Announcement

Collapse
No announcement yet.

Error reporting missing or truncated

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error reporting missing or truncated

    I'm running some extended tests (0,1, and 5) repeatedly on V7.5 Pro build 1001, and I'm encountering some inconsistent reporting of errors.

    Specifically, I am seeing only a small fraction of the errors reported. The cumulative error count is catching thousands of errors, but the number of errors reported is around 100.

    For example, one run reports 110 errors in detail for test 5, but the cumulative count is the maximum of 10000

    My configuration is as follows:
    TSTLIST=0,1,5
    NUMPASS=1200
    ADDRLIMLO=0x10000000
    ADDRLIMHI=0x100000000
    CPUSEL=PARALLEL
    PASS1FULL=1
    AUTOMODE=1
    REPORTNUMERRS=5000
    REPORTNUMWARN=5000
    SKIPSPLASH=1
    EXITMODE=0

  • #2
    Identical errors are filtered out.

    Comment


    • #3
      Is there any way to determine from the logfile how many times a specific error was encountered?

      Comment


      • #4
        No.

        V8.3 does slightly less filtering.

        See Memtest86 Version Histrory
        "Modified behaviour for detection of duplicate errors. Errors with the same address (and bits) but occur in different tests are no longer considered to be duplicate."

        But I don't think that helps in this case.

        Generally it doesn't matter if you get 1 error, 100 errors or 1000 errors. The RAM is bad in all cases. Degrees of badness don't really matter.

        Comment


        • #5
          I see, thanks for that information.

          in this case, I am intentionally inducing errors in otherwise functional memory, so the address and details of the errors are relevant.

          Is it reasonable to expect that an identical error would appear with that frequency? That is, given the space of random words that might be written back to a given address, I would expect that this would be very rare unless the PRNG had a very small period. Is that the case?

          Comment


          • #6
            What's the method you are using to induce errors? We had a few attempts at this with different methods (e.g. a heat gun blasting hot air on RAM sticks), but it was very hard to induce just a small number of errors. There is often a tipping point where you go from no errors to lots of errors, and lots of errors typically crashes the software, or locks up the machine.

            I don't really understand the question about frequency and random numbers. Most of the tests don't use random numbers (but test #5 does).

            Comment


            • #7
              I am inducing errors with radiation. I used test 0 and 1 to validate that the addressing was not being corrupted, and then used test 5 to identify bit errors. There were many errors in test 5, but only a handful were actually reported.

              To clarify my question about the random patterns-- If repeated errors are not reported, and there are many more errors counted than reported, then there must be many cases where the exact same random value was written to the exact same address and encountered the same error. This seems unlikely, which is why I was wondering if the random patterns were not so random after all. For example, if you use a LFSR to generate bytes, there is a period for any given seed, which could potentially synchronize with the order that addresses are written.

              Comment


              • #8
                Duplicate error in V8.3 is when the address is the same, error bits are the same, and test number is the same.
                So the value written and read can be different each time and the error still be classified as a duplicate. This can happen if there is a weak bit in that byte that flips for various different values that are stored in that byte.


                Comment

                Working...
                X