Announcement

Collapse
No announcement yet.

Diagnosing Memtest Results- Rare bit fade and row hammer errors

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diagnosing Memtest Results- Rare bit fade and row hammer errors

    I recently purchased a used 16GB Kit (2x8GB) of Crucial Ballistix Sport VLP DDR3 Memory 1600MHz PC3-12800 1.35V to match the exact same kit I already have and expand from 16->32GB. I'm currently rocking an MSI Z77A-GD65. Since I was purchasing the RAM used on ebay, I decided to test it to see if it was in working order. This is the first time I've really used memtest and I don't have any prior experience with bad ram.

    I started out by testing all 4 sticks at once. I'm only providing the full results for the first test just so everyone can see the details, I'll just mention the errors for the subsequent tests since nothing else changed. The memtest readout shows the voltage at 1.5V, but I've confirmed that it's set to 1.35V in the BIOS. The other SPD setting is at 1.5V, which is what I think memtest is choosing to display in the results for some reason. I don't have any XMP profile activated, my BIOS is updated to the latest and all the settings are at the default. Summary

    Report Date 2019-10-31 08:14:25
    Generated by MemTest86 V8.2 Free (64-bit)
    Result FAIL
    System Information

    EFI Specifications 2.31
    System
    Manufacturer MSI
    Product Name MS-7751
    Version 2.0
    Serial Number To be filled by O.E.M.
    BIOS
    Vendor American Megatrends Inc.
    Version V10.11
    Release Date 10/09/2013
    Baseboard
    Manufacturer MSI
    Product Name Z77A-GD65 (MS-7751)
    Version 2.0
    Serial Number To be filled by O.E.M.
    CPU Type Intel Core i5-3570 @ 3.40GHz
    CPU Clock 3400 MHz [Turbo: 3800.0 MHz]
    # Logical Processors 4
    L1 Cache 4 x 64K (93714 MB/s)
    L2 Cache 4 x 256K (52491 MB/s)
    L3 Cache 6144K (31109 MB/s)
    Memory 32740M (19095 MB/s)
    DIMM Slot #0 8GB DDR3 XMP PC3-12800
    Crucial Technology / BLS8G3D1609ES2LX0. / A9022A79
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #1 8GB DDR3 XMP PC3-12800
    Crucial Technology / BLS8G3D1609ES2LX0. / A9022563
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #2 8GB DDR3 XMP PC3-12800
    Crucial Technology / BLS8G3D1609ES2LX0. / A60ACDE8
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #3 8GB DDR3 XMP PC3-12800
    Crucial Technology / BLS8G3D1609ES2LX0. / A90224CF
    9-9-9-24 / 1600 MHz / 1.500V
    Result summary

    Test Start Time 2019-10-30 22:50:09
    Elapsed Time 8:48:23
    Memory Range Tested 0x0 - 81F000000 (33264MB)
    CPU Selection Mode Parallel (All CPUs)
    ECC Polling Enabled
    # Tests Passed 44/48 (91%)
    Lowest Error Address 0x16C3AD628 (5827MB)
    Highest Error Address 0x7EA1A559C (32417MB)
    Bits in Error Mask 0000000000001028
    Bits in Error 3
    Max Contiguous Errors 1
    Test # Tests Passed Errors
    Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
    Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
    Test 2 [Address test, own address] 4/4 (100%) 0
    Test 3 [Moving inversions, ones & zeroes] 4/4 (100%) 0
    Test 4 [Moving inversions, 8-bit pattern] 4/4 (100%) 0
    Test 5 [Moving inversions, random pattern] 4/4 (100%) 0
    Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
    Test 7 [Moving inversions, 32-bit pattern] 4/4 (100%) 0
    Test 8 [Random number sequence] 4/4 (100%) 0
    Test 9 [Modulo 20, ones & zeros] 4/4 (100%) 0
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 2/4 (50%) 2
    Test 13 [Hammer test] 2/4 (50%) 3
    Last 10 Errors
    2019-10-31 07:21:22 - [Data Error] Test: 13, CPU: 0, Address: 5B010CD30, Expected: D9F456B5, Actual: D9F45695
    2019-10-31 06:33:17 - [Data Error] Test: 10, CPU: 0, Address: 7EA1A559C, Expected: 00000000, Actual: 00000008
    2019-10-31 02:34:11 - [Data Error] Test: 13, CPU: 0, Address: 5B010CD30, Expected: 496C0D30, Actual: 496C0D10
    2019-10-31 02:03:34 - [Data Error] Test: 13, CPU: 0, Address: 16C3AD628, Expected: DA4D2BCE, Actual: DA4D3BCE
    2019-10-31 01:45:57 - [Data Error] Test: 10, CPU: 0, Address: 7EA1A559C, Expected: 00000000, Actual: 00000008

    After the first test with all 4 sticks showed failures, I took out the ebay sticks and tested just my original 2 sticks and they passed with no errors. Next, I tested, just the ebay RAM in the same slots as the RAM that just passed and got the following results:
    Lowest Error Address 0x1AF50DB38 (6901MB)
    Highest Error Address 0x3F50D551C (16208MB)
    Bits in Error Mask 0000000000000108
    Bits in Error 2
    Max Contiguous Errors 1
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 0/4 (0%) 7
    2019-10-31 23:35:55 - [Data Error] Test: 10, CPU: 0, Address: 3F50D551C, Expected: 00000000, Actual: 00000008
    2019-10-31 23:35:53 - [Data Error] Test: 10, CPU: 0, Address: 1AF50DB38, Expected: 00000000, Actual: 00000100
    2019-10-31 21:22:30 - [Data Error] Test: 10, CPU: 0, Address: 3F50D551C, Expected: 00000000, Actual: 00000008
    2019-10-31 21:22:28 - [Data Error] Test: 10, CPU: 0, Address: 1AF50DB38, Expected: 00000000, Actual: 00000100
    2019-10-31 20:17:37 - [Data Error] Test: 10, CPU: 0, Address: 3F50D551C, Expected: 00000000, Actual: 00000008
    2019-10-31 20:17:35 - [Data Error] Test: 10, CPU: 0, Address: 1AF50DB38, Expected: 00000000, Actual: 00000100

    Seems reasonable enough - looks like one of the ebay sticks are bad, but it's likely not the CPU/Cache or channels because I just verified those with the previous test. Also, the errors are occurring at the same addresses on different passes, which points even more definitively to the ram.

    I then tested both ebay sticks individually through the full set of tests and they passed with no errors. I then ran each stick through again, but only specify the bit fade test and row hammer test and they still didn't show any errors (I just used the default of 4 passes for all tests). Next, I just tried to reproduce the error again and ran both sticks of ebay RAM in the same slots as before, but just specified the Bit fade and row hammer tests to save time. These are the results:
    Lowest Error Address 0x1E488F1A4 (7752MB)
    Highest Error Address 0x1E488F1A4 (7752MB)
    Bits in Error Mask 0000000020000000
    Bits in Error 1
    Max Contiguous Errors 1
    Test # Tests Passed Errors
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 4/4 (100%) 0
    Test 13 [Hammer test] 3/4 (75%) 1
    Last 10 Errors
    2019-11-02 17:11:40 - [Data Error] Test: 13, CPU: 0, Address: 1E488F1A4, Expected: DCD92510, Actual: FCD92510

    Part of me wants to ignore it and go on with my life, but I've read that any error in memtest is a legitimate error and should be corrected. I don't do anything requiring fault-tolerant RAM, but I'm not a fan of applications crashing for no reason. That being said, I don't think I should ignore it. Also, this is fun.

    I should also mention that I've run about 8 hours of Prime95 torture testing and seen no errors yet.

    So I guess I have a few questions:
    1) Is my analysis of these results faulty and in need of correction?
    2) Is this just one of those cases where I need to test a single stick of RAM for 24 hours to catch the error?
    3) If 2, then is it really worth replacing the RAM?
    4) Is it common that 4 passes isn't enough to detect any errors? Should I trust the results that indicate my 2 original sticks of RAM are good?


    Happy to provide more information as needed.

  • #2
    Errors in test #13 - row hammer can be less serious. See,
    https://www.memtest86.com/troubleshooting.htm#hammer
    They are more of a design issue than a hardware fault.

    Any RAM test will eventually get errors if you run it long enough (soft errors). See,
    https://en.wikipedia.org/wiki/Soft_e...of_soft_errors
    Of course using ECC will dramatically decrease the frequency of soft errors.

    I would suspect that the Ebay ram is marginal. Just on the edge of working / not working. Slightly higher voltages or more frequent refresh rates might get it more stable.

    Comment

    Working...
    X