Announcement

Collapse
No announcement yet.

Making Sense of These Results ...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making Sense of These Results ...

    Hello,

    My recently-upgraded system started exhibiting behavior I've recognized in the past as RAM errors, with programs closing/hanging unexpectedly and randomly, as well as strange OS behavior while trying to boot to MemTest.

    Sure enough, when I came back later in the day I found that MemTest had found errors. A lot of them. However, the results don't quite make sense to me; I'm used to the original MemTest86+ version, which would list the details of the errors as they occurred. This version of MemTest says there were 301 total errors and that only 54% of the tests passed, but showed only a single error's details. It wasn't even the last error, because more happened during the final hammer test.

    In passes 1 and 2, the system warned that the RAM "may be vulnerable to high frequency row hammer bit flips" but did not do so for passes 3 and 4, which signified to me that it's not just a hammer test vulnerability, but that something really is wrong with the sticks.

    The system is currently using four Corsair 16GB DDR4 sticks, all recently bought. However, when I checked detailed SPD information, the stick in DIMM Slot #1 was not reporting correctly; it reported as 16GB DDR4 PC4-17000 and 15-15-15-36 / 2134 MHz / 1.2V while the other three reported as 16GB DDR4 XMP PC4-25600 and 16-18-18-36 / 3200 MHz / 1.350V. The misreporting stick also did not identify itself as a Corsair product, or ... anything, really. When I rebooted to double check the UEFI, all four sticks reported correctly, and when I booted back into MemTest the misreporting stick started reporting correctly as well.

    So ... what's going on? Could the misreporting stick be the one having RAM issues? Why did MemTest report the details of only one error when it had detected 301?

    Here's the log file in its entirety.
    Summary

    Report Date 2019-09-28 20:55:53
    Generated by MemTest86 V8.2 Free (64-bit)
    Result FAIL
    System Information

    EFI Specifications 2.40
    System
    Manufacturer ASUS
    Product Name All Series
    Version System Version
    Serial Number System Serial Number
    BIOS
    Vendor American Megatrends Inc.
    Version 3902
    Release Date 04/19/2018
    Baseboard
    Manufacturer ASUSTeK COMPUTER INC.
    Product Name SABERTOOTH X99
    Version Rev 1.xx
    Serial Number 170294965100026
    CPU Type Intel Core i7-6950X @ 3.00GHz
    CPU Clock 2998 MHz [Turbo: 3398.0 MHz]
    # Logical Processors 20 (10 enabled for testing)
    L1 Cache 8 x 64K (148781 MB/s)
    L2 Cache 8 x 256K (47185 MB/s)
    L3 Cache 25600K (29024 MB/s)
    Memory 65456M (13509 MB/s)
    DIMM Slot #0 16GB DDR4 XMP PC4-25600
    Corsair / CMR32GX4M2C3200C16
    16-18-18-36 / 3200 MHz / 1.350V
    DIMM Slot #1 16GB DDR4 PC4-17000
    15-15-15-36 / 2134 MHz / 1.2V
    DIMM Slot #2 16GB DDR4 XMP PC4-25600
    Corsair / CMR32GX4M2C3200C16
    16-18-18-36 / 3200 MHz / 1.350V
    DIMM Slot #3 16GB DDR4 XMP PC4-25600
    Corsair / CMR32GX4M2C3200C16
    16-18-18-36 / 3200 MHz / 1.350V
    Result summary

    Test Start Time 2019-09-28 09:29:53
    Elapsed Time 11:15:30
    Memory Range Tested 0x0 - 1040000000 (66560MB)
    CPU Selection Mode Parallel (All CPUs)
    ECC Polling Enabled
    # Tests Passed 26/48 (54%)
    Lowest Error Address 0xB4F94BCB8 (46329MB)
    Highest Error Address 0xB4F94BCB8 (46329MB)
    Bits in Error Mask 0000000004000000
    Bits in Error 1
    Max Contiguous Errors 1
    Test # Tests Passed Errors
    Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
    Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
    Test 2 [Address test, own address] 4/4 (100%) 0
    Test 3 [Moving inversions, ones & zeroes] 3/4 (75%) 6
    Test 4 [Moving inversions, 8-bit pattern] 2/4 (50%) 19
    Test 5 [Moving inversions, random pattern] 1/4 (25%) 27
    Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
    Test 7 [Moving inversions, 32-bit pattern] 0/4 (0%) 112
    Test 8 [Random number sequence] 0/4 (0%) 21
    Test 9 [Modulo 20, ones & zeros] 4/4 (100%) 0
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 0/4 (0%) 4
    Test 13 [Hammer test] 0/4 (0%) 112
    Last 10 Errors
    2019-09-28 09:32:36 - [Data Error] Test: 4, CPU: 0, Address: B4F94BCB8, Expected: 80808080, Actual: 84808080

  • #2
    At the moment the list of errors is filtered so that error is only logged if the address is different from the previous address.
    (i.e. errors are filtered if they look like the same error multiple times).
    We might change this in a future release so that filtering is only when Test# & address & bitmask all match.

    Anyway it looks like the RAM is bad. This page might help.
    https://www.memtest86.com/troubleshooting.htm

    Comment


    • #3
      Does that mean all 300+ errors were happening at address B4F94BCB8?

      I spent the rest of the day running more tests after trying to determine which RAM stick was which. Based on what I was able to look up with SIV it seems like the RAM was set up in my system this way:

      [Empty] [Dimm #0] [Empty] [Dimm #1] [CPU] [Dimm #3] [Empty] [Dimm #2] [Empty]

      I'll be calling them sticks A, B, C and D for clarity here. I took stick B out and ran MemTest again:
      Summary

      Report Date 2019-09-29 19:01:08
      Generated by MemTest86 V8.2 Free (64-bit)
      Result FAIL
      System Information

      EFI Specifications 2.40
      System
      Manufacturer ASUS
      Product Name All Series
      Version System Version
      Serial Number System Serial Number
      BIOS
      Vendor American Megatrends Inc.
      Version 3902
      Release Date 04/19/2018
      Baseboard
      Manufacturer ASUSTeK COMPUTER INC.
      Product Name SABERTOOTH X99
      Version Rev 1.xx
      Serial Number 170294965100026
      CPU Type Intel Core i7-6950X @ 3.00GHz
      CPU Clock 2998 MHz [Turbo: 3398.0 MHz]
      # Logical Processors 20 (10 enabled for testing)
      L1 Cache 8 x 64K (154346 MB/s)
      L2 Cache 8 x 256K (47118 MB/s)
      L3 Cache 25600K (28984 MB/s)
      Memory 49072M (13707 MB/s)
      DIMM Slot #0 16GB DDR4 XMP PC4-25600
      Corsair / CMR32GX4M2C3200C16
      16-18-18-36 / 3200 MHz / 1.350V
      DIMM Slot #1 16GB DDR4 XMP PC4-25600
      Corsair / CMR32GX4M2C3200C16
      16-18-18-36 / 3200 MHz / 1.350V
      DIMM Slot #2 16GB DDR4 XMP PC4-25600
      Corsair / CMR32GX4M2C3200C16
      16-18-18-36 / 3200 MHz / 1.350V
      Result summary

      Test Start Time 2019-09-29 11:14:58
      Elapsed Time 7:43:52
      Memory Range Tested 0x0 - C40000000 (50176MB)
      CPU Selection Mode Parallel (All CPUs)
      ECC Polling Enabled
      # Tests Passed 37/48 (77%)
      Lowest Error Address 0x88BAF8D38 (35002MB)
      Highest Error Address 0x88BAF8D38 (35002MB)
      Bits in Error Mask 0000000004000000
      Bits in Error 1
      Max Contiguous Errors 1
      Test # Tests Passed Errors
      Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
      Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
      Test 2 [Address test, own address] 4/4 (100%) 0
      Test 3 [Moving inversions, ones & zeroes] 4/4 (100%) 0
      Test 4 [Moving inversions, 8-bit pattern] 3/4 (75%) 1
      Test 5 [Moving inversions, random pattern] 3/4 (75%) 11
      Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
      Test 7 [Moving inversions, 32-bit pattern] 2/4 (50%) 18
      Test 8 [Random number sequence] 2/4 (50%) 6
      Test 9 [Modulo 20, ones & zeros] 4/4 (100%) 0
      Test 10 [Bit fade test, 2 patterns, 1 CPU] 0/4 (0%) 4
      Test 13 [Hammer test] 3/4 (75%) 1
      Last 10 Errors
      2019-09-29 11:42:49 - [Data Error] Test: 10, CPU: 0, Address: 88BAF8D38, Expected: 00000000, Actual: 04000000
      Fewer errors generated, but I'm still getting errors ... as well as a different failing memory address reported than before. So after that, I pulled out stick C, as it was in a mirroring slot from stick B. I ran the test again:
      Summary

      Report Date 2019-09-30 02:35:10
      Generated by MemTest86 V8.2 Free (64-bit)
      Result FAIL
      System Information

      EFI Specifications 2.40
      System
      Manufacturer ASUS
      Product Name All Series
      Version System Version
      Serial Number System Serial Number
      BIOS
      Vendor American Megatrends Inc.
      Version 3902
      Release Date 04/19/2018
      Baseboard
      Manufacturer ASUSTeK COMPUTER INC.
      Product Name SABERTOOTH X99
      Version Rev 1.xx
      Serial Number 170294965100026
      CPU Type Intel Core i7-6950X @ 3.00GHz
      CPU Clock 2998 MHz [Turbo: 3398.0 MHz]
      # Logical Processors 20 (10 enabled for testing)
      L1 Cache 8 x 64K (153882 MB/s)
      L2 Cache 8 x 256K (47130 MB/s)
      L3 Cache 25600K (29070 MB/s)
      Memory 32688M (13412 MB/s)
      DIMM Slot #0 16GB DDR4 XMP PC4-25600
      Corsair / CMR32GX4M2C3200C16
      16-18-18-36 / 3200 MHz / 1.350V
      DIMM Slot #1 16GB DDR4 XMP PC4-25600
      Corsair / CMR32GX4M2C3200C16
      16-18-18-36 / 3200 MHz / 1.350V
      Result summary

      Test Start Time 2019-09-29 19:20:02
      Elapsed Time 3:51:13
      Memory Range Tested 0x0 - 840000000 (33792MB)
      CPU Selection Mode Parallel (All CPUs)
      ECC Polling Enabled
      # Tests Passed 39/44 (88%)
      Lowest Error Address 0x5C7CA5E38 (23676MB)
      Highest Error Address 0x5C7CA5E38 (23676MB)
      Bits in Error Mask 0000000004000000
      Bits in Error 1
      Max Contiguous Errors 1
      Test # Tests Passed Errors
      Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
      Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
      Test 2 [Address test, own address] 4/4 (100%) 0
      Test 3 [Moving inversions, ones & zeroes] 4/4 (100%) 0
      Test 4 [Moving inversions, 8-bit pattern] 4/4 (100%) 0
      Test 5 [Moving inversions, random pattern] 4/4 (100%) 0
      Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
      Test 7 [Moving inversions, 32-bit pattern] 3/4 (75%) 1
      Test 8 [Random number sequence] 2/4 (50%) 2
      Test 9 [Modulo 20, ones & zeros] 4/4 (100%) 0
      Test 10 [Bit fade test, 2 patterns, 1 CPU] 2/4 (50%) 2
      Last 10 Errors
      2019-09-29 19:35:33 - [Data Error] Test: 8, CPU: 12, Address: 5C7CA5E38, Expected: 136E2C92, Actual: 176E2C92
      Even fewer errors, but still errors ... and again a different memory address reported than the last time. and guessing that the 23676 MB range had to be stick D given the order of the DIMMs, I pulled it out and did one more test.

      Although I took pictures I forgot to save the report ... but the lone stick, stick A, passed.

      If I'm reading these results right, does this mean all three of the sticks I pulled are bad? Or is it possible only one stick is bad and I kept pulling the wrong ones with the other tests? I'm planning on testing some of the other RAM slots with this stick that passed, just to make sure this is not a motherboard issue, but I wanted to post these results and get an opinion before I leave for work.

      Thanks!

      Comment


      • #4
        Likely a single stick is bad.
        It is impossible to know which stick is used for which addresses. The addresses are interleaved. So high addresses aren't just on 1 stick

        Comment

        Working...
        X