Announcement

Collapse
No announcement yet.

Errors in Test 13 (Row Hammer) - what does it mean?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Errors in Test 13 (Row Hammer) - what does it mean?

    Hi, I've recently been getting BSODs on my workstation which was rock solid for the last 3 years until about 2 months ago. During normal usage (email, web surfing and application development) I've had no issues, but I've seen a big uptick in BSODs and application crashes to desktop playing games and using 3d graphics software (although not when developing image processing applications). Since it seems to only happen when running graphics intensive applications, I first ran video memory diagnostics, but those all returned zero errors so my next stop was memtest86 v7.0. I ran booting off of USB with the default settings and got zero errors on pass 1 but failed on passes 2-4 in the row hammer test. There are only 3 bits flagged as bad by the report so I'm wondering:
    1) Are these real errors?
    2) Why did I pass on Pass1 but not 2 - 4?
    3) Is there any way to tell which DIMM the issue is in from the reported memory addresses or do i just have to test 1 DIMM at a time until i find the bad one(s)?

    Here is the report:
    Test Start Time 2018-04-27 12:34:25
    Elapsed Time 7:47:57
    Memory Range Tested 0x0 - 81F000000 (33264MB)
    CPU Selection Mode Parallel (All CPUs)
    ECC Polling Enabled
    # Tests Passed 46/48 (95%)
    Lowest Error Address 0xA42C62E0 (2626MB)
    Highest Error Address 0x3A32C4E30 (14898MB)
    Bits in Error Mask 0000000042000100
    Bits in Error 3
    Max Contiguous Errors 1
    Test # Tests Passed Errors
    Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
    Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
    Test 2 [Address test, own address] 4/4 (100%) 0
    Test 3 [Moving inversions, ones & zeroes] 4/4 (100%) 0
    Test 4 [Moving inversions, 8-bit pattern] 4/4 (100%) 0
    Test 5 [Moving inversions, random pattern] 4/4 (100%) 0
    Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
    Test 7 [Moving inversions, 32-bit pattern] 4/4 (100%) 0
    Test 8 [Random number sequence] 4/4 (100%) 0
    Test 9 [Modulo 20, ones & zeros] 4/4 (100%) 0
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 4/4 (100%) 0
    Test 13 [Hammer test] 2/4 (50%) 3
    Last 10 Errors
    [Data Error] Test: 13, CPU: 0, Address: 3A32C4E30, Expected: D1308F30, Actual: 91308F30
    [Data Error] Test: 13, CPU: 0, Address: 1032C4138, Expected: 4414C326, Actual: 4614C326
    [Data Error] Test: 13, CPU: 0, Address: A42C62E0, Expected: D85C4FDE, Actual: D85C4EDE


    and here is my system info:
    EFI Specifications 2.31
    System
    Manufacturer MSI
    Product Name MS-7752
    Version 1.0
    Serial Number To be filled by O.E.M.
    BIOS
    Vendor American Megatrends Inc.
    Version V2.11
    Release Date 07/10/2013
    Baseboard
    Manufacturer MSI
    Product Name Z77A-G45 (MS-7752)
    Version 1.0
    Serial Number To be filled by O.E.M.
    CPU Type Intel Core i5-3570K @ 3.40GHz
    CPU Clock 3400 MHz [Turbo: 4200.0 MHz]
    # Logical Processors 4
    L1 Cache 4 x 64K (117576 MB/s)
    L2 Cache 4 x 256K (63912 MB/s)
    L3 Cache 6144K (34498 MB/s)
    Memory 32740M (19496 MB/s)
    DIMM Slot #0 8GB DDR3 XMP PC3-12800
    G Skill Intl / F3-1600C9-8GSR
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #1 8GB DDR3 XMP PC3-12800
    G Skill Intl / F3-1600C9-8GSR
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #2 8GB DDR3 XMP PC3-12800
    G Skill Intl / F3-1600C9-8GSR
    9-9-9-24 / 1600 MHz / 1.500V
    DIMM Slot #3 8GB DDR3 XMP PC3-12800
    G Skill Intl / F3-1600C9-8GSR
    9-9-9-24 / 1600 MHz / 1.500V

    I guess despite it only being 3 potentially bad bits, since I'm seeing recent BSOD and crash-to-desktop behavior, i should isolate the DIMM(s) and replace them, but before I do that, I'm just wondering if there are any chances of "false positives" and how i can tell them apart from "real" errors.

    thanks in advance for any info\answers!!

  • #2
    This page should answer all the questions
    https://www.memtest86.com/troubleshooting.htm

    Comment


    • #3
      Hi David,
      Thanks! Based on that trouble shooting FAQ, i ran memtest86 with default settings on each individual DIMM and ran into 1 error on 1 DIMM for test 13. I then put the other DIMM that came in the kit with the DIMM that failed and re-ran with default settings and this time, i ran into 2 errors for test 13. My workstation is running with 2 different 16GB (2x kits. Is it safe to assume that I only need to replace the kit with the DIMM that failed or is there some reason why i should replace them all? Also, the test I originally posted came back with 3 errors total in test 13. As stated, after re-testing with 1 DIMM and then its pair, I got 1 error and then 2 errors respectively. Is that 3rd error in the original test pass something that i should be concerned about? Ie. should the errors "add up" and the fact that I am only seeing 2 errors now when there were 3 before hint that maybe there are more faulty DIMMs than the 2 I have tentatively identified...?

      Thanks again!

      Comment


      • #4
        Did you read this section that was especially about Test #13 errors
        https://www.memtest86.com/troubleshooting.htm#hammer

        If your only errors are in test 13, it might be an option just to ignore them. Might be a red herring. The real problem for your BSOD might be elsewhere.

        Comment

        Working...
        X