Announcement

Collapse
No announcement yet.

3 Month old build random frequent memory related BSOD confusing test results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 3 Month old build random frequent memory related BSOD confusing test results

    3 month old PC. Hardware is as follows:

    Asus Maximus IX Formula
    Intel Core i&-7700k @ 4.20 GHz 4x G Skill 16gb DDR4-3600 F4-3600C17-16GTZKW
    2x Asus GeForce GTX 1080 TI 11GB GDDR5X Founders Edition
    Primary Drive: Samsung 850 Pro 2.5" 1TB SATA III 3D Internal SSD
    Secondary Drive: WD Black 6TB HDD
    Corsair Hydro Series H100i V2 CPU cooler
    1x NZXT AC-IUSBH-M1 USB Hub
    Thermaltake RGB 1250W 80 PLUS TITANIUM Power Supply
    3.5" Rosewill 40:1 internal card reader
    LB 16x Blu-Ray rewriter
    Cooler Master HAF 932 Full Tower
    Windows 10 Pro 64-bit

    Over the last 2-3 weeks I have been getting random memory-related BSOD. I've reinstalled the OS twice. I've tried removing all peripherals. I tried replacing the RAM with an entirely different set. All produce the same type of BSOD, all within seconds, minutes, or hours of booting up. The more actively I try to load programs when booted, the faster the crashes. If after boot, when the login screen appears, I try to log in right away, it crashes. If I wait 60-120 seconds, it doesnt. Upon getting to the desktop, if I try to load any programs within 60-120 seconds, it crashes. If I wait, generally, it doesn't, until I load something like a big game such as Battlefield 1, BF4, or WoW. Sometimes it will run for a few hours, sometimes not.

    After reinstalling the OS from scratch to eliminate the possibility of it being driver related, I figured it must be hardware. I'd seen sporadic reports that my Steelseries Z board could cause problems with my video cards. Tried swapping that out for a plain old usb kb. No luck. No matter what configuration I tried, the failures persist.

    Moved into testing prime components, figuring that is the only remaining possibility that I haven't changed. Went out and bought new ram, 4 sticks of Corsair DDR4-3000 RGB Vengeance. Problem persisted.

    Downloaded MemTest86 from this site and began testing the original ram. First test was all four sticks in all four slots.
    Test Start Time 2017-09-24 15:05:01
    Elapsed Time 10:57:32
    Memory Range Tested 0x0 - 107F000000 (67568MB)
    CPU Selection Mode Parallel (All CPUs)
    ECC Polling Enabled
    # Tests Passed 32/48 (66%)
    Lowest Error Address 0x1971A0 (1MB)
    Highest Error Address 0x10700F51BC (67328MB)
    Bits in Error Mask 00000000FFFFFFFF
    Bits in Error 32
    Max Contiguous Errors 2
    Test # Tests Passed Errors
    Test 0 [Address test, walking ones, 1 CPU] 4/4 (100%) 0
    Test 1 [Address test, own address, 1 CPU] 4/4 (100%) 0
    Test 2 [Address test, own address] 4/4 (100%) 0
    Test 3 [Moving inversions, ones & zeroes] 2/4 (50%) 32
    Test 4 [Moving inversions, 8-bit pattern] 2/4 (50%) 64
    Test 5 [Moving inversions, random pattern] 0/4 (0%) 96
    Test 6 [Block move, 64-byte blocks] 4/4 (100%) 0
    Test 7 [Moving inversions, 32-bit pattern] 0/4 (0%) 17176
    Test 8 [Random number sequence] 4/4 (100%) 0
    Test 9 [Modulo 20, ones & zeros] 0/4 (0%) 177
    Test 10 [Bit fade test, 2 patterns, 1 CPU] 4/4 (100%) 0
    Test 13 [Hammer test] 4/4 (100%) 0
    Last 10 Errors
    [Data Error] Test: 9, CPU: 2, Address: 1051902A2C, Expected: 7A62D708, Actual: 859D28F7
    [Data Error] Test: 9, CPU: 2, Address: F8338DD30, Expected: 67722560, Actual: 988DDA9F
    [Data Error] Test: 9, CPU: 2, Address: F70EA9B10, Expected: 2EF0E0D0, Actual: D10F1F2F
    [Data Error] Test: 9, CPU: 2, Address: ED08D2C88, Expected: CC2F6370, Actual: 33D09C8F
    [Data Error] Test: 9, CPU: 2, Address: EC111ED34, Expected: 6D56AF43, Actual: 92A950BC
    [Data Error] Test: 9, CPU: 2, Address: EC0077D00, Expected: 6D56AF43, Actual: 92A950BC
    [Data Error] Test: 9, CPU: 2, Address: E43B84CB8, Expected: 2909AA9A, Actual: D6F65565
    [Data Error] Test: 9, CPU: 2, Address: D80CA09A4, Expected: B61544D6, Actual: 49EABB29
    [Data Error] Test: 9, CPU: 2, Address: D40B9ED98, Expected: 228E5A95, Actual: DD71A56A
    [Data Error] Test: 9, CPU: 2, Address: D31C2BEB0, Expected: 256EBB70, Actual: DA91448F

    Results were not pleasing. Decided that to best isolate the problem, I would test one stick at a time, in each slot sequentially, for a total of 16 tests. I won't list out the complete results here unless asked to do so, as that is alot of data. However this is the result of the first three sticks:

    Test 1: All 4 sticks all 4 slots - failed (#3,4,5,7,9 - 17545)
    Test 2: Stick 1 Slot 1 - passed
    Test 3: Stick 1 Slot 2 - failed (#7-16)
    Test 4: Stick 1 Slot 3 - failed (#4,7-66)
    Test 5: Stick 1 Slot 4 - failed (#9-1)
    Test 6: Stick 2 slot 1 - failed (#7,9-67)
    Test 7: Stick 2 slot 2 - passed
    Test 8: Stick 2 slot 3 - failed (#7-16)
    Test 9: Stick 2 slot 4 - passed
    Test 10: Stick 3 slot 1 - failed (#4,7,9-9
    Test 11: Stick 3 slot 2 - failed (#5,7-32)
    Test 12: Stick 3 slot 3 - failed (#7-16)
    Test 13: Stick 3 slot 4 - failed (#7-16)

    At this point I stopped because around test 7 or 8 I started to notice that the reporting CPU was always #2. I went back through all the previous tests logs which I had been saving as HTML files and found that indeed, every test that failed reported CPU: 2.

    I see that the tests recognize I have 8 cores, but only 4 are enabled for testing. I'm testing in parallel mode. Does this indicate that my 2nd core is faulty? Or does it just happen to be the only core that reports the failures?

    I don't have enough experience with this program to know for sure, so if anyone here does ~ I'd appriciate any thoughts.

    My next course of action is to reinstall all 4 sticks of RAM, and disable core #2 for testing, and see if the errors persist. If they do, I suppose I'll have to single core test all 4 sticks simultaniously, then maybe even individually, to try to isolate the problem. I'm really hoping it doesn't come to that. Even using 4 cores a single RAM stick test battery takes 4 hours. All 4 cores with all 4 sticks took 11. I've already been testing for 59 hours. I'm really ready to know exactly what the problem is so I can move out of the testing phase and get into the repair phase.

    Thanks for any help you guys can provide.


  • #2
    It is unlikely all 4 sticks are bad. So it is more likely that the CPU is bad, or the motherboard, or the BIOS is buggy (bad memory map, or bad RAM timings). So check for new firmware to start with.

    Comment


    • #3
      Already upgraded bios to the most recent release. Given that each stick shows different results in different sockets I'm leaning towards it not being a motherboard issue. I was hoping for an "aha" moment that clearly indicated the fault, so as not to have to replace multiple pieces unnecessarily. At this point I'm definitely leaning towards a CPU failure of core #2.

      Comment


      • #4
        Let us know the outcome once you track the root cause down.

        Comment

        Working...
        X