Hi all
my system (i7-3770K 4C/8T, 4x8 GB RAM) recently started to erratically fail and reboot. I have a hypervisor running on the system with a couple of VMs. It _seems_ the system is stable as long as only the hypervisor runs, but when i start VMs the system seems to run into a condition where it crashed after some time (sometimes a couple of hours, sometimes a day). I therefore concluded, it _could_ be related to growing memory use (normally the system in toltal with all the VMs does use a bit less than 16 GB). So i was thinking it could bit a bad memory stick where the malioucous address range is hit while a VM is dynamically increasing the memory it needs.
As context information: The cpu was slighly over clocked (with water cooling) and the RAM was run at XMP 2.0 configuration.
So i run memtest86 8.3, which took about half a day to complete and showed a couple of thounds of errors.
Since this is the first time i'm using memtest and from what i find in the test results, i'm not sure how to interprete. Mainly because it does not seem to show a random error but rather a kind of systematic error.
Now this goes on like this a couple of thound times with following tests.
What i can see is, that the error seems to be systematic, meaning, in this case, it seems the the HEX value at position 10 (from the right) seems always to be 8 instead of expected 0. No matter of the CPU used (so for all cores 0, 2, 4, 6).
Later on with next test it looks similar:
Here, it seems always the second last position is shifted from F to 7.
Interesting thing is, that the offset seems to be 8 in both examples (shifting from F to 7 in HEX, and from 0 to 8 in the first example). This makes me even more suspicious if this really is an error in the RAM, as it looks too systematic for a random error.
It would be nice if you would share your expertise - how does this look for you?
Thanks a lot for your support!
PS: If needed, i can provide the full test log
my system (i7-3770K 4C/8T, 4x8 GB RAM) recently started to erratically fail and reboot. I have a hypervisor running on the system with a couple of VMs. It _seems_ the system is stable as long as only the hypervisor runs, but when i start VMs the system seems to run into a condition where it crashed after some time (sometimes a couple of hours, sometimes a day). I therefore concluded, it _could_ be related to growing memory use (normally the system in toltal with all the VMs does use a bit less than 16 GB). So i was thinking it could bit a bad memory stick where the malioucous address range is hit while a VM is dynamically increasing the memory it needs.
As context information: The cpu was slighly over clocked (with water cooling) and the RAM was run at XMP 2.0 configuration.
So i run memtest86 8.3, which took about half a day to complete and showed a couple of thounds of errors.
Since this is the first time i'm using memtest and from what i find in the test results, i'm not sure how to interprete. Mainly because it does not seem to show a random error but rather a kind of systematic error.
Code:
2020-02-26 20:11:00 - *** TEST SESSION - 2020-02-26 20:11:00 ***
Code:
2020-02-26 20:11:02 - [MEM ERROR - Data] Test: 1, CPU: 0, Address: 1DE6DDB8, Expected: 000000001DE6DDB8, Actual: 000000[B]8[/B]01DE6DDB8 2020-02-26 20:11:02 - [MEM ERROR - Data] Test: 1, CPU: 0, Address: 243A1DB8, Expected: 00000000243A1DB8, Actual: 000000[B]8[/B]0243A1DB8 2020-02-26 20:11:02 - [MEM ERROR - Data] Test: 1, CPU: 0, Address: 2466DDB8, Expected: 000000002466DDB8, Actual: 000000[B]8[/B]02466DDB8 2020-02-26 20:11:02 - [MEM ERROR - Data] Test: 1, CPU: 0, Address: 2466E9B8, Expected: 000000002466E9B8, Actual: 000000[B]8[/B]02466E9B8 2020-02-26 20:11:02 - [MEM ERROR - Data] Test: 1, CPU: 0, Address: 25DC2938, Expected: 0000000025DC2938, Actual: 000000[B]8[/B]025DC2938
Code:
2020-02-26 20:11:08 - Running test #2 (Test 2 [Address test, own address]) 2020-02-26 20:11:08 - MtSupportRunAllTests - Setting random seed to 0x50415353 2020-02-26 20:11:08 - MtSupportRunAllTests - Start time: 8071 ms 2020-02-26 20:11:08 - ReadMemoryRanges - Available Pages = 8302249 2020-02-26 20:11:08 - MtSupportRunAllTests - Enabling memory cache for test 2020-02-26 20:11:08 - MtSupportRunAllTests - Enabling memory cache complete 2020-02-26 20:11:09 - Start memory range test (0x0 - 0x81F600000) 2020-02-26 20:11:09 - Pre-allocating memory ranges >=16MB first... 2020-02-26 20:11:09 - All memory ranges successfully locked 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 14D46DB8, Expected: 0000000014D46DB8, Actual: 000000[B]8[/B]014D46DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 6, Address: 189825B8, Expected: 00000000189825B8, Actual: 000000[B]8[/B]0189825B8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 6, Address: 18F65DB8, Expected: 0000000018F65DB8, Actual: 000000[B]8[/B]018F65DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 0, Address: 1D08A5B8, Expected: 000000001D08A5B8, Actual: 000000[B]8[/B]01D08A5B8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 6, Address: 195C0778, Expected: 00000000195C0778, Actual: 000000[B]8[/B]0195C0778 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 16F65DB8, Expected: 0000000016F65DB8, Actual: 000000[B]8[/B]016F65DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 16FE1538, Expected: 0000000016FE1538, Actual: 000000[B]8[/B]016FE1538 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 1700F5B8, Expected: 000000001700F5B8, Actual: 000000[B]8[/B]01700F5B8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 0, Address: 1D44DF78, Expected: 000000001D44DF78, Actual: 000000[B]8[/B]01D44DF78 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 6, Address: 1ABA1DB8, Expected: 000000001ABA1DB8, Actual: 000000[B]8[/B]01ABA1DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 6, Address: 1BD45DB8, Expected: 000000001BD45DB8, Actual: 000000[B]8[/B]01BD45DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 242AA9B8, Expected: 00000000242AA9B8, Actual: 000000[B]8[/B]0242AA9B8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 202ABFF8, Expected: 00000000202ABFF8, Actual: 000000[B]8[/B]0202ABFF8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 24326938, Expected: 0000000024326938, Actual: 000000[B]8[/B]024326938 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 20325D38, Expected: 0000000020325D38, Actual: 000000[B]8[/B]020325D38 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 24B26D38, Expected: 0000000024B26D38, Actual: 000000[B]8[/B]024B26D38 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 25325D38, Expected: 0000000025325D38, Actual: 000000[B]8[/B]025325D38 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 21B26938, Expected: 0000000021B26938, Actual: 000000[B]8[/B]021B26938 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 25BA3FF8, Expected: 0000000025BA3FF8, Actual: 000000[B]8[/B]025BA3FF8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 21E6EDB8, Expected: 0000000021E6EDB8, Actual: 000000[B]8[/B]021E6EDB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 25E6DDB8, Expected: 0000000025E6DDB8, Actual: 000000[B]8[/B]025E6DDB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 22326D38, Expected: 0000000022326D38, Actual: 000000[B]8[/B]022326D38 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 262A9F78, Expected: 00000000262A9F78, Actual: 000000[B]8[/B]0262A9F78 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 223A2DB8, Expected: 00000000223A2DB8, Actual: 000000[B]8[/B]0223A2DB8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 27105D38, Expected: 0000000027105D38, Actual: 000000[B]8[/B]027105D38 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 4, Address: 2766FFF8, Expected: 000000002766FFF8, Actual: 000000[B]8[/B]02766FFF8 2020-02-26 20:11:09 - [MEM ERROR - Data] Test: 2, CPU: 2, Address: 23D447F8, Expected: 0000000023D447F8, Actual: 000000[B]8[/B]023D447F8
What i can see is, that the error seems to be systematic, meaning, in this case, it seems the the HEX value at position 10 (from the right) seems always to be 8 instead of expected 0. No matter of the CPU used (so for all cores 0, 2, 4, 6).
Later on with next test it looks similar:
Code:
2020-02-26 20:11:17 - Running test #3 (Test 3 [Moving inversions, ones & zeroes]) 2020-02-26 20:11:17 - MtSupportRunAllTests - Setting random seed to 0x50415353 2020-02-26 20:11:17 - MtSupportRunAllTests - Start time: 16184 ms 2020-02-26 20:11:17 - ReadMemoryRanges - Available Pages = 8302249 2020-02-26 20:11:17 - MtSupportRunAllTests - Enabling memory cache for test 2020-02-26 20:11:17 - MtSupportRunAllTests - Enabling memory cache complete 2020-02-26 20:11:17 - Start memory range test (0x0 - 0x81F600000) 2020-02-26 20:11:17 - Pre-allocating memory ranges >=16MB first... 2020-02-26 20:11:17 - All memory ranges successfully locked 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 2, Address: 1807FC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 4, Address: 5C153C, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 2, Address: 1821BC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 2, Address: 104F7C, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 6, Address: AABFFC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 4, Address: 767FFC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 2, Address: 44CFFC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 4, Address: 5C293C, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F 2020-02-26 20:11:17 - [MEM ERROR - Data] Test: 3, CPU: 4, Address: 66DDBC, Expected: FFFFFFFF, Actual: FFFFFF[B]7[/B]F
Interesting thing is, that the offset seems to be 8 in both examples (shifting from F to 7 in HEX, and from 0 to 8 in the first example). This makes me even more suspicious if this really is an error in the RAM, as it looks too systematic for a random error.
It would be nice if you would share your expertise - how does this look for you?
Thanks a lot for your support!
PS: If needed, i can provide the full test log
Comment