I am running memtest86 on a Haswell system with suspect dimms. None of the dimms fail with uncorrectable errors. Some of the dimms fail with ECC error reports, like the following:
2017-02-28 15:23:35 - All memory ranges successfully locked
2017-02-28 15:24:44 - [Channel 1, Slot 1] DIMM err count=000001CF
2017-02-28 15:24:44 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
2017-02-28 15:24:45 - [Channel 1, Slot 1] DIMM err count=000001F8
2017-02-28 15:24:45 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
2017-02-28 15:26:27 - Cleanup - Releasing all memory ranges...
These ECC errors are reported on the memtest86 runtime screen, and in the summary Test report with count and last 10 occurrences.
But there are other dimms that cause corrected errors that are reported in neither place, but only in the memtest86.log file, and I wonder what is different about these errors and why they aren't reported in the summary Test report. They look like this:
2017-02-28 18:04:55 - All memory ranges successfully locked
2017-02-28 18:05:18 - MC10_STATUS=8800005000800091 (Overflow=No, Uncorrected=No, Recoverable=No, Corrected error count=1, Error code=0091)
2017-02-28 18:05:18 - Transaction error type: Memory read error
2017-02-28 18:05:18 - Model specific error: Corrected memory read error
2017-02-28 18:05:18 - MC10_ADDR=0000000000000000
2017-02-28 18:05:18 - MC10_MISC=4918C00400042000
2017-02-28 18:10:00 - Cleanup - Releasing all memory ranges...
Thank you
2017-02-28 15:23:35 - All memory ranges successfully locked
2017-02-28 15:24:44 - [Channel 1, Slot 1] DIMM err count=000001CF
2017-02-28 15:24:44 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
2017-02-28 15:24:45 - [Channel 1, Slot 1] DIMM err count=000001F8
2017-02-28 15:24:45 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
2017-02-28 15:26:27 - Cleanup - Releasing all memory ranges...
These ECC errors are reported on the memtest86 runtime screen, and in the summary Test report with count and last 10 occurrences.
But there are other dimms that cause corrected errors that are reported in neither place, but only in the memtest86.log file, and I wonder what is different about these errors and why they aren't reported in the summary Test report. They look like this:
2017-02-28 18:04:55 - All memory ranges successfully locked
2017-02-28 18:05:18 - MC10_STATUS=8800005000800091 (Overflow=No, Uncorrected=No, Recoverable=No, Corrected error count=1, Error code=0091)
2017-02-28 18:05:18 - Transaction error type: Memory read error
2017-02-28 18:05:18 - Model specific error: Corrected memory read error
2017-02-28 18:05:18 - MC10_ADDR=0000000000000000
2017-02-28 18:05:18 - MC10_MISC=4918C00400042000
2017-02-28 18:10:00 - Cleanup - Releasing all memory ranges...
Thank you
Comment