Announcement

Collapse
No announcement yet.

Some corrected errors not reported

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some corrected errors not reported

    I am running memtest86 on a Haswell system with suspect dimms. None of the dimms fail with uncorrectable errors. Some of the dimms fail with ECC error reports, like the following:
    2017-02-28 15:23:35 - All memory ranges successfully locked
    2017-02-28 15:24:44 - [Channel 1, Slot 1] DIMM err count=000001CF
    2017-02-28 15:24:44 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
    2017-02-28 15:24:45 - [Channel 1, Slot 1] DIMM err count=000001F8
    2017-02-28 15:24:45 - [MEM ERROR - ECC] Test: 3, (Col,Row,Rank,Bank): (N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 1/1
    2017-02-28 15:26:27 - Cleanup - Releasing all memory ranges...

    These ECC errors are reported on the memtest86 runtime screen, and in the summary Test report with count and last 10 occurrences.

    But there are other dimms that cause corrected errors that are reported in neither place, but only in the memtest86.log file, and I wonder what is different about these errors and why they aren't reported in the summary Test report. They look like this:
    2017-02-28 18:04:55 - All memory ranges successfully locked
    2017-02-28 18:05:18 - MC10_STATUS=8800005000800091 (Overflow=No, Uncorrected=No, Recoverable=No, Corrected error count=1, Error code=0091)
    2017-02-28 18:05:18 - Transaction error type: Memory read error
    2017-02-28 18:05:18 - Model specific error: Corrected memory read error
    2017-02-28 18:05:18 - MC10_ADDR=0000000000000000
    2017-02-28 18:05:18 - MC10_MISC=4918C00400042000
    2017-02-28 18:10:00 - Cleanup - Releasing all memory ranges...

    Thank you
    Last edited by Jag; Mar-01-2017, 06:09 PM.

  • #2
    Depending on the chipset configuration, ECC errors may also be reported in the Machine Check MSR registers. However, because not all chipsets use the Machine Check Exception mechanism to report ECC errors, it is not used to trigger an ECC error in MemTest86 though we output it to the log file for information purposes. For the Haswell chipset, a corrected ECC error count register is polled instead.

    Comment

    Working...
    X