Announcement

Collapse
No announcement yet.

Memtest86-which memory stick is it referring to?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memtest86-which memory stick is it referring to?

    Test is showing ECC errors on channel 8 and channel 10/A. I'm using a Supermicro H12DSi-N6 with dual Epyc 7F72 and 8 DDR4 sticks per CPU (16 total). DIMM channels are A to H.
    1. When Memtest refers to Channel 8, is that DIMM slot H? Is that CPU 1 or CPU 2?
    2. When Memtest refers to Channel 10 or A, what does that mean? There are only 8 channels per CPU. Why does it refer to Channel 10 as A in the last line in the log?
    Thanks in advance.

    Log:

    2024-12-07 09:24:44 - [Channel 8, Slot 0] DIMM err count=22 (prev=2)

    2024-12-07 09:24:44 - [MEM ERROR - ECC Errors] Test: 2, (Chan,Slot,Rank,Bank,Row,Col): (8,N/A,N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 8-X

    2024-12-07 09:24:45 - [Channel 8, Slot 1] DIMM err count=22 (prev=2)

    2024-12-07 09:24:46 - [Channel 8, Slot 0] DIMM err count=40 (prev=22)

    2024-12-07 09:24:46 - [MEM ERROR - ECC Errors] Test: 2, (Chan,Slot,Rank,Bank,Row,Col): (8,N/A,N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 8-X

    2024-12-07 09:24:46 - [Channel 8, Slot 1] DIMM err count=40 (prev=22)

    2024-12-07 09:25:03 - [Channel 10, Slot 0] DIMM err count=12 (prev=0)

    2024-12-07 09:25:03 - [MEM ERROR - ECC Errors] Test: 2, (Chan,Slot,Rank,Bank,Row,Col): (A,N/A,N/A,N/A,N/A,N/A), ECC Corrected: yes, Syndrome: N/A, Channel/Slot: 10-X

    Click image for larger version

Name:	IMG_2132.jpg
Views:	54
Size:	362.0 KB
ID:	58171Click image for larger version

Name:	memtest86-which-memory-stick-is-it-referring-to-v0-euqkgdx38j5e1.webp
Views:	34
Size:	99.8 KB
ID:	58172
    Click image for larger version

Name:	memtest86-which-memory-stick-is-it-referring-to-v0-8i19iod18j5e1.webp
Views:	35
Size:	51.6 KB
ID:	58173​​

  • #2
    Thanks for the details.

    To prevent further confusion, we are looking into including the CPU socket number in the next public release when reporting ECC errors.

    In the current release, channel numbers are assigned consecutively with respect to the CPU socket. So in your particular case, channels 0-7 would refer to CPU1 and channels 8-15 refer to CPU2.

    The 'A' in reference to the channel in the log file is in hex format, so it refers to channel 10.

    Comment

    Working...
    X