Announcement

Collapse
No announcement yet.

Channel/Slot mapping (from ECC error report) on ASRock X99 WS-E?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Channel/Slot mapping (from ECC error report) on ASRock X99 WS-E?

    Maybe what I'm thinking of isn't possible, but worth a shot to ask.

    The ASRock X99 WS-E board has memory slots named:
    A1 A2 B1 B2 | D2 D1 C2 C1

    Memtest 6.3 reports ECC errors on Channel 3/Slot 0.
    If I follow the logic that 0 1 2 3 = A B C D, and 0 1 = 1 2, then this would mean that the RDIMM in slot C1 is the one producing the errors. Is this a correct assumption? Or is the Channel/Slot designation here meaningless and won't allow me to know which module is bad other than keeping removing modules? (currently 4 installed)

    Further, if this designation of channel/slot is indicative to a particular DIMM, is it possible for Memtest to report say the serial number of the DIMM in question that's failing when available? That would really help with knowing what DIMM is bad.

  • #2
    It is not a given that the reported slot/channel number corresponds exactly to the naming of the physical slots on the board. But it would be a good bet to start with when attempting to isolate the failing module.

    You may be able to look up the the module's serial number in the 'View detailed RAM (SPD) info' option in the Main Menu. Again, there is no guarantee on the ordering of the DIMM modules.

    Comment


    • #3
      Originally posted by keith View Post
      It is not a given that the reported slot/channel number corresponds exactly to the naming of the physical slots on the board. But it would be a good bet to start with when attempting to isolate the failing module.

      You may be able to look up the the module's serial number in the 'View detailed RAM (SPD) info' option in the Main Menu. Again, there is no guarantee on the ordering of the DIMM modules.
      Okay, that's what I figured in re: no guarantee on ordering. The serial number is shown where you mention, however, it is not mapped to the reported Channel/Slot information reported with the ECC errors. The DIMMs in the 'View detailed RAM (SPD) info' screen are enumerated as: 1..N, where N is the number of DIMMs installed. Would be nice if the extended information reported Channel/Slot information if available to be able to match the errors up to a particular DIMM.

      I looked briefly through the 5.1 source code here: https://github.com/Distrotech/memtest86 and searched the code for that string, however I didn't find an exact reference, so it's clear it changed quite a bit since then.

      Edit: That source code is actually for memtest86+, and not memtest86, so it's reasonable that I didn't find the string...
      Last edited by vacaloca; Jun-01-2016, 03:26 AM.

      Comment


      • #4
        Originally posted by vacaloca View Post
        Would be nice if the extended information reported Channel/Slot information if available to be able to match the errors up to a particular DIMM.
        Agree that it would be nice but in reality we do not know which module's SPD is connected to which channel/slot.

        Comment


        • #5
          Originally posted by keith View Post
          Agree that it would be nice but in reality we do not know which module's SPD is connected to which channel/slot.
          Fair enough. On this particular motherboard, I was getting Channel 3/Slot 0 (D1) ECC correctable error messages, usually on tests 5 and/or 7, with four DIMMs installed (quad channel).

          I had previously done an ~8 hour test with three modules (A1 B1 C1) with no issues, but when adding the fourth module (D1), I got ECC correctable errors in test 2 and 5 after two runs spanning about 4-5 hours. To see if it's an issue with quad-channel or a particular DIMM, I removed the original good module in slot A1 and swapped it with the suspected bad one that was in slot D1.

          The result was that 7 ECC errors were reported by Memtest 7.0.0 Beta, this time on Channel 0/Slot 0 (A1), which coincides with the individual DIMM swapped from slot D1 to slot A1 being bad.

          Since now I seem to have a handle of the mapping, I am testing the remaining DIMMs 1 by 1 on slot A1.
          Last edited by vacaloca; Jun-06-2016, 01:50 AM. Reason: updated summary of debugging bad RDIMMs

          Comment


          • #6
            So, it turns out that the motherboard slot D1 (Ch3 slot 0) was bad after doing some extensive testing, as it got to the point it was spitting ECC uncorrectable errors seconds after the memtest run started, will re-test with new board after I receive it.

            Comment


            • #7
              Originally posted by vacaloca View Post
              So, it turns out that the motherboard slot D1 (Ch3 slot 0) was bad after doing some extensive testing, as it got to the point it was spitting ECC uncorrectable errors seconds after the memtest run started, will re-test with new board after I receive it.
              2nd replacement board had an issue with PCI-E slots or PLX chip, but memory passed 4-6 passes of memtest without issues. 3rd board is the charm so far and seems good for both PCI-E and memory slots as well. Solved... after a while

              Comment

              Working...
              X