Announcement

Collapse
No announcement yet.

Has anyone successfully got ECC Injection detected and corrected on AM4?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Has anyone successfully got ECC Injection detected and corrected on AM4?

    I'm waiting for my PRO CPU to arrive to complete the 2nd half of my testing. Currently using non-PRO CPUs with different boards.

    I'm able to get the ECC Injection working, there's even a BIOS setting in the Asus PRIME X370-PRO board that says "Disable Memory Error Injection" and it's on TRUE by default. I set it to False and there's no longer warnings in memtest about it being disabled. However, still no ECC Errors detected messages.

    On an Asrock B550M PRO SE board, there's no options whatsoever. However, I do see a message pop up about [ECC Errors], but not [ECC Errors Detected]; this is different from the Asus board. The summary in HTML report shows the full message of the ECC Error and it says "Corrected: No". I'm guessing this means ECC isn't working despite the initial startup/configuration page saying ECC is enabled.

    Regardless, several posts mentioned ECC is just all over the place with retail AMD CPUs. These boards both claim to be ECC functional with PRO CPUs, so once my CPU arrives, I'll do the same testing and see if the injected errors are corrected.

  • #2
    several posts mentioned ECC is just all over the place
    Correct. It is all a bit of a mess.

    CPU needs to support it. Motherboard needs to support it. BIOS needs to be support it. RAM needs to support it.
    Then there is different levels of support and different types of ECC. ECC active for corrections, but reporting & monitoring doesn't work. Injection might or might not work depending on BIOS and the CPU. We have been documenting details here
    https://www.memtest86.com/ecc.htm

    ECC Error and it says "Corrected: No"
    If ECC was active, then this might mean the ECC detected was 2 bits.
    ECC corrects 1 bad bit and reports, but doesn't correct, 2 bad bits.

    Comment


    • #3
      Thanks for the reply. It turns out that Asrock board had a defective DIMM slot and that's where the errors were happening; typical Asrock quality.
      Anyway, I sent it back and got an MSI B550M PRO-VDH. And my PRO CPU arrived and here are the test results:
      1. CPU supports it: yes. Ryzen 3 PRO 1300 and 5700X3D
      2. Motherboard supports it: yes; MSI B550M PRO-VDH. Other MSI boards specifically says ECC UDIMM will run in non-ECC mode. This one does not say that and there's an option in the BIOS for "DRAM ECC" which I set it Enabled. No PFEH or similar setting however.
      3. RAM supports it: yes. KSM32ED8/16HD Kingston UDIMM ECC.
      4. BIOS supports it: yes? Using the latest BIOS as of this date; it's a "beta" BIOS, only changes are updated AGESA and "Sinkclose" vulnerability.
      • In Memtest, both CPUs say ECC is enabled.
      • When running with ECC error injection, the non-PRO 5700X3D gives a warning saying injection may be disabled. With the PRO 1300, it does not give that warning.
      • Neither CPUs report ECC errors or corrections during the memtest run. I assume the platform handles it first and there's no way to change that in the BIOS.
      • In Windows, there are 2 commands that can be run to check ECC functionality:
        1. wmic memorychip get datawidth,totalwidth
        2. wmic memphysical get memoryerrorcorrection​
      • For the 5700X3D: #1 reports 64 and 72. #2 reports 6; apparently 5 is single bit ECC and 6 is multi-bit ECC?
      • For the PRO 1300: #1 reports 64 and 128. #2 reports 6 as well.
      I'm not sure about the discrepancy between the 2 widths for #1. Your thoughts on that David?

      Comment


      • #4
        We don't really know how accurate wmic is. But we are fairly sure that is uses & depends upon the SMBios data structures. These are populated by the BIOS firmware at boot time. We seen dozens on errors in the SMBios data over the years. Motherboard vendor don't really check this data is correct and don't bother fixing it even when they know it is wrong.

        Some quick Googling on memoryerrorcorrection turned up this list of possible values. See below.
        I don't think #6 is not correct however. As most ECC RAM only corrects 1 bad bit, not multiple bits. So just more confusion......

        Click image for larger version

Name:	image.png
Views:	34
Size:	60.8 KB
ID:	58559

        For datawidth & totalwidth. I don't see how 64, 128 could be correct for ECC RAM. Maybe the 128bits represents the dual channel width, but then where are the extra ECC bits. So even more confusion. I would dump the SMBios structures next to see what was in them if you wanted to investigate more. But probably isn't much point as MSI won't fix any bugs you find.

        Comment


        • #5
          I agree that none of these manufacturers will bother with fixing any of this stuff as it's an "enterprise" feature on a consumer board.
          That said, I got my hands on a Gigabyte B550M DS3H and will do a final test later on Asus X370-PRO.

          Gigabyte came in with a surprise.
          With the Ryzen PRO CPU, the BIOS displayed options for both PFEH and DRAM ECC. I disabled the previous and enabled the latter.
          Actually, PFEH is forced to be disabled on "B1 Stepping", which I think is a designated stepping for Ryzen PRO CPUs? I couldn't enable it after disabling it.

          Anyway, with the PRO CPU, Memtest injected ECC error and the report showed it corrected it!
          The non-PRO 5700X3D removes the PFEH option in the BIOS, I can't enable or disable it. DRAM ECC is still there and I have that enabled. There is also "Disable Memory Error Injection" that appears now, which I set to False.
          Memtest shows that it injects the errors but no reports of error correction. I can only assume the platform is handling it.

          Similar to the MSI board with WMIC, the PRO CPU shows 64/128 for the widths and 64/72 for the non-PRO CPU. Both report 6 for ECC type.

          At the very least, I can confirm this motherboard has ECC support with a PRO CPU. Whether it works with a non-PRO CPU can't be 100% confirmed with Memtest.
          However, both Memtest and WMIC show differences between ECC and non-ECC memory, so some assumptions can be made.
          Last edited by watchDominion; Feb-15-2025, 10:09 AM.

          Comment


          • #6
            OK, did the same tests with the Asus X370, results are the same as the MSI. ECC error injection, detection, and correction with a PRO CPU isn't working. Only the Gigabyte fully works and confirms ECC errors getting corrected with Memtest with a PRO CPU.

            There is PFEH and it's disabled by default for "B1 Stepping" CPUs, like the Gigabyte.

            Comment


            • #7
              Thanks for doing the test.
              Proves that motherboard support is very hit and miss. Would not surprise me at all if ASUS and MSI never tested ECC with these boards.

              Comment

              Working...
              X