Announcement

Collapse
No announcement yet.

How do you verify ECC error injection working?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do you verify ECC error injection working?

    Tried ECC error injection on two systems, X99 + Xeon 2670v3 won't enable injection, Xeon E3-1276 V3 on C216 chipset will.

    How I'm running Hammer on both with v.7.0.0 beta and the Xeon E3 isn't reporting anything other than [ECC Inject] being active.

    Is ECC inject actually taking place in the Row Hammer test? (I could imagine that test is complicated enough without)

    And how would I otherwise be able to distinguish injected vs. discovered error?

  • #2
    For ECC injection to work properly on Intel systems, it must satisfy the following conditions:
    1. The chipset must support hardware injection capabilities
    2. The BIOS must not lock the ECC injection registers. Once they are locked, it cannot be unlocked until the next power cycle. (ECC injection is a debug mechanism that is not meant to be enabled in production systems)


    If 1) is not satisfied, MemTest86 would not allow ECC injection to be enabled.

    If 1) is true but 2) is not, you will see MemTest86 attempt to inject ECC errors ("[ECC Inject]" displayed on screen) but there are no subsequent detected ECC errors. This seems to be what is happening in your case. To fix this, there may be an option in the BIOS setup to leave the ECC injection registers unlocked. Otherwise, you may need to obtain a modified BIOS that does not perform locking of the ECC registers.

    ECC injection is performed at the beginning of each test, so if ECC detection is working properly you would see detected ECC errors subsequently after each injection.

    Update: We had a someone ask what the CPU registers are that block ECC injection on Intel platforms. These are MSRs,

    LT_UNLOCK_MEMORY
    LT_LOCK_MEMORY
    But like many MSRs there is very little documentation available on their use.

    Comment


    • #3
      Originally posted by keith View Post
      For ECC injection to work properly on Intel systems, it must satisfy the following conditions:

      1. The chipset must support hardware injection capabilities
      2. The BIOS must not lock the ECC injection registers. Once they are locked, it cannot be unlocked until the next power cycle. (ECC injection is a debug mechanism that is not meant to be enabled in production systems)


      If 1) is not satisfied, MemTest86 would not allow ECC injection to be enabled.

      If 1) is true but 2) is not, you will see MemTest86 attempt to inject ECC errors ("[ECC Inject]" displayed on screen) but there are no subsequent detected ECC errors. This seems to be what is happening in your case. To fix this, there may be an option in the BIOS setup to leave the ECC injection registers unlocked. Otherwise, you may need to obtain a modified BIOS that does not perform locking of the ECC registers.

      ECC injection is performed at the beginning of each test, so if ECC detection is working properly you would see detected ECC errors subsequently after each injection.
      If error injection registers can be locked, I'd have to agree that they probably are by default, most likely to avoid issues with malware or other types of creative hacking.

      And that motherboard manufacturers will perhaps enable unlocked injection registers via some diagnostics device connected to the TPM port or a USB dongle, far less likely via a special BIOS (too many things can go wrong during BIOS flashing).

      I guess that if the locked state could easily be discovered, e.g. by reading back error injection registers or some status bit, you'd have done so and you would report on it.

      The way things stand, with 99.9% of all end-user devices supporting ECC being locked by default and that state only being discoverable by activating error injection and then no fault being reported, the error injection PRO feature has almost negative value:

      I was seriously confused, that error seemed to be injected and then nothing happened and you don't want to create confusion when you sell a diagnostics product instead of entertainment.

      Thanks a lot for clearing that up and I'm pretty sure I'll shell out the money for the pro version, once v7 is comes out of beta (don't want to pay extra for an upgrade just to show my appreciation.

      Comment


      • #4
        Unfortunately, this locking mechanism is present in certain Intel chipsets (particularly the Intel Xeon E3 families) which limits the ECC injection capability. For others (AMD and other Intel chipsets), MemTest86 should be able to inject ECC errors and subsequently detect the corrected ECC errors.

        Comment


        • #5
          Originally posted by keith View Post
          Unfortunately, this locking mechanism is present in certain Intel chipsets (particularly the Intel Xeon E3 families) which limits the ECC injection capability. For others (AMD and other Intel chipsets), MemTest86 should be able to inject ECC errors and subsequently detect the corrected ECC errors.
          Sorry to bring up an old post but this is along the lines of my question. I am building two systems to test ECC enabled DDR3 and DDR4 modules and would ideally be able to use the error injection capability of the Pro version of memtest. Is there a compatibility list or suggestions on what motherboard/chipset/CPU to use to have the best chance of this?

          Comment


          • #6
            The current list can be found at the bottom of this MemTest86 features page.

            As at 11/May/2017, the following chipsets may support this feature (depending on your BIOS configurations):
            • AMD Bulldozer (15h)
            • AMD Jaguar (16h)
            • Intel Nehalem
            • Intel Lynnfield
            • Intel Westmere
            • Intel Xeon E3 family
            • Intel Xeon E3 v2 family
            • Intel Xeon E3 v3 family
            • Intel Skylake
            • Intel Atom C2000
            • Intel Broadwell-H

            Comment

            Working...
            X