Announcement

Collapse
No announcement yet.

Question about ECC injection

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about ECC injection

    With MemTest86 V7.4 Pro I have been testing memory on an ECC-memory-equipped Asrock C27504Dl (Intel Atom C2750 @ 2.40GHz) that has reported memory errors through the FreeNAS 11 (FreeBSD11) operating system. So far memory has been tested as single 8GB sticks and in pairs of 8GB sticks after one failed test with all 4 sticks installed that appeared to point to one stick as the culprit. The single and pair stick tests so far have all passed.
    I had ECC injection enabled. Contrary to my expectations there have been very few system events recorded in the Event Log. For instance, in about 11 hours of testing yesterday involving 6 tests each of four passes there were zero ECC errors asserted.
    On the day before I believe the situation was the same, the only memory events recorded look to have been during initial bios bootup before resetting and forcing a UEFI boot into V7.4 Pro - but I was not carefully studying test start times so I cannot be certain for that day.
    What should I expect to be seeing in the Event Log, if anything? How will I be able to determine if indeed ECC injection is functional?

  • #2
    I assume the "Event log" you are referring to in the MemTest86 test report? And not the Windows Event log, or some other log in BSD?

    ECC injection should produce 1 error per test if injection is supported by the hardware the injection is turned on (i.e. 1 error every couple of minutes).
    So if you aren't seeing 1 ECC error per test the the error injection function might not be working.

    Comment


    • #3
      Thanks. Sorry that I wasn't clear. The Event Log to which I was referring is that provided by the motherboard's BMC IPMI functionality - where all system events are expected to be logged. There were no ECC errors reported during the most recent 12 hour plus test - while the "ECC inject" screen indication was present and the ">" proceeding down the list once the screen space was fully populated. The event log does show some ECC errors during boot which seems to indicate that its functionality for ECC error capture and reporting capabilities are functional.

      Comment


      • #4
        It still isn't clear to me did MemTest86 report that ECC errors were detected?

        Comment


        • #5
          Thanks.

          Memtest86 reported nothing. It indicated with screen lines reading "[ECC Inject] .... Transaction Router" (once per test, I can observe, as you suggested).

          Should I have any expectation of any response from the either Memtest86 or the system to this error injection? If there is no correction of an injected error, what indication do we have? If there is correction of an injected error, what indication do we have?









          Comment


          • #6
            Is this really so difficult a question to answer?
            Should I have any expectation of any response from the either Memtest86 or the system to this error injection? If there is no correction of an injected error, what indication do we have? If there is correction of an injected error, what indication do we have?
            Although this was not the original stimulus for my posting (which was a technical inquiry to assist me understanding the testing protocol), how about I ask it this way? Respectfully - I paid $35 for Pro to get the ECC error injection function. How might I know that I got more for my money than a screen prompt telling me that there was an error injection?

            Thanks in anticipation.


            Comment


            • #7
              As pointed our earlier,
              "ECC injection should produce 1 error per test if injection is supported by the hardware the injection is turned on (i.e. 1 error every couple of minutes).
              So if you aren't seeing 1 ECC error per test the the error injection function might not be working."


              If you aren't seeing (on the screen of Memtest86) one error reported per injected error, there is something wrong.

              We don't have enough information to tell you exactly what is wrong. It might be a hardware issue, it might be the RAM you are using, it might be a difference of behavior in this Atom C2750 CPU compared to the other C2000 series CPUs, might be a firmware issue. Without having the machine to investigate we can only guess.

              Email us the MemTest86 debug log file if you want (MemTest86.log), we can see if there is anything obvious in the log which might explain it.

              Comment


              • #8
                Thanks, David. I am emailing you the debug log as requested. I will be interested if you can find something. I certainly did not see "one error reported per injected error". As the test progressed got to 10 or 11 rows stating error injection after which a > moved line by line against the text lines already on the screen "[ECC Inject] Injecting ECC Error for Intel Atom C2000 SoC Transaction Router".
                John N.

                Comment


                • #9
                  Got the log. Your hardware supports ECC injection. So no problem there.

                  But the DRAM Control Operation (DCO) register is showing that the ECC injection has been disabled in your BIOS firmware. You may want to check your BIOS setup to see if there is an option to enable ECC injection. Otherwise, you would need to flash a custom BIOS to prevent the ECC injection feature from being disabled.

                  In the next release of MemTest86 we'll decode the bits in the DCO registers to provide a clearer indication in the log that the injection feature is disabled in BIOS.
                  At the moment this string appearing in the log shows the feature was disabled. "DCO=C0101201"

                  The warning message in the next release will be,

                  **Warning** DRPLOCK is set to 1. DRPLOCK must be cleared by the BIOS to enable error injection

                  Comment


                  • #10
                    Thanks for the review and the explanation. I appreciate the comeback. I will be making a note about this on the FreeNAS forum as there were several people interested in the issue.
                    Best.
                    John N.

                    Comment


                    • #11
                      I suspect the rational for vendors disabling it, is that they see it only as an internal debugging feature and (they claim) there is potential for Malware to exploit it. Which is also the reason you can't turn the feature back on in software once BIOS turned it off. Malware might turn it back on, then exploit it. i.e. there is no ON switch. Just an OFF switch.

                      Comment


                      • #12
                        Thanks once again, seems logical.

                        Comment

                        Working...
                        X