Announcement

Collapse
No announcement yet.

ECC Errors Syndrome 0000

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ECC Errors Syndrome 0000

    Howdy,

    So I've received new memory and have been running MemTest86 V10.7 Pro to test it.

    Specifically the platform is:

    Processor: 2x Intel Xeon E5-2699A v4
    Motherboard: X10DRi-T4+
    Memory: 16x SK Hynix / HMAA8GL7MMR4N DDR4-2400 LRDIMM PC4-19200T-L Quad Rank
    Around test pass #6, I've been receiving ECC errors on three modules. I RMAed the modules and one of the replacement modules is still throwing errors. While looking deeper into it, the HTML report shows that they all fail with syndrome 0000.

    Example:

    [ECC Errors] Test: 12, (Channel,Slot,Rank,Bank,Row,Col): (4,1,N/A,N/A,N/A,N/A), ECC Corrected: Yes, Syndrome: 0000, Channel-Slot: 4-1
    Reading more into the details, your website says that syndrome 0000 isn't an error and other sources on the web indicate the same.

    Are these real errors?

    Thank you


  • #2
    Is strange to have 3 bad modules in one batch, then also to have one of the replacement modules being bad as well. Makes me suspicious that something else is going on. Maybe the timings or voltages on the MB aren't quite right. Running them a slightly slower speed might make them stable (e.g. turn off XMP if you have it on).

    syndrome 0000 isn't an error
    That is our understanding. But all this stuff is super poorly documented and nearly never tested by the vendors.
    Seems pretty clear that there was an error that got corrected however.

    Comment


    • #3
      It's a SuperMicro server motherboard, no XMP or anything like that. voltages are reported as well within range, and currently the power supplies aren't loaded anywhere close the max. It's pulling ~500 watts during the testing out of a 2000 watts available

      FWIW, the errors follow the ram chips around, even into other motherboards/power supplies.


      As long as you're sure those are real corrected errors, I'm going to run another round of RMAs and hope the new chips work better.

      Thank you for your time and expertise.

      Comment


      • #4
        voltages are reported as well within range
        How did you measure them? Normally it is pretty hard to measure the real voltage on a RAM module / slot. As well, there are 4 different voltage lines per module. Timings are even harder again to measure.

        BIOS reports what it thinks the voltage is set to, but each RAM slot will be running at slightly different levels. You need something like our ECC tester to measure the slots voltage if you needed an accurate measurement.

        Comment


        • #5
          I was checking via the ipmi sensor data repository and system event log looking for it detecting values out of spec, but you are right that they won't catch everything.

          Comment

          Working...
          X