Announcement

Collapse
No announcement yet.

Memtest Pro 9.0 / Threadripper Pro / 128GB Modules - False Positives?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memtest Pro 9.0 / Threadripper Pro / 128GB Modules - False Positives?

    Hi all,

    I work for an SI and we use Memtest on everything (literally, everything). It's not the be all and end all, obviously. But it gives us a good idea on system stability.

    We've got four systems in progress at the moment with the new AMD Threadripper Pro CPUs (2x 3975WX, 2x 3995WX), all on the ASUS Pro WS WRX80E-SAGE SE WIFI mainboard.

    The systems all work, fully populated with 16GB, 32GB and 64GB (3200MHz) ECC Registered parts (all Samsung original parts). However, one system needs 8x 128GB 2933MHz parts.

    They're on the ASUS QVL list - M393AAG40M3B-CYFCQ - for 1, 2, 4 and 8 DIMM configurations.

    If all eight modules are run together I'll get 1000+ errors and the big red FAIL. I broke them down to pairs and 3 of the four pairs report ECC Errors - but nothing for total errors, so I'm guessing the ECC nature of the DIMMS/system is correcting any problems at that point.

    Ran individually, the DIMMs will all pass. I've taken four of the modules that tested for 60+ hours (no errors) and put them in a system together - and they produce ECC errors in <20 minutes.

    I know it's not the same thing, but the built-in Windows tests takes two days - but completes with no errors.

    Is my configuration just too new? could these be false positives?

    I do feel I may have one faulty DIMM as it produces ECC errors on it's own (it's the only one)... but it's really odd.

    Any ideas/thoughts would be very much appreciated.

  • #2
    Memtest should still work with that large amount of RAM.

    We aren't aware of any false positive issues. Sometimes there are BIOS bugs that cause problems (e.g. wrong voltage set by BIOS, or wrong memory map). While these type of errors aren't hardware fault, they aren't false errors either.

    Memtest would only count errors if there are two bits that have errors. As the ECC RAM would fix the single bit errors, Memtest would not count these to total errors.

    Some other possible reasons the errors may be happening is there may be a electrical issue or EMI with all the DIMMs being plugged in
    Try to contact ASUS - while they are on the QVL list, they might not have tested it thoroughly in such large amounts of RAM.


    Comment


    • #3
      I don't want to jinx it... but I think it might be heat-related. The DIMMs are the hottest thing I've ever felt inside a computer, I think. I've got an industrial Noctua fan blasting at them now and it's been fine for the first hour or so.

      Comment


      • #4
        I could totally believe the errors might be heat related with 8 x 128GB sticks.

        Comment


        • #5
          Definitely heat-related.

          I thought I'd come and post a follow-up. This can be closed.

          Thanks!

          Comment

          Working...
          X