Announcement

Collapse
No announcement yet.

Why MemTest86 might not find all RAM errors.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why MemTest86 might not find all RAM errors.

    MemTest86 is a pretty good RAM tester. It finds the vast majority of issues. But from time to time we see rare cases where the RAM is probably bad, but it isn't being detected as bad.

    Here are some of the reasons why MemTest86 might not always find errors.

    1) There are "soft" errors. The RAM susceptibility to these random errors can't really be detected by quick testing.
    https://en.wikipedia.org/wiki/Soft_error

    2) There is marginal RAM that only shows errors under certain environmental conditions, like voltages, temperature or EMI conditions. Running Windows means the SSD, GPU and maybe CPU all have more load on them. This can effect the environmental conditions. We have BurnInTest software for in Windows testing. Very occasionally it finds an error where MemTest86 doesn't.
    https://www.passmark.com/products/burnintest/index.php
    Or there might be something external to the PC that triggers the marginal RAM to fail at a particular time (e.g. mains power spike, EMI pulse from external devices like Microwave ovens, LED lights, motors, power supplies, etc..)

    3) The RAM might have in fact been initially OK when tested. Then gone bad later on, once in production.

    4) There is a small amount of RAM that can not be tested. BIOS reserves RAM for it's own use. In a 32GB system, this can be around 2% of the total RAM. But it can vary a lot between machines. A fault in this 2% can still cause problems. Example memory map is here
    https://www.memtest86.com/img/memtest86-memory-map.png

    5) The RAM was marginal but the error was just rare and random. So a very low, but non-zero bit error rate. (e.g. 1 fault every 3 days of heavy use) and you didn't get lucky when you did the test.

    6) Sometimes RAM just fails with a specific access pattern. Maybe one that MemTest86 doesn't exercise. There are a infinite number of patterns however and we can't do them all in limited time. Row hammer effects are particularly insidious.

    7) We had one case of RAM only ever failing during DMA access (so very rare, but we confirmed it happened). This looks like disk corruption, but it was a RAM issue. We added an optional DMA test for this in MemTest86. The test was optional, as real disk errors, also trigger failures in this test and disk errors are more common that RAM errors.

    Note that using ECC RAM solves nearly all of the above cases.

Working...
X