Announcement

Collapse
No announcement yet.

G.Skill claiming Rowhammer test is not indicative of stability- BS?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • G.Skill claiming Rowhammer test is not indicative of stability- BS?

    I RMA'd a faulty G.Skill memory kit (consistent bad addresses on one of the sticks).

    They sent me a new kit of the same kind- the new kit fails the Rowhammer test (13) consistently when both sticks are inserted, but each stick passes by itself.
    Additional testing info:
    • tested on both dual channel configurations, multiple times)
    • Tested with XMP off as well - they still fail.
    • Tested a different memory kit (dual channel) - passes, no issue.
    I contacted G.Skill expressing my dissatisfaction with this (being the 3rd faulty G.Skill kit including another set that had different issues).

    Their reply (below) devaules the rowhammer test as "not an indicator of stability". Is there any basis whatsoever to this?
    I know for a fact that it is not disabled by default as they claim, at least on my version of memtest86.

    Thanks!

    Full G.Skill reply:
    Dear Customer,

    We apologize for the inconvenience.

    First, we would like to establish that the Rowhammer test is not an indicator of stability. To test for system stability, please make sure that the Rowhammer test in Memtest is disabled. Memtest should also have this disabled by default in its standard list of memory tests.

    If you wish to request for a different set of memory kit, we will be able to provide you with free shipping, please let us know is you wish to ship back your memory kit.

    Thank you so much for your understanding and cooperation.

  • #2
    They are not necessarily wrong. The Hammer Test is designed to detect RAM modules that are susceptible to charge leakage exploit. See the FAQ, Why am I only getting errors during Test 13 Hammer Test?

    Comment


    • #3
      It is a difficult area.
      The row hammer issue has been a design problem for decades.
      We were told by the memory vendors that with the introduction of DDR4 the problem would largely go away as mitigations were put in place, both in the RAM and motherboards. This turned out not to be true and it was still a problem in DDR4. The desire for the highest performance at the lowest price means stability is a lower priority (if it wasn't then we would all be using ECC RAM, which would mostly solve the problem)

      There is a good summary in this research paper
      https://arxiv.org/pdf/1904.09724.pdf

      So it is absolutely a design failure of the system (the system being motherboard. memory controller in the CPU and the RAM).

      Will replacing the RAM with more RAM of the same model fix the problem? No probably not.
      Maybe you can fix it by tweaking BIOS settings, e.g. increase the refresh rate

      Should we be hiding errors but turning off the test in MemTest86. Absolutely not in our opinion. Obviously the RAM vendors would like this however.

      What is a big unknown, and we have no data on, is how often do these errors appear in real life applications (as opposed to the well documented security issues that malicious malware exploiting row hammer can cause).

      So in truth we don't know how bad it is in real life.

      Also there are two levels of errors in MemTest86 for row hammer. A actual error (with a FAIL result), and a warning / note (with a PASS result). This is related to how easy it was to force a row hammer error. It is was easy to force an error, then it is a FAIL. It is was difficult or impossible then we PASS the test. In V9.3 of MemTest86 we are going to add a new level

      Green PASS - No errors.
      Yellow PASS - No errors in any normal test (tests 1 to 12), but at least 1 row hammer error, which was hard to provoke.
      Red FAIL - Errors in the normal test or an easy to provoke row hammer error.

      If you got an actual FAIL result, then we believe you are much more likely to see the problem in real life.
      Via anecdotal correspondence with users, we do strongly suspect the cells that are weak in row hammer are also weak to other intermittent errors. But have no data to prove this.

      Comment


      • #4
        Originally posted by Richard (PassMark) View Post
        They are not necessarily wrong. The Hammer Test is designed to detect RAM modules that are susceptible to charge leakage exploit. See the FAQ, Why am I only getting errors during Test 13 Hammer Test?
        I did look at that section prior to posting- My impression was that these are real but hard-to trigger errors, so data corruption might occur with them.

        Originally posted by David (PassMark) View Post
        It is a difficult area.
        What is a big unknown, and we have no data on, is how often do these errors appear in real life applications (as opposed to the well documented security issues that malicious malware exploiting row hammer can cause).

        So in truth we don't know how bad it is in real life.
        ...
        If you got an actual FAIL result, then we believe you are much more likely to see the problem in real life.
        Via anecdotal correspondence with users, we do strongly suspect the cells that are weak in row hammer are also weak to other intermittent errors. But have no data to prove this.
        I see. My result was an actual FAIL (IIRC 4-8 failures per 4 passes, on different memory addresses).
        Thanks for the info. I'll take up G.Skill on their offer to replace the modules. Perhaps I'll sell the replacement and buy a different brand.

        Comment

        Working...
        X