Announcement

Collapse
No announcement yet.

Occasional crashes, memtests86 and memtest86+ huge difference

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Occasional crashes, memtests86 and memtest86+ huge difference

    Dear experts,

    in the following text I will be a bit more descriptive, in order to describe what started the computer issues, the occasional crashes, how I tried to diagnose it, and to give information for future people with a similar issue so they can save time as I already spent 2 weeks with this.

    I had my computer for almost 2 years with no issues:
    Asus rog strix Z690-I
    Intel i7-12700K
    Corsair 32 GB kit DDR5 5600MHz CL36 Vengeance Black
    SSD Samsung 980 PRO 2 TB
    Windows 11
    NVidia 3080 FE

    GPU was all the time little bit undervolted, rest components are left on stock (auto in BIOS)

    Before Christmas I did too many things: updated win/drivers, dusted the pc internals (without unplugging anything), connected a second monitor via DP->VGA converter and enabled XMP (changed from auto -> XMP1)
    For 3 days everything was OK and then I received first BSOD pointing on memory issues.
    As a first thing I disabled the XMP profile, but the crash occurred again.
    Worried that the XMP had damaged the HW I run the Passmark Memtest86 (free version coming with the motherboard) with no found errors.
    At this point I was crashing/freezing twice a day.
    I even completely reinstalled BIOS to be fully sure no trace of XMP is there. Still crashing.
    So I did disconnect the second monitor and the crashes disappeared, for a while.
    After about 2 weeks I received another BSOD on memory and froze twice. Even reverted GPU drivers for older version.

    As the freezes were now rarer (like once in 2-3 days), I decided to run longer tests using memtest86+ as I though the 1h test using memtest86 was not long enough.
    It found an memory error in the 1st GB of memory (once 800 MB, 700 MB and multiple times around 3 MB) usually after around 3-4h of running. So I started suspecting one chip on the RAM to be bad.
    As my computer freezes were less common at this stage, I was trusting more the memtest86+ results. But weirdly once during the test the computer restarted and second time memtest86+ froze.
    That weekend I was playing on the pc almost 12h a day and only once Unreal game crashed, which might be unrelated to the memory issue.

    To be sure the errors are not coming from the GPU (as the crashes were now less often with the screen removed, I was worried the adapter might have damaged it - but I would expect the GPU would filter any problems and protect the motherboard). I have removed the GPU, enabled the 12700K iGPU, and run the test again.
    Memtest86+ found errors around the same location (3 MB), but after around 5h the CPU threw unexpected interrupt (Gen. Prot) and stopped the process with the stack trace around the addresses memtest86+ was finding issues. This repeated during second test (this time no memory issues found), after 5h CPU interrupted from different core.
    To test the CPU, I have booted the system. Cinebench run OK for 10 minutes.
    To stress it even more, I have installed Prime95 and run the CPU test. I did not build the computer for such huge load, the temperatures are usually 50-60, and 80 during multicore Cinebench. In 5 minutes, the CPU reached 100 degrees and thermal throttling started, I left it for 10 more minutes before ending the test. No errors found. If there would be an CPU issues I would expect it do not withstand 100 degrees.

    On Monday I tried to do single-channel tests to get more hints if the issues are RAM, CPU, or Mobo related.
    memtest86+ throw hundreds of errors within the first 2 minutes in random addresses all over the memory, on both sticks in both slots and usually completely glitched the display which forced the PC into hard reboot or just froze.
    So I was now worried this points to CPU or Mobo issue.

    Even though, with the single stick I has able to boot into Windows, and I run the Prime95 memory test for 3h with no errors, after I run the memtest86+ again and it run for 1.5h without an error.

    Next day, I tried the memtest86+ again with the same one stick and again it found errors/crashed/hard rebooted the system within first minutes. Thinking it might be some problems with the latest V7 release, I even tried the V6.2 with the same result.

    After, I ran the Passmark Memtest V10.0 from the motherboard. No errors found when I executed it twice in a row on that same stick as before.

    One interesting thing I noticed in single mode memtest86 has these values (L1 cache: 80K 561.8 GB/s, L2 cache 1280K 117.0 GB/s, L3 cache 25600K 42.8 GB/s, memory 17.3 GB/s)
    while memtest86+(L1 cache: 48K 555 GB/s, L2 cache 1.25M 132.0 GB/s, L3 cache 25M 48.6 GB/s, memory 19.7 GB/s)
    Both memory speeds were reacting when I changed the DRAM speed from auto (4800) to 4400 or 5200, but in case of memtest86+ there were always higher.

    In dual mode it was the other way around:
    memtest86 has these values (L1 cache: 80K 538.5 GB/s, L2 cache 1280K 116.5 GB/s, L3 cache 25600K 42.6 GB/s, memory 27.2 GB/s)
    while memtest86+(L1 cache: 48K 575 GB/s, L2 cache 1.25M 127.0 GB/s, L3 cache 25M 51.9 GB/s, memory 25.8 GB/s)

    So, the question is, for the system/RAM I have, what is the expected speed to have in single channel mode? is memtest86+ overestimating it (and thus causing the crashes as the speed is way above stock) or the memtest86 underestimating them?

    Now, I am not sure what to trust. It is hard to believe memtest86+ because it crashes within first 5-10 minutes with one stick while the system runs for hours with no issues while stress-testing it. If those errors would be real I should not be even be able to load Windows with one stick due to its severity.
    But since I sometime had the crash I not sure if to trust the memtest86 0 errors.

    Can I ask you for a recommendation and your opinion? Should I get the memtest86 pro version and let it run for hours to catch the rare (2-3 days) crashes?
    How long should the test run to be sure the RAM is good?

    Thanks a lot!







  • #2
    The latest memtest86+ release is pretty new. It isn't our software, but being so new it would be easy to believe it had some issues. Especially when you have both sticks throwing errors. A double stick failure is rare. You'll need to contact the developer to get help with the "+" release.

    Having said that, there are been a recent run of serious problems with Corsair RAM. See,
    https://forums.passmark.com/memtest8...rs-corsair-ram

    The MB/sec speed difference between MemTest86 and the "+" version isn't important. There is no standard way to benchmark cache and RAM. So different results from different code are to be expected. BIOS settings is what determines the RAM speed, timings and voltage. These aren't set in MemTest86.

    So I would borrow some non Corsair RAM and try that as the next step.

    Comment


    • #3
      Thanks a lot for the reply!
      I know you are not the developers of the "+". But before starting to swap the hardware I actually need some reliable benchmark to evaluate, that is why I reached here.
      I have these sticks for almost 2 years (bought in April 2022) and did not have any issues until recently...so I do not have the recent run of Corsairs.

      I got the same very quick hard crashes (in 2-3 minutes) while running even the older version (V6) of the memtest86+ on a single stick.
      But Windows runs for hours with no issue with that same single stick (so I am doubtful of the crashes now)...as I mentioned before, once windows crashed in two weeks, and after that, like once in 2-3 days, so I might had to borrow the stick for days.

      Yours memtest86 V10.0 never showed error in its free version on both single or dual mode. (so maybe it is not a HW issue, but just some drivers/updates in Wins)
      That is why I am now thinking it provides more closer results to what I experience with the PC, just that the free version might not be running long enough to detect the error?

      Comment


      • #4
        Dear experts,

        I have bought the memtest86 Pro v10.6 and run several tests on both RAM sticks installed:
        1. 10 passes (7.5h) - 0 errors
        2. 20 passes (15h) - 0 errors
        3. 15 passes (9h) excluding fade bit test to not let the RAM cool down - 0 errors
        4. 15 passes (11.5h) installed back dGPU and switched from iGPU - 0 errors

        After these tests, can I consider that the occasional freezes/crashes are not hardware (RAM/mobo/CPU memory controller) related and that the memtest86+ was giving me false positives and that the issues are rather system/software related?

        Thanks a lot!

        Comment


        • #5
          Yes, likely RAM is OK.
          It is very hard to debug issues that only occur every couple of weeks. I would start recording the details in the BSOD to see if there is a pattern, or if it was random. The nirsoft bsod analyser software can help with this.

          Comment

          Working...
          X