Dear experts,
in the following text I will be a bit more descriptive, in order to describe what started the computer issues, the occasional crashes, how I tried to diagnose it, and to give information for future people with a similar issue so they can save time as I already spent 2 weeks with this.
I had my computer for almost 2 years with no issues:
Asus rog strix Z690-I
Intel i7-12700K
Corsair 32 GB kit DDR5 5600MHz CL36 Vengeance Black
SSD Samsung 980 PRO 2 TB
Windows 11
NVidia 3080 FE
GPU was all the time little bit undervolted, rest components are left on stock (auto in BIOS)
Before Christmas I did too many things: updated win/drivers, dusted the pc internals (without unplugging anything), connected a second monitor via DP->VGA converter and enabled XMP (changed from auto -> XMP1)
For 3 days everything was OK and then I received first BSOD pointing on memory issues.
As a first thing I disabled the XMP profile, but the crash occurred again.
Worried that the XMP had damaged the HW I run the Passmark Memtest86 (free version coming with the motherboard) with no found errors.
At this point I was crashing/freezing twice a day.
I even completely reinstalled BIOS to be fully sure no trace of XMP is there. Still crashing.
So I did disconnect the second monitor and the crashes disappeared, for a while.
After about 2 weeks I received another BSOD on memory and froze twice. Even reverted GPU drivers for older version.
As the freezes were now rarer (like once in 2-3 days), I decided to run longer tests using memtest86+ as I though the 1h test using memtest86 was not long enough.
It found an memory error in the 1st GB of memory (once 800 MB, 700 MB and multiple times around 3 MB) usually after around 3-4h of running. So I started suspecting one chip on the RAM to be bad.
As my computer freezes were less common at this stage, I was trusting more the memtest86+ results. But weirdly once during the test the computer restarted and second time memtest86+ froze.
That weekend I was playing on the pc almost 12h a day and only once Unreal game crashed, which might be unrelated to the memory issue.
To be sure the errors are not coming from the GPU (as the crashes were now less often with the screen removed, I was worried the adapter might have damaged it - but I would expect the GPU would filter any problems and protect the motherboard). I have removed the GPU, enabled the 12700K iGPU, and run the test again.
Memtest86+ found errors around the same location (3 MB), but after around 5h the CPU threw unexpected interrupt (Gen. Prot) and stopped the process with the stack trace around the addresses memtest86+ was finding issues. This repeated during second test (this time no memory issues found), after 5h CPU interrupted from different core.
To test the CPU, I have booted the system. Cinebench run OK for 10 minutes.
To stress it even more, I have installed Prime95 and run the CPU test. I did not build the computer for such huge load, the temperatures are usually 50-60, and 80 during multicore Cinebench. In 5 minutes, the CPU reached 100 degrees and thermal throttling started, I left it for 10 more minutes before ending the test. No errors found. If there would be an CPU issues I would expect it do not withstand 100 degrees.
On Monday I tried to do single-channel tests to get more hints if the issues are RAM, CPU, or Mobo related.
memtest86+ throw hundreds of errors within the first 2 minutes in random addresses all over the memory, on both sticks in both slots and usually completely glitched the display which forced the PC into hard reboot or just froze.
So I was now worried this points to CPU or Mobo issue.
Even though, with the single stick I has able to boot into Windows, and I run the Prime95 memory test for 3h with no errors, after I run the memtest86+ again and it run for 1.5h without an error.
Next day, I tried the memtest86+ again with the same one stick and again it found errors/crashed/hard rebooted the system within first minutes. Thinking it might be some problems with the latest V7 release, I even tried the V6.2 with the same result.
After, I ran the Passmark Memtest V10.0 from the motherboard. No errors found when I executed it twice in a row on that same stick as before.
One interesting thing I noticed in single mode memtest86 has these values (L1 cache: 80K 561.8 GB/s, L2 cache 1280K 117.0 GB/s, L3 cache 25600K 42.8 GB/s, memory 17.3 GB/s)
while memtest86+(L1 cache: 48K 555 GB/s, L2 cache 1.25M 132.0 GB/s, L3 cache 25M 48.6 GB/s, memory 19.7 GB/s)
Both memory speeds were reacting when I changed the DRAM speed from auto (4800) to 4400 or 5200, but in case of memtest86+ there were always higher.
In dual mode it was the other way around:
memtest86 has these values (L1 cache: 80K 538.5 GB/s, L2 cache 1280K 116.5 GB/s, L3 cache 25600K 42.6 GB/s, memory 27.2 GB/s)
while memtest86+(L1 cache: 48K 575 GB/s, L2 cache 1.25M 127.0 GB/s, L3 cache 25M 51.9 GB/s, memory 25.8 GB/s)
So, the question is, for the system/RAM I have, what is the expected speed to have in single channel mode? is memtest86+ overestimating it (and thus causing the crashes as the speed is way above stock) or the memtest86 underestimating them?
Now, I am not sure what to trust. It is hard to believe memtest86+ because it crashes within first 5-10 minutes with one stick while the system runs for hours with no issues while stress-testing it. If those errors would be real I should not be even be able to load Windows with one stick due to its severity.
But since I sometime had the crash I not sure if to trust the memtest86 0 errors.
Can I ask you for a recommendation and your opinion? Should I get the memtest86 pro version and let it run for hours to catch the rare (2-3 days) crashes?
How long should the test run to be sure the RAM is good?
Thanks a lot!
in the following text I will be a bit more descriptive, in order to describe what started the computer issues, the occasional crashes, how I tried to diagnose it, and to give information for future people with a similar issue so they can save time as I already spent 2 weeks with this.
I had my computer for almost 2 years with no issues:
Asus rog strix Z690-I
Intel i7-12700K
Corsair 32 GB kit DDR5 5600MHz CL36 Vengeance Black
SSD Samsung 980 PRO 2 TB
Windows 11
NVidia 3080 FE
GPU was all the time little bit undervolted, rest components are left on stock (auto in BIOS)
Before Christmas I did too many things: updated win/drivers, dusted the pc internals (without unplugging anything), connected a second monitor via DP->VGA converter and enabled XMP (changed from auto -> XMP1)
For 3 days everything was OK and then I received first BSOD pointing on memory issues.
As a first thing I disabled the XMP profile, but the crash occurred again.
Worried that the XMP had damaged the HW I run the Passmark Memtest86 (free version coming with the motherboard) with no found errors.
At this point I was crashing/freezing twice a day.
I even completely reinstalled BIOS to be fully sure no trace of XMP is there. Still crashing.
So I did disconnect the second monitor and the crashes disappeared, for a while.
After about 2 weeks I received another BSOD on memory and froze twice. Even reverted GPU drivers for older version.
As the freezes were now rarer (like once in 2-3 days), I decided to run longer tests using memtest86+ as I though the 1h test using memtest86 was not long enough.
It found an memory error in the 1st GB of memory (once 800 MB, 700 MB and multiple times around 3 MB) usually after around 3-4h of running. So I started suspecting one chip on the RAM to be bad.
As my computer freezes were less common at this stage, I was trusting more the memtest86+ results. But weirdly once during the test the computer restarted and second time memtest86+ froze.
That weekend I was playing on the pc almost 12h a day and only once Unreal game crashed, which might be unrelated to the memory issue.
To be sure the errors are not coming from the GPU (as the crashes were now less often with the screen removed, I was worried the adapter might have damaged it - but I would expect the GPU would filter any problems and protect the motherboard). I have removed the GPU, enabled the 12700K iGPU, and run the test again.
Memtest86+ found errors around the same location (3 MB), but after around 5h the CPU threw unexpected interrupt (Gen. Prot) and stopped the process with the stack trace around the addresses memtest86+ was finding issues. This repeated during second test (this time no memory issues found), after 5h CPU interrupted from different core.
To test the CPU, I have booted the system. Cinebench run OK for 10 minutes.
To stress it even more, I have installed Prime95 and run the CPU test. I did not build the computer for such huge load, the temperatures are usually 50-60, and 80 during multicore Cinebench. In 5 minutes, the CPU reached 100 degrees and thermal throttling started, I left it for 10 more minutes before ending the test. No errors found. If there would be an CPU issues I would expect it do not withstand 100 degrees.
On Monday I tried to do single-channel tests to get more hints if the issues are RAM, CPU, or Mobo related.
memtest86+ throw hundreds of errors within the first 2 minutes in random addresses all over the memory, on both sticks in both slots and usually completely glitched the display which forced the PC into hard reboot or just froze.
So I was now worried this points to CPU or Mobo issue.
Even though, with the single stick I has able to boot into Windows, and I run the Prime95 memory test for 3h with no errors, after I run the memtest86+ again and it run for 1.5h without an error.
Next day, I tried the memtest86+ again with the same one stick and again it found errors/crashed/hard rebooted the system within first minutes. Thinking it might be some problems with the latest V7 release, I even tried the V6.2 with the same result.
After, I ran the Passmark Memtest V10.0 from the motherboard. No errors found when I executed it twice in a row on that same stick as before.
One interesting thing I noticed in single mode memtest86 has these values (L1 cache: 80K 561.8 GB/s, L2 cache 1280K 117.0 GB/s, L3 cache 25600K 42.8 GB/s, memory 17.3 GB/s)
while memtest86+(L1 cache: 48K 555 GB/s, L2 cache 1.25M 132.0 GB/s, L3 cache 25M 48.6 GB/s, memory 19.7 GB/s)
Both memory speeds were reacting when I changed the DRAM speed from auto (4800) to 4400 or 5200, but in case of memtest86+ there were always higher.
In dual mode it was the other way around:
memtest86 has these values (L1 cache: 80K 538.5 GB/s, L2 cache 1280K 116.5 GB/s, L3 cache 25600K 42.6 GB/s, memory 27.2 GB/s)
while memtest86+(L1 cache: 48K 575 GB/s, L2 cache 1.25M 127.0 GB/s, L3 cache 25M 51.9 GB/s, memory 25.8 GB/s)
So, the question is, for the system/RAM I have, what is the expected speed to have in single channel mode? is memtest86+ overestimating it (and thus causing the crashes as the speed is way above stock) or the memtest86 underestimating them?
Now, I am not sure what to trust. It is hard to believe memtest86+ because it crashes within first 5-10 minutes with one stick while the system runs for hours with no issues while stress-testing it. If those errors would be real I should not be even be able to load Windows with one stick due to its severity.
But since I sometime had the crash I not sure if to trust the memtest86 0 errors.
Can I ask you for a recommendation and your opinion? Should I get the memtest86 pro version and let it run for hours to catch the rare (2-3 days) crashes?
How long should the test run to be sure the RAM is good?
Thanks a lot!
Comment