Announcement

Collapse
No announcement yet.

RAM testing - errors to no errors?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RAM testing - errors to no errors?

    Hello,

    System:
    Mb: Gigabyte B650 Gaming X AX rev 1.3
    CPU: AMD Ryzen 9 7900X
    RAM: 32 GB (2x16) Corsair Vengeance 5600 Mhz CL36 DDR5
    GPU: Gigabyte RTX 2070 Super​

    ====
    Pre-story
    ====


    I've bought an upgrade kit in August containing mb, cpu and ram, assembled it and it immediately gave me DRAM errors and no boot to Windows.
    I tried reseating te RAM in the other slots, same issue.
    Tried to Q-flash the bios, also same issue.
    Then I send the system back and the shop tells me the mainboard is defect. I'll get a new one and they assemble it for me.
    Back home I start the system and I am able to install Windows. During installment my screen goes black. Looking in the case I see the DRAM light on again.
    I restarted the system and it continued installing till the next "restart" event. I contacted the shop again and tried some things with them on the phone. Nothing seemed to work.
    Again, I bring the system to them for analyses. This time I get another new mainboard and also a new CPU. Both are defect. The first replacement mainboard was tested by them, everything worked, they said.
    I got my system back on October 10th and could reinstall Windows without problems. I installed other software without problems. The next day, however, I got a Windows update and installed that one. It needed a restart and that gave me another DRAM error and black screen. Manually shutting the system down and restart after that worked. But with every forced restart/reset I got the same issue again, DRAM and black screen. So I got to testing and contacting the shop again. They said it could be a software conflict. Since I started to get display driver issues too, I looked into the software conflict idea. Reinstalled Windows three times, Tried different display driver versions (did clean installs using DDU) and I looked into the dumpfiles created by CTD's and BSOD's. All error codes pointed towards memory handling issues with the c0000005 access violation errors, memory_management error, unloaded system and dll files and also one internal_error bsod.

    I contacted the shop again and told them I'm not sure it is software related. They send me their list with apps they use to test hardware issues and I started using them. Again everything points to memory handling issues.

    ====
    Now the part why I'm writing this topic.
    ====


    I've run Memtest86 V10.6 free version multiple times.
    Standard config: test 1 - 13, 4 passes.

    I got two modules or RAM 16 GB. When testing them both I got 42 minutes of testing before Memtest failed the test, limit reached with 10000 errors shown. 39 bits in error.
    I tested one of the two modules in two RAM slots and both tests gave me 5 bits in errors. I guess that RAM needs to be replaced.
    I tested the other module too. The first test gave me the same result as the dual testing. Difference was that this test ran 43/48 tests before it reached the limit in errors (10000 again). Bits in error were 6.
    I did read something about a possible confusion within the system about the amount or RAM available. So I ran the test again but this time I reseated the stick in another slot.
    The module passed this test without any errors. 4 passes ran as if everything is fine. To me this was strange because until this moment every test gave me errors. (I also used OCCT, BurnInTest, Prime95, benchmarks and AIDA64 to test other components). I ran another test on this module in the slot it gave me the 10000 errors before. This time it passed again, no errors.

    I could not find a lot of information about this, so I turn to the experts here.

    ====
    Actual question
    ====


    What does it mean if a RAM module doesn't pass a testrun and reaches the limit of errors and a second and third run it seems in good working order?


    ====
    Final ranting
    ====


    I do believe my mainboard is defect again.

    I know buying new RAM is an option but since these are products within waranty, the fact that I already got two new mainboards (3 in total) and a new cpu, I doubt the shop is testing everything to great lengths, they keep the system two weeks for testing and repairs. They should find issues in my opinion and from the start I thought the RAM was broken. I asked them to test it extensively and they said they did. But I still have a system that doesn't work. Browsing the internet is causing issues, so there is more then software related conflicts.
    I need to know for sure what parts could be broken and I need to be able to proof it in some way.

  • #2
    If you have RAM errors in Memtest86 (even one error), it makes sense to replace the RAM.

    It doesn't make much sense at all to replace the motherboard or CPU for a 3rd time without trying new RAM.

    You don't need to wait until you have 10,000 errors to decide a test failed. Even one error is a failure. 10,000 errors is a catastrophic failure.

    The fact that both RAM modules threw errors is slightly strange. Is either a systematic manufacturing issue at Corsair, or maybe a compatibility issue (i.e. BIOS isn't setting the correct voltages & timings, which could be a BIOS bug, or bad SPD data in the RAM sticks).

    Sometimes errors aren't consistent. Parts are marginal (voltages and timings are right on the edge or working or not) or the behaviour depends on external factors like temperature, EMI, power supply fluctuations, corrosion on the connectors, row hammer effects, the test pattern in use when writing the data, etc...

    Comment


    • #3
      Thank you for your reply.

      I guess from your answer, that a test that gave me errors (no matter the amount of errors) tells me it is a bad RAM module. Also the second test with the same module but a different slot and no errors is not supposed to happen. Am I right?

      "If you have RAM errors in Memtest86 (even one error), it makes sense to replace the RAM."
      I know this. Since I returned the original upgrade kit because of a DRAM error, I was already telling the shop that my guess is that it is the RAM. They tested the system (as a whole) and told me the mainboard was defect and replaced it.
      They tested it and said it was in working order. I got my pc back, started it and got the install screen for Windows 11. During the installment it had to restart and that gave me a DRAM error again. I was disappointed because of this. The issue was not solved, the system couldn't start.
      I contacted the shop again and did some research with them on the phone. Reseated RAM and all of it didn't work. So I had to bring the system in again. I got the system on Thursday and brought it back in on Saturday morning.
      They checked again and when I called them, they told me the mainboard was broken and they ordered a new one. So second mobo. After a week I called them because I didn't hear anything from them. They told me then the CPU needed to be replaced too and they ordered one. After that I got my system back. I told them to check the RAM extensively and they told me they did and the system was in working order.
      I was happy when the system seemed to work. Fast booting and I could use it as I expected. The next day a Windows update caused the DRAM error again when restarting the system.
      I shut the system down manually, since my monitor was black. Starting the pc again worked. I couldn't find any errors then so I thought it would maybe be a weird one time thing. But it happened again when my AV needed to update and restart.
      After contacting the shop, again, they told me a fresh install of Windows could be a good thing to rule out software conflicts. They also told me to reseat the RAM if it would still happen.
      After the reinstall it all seemed to work fine. I didn't have any restarts during installing software. I didn't think of the manual reset I could do, so no testing done.

      Using the pc for a while but as soon I did need a restart, the DRAM issue came back. I started to look into this and possible software conflicts that could cause this.
      I noticed that Gigabyte software did cause issues. Same as Corsair iCue software. I uninstalled them and could use the pc.
      Then Nvidia driver updates came and games would CTD. Mostly Rockstar games. I looked into that problem and a lot of solutions are "reinstall driver using DDU". So I did but it didn't help.
      I contacted the shop again and told them about the issue. They told me to test the hardware and send me the list with software titles they use for testing.


      "It doesn't make much sense at all to replace the motherboard or CPU for a 3rd time without trying new RAM."​
      I agree, it would not. Same that is makes no sense that three mainboard were defective, and now a fourth one.
      Which I know for sure. When I reseated the RAM for testing I found that slots A1 and B1 are not working. They go straight to DRAM error and not starting the system.
      A2 and B2, the default slots for dual setup, are working, but it takes about 4-5 seconds to pass the DRAM check. CPU, GPU and BOOT checks go fast.

      So mobo needs to be replaced. I also wonder if the RAM could be the culprit from the start and that is why I test them myself now. To rule out things.

      "You don't need to wait until you have 10,000 errors to decide a test failed. Even one error is a failure. 10,000 errors is a catastrophic failure."b
      That was my thought too when I decided to start with Memtest86. I am not familiar with testing hardware this way. I usually can tell what is broken because of clear issues.
      I read a lot about how to read test results and how to use Memtest86.
      Since there is a lot of different opinions on this topic I did what was most common. I also figured that the default setup of the program would be good enough. During testing I looked into things and found some information on a lot of consecutive errors and the possibility of confused memory availability.
      My initial plan was to collect the data and sent it to the shop for analyses. I am, however, way to curious about these things too. I know 10000 errors is bad but I didn't know in what way.
      Since I read about clearing CMOS before dual testing, I was wondering about the legitimacy of these results.
      While reading into it I decided to continue testing with one module and got with that module in slots A2 and B2 around the 2000 errors. That seemed, to me as a newbie, still a lot and a sign that the RAM would be bad. This was confirmed when I read about the Bits in Error meaning.
      Then I started to test the other module in slot B2 and thought to get around the same results. But this one skyrocketed to 10000 in it's fourth pass and stopped the test.
      Since I am not an expert on this matter I decided to test it in slot A2, also to find out if slot B2 could be broken in a way.
      A2 didn't gave me any errors and I ran that test 1,5 times. The first time I stopped the test during pass 2 and checked settings. I restarted and let it run.
      Still confused about what was happening I decided to post here. I also started to test the first module again, not for four passes but just 1 at least. I did and this one passed too with the notion about the RAM probably being to fragile for hammer testing. There were no errors and the message came after finishing pass 2. Weird, because Memtest86 documentation says this message appears when there are errors in the first pass, they were not there.

      "The fact that both RAM modules threw errors is slightly strange. Is either a systematic manufacturing issue at Corsair, or maybe a compatibility issue (i.e. BIOS isn't setting the correct voltages & timings, which could be a BIOS bug, or bad SPD data in the RAM sticks)."
      I am not sure about manufacturing issues at Corsair, I could look into that. If so, the shop and manufacturer should have known when putting together the upgrade kit. To be clear, the upgrade kit was a kit consisting out of a mobo, cpu and ram. Pre-selected, tested and found compatible, as they tell the customer.

      I did watch voltages and temperatures while using the pc and during the tests I did (not memtest). No strange issues as far as I could see. Timings I am not sure.
      Changes in the bios, which is set to default settings, result in DRAM error. The shop had their own profile for the setup. The first two systems had it but it didn't work. After the last fix, the profile is never uploaded and the bios is running on default settings. The bios is updated to the latest version.

      "Sometimes errors aren't consistent. Parts are marginal (voltages and timings are right on the edge or working or not) or the behaviour depends on external factors like temperature, EMI, power supply fluctuations, corrosion on the connectors, row hammer effects, the test pattern in use when writing the data, etc..."
      I can understand that errors aren't consistent, seems even logical. The reason I posted the question is because of the drastic change from 10000 errors to no errors. Something has changed without me changing it.

      Also, the reason why I test so much is because I want a working system. I've let the shop I got the kit from fix it twice now and I still have issues. That means for me that I need to be more informed and involved to get it done.
      I thought the RAM was bad from the beginning. Looks like I was right. By missing this, the shop has to replace a mainboard, again!, and RAM too. That means they delivered a broken upgrade kit in the first place. I had a bad purchase, which can happen. If they would actually test the RAM I would think they found the errors like I did.

      Since I want to be sure there is no other possible reason I tested a lot and all components.

      CPU: Tested with Prime95, BurnInTest and OCCT.
      - Prime95 torture test didn't run well. Cores stopped working because of hardware failure.
      - BurnInTest had no errors. I tested with the 3 minute test and cpu on its own.
      - OCCT had issues when testing with medium operations (40 min) and had a BSOD when doing large operations (<10 min)(MEMORY_MANAGEMENT).

      PSU: Tested with OCCT and no errors.

      GPU: Tested with OCCT, BurnInTest, benchmark Heaven and Furmark.
      - OCCT 3D Standard and Adaptive caused BSOD (MEMORY_MANAGEMENT)
      - BurnInTest had errors for 2D Graphics (Validating data in GPU memory)
      - Benchmark heaven and furmark ran stable with different settings from windows to fullscreen and low to ultra settings.

      RAM: Tested with Memtest86 & BurnInTest
      - Memtest86, as described in my post.
      - BurnInTest ran in Windows and no errors. I read something that testing RAM needs to be done from a flashdrive. So not sure this test is valid.

      Disks: Tested with BurnInTest, CrystalDiskInfo, HD Tune and AIDA64.
      - BurnInTest 3 minute test gave errors (Verifying Data). Individual testing was fine.
      - CrystalDiskInfo: Health is fine, temps are fine
      - HD Tune: Same as CrystalDiskInfo and no errors
      - AIDA64 smart scan, all fine.

      Reinstalled Windows 11 3x
      Reinstalled Nvidia drivers multiple times (older and newest version) using DDU

      Used HWMonitor to monitor the voltage and temperatures during testing and usage.
      Used CPU-Z to read out data for CPU, Mobo and RAM
      Used Windbg and BlueScreenView to read out dumpfiles for CTD and BSOD
      Used Event viewer to keep an eye on errors before and after CTD and BSOD

      So my testing is a bit extensive but like I said, I need to know what is happening to be able to make a good case and get a working computer.

      Comment


      • #4
        Same that is makes no sense that three mainboard were defective, and now a fourth one.
        I think you need to consider the possibility that the people in the shop aren't experts and they have got it wrong.
        Maybe it was never the motherboard that was bad.

        To be fair, fault finding can be hard work if results are consistent. But,
        A) RAM is quick and easy to replace and re-test. Replacing the whole motherboard is a lot more effort. So if MemTest86 reports RAM errors, RAM replacement is quick & easy first step.
        B) After getting to the point of considering a 4th motherboard, it should have occurred to someone that it might be time to try something else.

        RAM is cheap. Buy yourself a new $30 8GB stick for testing if the shop can't deal with it and you can't borrow some. Windows will boot fine with 8GB.

        Comment


        • #5
          To be fair, that is what I start to think, that they are not expert enough. Although I've bought more parts in the past woth them and the service was always good. It is one of the better known stores and serviceshop in the country. So all this is very disappointing to me.

          The laat time I let them check my system I explicitly asked them to test the RAM vigorously because my idea was that the RAM was bad.

          I thank you for your help. I understand a lot more about testing RAM and other hardware now.

          I am going to call the shop and present them my findings. I still have warranty on the productproducts. If I need to, I get another cheap RAM stick for testing purposes.

          Thank you.

          Comment

          Working...
          X