Announcement

Collapse
No announcement yet.

Gigabyte x299x MemTest86 memory errors

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gigabyte x299x MemTest86 memory errors

    Hello and thanks for any help in advance.

    Just finished my latest build last night and updated the drivers on the Gigabyte x299x Designare 10G mobo I'm using. I changed my BIOS setting to enable XMP to 3200 and kicked off the first test. I got about halfway through pass 1 and there were already around 50 errors. Don't have a screenshot of that, but the errors were exclusively on CPU 6. I got nervous that XMP was causing the errors so I exited that test, disabled XMP in BIOS and then restarted the test.

    This is a screenshot of the results so far on that next run. Noticing that in all runs every error has come from CPU 6. Also noticing that all but one of these errors are coming from the same address. Right now I'm planning on letting this run finish and then starting to troubleshoot by testing each RAM stick individually. Wanted to make sure that I'm going down the right path and that seeing the same CPU and address pop up continually wouldn't suggest something else.

    Thanks!

  • #2
    Most likely bad RAM.

    See,
    https://www.memtest86.com/troubleshooting.htm

    EDIT: Ignore that, further details below. Most likely it is NOT bad RAM.

    Comment


    • #3
      Thank you,

      based on that response I RMA the RAM and got new ones. Just started the test again and already on pass 1 I’ve gotten errors on tests 1, 2,3,4, and 5. They are still all coming from CPU 6.

      Including a screenshot from both the latest run that is running now as well as one from a previous run where it looks like some of the errors are way more than 1 bit. Based on this I think it’s highly unlikely that I got two bad sets of RAM.

      What should be my next step? BIOD issue, cpu issue, motherboard issue?
      Attached Files

      Comment


      • #4
        Just a bit more information, these errors appear to happen only with quad channel memory. Any single stick in any single slot works without errors and any two sticks in any combination of dual channel memory work fine. As soon as I go one stick in each of the four channels I get an immediate influx of errors. It is interesting to see that the errors tend to only happen during the first pass and not during subsequent passes. Thought this info might help.

        Comment


        • #5
          All the error in your screen shots were very high in RAM (~53GB address range). If the addresses for the errors were a constant range, this might explain why you don't see it with less RAM (as you don't reach the 53GB level).

          And you are right, if you are now getting errors with lots of bits being flipped, it does make it less likely that the RAM is at fault.
          I wonder how well Gigabyte tested this motherboard with 64GB of RAM.

          If you had a spare CPU sitting around, I would swap the CPU next. But you probably don't have a spare.
          Is the BIOS up to date? It might be a memory map issue in the BIOS. (Mapping the same memory address range to two things)

          Comment


          • #6
            Good point on the high memory ranges. Didn’t realize that.

            I did update the BIOS to the latest version last night and still had the same errors. One thing I’ve also noticed is that the errors only occur on the first pass and they only occur if I’ve actually booted up Windows and used the system between tests. For example, if I run Memtest I will get errors on the first pass, but the following three consistently will have zero errors. If I then exit Memtest and immediately go back into Memtest rather than to Windows all tests including the first pass will complete without errors. If I boot up windows and use the computer between Memtest I will once again get errors on pass one and then the following passes run without errors again. It makes me wonder whether booting up the OS is doing something to that range of memory that is conflicting with Memtest, but then whatever Memtest does clears that range out until Windows remaps it. I don’t understand enough about this to know whether that’s a possibility.

            The RAM I’m using is not on the official compatibility list that Gigabyte has put out for this mobo, but it’s very similar to others that have. I suppose I could try one of their officially approved RAMs, but I’m concerned no matter what I might have this issue. Seems like if it were a CPU issue I’d see something different so I’m inclined to think it’s BIOS or mobo, but not sure I can do anything about it.

            Comment


            • #7
              There is an option in Memtest86 to select the CPU core to use. So maybe just as an experiment try forcing to just CPU6, then maybe CPU1.

              Otherwise we don't really know. I guess in theory it is possible that booting Windows activates some piece of hardware that does memory mapped I/O, and the I/O to memory continues for some period of time after being activated AND the memory mapped I/O location was not corrected configured in BIOS. So BIOS reports the address range as free, even though it is in fact used later on by hardware for I/O. A test for this would be to boot windows, run MemTest86 but don't immediately start the testing, wait 30min sitting in the menu, then start the tests.

              Comment


              • #8
                Thanks David,

                Tried a couple additional things. I stayed with the quad channel setup and limited the test to CPU 0 and the errors remained. All of the errors that had been on CPU 6 changed to CPU 0. I then ran on only CPU 6 and had the same errors as always.

                I then went to dual channel at 64 GB RAM by putting two sticks in each channel instead of one and I also got errors so it does look like the errors generate once I get to 64 GB of RAM no matter whether it’s dual or quad channel.

                I also tried sitting at the Memtest window for an hour before starting and the errors remained so maybe not the issue there.

                At this point seems like RAM and CPU are fine but that my motherboard just doesn’t like having 64GB of RAM. What are my options at this point? Would a different Brand or version of RAM, maybe one on the official approved list, fix this? How big of a deal is this if I operate the computer this way? Do I need to find a new mobo?

                Comment


                • #9
                  An update.

                  We had another customer contact us with pretty much the same issue.

                  Hardware was almost the same as the report above:

                  Motherboard: Gigabyte X299X Aorus Master (rev 1.0);
                  BIOS: F3c
                  CPU: Intel Core i9-10900X
                  RAM: G.Skill F4-3600C17Q-64GTZKW, Trident Z, DDR4-3600MHz CL17-19-19-39 1.35V, 64GB (4x16GB)

                  Behaviour was more of less the same.
                  • Errors at very high memory addresses (around the 53GB level, memory address in hexadecimal 0xC49813AD8 )
                  • Errors only after booting into Windows and using Windows for a while. There were no errors on a cold boot (mains power disconnected).
                  • Errors had multiple bits in error and were not your typical single bit flip
                  • Replacing the RAM and the motherboard didn't fix the problem
                  • Using 4 x 16GB ram sticks and 2 x 32GB RAM stick gave the same result.

                  Example errors.

                  [Data Error] Test: 5, CPU: 6, Address: C491A189C, Expected: D0C2064C, Actual: D0C20636
                  [Data Error] Test: 5, CPU: 6, Address: C491A1890, Expected: D0C2064C, Actual: D0C2064B
                  [Data Error] Test: 7, CPU: 6, Address: C49813AD8 Expected: DFFFFFFF, Actual: DFFFFFD8
                  [Data Error] Test: 7, CPU: 6, Address: C49813AD8 Expected: 40000000, Actual: 3FFFFFEE

                  Conclusion:
                  There is a firmware fault in these Gigabyte X299X motherboards. Probably along the lines of my previous post (the memory map in UEFI BIOS is slightly wrong when 64GB of RAM is installed).

                  Added note: It might appear the CPU6 is to blame, but we think that is just coincidence.

                  Comment


                  • #10
                    Hello there,

                    I have recently purchased a Gigabyte X299X Designaire 10G MB and are having the same RAM errors as you describe. Did you ever get an answer to the issue in the end?

                    Comment


                    • #11
                      I believe Gigabyte were contacted, but they don't seem to care.

                      Comment


                      • #12
                        Very similar problems and system here too, anything else from Gigabyte?

                        Motherboard: Gigabyte X299X Aorus Master (rev 1.0); BIOS: F3c
                        CPU: Intel Core i9-10940X
                        CPU Cooler: NZXT
                        RAM: Corsair 256Gb
                        PSU: Corsair AX1200i
                        SSD: NVMe M.2 970 EVO Plus
                        GPU: Gigabyte Nvidia GTX 1660 super

                        I pass MemTest86 but get a hardware fail error on the windows memory diagnostic, a similar pattern to mentioned elsewhere. My crashes mostly occur when CPU utilization is >80%. For example, it will not pass any CPU stress test and will crash within 5 mins despite being within temp limits. Crashes have also occurred in mundane tasks.

                        Not sure what else to do if Gigabyte cant fix this? RMA and go with a different motherboard?
                        Very similar problems and system here too, @benr_ did you hear back anything else from Gigabyte?

                        Comment


                        • #13
                          Hello everyone! Just joined the club LOL

                          Motherboard: Gigabyte X299X Aorus Master (rev 1.0); BIOS: F3c
                          CPU: Intel Core i9-10920X
                          GPU: RTX 3090
                          RAM: 64GB 4 channel (16x4) from Gigabytes qualified list.
                          OS: Windows 10

                          After almost 6 months of flawless gaming/coding with my rig (with other GPU), started to get BSODs on each cold boot when launching any web browser.
                          If I don't launch browsers I could do anything else just fine. Memory dumps/logs all were normal. Solved by disabling the Fast boot completely.

                          Then I have decided to test RAM

                          Windows memory check always shows an error. Panic!
                          MemTest86 free version shows errors but not always and only on first pass and messages almost the same as topic starter's (52G+ addresses and mostly on 6 CPU)

                          Sadly memtest86 hangs at the end of run, don't know should I buy license is such case to change cache/cpu options





                          Comment


                          • #14
                            UPDATE:
                            Current Pro version runs to the finish and doesn't hang. With recent GB BIOS no blue screen but now time to time Windows boots into black screen with monitor.sys - last loaded component in logs.
                            Windows memory diagnostics still detects bad pages while memtest86 only at first run.

                            Comment


                            • #15
                              Originally posted by cybertiger View Post
                              UPDATE:
                              Current Pro version runs to the finish and doesn't hang. With recent GB BIOS no blue screen but now time to time Windows boots into black screen with monitor.sys - last loaded component in logs.
                              Windows memory diagnostics still detects bad pages while memtest86 only at first run.
                              We had a look into the log file from your e-mail. It looks like a similar issue:

                              Code:
                              2021-09-05 12:50:16 - [MEM ERROR - Data] Test: 3, CPU: 6, Address: C49608B80, Expected: FFFFFFFF, Actual: FFFFFFEC
                              2021-09-05 12:50:16 - [MEM ERROR - Data] Test: 3, CPU: 6, Address: C49608B08, Expected: FFFFFFFF, Actual: FFFFFFE8
                              2021-09-05 12:50:17 - [MEM ERROR - Data] Test: 3, CPU: 6, Address: C49608A84, Expected: FFFFFFFF, Actual: FFFFFFEF
                              2021-09-05 12:50:17 - [MEM ERROR - Data] Test: 3, CPU: 6, Address: C4960891C, Expected: FFFFFFFF, Actual: FFFFFFF0
                              .
                              .
                              2021-09-05 12:51:37 - RunMemoryRangeTest - CPU #6 completed but did not signal (test time = 982ms, event wait time = 1001ms, result = 80001000) (BSP test time = 998ms)
                              2021-09-05 12:51:37 - WARNING - possible multiprocessing bug in BIOS
                              2021-09-05 12:51:37 - RunMemoryRangeTest - CPU #8 completed but did not signal (test time = 980ms, event wait time = 1022ms, result = Success) (BSP test time = 998ms)
                              2021-09-05 12:51:37 - WARNING - possible multiprocessing bug in BIOS
                              The errors are mostly around the 0xC49608xxx region.
                              The fact that the values do not look like errors in specific bits but overwritten values seem to imply that the region of memory is being used by another process/device.

                              The subsequent multiprocessor warnings may be a symptom of these errors. Possibly a bug in the UEFI BIOS multiprocessor implementation.

                              I would maybe try running in SINGLE CPU mode to see if the errors still occur.

                              In any case, there is strong indication of a bug in the UEFI BIOS.

                              Comment

                              Working...
                              X