Announcement

Collapse
No announcement yet.

List of Motherboards with issues when running MemTest86 in multi-CPU selection modes

Collapse
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by lightknightrr View Post
    I'll post the log after MS's memory scan runs its course. Finally got the thing to run.
    Yes, please post the debug log file.


    Comment


    • Hello,
      memtest8 kept getting stuck at "Testing multiprocessor support". My system consists of:

      WS-C621E-SAGE
      2 x Intel Xeon Gold 6130

      I updated BIOS to version 6801 (most recent one at the time of writing) and the problem didn't go away.
      After I add "WS-C621E-SAGE Series" to the blacklist to disable the multithreading, the test works fine.
      Attached Files

      Comment


      • Originally posted by csehydrogen View Post
        Hello,
        memtest8 kept getting stuck at "Testing multiprocessor support". My system consists of:

        WS-C621E-SAGE
        2 x Intel Xeon Gold 6130

        I updated BIOS to version 6801 (most recent one at the time of writing) and the problem didn't go away.
        After I add "WS-C621E-SAGE Series" to the blacklist to disable the multithreading, the test works fine.
        Thanks for the logs.

        Can you give this build a try:
        https://www.passmark.com/temp/memtes...-10.1.0010.zip

        Comment


        • Motherboard: Super Micro H12DSi-NT6
          CPU: Dual AMD EPYC 7642 48 core
          BIOS Firmware Version: 2.6
          BIOS Build Time: 4/13/2023

          Having issues with a Super Micro board that's not in the blacklist. See attached screenshot.

          Click image for larger version

Name:	memtest.png
Views:	592
Size:	364.5 KB
ID:	55338

          Comment


          • Uploaded the log from the (manually aborted) run here: https://file.io/CMDuiCIMcCbv

            Comment


            • Originally posted by theregoesplanb View Post
              Uploaded the log from the (manually aborted) run here: https://file.io/CMDuiCIMcCbv
              Thanks for letting us know. The link is no longer available but we've updated the blacklist.cfg file with the following:

              Code:
              "H12DSi-NT6",ALL,EXACT,RESTRICT_MP

              Comment


              • Here's a new link to the log that won't disappear in a few days: https://www.dropbox.com/scl/fi/egzmn...paecegy4f9igat

                One concerning thing is I did get ECC errors in MP mode, but I haven't been able to replicate when running in uniprocessor mode. Is there any chance that those were false positives due to MP incompatibility?

                You'll probably also want to add H12DSi-N6 to the blacklist as well. It's basically the same board with different onboard NIC.

                Comment


                • Recently I ran MemTest86 V10.2 Pro on a system with a Xeon E5-2670, X9SRL-F (listed in startpost) and 80GB's of RAM. After running for about 50 hours, it was only half way through pass 3. Initially I thought something was wrong, I've never had MemTest86 take this long to get through the passes, no matter the amount of RAM.

                  Since the error count was 0, despite
                  Note: Your RAM may be vulnerable to high frequency row hammer bit flips. However the conditions needed to induce these errors occur only very rarely in normal PC usage, and so this should not be of concern to most users.
                  I started thinking maybe it was because I've been testing ECC-RAM (instead of non-ECC) , but could not find any relation between ECC and longer runtimes.

                  After checking the config, the testing was done on 1 CPU core, which MemTest86 set because this motherboard's BIOS apparently has this bug. After setting parallel instead of single, MemTest86 barely ran a minute or two before the system reset (which I can reproduce), guess that's the bug?

                  Since MemTest86 was running on only 1 core, could that explain the long runtime?

                  Comment


                  • ECC RAM is generally a bit slower that non-ECC RAM. Maybe 10% to 20%.
                    But running on 1 CPU Core can be a lot slower. Maybe 4x slower than multi-core.

                    Supermicro don't care much about their BIOS bugs. More interested in selling you a new motherboard than doing customer support.

                    Comment


                    • Motherboard: Super Micro H11DSi-NT
                      CPU: Dual AMD EPYC 7F52
                      BIOS Firmware Version: 2.5


                      No freezing or reboot, the software keeps showing the "[UEFI firmware Error] Could not start CPU X" message.

                      Super Micro just release a new BIOS version 2.7 recently, so the issue may be fixed?
                      Attached Files

                      Comment


                      • Hello,

                        Sorry admin for the previous post, I thought I can edit it after it is posted.

                        My motherboard BIOS FW is now updated to its newest release version 2.7, but the problem is still the same.

                        My test environment can be sum up as follow

                        Motherboard: Super Micro H11DSi-NT
                        CPU: Dual AMD EPYC 7F52
                        BIOS Firmware Version: 2.7​
                        RAM: 16 x Hynix HMA82GR7CJR8N-XN DDR4 16 GB 3200 MHz RDIMM
                        Memtest86 Version: V10.6 Free
                        SMT is disabled in the BIOS
                        A retired 2U Super Micro Super Chassis with a 700 W hot-swappable power supply

                        I manually terminated the test after it finish pass #2, and the log file is attached.

                        As you may see in the attached log file, my current PC build can run the test without any error or warning messages (including the annoying "[UEFI brah brah brah"one) in the 1st pass. However, the "[UEFI firmware Error] Could not start CPU X" message will shows up after the 1st pass is finished.Before the test is aborted, the system shows no memory errors but I get loads of "[UEFI firmware Error] Could not start CPU X" messages during the test.

                        In fact, when I run the test with BIOS version 2.5, the system can do a 4 pass test with 0 memory error, and again, I just get so many "[UEFI firmware Error] Could not start CPU X" messages at the end of the test as you may see in my previous post.

                        The IPMI's Health Event Log shows no event of anything fail( I cleared it before the post tested is conducted, because I got loads of fan falling Messages when I pull out the high speed 8cm fans at night)

                        I also run the AIDA64 test (CPUs, cache & memory) for 3 hours to test the stability of my build. After the test is aborted, I don't have any reported errors from windows or the IPMI's health Event Log.

                        Am I likely experiencing the same BIOS bug that the other users mentioned in this thread?

                        Sorry for my broken English, please let me know if there is any other info that I may be able to provide.
                        Attached Files

                        Comment


                        • Originally posted by mt04340434 View Post
                          Hello,
                          Motherboard: Super Micro H11DSi-NT
                          CPU: Dual AMD EPYC 7F52
                          BIOS Firmware Version: 2.7​
                          RAM: 16 x Hynix HMA82GR7CJR8N-XN DDR4 16 GB 3200 MHz RDIMM
                          Memtest86 Version: V10.6 Free
                          SMT is disabled in the BIOS
                          A retired 2U Super Micro Super Chassis with a 700 W hot-swappable power supply
                          Thanks for the logs.

                          Can you try first disabling ECC polling from the main menu and run the tests again.

                          Also, we had a report about UEFI firmware issues on a similar chipset but different motherboard on v10.6. In this case, it seems the errors don't appear when running an earlier version of MemTest86 (v8.4).

                          Can you run MemTest86 v8.4 as well and upload a copy of the logs:
                          https://www.memtest86.com/downloads/...86-8.4-usb.zip

                          Comment


                          • Originally posted by keith View Post

                            Thanks for the logs.

                            Can you try first disabling ECC polling from the main menu and run the tests again.

                            Also, we had a report about UEFI firmware issues on a similar chipset but different motherboard on v10.6. In this case, it seems the errors don't appear when running an earlier version of MemTest86 (v8.4).

                            Can you run MemTest86 v8.4 as well and upload a copy of the logs:
                            https://www.memtest86.com/downloads/...86-8.4-usb.zip
                            Here you go!

                            I ran the tests with ECC polling disabled on both v10.6 & v8.4 of Memtest86(1 pass only), and the v10.6 one was manually terminated.

                            I also ran MemTest86 v8.4 for 4 Passes with default settings, but the size of the log is too large(2.5MB), and I cannot upload it on to the forum.

                            When I disabled ECC polling on Memtest86 v10.6, the error message will now appear in the 1st pass, but my PC is not crashing / freezing / rebooting at all.

                            The two cases ran with v8.4 shown no error message at all.

                            I hope these help.
                            Attached Files

                            Comment


                            • Originally posted by keith View Post

                              Thanks for the logs.

                              Can you try first disabling ECC polling from the main menu and run the tests again.

                              Also, we had a report about UEFI firmware issues on a similar chipset but different motherboard on v10.6. In this case, it seems the errors don't appear when running an earlier version of MemTest86 (v8.4).

                              Can you run MemTest86 v8.4 as well and upload a copy of the logs:
                              https://www.memtest86.com/downloads/...86-8.4-usb.zip
                              The compressed log for the MemTest86 v8.4 test with default settings is now also uploaded for your review.
                              Attached Files

                              Comment


                              • Thanks for the logs.

                                The logs confirm it is likely a UEFI BIOS issue (and not related to ECC polling or an introduced bug in later versions of MemTest86).
                                Even though the errors don't show up in v8.4, they are present in the logs file. The [UEFI Firmware Error] messages on the screen were introduced in a later version of MemTest86.

                                Although this is a software bug that should be fixed in the BIOS by the vendor, it likely isn't a critical (hardware) error as MemTest86 makes it out to be. We'll need to revisit how to report such errors in a way where it better represents its severity.

                                Comment

                                Working...
                                X