Announcement

Collapse
No announcement yet.

Help diagnose memtest86 pro issue on Threadripper 2950x Asus x399-a Prime 2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help diagnose memtest86 pro issue on Threadripper 2950x Asus x399-a Prime 2

    I have been trying to figure out why i am getting memtest86 pro errors on this system before actually putting the system into use.

    What i originally though was a bad memory module turns out not to be the case. It seems i am only getting errors on test1 and it only occurs within the first 5 minutes of testing. If the machine gets past the first 5 minutes i've had it run for 90 hours without a single error.

    What could be causing the system whenever it is starting to run test1 it is prone to error but once it has been ran for awhile no errors ever occur? Is it a PSU issue? A Voltage issue? Any ideas what i should exactly be looking for?

    Also in your opinion is it safe to use this system for production server even though it does sometimes after a fresh boot spit out an error in test1 memtest but after that it seems to be fine for however long i have had time to test it?

    EDIT: I have memtest reports but the site won't allow me to upload anything so i had to upload the reports here: https://easyupload.io/m/nen0gr
    As you can see, one report it randomly fails at test1, then i do a 20pass for test1 right afterwards and it passes all of them, no changes were made.
    I know the configuration is strange where i have mixed memory modules, but i have already tested with 4x (same modules) and it still randomly fails on test1. I cannot see any pattern to whatever memory i have installed and it failing test1, it seems to occur no matter what at complete random.

    EDIT: Had to make a completely new post to even edit my original post, whats the point of having a forum and make users register via email and even confirm their email if you can't even post on the forum afterwards, you can't upload, you can't edit, you can't do anything, i get bots are a problem but jesus christ guys.. If you can just merge this new information with my old thread. Thanks.

  • #2
    The majority of forum posts are spam nowadays. So the 1st post from anyone needs to be human approved. The human being me in this case.
    We stopped allowing posts to be edited after a certain amount of time has passed, as spammers would make one valid looking post, then go back and edit it days later to insert spam links.
    Yes, it is a total pain in the butt, but that is the way the world is.

    It does at least superficially look like a RAM error.
    Running 6 RAM sticks isn't that common. Are they in the optimal slots according to the motherboard manual (if the manual says anything useful)?

    Maybe temperature related? Errors when cold sometimes, but never when hot?

    Comment


    • #3
      Thanks for the quick reply!

      I just did 80 passes of Test1 (which seems to be the only test it fails) and it went through smoothly . I cannot narrow down why it fails sometimes but the only thing i can tell is that if it fails it always fails early on, like within the first 3 minutes of testing, after that it never fails again.

      It only ever fails on test1. it fails on test1 no matter what ram configuration i try, 2 sticks, 4, sticks, 6 sticks, 8 sticks, mixed/non mixed does not matter.

      Non of the memory modules give errors in another system which is why i conclude it cannot be a memory module issue. Do you have any tips i could try to narrow things down with this information?

      Comment


      • #4
        I fogot to answer your questions:

        Yes i have populated the modules as instructed in the manual, it does not seem to be any difference if i run 2 modules, 4 modules or this weird 6 module configuration which is the last thing i tested to try to generate "more errors" but even with this very strange setup the system behaves exactly the same.

        As for temps it doesn't seem likely as the test usually only fails early on, when the modules can't have gotten very hot yet, the test never seem to generate any errors after the tests have been running for awhile which is very strange to me. It seems if there are errors they always show up within the first 3 minutes of testing, i just ran 80 passes of test1 without a hitch, then i restart the test and voila within 2 minutes of testing i get a error in test1. I am very confused and the only pattern i can see is that errors are coming whenever the test is freshly started.

        Comment


        • #5
          i am running tests again between the diferent modules, i guess i shouldn't completely rely on the information the store i bought the modules from to be aqurrate because they probably didn't run that many tests and from what i seen these modules can easily pass multiple hours of testing then suddenly spit errors... ill be back if i get more information, my god this is frustrating.

          Comment


          • #6
            Yes, it is a strange case. Errors in Test #1 are rare. (Errors in test #6 & #7 are the most common)
            We don't know the answer.

            There is some chance it is a BIOS bug (BIOS writing to a memory location it shouldn't early in the boot process).

            There is also some change it is a CPU fault.
            In MemTest86 try limiting the CPU Cores in use to just CPU1, then CPU2, etc..
            Just do a few test restarts per CPU core under test.

            I'll check the code, but I think Test #1 cycles the CPU core in use. So pass 2 might be using Core 2. pass 3 uses Core 3, etc... [Correction] It doesn't. In V9 for test #1 it is a single CPU core.

            Comment


            • #7
              We had another thought. Maybe the problem isn't triggered by just #1. But instead is being triggered by the first full test. (Test #0 isn't a full test of the RAM).
              So if you were just to un Test #6 after a boot, maybe you would also see the same problem in test #6. This doesn't help solve the problem, but it would at least make more sense.

              Comment

              Working...
              X