Announcement

Collapse
No announcement yet.

MemTest86 hangs at 3 second mark for SuperMicro X9DAi with 128GB Crucial ECC RAM

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MemTest86 hangs at 3 second mark for SuperMicro X9DAi with 128GB Crucial ECC RAM

    I just downloaded the CD iso for MemTest86 4.2.0 from your site.

    I'm running a SuperMicro X9DAi with Dual Xeon E5-2670's and 128GB of Crucial ECC CT2K16G3ERSLD41339 RAM.

    The Video card is an nVidia Titan

    It displays: Time 0:00:03 and then hangs, after I told it to run one pass in the previous screen.

    I don't see any effect from pressing ESC or Return.

    I just tried #6 one CPU, and it appears to be running ok, so ... something to do with my dual CPU's?

    BTW - I just tried MemTest86+ and it has not hung so far.
    Last edited by RamTestWannabe; Apr-10-2013, 11:52 PM.

  • #2
    I would like to reinforce the OP above in that I have a similar, if not identical, problem. We have two brand new systems here with the following specifications.

    #1 SuperMicro X9DRi-F with Dual Xeon E5-2640's and 64GB of Registered ECC DDR3-1600 RAM in Quad Channel config.

    #2 SuperMicro X9DRi-F with Dual Xeon E5-2620's and 32GB of Unbuffered ECC DDR3-1600 RAM in Quad Channel config.

    Both motherboards already have the latest BIOS (r2.0) and have no issues at all when tested under Memtest86+ v4.2. For Memtest86 v4.2 I went for the default boot option and it hangs after a few seconds - between 10 - 12 seconds in; no keyboard response. It appears to hang after initialising all 12 processor threads i.e. the 12 active CPU cores start off in 'Waiting' state then one by one proceeds to test but once all 12 threads are past that state the screen freezes. On one attempt some errors were shown immediately before it hung, please see screenshots below/attached.

    Both systems will run OK if we select boot option (6) for the one CPU core only mode.

    It would be great if this issue could be looked into.

    Cheers!
    Edmond



    Comment


    • #3
      Is there any chance you could retest with only a single CPU active in the machine?

      Our guess at this point is that it is related to the multiple CPUs in the machine.

      Comment


      • #4
        Originally posted by David (PassMark) View Post
        Is there any chance you could retest with only a single CPU active in the machine? Our guess at this point is that it is related to the multiple CPUs in the machine.
        I'm sorry, but I can't. Had to have integrator change out fan (and remove heat sink) and it ended up causing another problem and lost a week without the box.

        Comment


        • #5
          Hi David,

          I have just tested a single CPU scenario with 8x8GB ECC RDIMM and also 1x2GB ECC UDIMM but no luck unfortunately (surprisingly). Tested with older r1.1 BIOS and latest r2.0 BIOS.

          Please see screenshots below.

          Cheers
          Edmond



          Comment


          • #6
            Originally posted by Edmond View Post
            I would like to reinforce the OP above in that I have a similar, if not identical, problem... Both systems will run OK if we select boot option (6) for the one CPU core only mode.
            I would also like to add that I am experiencing the same issue running Memtest86 v4.2 and v4.1. It is an IBM Server x3550, dual CPU Quad-core Xeon with 8 x 1GB Quad-Channel Fully-Buffered ECC Registered RAM. It only takes a matter of seconds after running Memtest86 for it to freeze/lock up.

            Running in single-threaded mode seems to run fine.

            No memory errors are reported in either mode.

            Comment


            • #7
              Clearly there is something wrong. We aren't sure if it is related to the Xeons or the ECC RAM.

              Ryoken, I note you have listed your location in Australia. Where about exactly? Maybe we could drop in and do some debugging as we don't have a machine here that has this problem.

              Comment


              • #8
                Hi David, thanks for looking into this. I've sent you a PM. It's good to know you are offering such great support even for free open source software! If this response is anything to go by, I think Memtest86 is in good hands!

                Comment


                • #9
                  Ryoken,

                  I sent you a direct E-mail yesterday regarding the spec of the machine. Not sure if you got it?

                  Comment


                  • #10
                    Hi David, sorry I didn't get the chance to check my email lately. I have received it and have just replied to your email.

                    Comment


                    • #11
                      Some testing was done on a older IBM X3550 server.
                      2 x Intel Xeon Dual Core 5050 3.0Ghz CPU
                      8GB PC2-5300 ECC RAM
                      No problem was seen on this machine.

                      Also did some testing on an older HP WX8200 workstation.
                      Also Dual Xeon
                      4GB PC2-3200 ECC RAM
                      No problem was seen on this machine either.

                      So problem is more likely related to either just the newer Xeons or newer DDR3 ECC RAM. Unfortunately these machines are more expensive and thus harder to get hold of.

                      Comment


                      • #12
                        This exactly the kind of errors the boot trace feature was designed for. Please run with boot trace enabled and provide a screenshot. The boot traces will provide a lot of good information and will narrow down the problem. The test is hanging at the first test that uses more than one CPU.

                        Comment


                        • #13
                          We ended up buying a new Xeon E3 server with ECC RAM.
                          The problem was more or less reproducible but also somewhat random. We didn't find the root cause (with the time and resources available) but implemented a work-around in V4.3 that should allow testing on these machines.

                          See the post on the MemTest86 V4.3 release for more details.

                          Comment


                          • #14
                            Originally posted by David (PassMark) View Post
                            We ended up buying a new Xeon E3 server with ECC RAM.
                            The problem was more or less reproducible but also somewhat random. We didn't find the root cause (with the time and resources available) but implemented a work-around in V4.3 that should allow testing on these machines.

                            See the post on the MemTest86 V4.3 release for more details.
                            Hi there, I just built a new system and am experiencing the same reported issue using Memtest V4.3. Specifically, the machine is a Supermicro X10SLL-F, Xeon E3-1230V3 3.3ghz Haswell, and 16gb of Kingston KVR16E11/8 unbuffered ECC (two dimms).

                            Memtest offers the option of mode selection, and then "locks" at a blue screen with a blinking cursor. The machine has displayed no other issues with various OS boots and liveCD type usage. I'll try the boot trace feature as soon as I get a chance.

                            I was also considering enabling full UEFI mode in BIOS and trying memtest 5.x beta, but haven't yet. I'm open to suggestions and happy to test development builds if you want to send them to me.

                            Comment


                            • #15
                              Yes, in V4.3 try the boot trace mode first to try and find the last line of code executed, then post a photo if you can.

                              You should also try V5 as we have started to add ECC support back into MemTest86 in V5.

                              Comment

                              Working...
                              X