Announcement

Collapse
No announcement yet.

consistent error, but only one test (6) and only w/ parallel CPU -- real problem?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • consistent error, but only one test (6) and only w/ parallel CPU -- real problem?

    Hi! I'm building a new system1, so i ran memtest (v4.2.0) overnight with the default settings. By morning, it had reported 2 errors, both in test 6 (Block Move) and both in a very small range of the bottom 1MB of memory (0x000000140f0 - 0x000000140f8).



    I then selected only test 6 and only the bottom 1MB (0-1m) and sure enough, that area failed consistently



    If i change the CPU selection to anything other than parallel, i cannot reproduce the failures.

    I swapped the two DIMMs and (with parallel cpu) the error persists, but stays at the same address.

    Given all of this, i'm tempted to conclude that the memory is okay, but there is something weird about the following combination:
    1. parallel CPU mode
    2. low addresses (in particular, it may be worth noting that i'm using the onboard video)
    3. Test #6 [Block move]


    Has anyone seen anything similar? Should i worry about the memory or the system? Any more data i should provide? Any more tests i should run? (I don't think i have any other DDR3 sticks, but i can look)

    Thanks!
    --Rob*

    [1] AMD FX 8-core cpu / MSI 760-GM-E51 (FX) mainboard / Patriot DDR3 1333MHz 16GB (2x8GB)

  • #2
    Thanks for the detailed report.

    Is there any other hardware in the machine? I ask this because there was another post claiming errors in low RAM were linked to having an external eSATA drive connected on the machine.

    The addresses are also suspicious. From a bit of Googling these addresses 0x000140f0 - 0x000140f8 can correspond to the address used by a SATA controller's bus mastered DMA (bmdma). In theory MemTest86 should be getting all this data available available RAM from the BIOS, but maybe the BIOS is buggy.

    Can you boot into Windows and check the memory map in device manger. As per the screen shot below to see if the 0x000140f0 - 0x000140f8 is amapped to anything (or just post a screen shot like the one below).


    Comment


    • #3
      np! thanks for the great tool!

      no, there was no other h/w in the machine at the time of the test.

      i don't have windows, so i can't check device manager. "dmesg" doesn't seem to show anything interesting. any interesting linux reporting that might help?

      for comparison, the latest memtest86+ beta (500b6) also appears to run some tests on multiple cpus, but i couldn't find a combination of params that failed. it only does some of the test types in parallel and others appear to be single-cpu.

      Comment


      • #4
        Originally posted by David (PassMark) View Post
        Thanks for the detailed report.

        Is there any other hardware in the machine? I ask this because there was another post claiming errors in low RAM were linked to having an external eSATA drive connected on the machine.

        The addresses are also suspicious. From a bit of Googling these addresses 0x000140f0 - 0x000140f8 can correspond to the address used by a SATA controller's bus mastered DMA (bmdma). In theory MemTest86 should be getting all this data available available RAM from the BIOS, but maybe the BIOS is buggy.

        Can you boot into Windows and check the memory map in device manger. As per the screen shot below to see if the 0x000140f0 - 0x000140f8 is amapped to anything (or just post a screen shot like the one below).


        I'd like to add that I'm experiencing a very similar issue as the OP with Memtest 4.2 and Test #6. There is no onboard video as it is a server/workstation class motherboard. The system is a single socket hex-core Xeon E5-1650 with 4 x 2GB Reg ECC 1600Mhz DDR3 Quad-Channel RAM.

        I didn't have my camera with me at the time, but did scribble down the results after approx 31 hours of testing:

        Errors: 480 errors (on top-right)

        Error Confidence Value: 136
        Lowest Error Address: 00000018ef0 - 0.0MB
        Highest Error Address: 00000018ef8 - 0.0MB
        Bits in Error Mask: fffffffa
        Bits in Error - Total: 30 Min: 19 Max: 24 Avg: 21
        Max Contiguous Errors: 1

        TEST ERRORS
        0 0
        1 0
        2 0
        3 0
        4 0
        5 0
        6 48
        7 0
        8 0
        9 0
        10 0
        It's interesting how the errors don't tally up. Since this is a very low memory range, I wonder if these errors are possibly false-positives? I don't have any external hardware connected besides a keyboard, mouse, and monitor.

        Here is a screenshot of the memory ranges reported by the Windows Device Manager:


        Thanks for looking into this...

        Comment


        • #5
          Glad I found this post. Having a similar issue on an i7 system we have here. We ran some tests on three different sets of memory, even one set of it is mixed (Crucial and Mushkin DIMMs) and everything goes along fine for the first few passes, but between pass 6-9 and always in the same spot, the RAM inexplicably fails.

          The first two sets of RAM were unknowns, but the last (mixed) set we know is good from prior testing with other equipment.

          There's something fishy going on here. I'm going to fire up the system again in a moment and test using the same information on this thread to see if the problem comes up.

          Comment


          • #6
            I'm having a similar issue on a new build[1].

            Does PassMark have any further insight as to whether this might be a "false positive"?

            If I've got time later, I'll provide more detail/some screenshots. (Though mine are pretty similar to those that have been posted already.)

            Thanks.

            [1] AMD FX 8350 8-core CPU, ASUS M5A99X EVO R2.0, Crucial DDR3 1866 16GB (2x8GB), etc.

            Comment


            • #7
              Joe,

              A screen shot would help, or at least a concise description of what test had a problem at what memory addresses after what period of time.

              We can't say for sure at the moment, but we are thinking it is either a bug in Test #6 or a BIOS memory map issue that effects a number of machines.

              The real problem is none of our test machines have this problem (despite having similar FX 8350 hardware here to test on).

              We have been stepping though the code of test #6 trying to have a guess at what might go wrong.

              If anyone were interested in selling (or loaning) their motherboard & CPU, let me know. Debugging it would be a lot quicker if we had a machine with the problem.

              Comment


              • #8
                Just to muddy the waters.

                I found a few threads on other forums where people were also having problems with AMD FX CPUs and MemTest on the block move test.
                Like this one,
                http://forum.corsair.com/v3/showthread.php?t=105501
                In turns out it was a CPU fault in the case above (the memory controller is in the CPU).

                Comment


                • #9
                  I believe I have the same problem in my system consisting of:
                  MSI X79A-GD45 8D with BIOS ver 12.0 and ver 12.2 (tested both)
                  i7-3930k six-core 3.2 GHz
                  4 x 8 Gbyte DDR3 2400 MHz CL10 ( Corsair Dominator platinum 32 Gbyte kit CMD32GX3M4A2400C10 )
                  Quadro 600

                  The system passes all memtest86 4.2.0 test except test #6 where it fails around 348f0 to 348f8. I have tried two other known good DDR3 sticks as well and they also fails on the same spot. This testing has been from bootable CD-ROM.

                  The memory has been switched to different memory slots and it did not change fault condition.
                  I suspect memory overlap not being recognized by memtest86 but the memory map under windows 8 did not show the faulty range as occupied so it could be BIOS not doing it's job correctly.

                  The good news is that memtest86 4.3.0 Beta started from bootable USB stick seems not to fail. Tonight I have tested 32 GByte without fail for first time on this new system. So far I have run two complete pass plus just passed a third run on test#6. Me very happy!

                  Comment


                  • #10
                    A bit too fast in my previous posting. Test failed after three passes but on the same address range:

                    It must be caused by something else than the memory modules?...

                    Comment


                    • #11
                      Originally posted by Perbear View Post
                      I believe I have the same problem in my system consisting of:
                      MSI X79A-GD45 8D with BIOS ver 12.0 and ver 12.2 (tested both)
                      i7-3930k six-core 3.2 GHz
                      4 x 8 Gbyte DDR3 2400 MHz CL10 ( Corsair Dominator platinum 32 Gbyte kit CMD32GX3M4A2400C10 )
                      Quadro 600

                      The system passes all memtest86 4.2.0 test except test #6 where it fails around 348f0 to 348f8. I have tried two other known good DDR3 sticks as well and they also fails on the same spot. This testing has been from bootable CD-ROM.
                      I have now done some further testing using various CPU configurations:

                      Error occurs only when number of CPU cores available to memtest is more than 4 (no of cores can be configured in BIOS)
                      Error occurs only when running test with CPU selection: 1 - Parallel (all)
                      Error occurs only in test #6
                      Error occurs independent of hyperthread setting is on or off
                      Error occurs independent of XMP is on or off

                      Edit: I ran the memtest86 4.2.0 varying the number of enabled cores and noted the memory speed reported. This was with XMP enabled. It is interesting that the reported speed maxed at 4 cores and dropped at 5 and 6 cores:
                      1 core: 17 305 MB/s
                      2 cores: 32 296 MB/s
                      3 cores: 42 655 MB/s
                      4 cores: 51 328 MB/s
                      5 cores: 51 280 MB/s
                      6 cores: 38 622 MB/s

                      If anyone have a i7 six core CPU it would be great if they tried memtest86 4.2.0 to confirm that this is not a false positive in memtest86.

                      Thanks!
                      Last edited by Perbear; Jun-24-2013, 02:20 AM. Reason: Added memory speed for different no of cores enabled

                      Comment


                      • #12
                        Thanks for the additional information.

                        We did some additional testing on a 8 core AMD FX-8150 machine this morning and eventually reproduced the problem. We can see is happening about 1 pass in 10. So with the 16GB of RAM in our machine and running all tests it can take a day or more of running MemTest86 to see one instance of the problem. Which is why we didn't see it reproduced earlier.

                        Just running test 6 by itself, in a loop, and reducing the amount of ram to test to just 2GB, meant that we were able to get many more passes done in a short period. So we think we can reproduce the problem fairly reliably in under an hour now.

                        Reproducing the problem is half the work in fixing it. So this is some progress.

                        So what is sure at this point is that is isn't a fault in the RAM. As we have seen it with a couple of different sticks of RAM now. It also doesn't appear to be related to the CPU type as both Intel and AMD machine are effected.

                        So it is looking much more likely that it is in fact a MemTest86 bug.

                        So there will surely be more news later in the week after we do a bit more investigation.

                        Comment


                        • #13
                          I continued setting up my workstation and it seems stable, running 15 million cell CFD problems without indication of any RAM or other hardware problems.

                          When you have updated software I would like to test it.

                          Comment


                          • #14
                            We spent a few more days looking at the problem, but didn't make any progress on fixing it. There are some work around solutions, like turning off multi-threading for test #6, but we would prefer to fix the root cause.

                            If we don't solve it in a few more days, we'll probably implement a work around in V4.3 and instead concentrate on getting it to work properly in V5.

                            Comment


                            • #15
                              We ended up implementing a work-around in the V4.3 release that should avoid this problem.

                              See the post on the MemTest86 V4.3 release for more details.

                              Comment

                              Working...
                              X