Announcement

Collapse
No announcement yet.

Tricky memtest results; parallel/fail, single/pass

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tricky memtest results; parallel/fail, single/pass

    MemTest86 v7.5 Free running in parallel mode with 8 CPUs (AMD Ryzen 5 2500U, 4 cores, 8 threads with SMT) got 600+ errors in Tests 6 and 7, so I stopped it and ran single CPU mode to confirm the result. It has taken nearly 2.5 hours to finish 1 pass (out of 4) of all 13 tests with no error so far. What does this mean?

    I can include the test log once I stop the running test, if that would be helpful.

    I am testing both a new HP ENVY x360 laptop and new RAM upgrades. The RAM is 2 X 16GB modules of G.Skill Ripjaws DDR4 2666 MHz automatically downclocked to 2400 MHz (no RAM options in BIOS). F4-2666C18-16GRS. The modules are not a kit, but identical otherwise.

    ======

    I have a really confusing situation. Before running MemTest86 I ran HP's Hardware Diagnostics in UEFI and BIOS. The results of HP's memory testing the 2 dimms in the 2 slots, multiple times for each scenario, were as follows:

    dimm1 in slot1, dimm2 in slot2 = fail
    dimm2 in slot1, dimm1 in slot2 = pass (!!!)
    dimm1 in slot1, slot2 empty = fail
    dimm1 in slot2, slot1 empty = pass (!!!)
    dimm2 in slot1, slot2 empty = pass
    dimm2 in slot2, slot1 empty = pass

    In summary, dimm2 never failed, and dimm1 fails in slot1, but not in slot2 (so far, out of 3+ tests). What's going on?

    Is dimm2 defective? Or is slot1 defective? Or something else? Both the laptop and RAM are brand new in their return period, so I could get them replaced, but I don't know what is at fault (if anything) to know who to contact.

    Any help in this time-sensitive and confusing situation is appreciated!

  • #2
    > dimm1 in slot1, dimm2 in slot2 = fail
    > dimm2 in slot1, dimm1 in slot2 = pass (!!!)

    Could maybe explain this, by assuming that the bad memory address shifted to a memory location that isn't being tested. (There is few % of the RAM that can't be tested in all systems)

    > dimm1 in slot1, slot2 empty = fail
    > dimm1 in slot2, slot1 empty = pass (!!!)

    So either the results are intermittent (random and infrequent errors that just happen to occur like this during this test session), or it is a combination of the slot and the RAM stick, that together are marginal. Or both.

    I assume the machine was stable before the RAM upgrade. Therefore I would be betting the RAM is bad and not the slot.



    Comment


    • #3
      Is the part of memory that isn't being tested predictable? eg. at end or beginning of address range. Is this consistent with dimm1 not failing (yet) when in slot2 with slot1 empty?

      The machine was stable before the RAM upgrade, but I haven't really used it after the RAM upgrade. I started testing first thing after I upgraded.

      Are there any other scenarios you'd suggest testing/retesting?

      Those results were with HP's hardware diagnostics. Are they reliable?

      Aside from that, I just completed a 16hr test with Memtest86 with dimm1 in slot2 and dimm2 in slot1. No errors after all 4 passes in single cpu mode. I set it again to all 8 cpus in parallel and I'm again seeing errors in test 6, etc. I'm guessing something is wrong with the parallel implementation?

      In the end, should I ask for replacements of the RAM (both or just dimm1)?
      Last edited by passrami; Mar-03-2018, 03:09 PM.

      Comment


      • #4
        Is the part of memory that isn't being tested predictable?
        Yes, it is but it far more complex that you might suspect. You can view the memory map in MemTest86.

        Those results were with HP's hardware diagnostics. Are they reliable?
        Don't know. Never looked into it.

        Comment


        • #5
          Thank you, David. After extensive testing, I'm concluding that dimm1 might be defective. I will have it replaced.

          I still find it strange that single CPU mode passed but multi CPU mode failed. But this seemed to happen only with dimm1, as far as I can tell. I also tested the original RAM modules that came with the system (not dimm1 or dimm2) and they passed both single and multi CPU modes of Memtest86, as well as HP's memory diagnostics.

          RAM sure is tricky stuff.

          Comment


          • #6
            multi CPU generates more heat, more accesses per second and more EMI.

            Comment


            • #7
              I got the problematic RAM module replaced and re-ran tests with just that module in slot1:

              - It passes HP's hardware/memory diagnostics (multiple runs)
              - It immediately crashes Memtest86 in multi CPU mode. In sequential or round robin mode (forget which) it froze Memtest86. When I tried running just Test 6 (since that's where errors were with previous module) it almost instantly got thousands of errors and the test stopped due to a max limit of errors.
              - In single CPU mode it seemed to work. I didn't complete the whole 4 passes of the test.
              - With the BIOS set to Legacy Boot (ie. non UEFI), multi CPU (parallel) worked and had no errors over all 4 passes of the test.

              Notes:
              - the BIOS is up to date (F16)
              - one of the scenarios (forget which) gave an error like "uefi firmware error could not start cpu" which gave me the idea to try Legacy Boot mode.

              Conclusion: it seems there's a problem either with UEFI, Memtest86, or the computer, or some combination. But I'm guessing this new RAM module is ok.

              Comment


              • #8
                Some motherboards UEFI BIOS bugs which cause problems with multi-threading
                https://www.passmark.com/forum/memte...election-modes

                Comment


                • #9
                  I saw that thread recently, but it doesn't seem to list laptops. It does have an HP motherboard listed though.

                  Comment


                  • #10
                    1) If Memtest86 completes testing with no errors on 2 modules installed simultaneously, does that mean they are both ok? i.e. they don't need to be tested individually, and on different slots? The answer might seem obvious, but I don't want to assume anything given how tricky memory issues are.

                    2) Is it normal for Memtest86 to gauge significantly (20-100%) different L1/L2/L3/memory bandwidth rates on different runs? eg. on one run it assessed 46/35/15/9 GB/s and took 2.5-3 hrs for 1 complete pass, vs another run where on the same system, RAM, and Memtest86 configurations it saw 84/61/21/15 GB/s and ran 1 pass in 1.5 hrs.

                    Comment


                    • #11
                      If there are no errors in dual channel, finding errors in single channel mode is less likely. Plus real life usage is in dual channel mode.
                      Benchmark results should be fairly consistent. e.g. less than 10% variation between runs.
                      Maybe your CPU is overheating and throttling.

                      Comment


                      • #12
                        Originally posted by David (PassMark) View Post
                        Benchmark results should be fairly consistent. e.g. less than 10% variation between runs.
                        Maybe your CPU is overheating and throttling.
                        I doubt it's overheating/throttling. The benchmark is done right at the beginning of the test. The information in Memtest86, if accurate, indicated ~2GHz throughout the test, which is the normal non-boost frequency. Although, this information didn't change in the course of testing, so maybe it was just a snapshot at the beginning. Still, temperatures reported during the test were in the range of 40-75C which are in the normal operating range for the Ryzen 5 2500U.

                        To be clear, the benchmark figures I'm referring to are the static figures shown at the top left of the screen during Memtest86 tests.

                        Comment


                        • #13
                          indicated ~2GHz throughout the test,
                          The clock speed isn't monitored during the RAM test. So even if the clock speed did change, it wouldn't change the display. We only it measure once at start up.

                          We do monitor the temperature in real time however.

                          Comment

                          Working...
                          X