Announcement

Collapse
No announcement yet.

'Number of errors exceed max count', but also 'CPU #x timed out', is this RAM bad?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 'Number of errors exceed max count', but also 'CPU #x timed out', is this RAM bad?

    Testing a laptop, and pass 3, test #4 failed with many many errors.

    Do the CPU timeouts (e.g. "RunMemoryRangeTest - CPU #2 timed out") mean the test is a false positive?

    Tests 0 thru 3 reported no errors, but also had CPU time outs.

    Images attached. I have a log for
    "Moving inversions, 8-bit pattern (for this log).jpg", but the forum at the moment won't let me upload it.

    Pass 3, Test #4from the log:
    2019-06-18 23:57:37 - Running test #4 (Test 4 [Moving inversions, 8-bit pattern])
    2019-06-18 23:57:37 - MtSupportRunAllTests - Setting random seed to 0xCCCF43DC
    2019-06-18 23:57:37 - MtSupportRunAllTests - Start time: 4791362 ms
    2019-06-18 23:57:37 - ReadMemoryRanges - Available Pages = 1914690
    2019-06-18 23:57:37 - MtSupportRunAllTests - Enabling memory cache for test
    2019-06-18 23:57:37 - MtSupportRunAllTests - Enabling memory cache complete
    2019-06-18 23:57:37 - Start memory range test (0x0 - 0x21F000000)
    2019-06-18 23:57:37 - Pre-allocating memory ranges >=16MB first...
    2019-06-18 23:57:37 - All memory ranges successfully locked
    2019-06-18 23:57:37 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 7ms)
    2019-06-18 23:57:37 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 7ms)
    2019-06-18 23:57:42 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4486ms)
    2019-06-18 23:57:42 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4486ms)
    2019-06-18 23:57:46 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:57:46 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:57:51 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:57:51 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:57:55 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4475ms)
    2019-06-18 23:57:55 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4475ms)
    2019-06-18 23:58:00 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:58:00 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4474ms)
    2019-06-18 23:58:04 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:04 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:09 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:09 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:13 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:14 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4473ms)
    2019-06-18 23:58:18 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 4475ms)
    2019-06-18 23:58:18 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4475ms)
    2019-06-18 23:58:18 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 71ms)
    2019-06-18 23:58:18 - RunMemoryRangeTest - CPU #2 timed out, test time = 0ms (BSP test time = 71ms)
    2019-06-18 23:58:19 - Cleanup - Releasing all memory ranges...
    2019-06-18 23:58:19 - MtSupportRunAllTests - Test execution time: 41.790 (Test 4 cumulative error count: 235627)
    2019-06-18 23:58:19 - Number of errors exceed maximum error count of 10000
    2019-06-19 07:47:37 - Get_AMD_15_CurTmp: SMU_INDEX_0=0xD8200CA4 SMU_DATA_0=0x2D800FEF
    2019-06-19 07:47:37 - Get_AMD_15_CurTmp: Setting SMU_INDEX_0 to 0xD8200CA4
    2019-06-19 07:47:38 - Get_AMD_15_CurTmp: SMU_DATA_0=0x2D800FEF
    2019-06-19 07:47:38 - GetAMD15Temp - Temperature: 45
    Thanks,
    Mark
    Attached Files

  • #2
    There are a number of UEFI BIOSs with multi-threading bugs.

    Can you try running the tests in single threaded mode (1 CPU).

    Comment


    • #3
      MemTest86 was run twice (that we have logs for), first with default settings, then with CPU selection set to Parallel

      The first time with no manual CPU selection, and these (selected) lines were in the log for the first run:
      Line 383: 2019-06-18 16:41:30 - MP test failed. Setting default CPU mode to SINGLE
      Line 410: 2019-06-18 16:44:24 - CPU selection mode = 3
      Line 415: 2019-06-18 16:44:24 - Starting pass #1 (of 4)
      Line 420: 2019-06-18 16:44:24 - Running test #0 (Test 0 [Address test, walking ones, 1 CPU])
      (I presume 3 is single.)
      That first test ran for 6 or so hours, with no reported problems.

      The second time, CPU Selection was manually set to Parallel, with these (selected) lines in the log:
      Line 3385: 2019-06-18 22:28:44 - Finished pass #4 (of 4) (Cumulative error count: 0)
      Line 3391: 2019-06-18 22:34:04 - CPU selection mode = 1
      Line 3396: 2019-06-18 22:34:04 - Starting pass #1 (of 4)
      Line 3401: 2019-06-18 22:34:04 - Running test #0 (Test 0 [Address test, walking ones, 1 CPU])
      Line 3422: 2019-06-18 22:36:39 - =============================================
      Line 3423: 2019-06-18 22:36:39 - MemTest86 V8.2 Free Build: 1000 (64-bit)
      Line 3424: 2019-06-18 22:36:39 - =============================================
      Line 3792: 2019-06-18 22:36:58 - MP test failed. Setting default CPU mode to SINGLE
      Line 3805: 2019-06-18 22:37:28 - CPU selection mode = 1
      Line 3811: 2019-06-18 22:37:28 - Starting pass #1 (of 4)
      Line 3817: 2019-06-18 22:37:28 - Running test #0 (Test 0 [Address test, walking ones, 1 CPU])
      Line 3827: 2019-06-18 22:37:29 - MtSupportRunAllTests - Test execution time: 0.687 (Test 0 cumulative error count: 0)
      Line 3832: 2019-06-18 22:37:29 - Running test #1 (Test 1 [Address test, own address, 1 CPU])
      Line 3846: 2019-06-18 22:37:36 - MtSupportRunAllTests - Test execution time: 6.534 (Test 1 cumulative error count: 0)
      Line 3853: 2019-06-18 22:37:46 - CPU selection mode = 1
      Line 3858: 2019-06-18 22:37:46 - Starting pass #1 (of 4)
      Line 3863: 2019-06-18 22:37:46 - Running test #0 (Test 0 [Address test, walking ones, 1 CPU])
      looks to me like this is one of those BIOSs?

      Log also says about BIOS:
      2019-06-18 16:41:11 - SMBIOS BIOS INFO Vendor: "LENOVO", Version: "5QCN20WW", Release Date: "11/29/2017"
      2019-06-18 16:41:11 - SMBIOS SYSTEM INFO Manufacturer: "LENOVO", Product: "80XS", Version: "Lenovo ideapad 320-15ABR", S/N: "PF0YU7QB", SKU: "", Family: ""
      2019-06-18 16:41:11 - SMBIOS: Found SMBIOS BaseboardInformation (pbLinAddr=0xBBA92672, FormattedLen=15, iTotalLen=96)
      2019-06-18 16:41:11 - SMBIOS BASEBOARD INFO Manufacturer: "LENOVO", Product: "LNVNB161216", Version: "SDK0J40700WIN", S/N: "PF0YU7QB", AssetTag: "No Asset Tag", LocationInChassis: "Chassis Location Unknown"
      2019-06-18 16:41:11 - EFI Specifications: 2.50
      (snip)
      2019-06-18 16:41:12 - Detected blacklisted baseboard (Product Name: "LNVNB161216", BIOS version: "5QCN20WW")
      2019-06-18 16:41:12 - MemTest86 has detected a baseboard that requires console control to be disabled
      2019-06-18 16:41:12 - Console Control protocol workaround disabled
      although that blacklist relates to "console control", rather than multi-threading, but still...

      Comment


      • #4
        To be more clear:

        I don't have the laptop at the moment to try running in single threaded mode, but it looks like the logs from the previous run show that run 1 was in single mode anyway, because of the "MP test failed" message.

        When I next have the laptop, I will run in single threaded mode, but that looks to be moot at the moment, waiting your confirmation.

        Thanks.

        Comment


        • #5
          Can you post the details of the errors.

          Comment


          • #6
            Here is the html summary:
            Summary

            Report Date 2019-06-19 07:47:37
            Generated by MemTest86 V8.2 Free (64-bit)
            Result FAIL
            System Information

            EFI Specifications 2.50
            System
            Manufacturer LENOVO
            Product Name 80XS
            Version Lenovo ideapad 320-15ABR
            Serial Number PF0YU7QB
            BIOS
            Vendor LENOVO
            Version 5QCN20WW
            Release Date 11/29/2017
            Baseboard
            Manufacturer LENOVO
            Product Name LNVNB161216
            Version SDK0J40700WIN
            Serial Number PF0YU7QB
            CPU Type AMD A12-9720P RADEON R7, 12 COMPUTE CORES 4C+8G
            CPU Clock 2695 MHz [Turbo: 3594.4 MHz]
            # Logical Processors 4
            L1 Cache 4 x 128K (34377 MB/s)
            L2 Cache 4 x 1024K (30139 MB/s)
            L3 Cache N/A
            Memory 7622M (4636 MB/s)
            DIMM Slot #0 4GB DDR4 PC4-19200
            Samsung / M471A5143SB1-CRC / 217C259A
            17-17-17-39 / 2400 MHz / 1.2V
            Result summary

            Test Start Time 2019-06-18 22:37:46
            Elapsed Time 1:20:33
            Memory Range Tested 0x0 - 21F000000 (8688MB)
            CPU Selection Mode Parallel (All CPUs)
            ECC Polling Enabled
            # Tests Passed 28/28 (100%)
            Lowest Error Address 0x98100000 (2433MB)
            Highest Error Address 0x981E61A8 (2433MB)
            Bits in Error Mask 00000000FFFFFFFF
            Bits in Error 32
            Max Contiguous Errors 2
            Test # Tests Passed Errors
            Test 0 [Address test, walking ones, 1 CPU] 3/3 (100%) 0
            Test 1 [Address test, own address, 1 CPU] 3/3 (100%) 0
            Test 2 [Address test, own address] 3/3 (100%) 0
            Test 3 [Moving inversions, ones & zeroes] 3/3 (100%) 0
            Test 4 [Moving inversions, 8-bit pattern] 2/2 (100%) 471253
            Test 5 [Moving inversions, random pattern] 2/2 (100%) 0
            Test 6 [Block move, 64-byte blocks] 2/2 (100%) 0
            Test 7 [Moving inversions, 32-bit pattern] 2/2 (100%) 0
            Test 8 [Random number sequence] 2/2 (100%) 0
            Test 9 [Modulo 20, ones & zeros] 2/2 (100%) 0
            Test 10 [Bit fade test, 2 patterns, 1 CPU] 2/2 (100%) 0
            Test 13 [Hammer test] 2/2 (100%) 0
            Last 10 Errors
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60B0, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60B4, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60B8, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60BC, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60C0, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60C4, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60C8, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60CC, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60D0, Expected: 7F7F7F7F, Actual: 80808080
            2019-06-18 23:58:19 - [Data Error] Test: 4, CPU: 3, Address: 981E60D4, Expected: 7F7F7F7F, Actual: 80808080

            Although the log says:
            Line 6253: 2019-06-18 23:58:19 - MtSupportRunAllTests - Test execution time: 41.790 (Test 4 cumulative error count: 235627)

            The screen dump for this test has the first 10 errors, and has Addresses between 981000D7 and 981000F8, and are all Expected: 80808080, Actual:
            7F7F7F7F CPU: 3 (see image
            Moving inversions, 8-bit pattern (for this log).jpg (119.4 KB, 9 views)
            attached to original post)

            The other screen dumps (we don't have logs for these) have:
            Test 8, Addresses between 1018EB and 1018E3, with various Expected and Actual - see image
            Random number sequence (no log for this).jpg (121.9 KB, 2 views)
            Test 5, Addresses between C0FFF1B and C0FFF24, and are all Expected: 9C824F35, Actual 637DB0CA, CPU 2

            (I you meant something else by "details of the errors", then sorry, can you please ask again with more detail of what I should be finding and posting?)

            Thanks

            Comment


            • #7
              Interesting case.

              The Bits in Error Mask of 00000000FFFFFFFF would seem to indicate that this isn't a typical RAM failure.
              Typical RAM failures have only 1 or 2 bits in error.

              So the next most common issue if the UEFI memory map being wrong (buggy firmware). So a block of addresses is being testing, but it is also in use by some other memory mapped hardware component at the same time. But in this case you typically see lots of bits in error, but the error addresses remaining the same across all the test. But in this case you have multiple blocks, (981000D7, 1018EB & C0FFF1B). And it is hard to believe that so many blocks have been incorrectly mapped in BIOS.

              All the screen shots and the logs still show multi-threading mode being used. I'd still like to see it in single threaded mode.

              Comment


              • #8
                It seems I'm now allowed to upload ZIP files? The log is inside.

                From line 1 (2019-06-18 16:41:11) to line 3385 (2019-06-18 22:28:44), I believe is in single threaded mode, but can you please confirm.
                (It was run with the default CPU selection, but the log says "MP test failed. Setting default CPU mode to SINGLE", and "CPU selection mode = 3", so I presume that is a single thread test.)
                It ran with no errors passes 1 thru 4.

                After that (lines 3386 and below) is the CPU Selection = Parallel test which failed.

                (I don't have access to the laptop at the moment to run another fresh single threaded mode test.)
                Attached Files

                Comment

                Working...
                X