Announcement

Collapse
No announcement yet.

Memtest 86 ECC Failure on Cold Boot

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memtest 86 ECC Failure on Cold Boot

    Hello

    My hardware is
    i3-6102E
    MT40A2G8SA-062E SDRAM - DDR4 Memory IC 16Gbit Parallel 1.6 GHz 19 ns 78-FBGA (7.5x11)

    MemTest86 On the first pass catches 6 ECC errors, shown below in the test logs. However when rerunning the test, the ECC errors don't show up anymore. Does MemTest save this configuration or fix somewhere? We'd like to see what they did to fix the ECC Error and maybe figure out the solution from there.

    We saw an earlier post with similiar issue. The recommended fix was to disable Quick Boot in the BIOS option. Unfortunately we do not have this option.



    Test Logs
    2024-08-09 15:58:04 - =============================================
    2024-08-09 15:58:04 - MemTest86 V11.0 Pro Build: 1000 (64-bit)
    2024-08-09 15:58:04 - =============================================
    2024-08-09 15:58:04 - Changed to new save location to: FS0
    2024-08-09 15:58:04 - Enabling graphics mode
    2024-08-09 15:58:04 - Get screen size
    2024-08-09 15:58:04 - Current screen size: 1024 x 768
    2024-08-09 15:58:09 - Initializing spin lock (Align=64)
    2024-08-09 15:58:09 - Intel Skylake chipset init
    2024-08-09 15:58:09 - MCHBAR_LO=FED10001
    2024-08-09 15:58:09 - MCHBAR_HI=00000000
    2024-08-09 15:58:09 - MCHBAR=00000000FED10000
    2024-08-09 15:58:09 - CAPID0_A=60012671
    2024-08-09 15:58:09 - CAPID0_B=940400D8
    2024-08-09 15:58:09 - CAPID0_C=0006C000
    2024-08-09 15:58:10 - MAD_INTER=00000000
    2024-08-09 15:58:10 - [Ch0] TC_PRE=2208230F
    2024-08-09 15:58:10 - [Ch0] TC_ODT=29AF0010
    2024-08-09 15:58:10 - MAD_INTRA_0=00003110
    2024-08-09 15:58:10 - MAD_DIMM_0=00000010
    2024-08-09 15:58:10 - Ch0 DIMM L size=8589934592
    2024-08-09 15:58:10 - Ch0 DIMM L ranks=1
    2024-08-09 15:58:10 - Ch0 DIMM L width=x8
    2024-08-09 15:58:10 - Ch0 DIMM S size=0
    2024-08-09 15:58:10 - Ch0 DIMM S ranks=1
    2024-08-09 15:58:10 - Ch0 DIMM S width=x8
    2024-08-09 15:58:10 - Ch0 DIMM L map=0
    2024-08-09 15:58:10 - [Ch1] TC_PRE=18061C08
    2024-08-09 15:58:10 - [Ch1] TC_ODT=10C50000
    2024-08-09 15:58:10 - MAD_INTRA_1=00000000
    2024-08-09 15:58:10 - MAD_DIMM_1=00000000
    2024-08-09 15:58:10 - Ch1 DIMM L size=0
    2024-08-09 15:58:10 - Ch1 DIMM L ranks=1
    2024-08-09 15:58:10 - Ch1 DIMM L width=x8
    2024-08-09 15:58:10 - Ch1 DIMM S size=0
    2024-08-09 15:58:10 - Ch1 DIMM S ranks=1
    2024-08-09 15:58:10 - Ch1 DIMM S width=x8
    2024-08-09 15:58:10 - Ch1 DIMM L map=0
    2024-08-09 15:58:10 - ERRSTS=0000
    2024-08-09 15:58:10 - *** TEST SESSION - 2024-08-09 15:58:10 ***
    2024-08-09 15:58:10 - CPU selection mode = 1
    2024-08-09 15:58:10 - poll_timings_skylake - SA_PERF_STATUS=13000810 (qclk_ratio=16, qclk_ref=0)
    2024-08-09 15:58:10 - get_mem_ctrl_timings - 2132 MT/s (15-15-15-35)
    2024-08-09 15:58:10 - ReadMemoryRanges - Available Pages = 4144360
    2024-08-09 15:58:10 - Locking all memory ranges first...
    2024-08-09 15:58:10 - Skipping memory range 0x0 - 0x58000 (352KB). Range too small.
    2024-08-09 15:58:10 - Skipping memory range 0x59000 - 0x9E000 (276KB). Range too small.
    2024-08-09 15:58:10 - Memory range locked: 0x100000 - 0xAD815000 (13735756KB of available memory left)
    2024-08-09 15:58:10 - Memory range locked: 0xAD855000 - 0xB45FB000 (13623476KB of available memory left)
    2024-08-09 15:58:10 - Memory range locked: 0xB5805000 - 0xB6BF7000 (13603052KB of available memory left)
    2024-08-09 15:58:10 - Skipping memory range 0xBB653000 - 0xBB8A5000 (2376KB). Range too small.
    2024-08-09 15:58:10 - Skipping memory range 0xBBF58000 - 0xBC0A4000 (1328KB). Range too small.
    2024-08-09 15:58:10 - Remaining memory is less than 16MB. Reducing memory range from 0x100000000 - 0x43E000000 (13598720KB) => 0x100000000 - 0x43D43B000 (13586668KB)
    2024-08-09 15:58:10 - Memory range locked: 0x100000000 - 0x43D43B000 (16384KB of available memory left)
    2024-08-09 15:58:10 - All memory ranges successfully locked
    2024-08-09 15:58:10 - Starting pass #1 (of 4)
    2024-08-09 15:58:10 - poll_timings_skylake - SA_PERF_STATUS=13000810 (qclk_ratio=16, qclk_ref=0)
    2024-08-09 15:58:10 - get_mem_ctrl_timings - 2132 MT/s (15-15-15-35)
    2024-08-09 15:58:10 - Current mem timings: 2132 MT/s (15-15-15-35)
    2024-08-09 15:58:10 - Current CPU temperature: 46C
    2024-08-09 15:58:10 - Running test #0 (Test 0 [Address test, walking ones, 1 CPU])
    2024-08-09 15:58:10 - MtSupportRunAllTests - Setting random seed to 0x50415353
    2024-08-09 15:58:10 - MtSupportRunAllTests - Start time: 387 ms
    2024-08-09 15:58:10 - MtSupportRunAllTests - Enabling memory cache for test
    2024-08-09 15:58:10 - MtSupportRunAllTests - Enabling memory cache complete
    2024-08-09 15:58:10 - Start memory range test (0x0 - 0x43E000000)
    2024-08-09 15:58:11 - ERRSTS=0003
    2024-08-09 15:58:11 - ERRLOG0[0]=00D10003
    2024-08-09 15:58:11 - ERRLOG1[0]=00010000
    2024-08-09 15:58:11 - [MEM ERROR - ECC Errors] Test: 0, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,10000,0), ECC Corrected: yes, Syndrome: 00D1, Channel/Slot: 0-0
    2024-08-09 15:58:11 - ERRLOG0[1]=00000000
    2024-08-09 15:58:11 - ERRLOG1[1]=00000000
    2024-08-09 15:58:11 - MtSupportRunAllTests - Test execution time: 1.060s (Test 0 cumulative error count: 0, buffer full count: 0)
    2024-08-09 15:58:11 - Running test #1 (Test 1 [Address test, own address, 1 CPU])
    2024-08-09 15:58:11 - MtSupportRunAllTests - Setting random seed to 0x50415353
    2024-08-09 15:58:11 - MtSupportRunAllTests - Start time: 1525 ms
    2024-08-09 15:58:11 - MtSupportRunAllTests - Enabling memory cache for test
    2024-08-09 15:58:11 - MtSupportRunAllTests - Enabling memory cache complete
    2024-08-09 15:58:11 - Start memory range test (0x0 - 0x43E000000)
    2024-08-09 15:58:14 - ERRSTS=0003
    2024-08-09 15:58:14 - ERRLOG0[0]=00820003
    2024-08-09 15:58:14 - ERRLOG1[0]=10110000
    2024-08-09 15:58:14 - [MEM ERROR - ECC Errors] Test: 1, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,10000,, ECC Corrected: yes, Syndrome: 0082, Channel/Slot: 0-0
    2024-08-09 15:58:14 - ERRLOG0[1]=00000000
    2024-08-09 15:58:14 - ERRLOG1[1]=00000000
    2024-08-09 15:58:15 - ERRSTS=0003
    2024-08-09 15:58:15 - ERRLOG0[0]=00820003
    2024-08-09 15:58:15 - ERRLOG1[0]=10110C00
    2024-08-09 15:58:15 - [MEM ERROR - ECC Errors] Test: 1, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,10C00,, ECC Corrected: yes, Syndrome: 0082, Channel/Slot: 0-0
    2024-08-09 15:58:15 - ERRLOG0[1]=00000000
    2024-08-09 15:58:16 - ERRLOG1[1]=00000000
    2024-08-09 15:58:17 - ERRSTS=0003
    2024-08-09 15:58:17 - ERRLOG0[0]=00820003
    2024-08-09 15:58:17 - ERRLOG1[0]=10115C00
    2024-08-09 15:58:17 - [MEM ERROR - ECC Errors] Test: 1, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,15C00,, ECC Corrected: yes, Syndrome: 0082, Channel/Slot: 0-0
    2024-08-09 15:58:17 - ERRLOG0[1]=00000000
    2024-08-09 15:58:17 - ERRLOG1[1]=00000000
    2024-08-09 15:58:18 - ERRSTS=0003
    2024-08-09 15:58:18 - ERRLOG0[0]=00820003
    2024-08-09 15:58:18 - ERRLOG1[0]=1011AC00
    2024-08-09 15:58:18 - [MEM ERROR - ECC Errors] Test: 1, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,1AC00,, ECC Corrected: yes, Syndrome: 0082, Channel/Slot: 0-0
    2024-08-09 15:58:18 - ERRLOG0[1]=00000000
    2024-08-09 15:58:18 - ERRLOG1[1]=00000000
    2024-08-09 15:58:18 - MtSupportRunAllTests - Test execution time: 6.790s (Test 1 cumulative error count: 0, buffer full count: 0)
    2024-08-09 15:58:18 - Running test #2 (Test 2 [Address test, own address])
    2024-08-09 15:58:18 - MtSupportRunAllTests - Setting random seed to 0x50415353
    2024-08-09 15:58:18 - MtSupportRunAllTests - Start time: 8392 ms
    2024-08-09 15:58:18 - MtSupportRunAllTests - Enabling memory cache for test
    2024-08-09 15:58:18 - MtSupportRunAllTests - Enabling memory cache complete
    2024-08-09 15:58:18 - Start memory range test (0x0 - 0x43E000000)
    2024-08-09 15:58:19 - ERRSTS=0003
    2024-08-09 15:58:19 - ERRLOG0[0]=00820003
    2024-08-09 15:58:19 - ERRLOG1[0]=1011FC00
    2024-08-09 15:58:19 - [MEM ERROR - ECC Errors] Test: 2, (Chan,Slot,Rank,Bank,Row,Col): (0,0,0,0,1FC00,, ECC Corrected: yes, Syndrome: 0082, Channel/Slot: 0-0
    2024-08-09 15:58:19 - ERRLOG0[1]=00000000
    2024-08-09 15:58:19 - ERRLOG1[1]=00000000
    2024-08-09 15:58:26 - MtSupportRunAllTests - Test execution time: 7.690s (Test 2 cumulative error count: 0, buffer full count: 0)
    2024-08-09 15:58:26 - Running test #3 (Test 3 [Moving inversions, ones & zeroes])
    2024-08-09 15:58:26 - MtSupportRunAllTests - Setting random seed to 0x50415353
    2024-08-09 15:58:26 - MtSupportRunAllTests - Start time: 16160 ms
    2024-08-09 15:58:26 - MtSupportRunAllTests - Enabling memory cache for test
    2024-08-09 15:58:26 - MtSupportRunAllTests - Enabling memory cache complete
    2024-08-09 15:58:26 - Start memory range test (0x0 - 0x43E000000)
    2024-08-09 15:58:42 - poll_timings_skylake - SA_PERF_STATUS=13000810 (qclk_ratio=16, qclk_ref=0)
    2024-08-09 15:58:42 - get_mem_ctrl_timings - 2132 MT/s (15-15-15-35)
    2024-08-09 15:58:42 - MtSupportRunAllTests - Test execution time: 16.025s (Test 3 cumulative error count: 0, buffer full count: 0)
    2024-08-09 15:58:42 - Test aborted
    2024-08-09 15:58:42 - Cleanup - Unlocking all memory ranges...
    2024-08-09 15:58:42 - All memory ranges successfully unlocked
    2024-08-09 15:58:44 - Test result: INCOMPLETE PASS (Errors: 0)
    2024-08-09 15:58:46 - Supported motherboard ("To be filled by O.E.M.") and chipset ("Intel Skylake") detected. Display individual DIMM results.
    2024-08-09 15:58:46 - Enabling graphics mode
    2024-08-09 15:58:46 - Get screen size
    2024-08-09 15:58:46 - Current screen size: 1024 x 768
    2024-08-09 15:58:46 - [DIMM 0] Errs=0 Ranks=1 Side=0 Chips=8
    2024-08-09 15:58:46 - [DIMM 1] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:46 - [DIMM 2] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:46 - [DIMM 3] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:54 - Saving screenshot to DIMMResults-20240809-155854.bmp
    2024-08-09 15:58:54 - [DIMM 0] Errs=0 Ranks=1 Side=0 Chips=8
    2024-08-09 15:58:54 - [DIMM 0] Errs=0 Ranks=1 Side=1 Chips=0
    2024-08-09 15:58:54 - [DIMM 1] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:54 - [DIMM 1] Errs=0 Ranks=0 Side=1 Chips=0
    2024-08-09 15:58:54 - [DIMM 2] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:54 - [DIMM 2] Errs=0 Ranks=0 Side=1 Chips=0
    2024-08-09 15:58:54 - [DIMM 3] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:58:54 - [DIMM 3] Errs=0 Ranks=0 Side=1 Chips=0
    2024-08-09 15:58:55 - Screenshot successfully saved to "DIMMResults-20240809-155854.bmp"
    2024-08-09 15:59:01 - [DIMM 0] Errs=0 Ranks=1 Side=0 Chips=8
    2024-08-09 15:59:01 - [DIMM 1] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:01 - [DIMM 2] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:01 - [DIMM 3] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:02 - [DIMM 0] Errs=0 Ranks=1 Side=0 Chips=8
    2024-08-09 15:59:02 - [DIMM 1] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:02 - [DIMM 2] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:02 - [DIMM 3] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:03 - [DIMM 0] Errs=0 Ranks=1 Side=0 Chips=8
    2024-08-09 15:59:03 - [DIMM 1] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:03 - [DIMM 2] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:03 - [DIMM 3] Errs=0 Ranks=0 Side=0 Chips=0
    2024-08-09 15:59:05 - Display test result summary
    2024-08-09 15:59:08 - Enabling graphics mode
    2024-08-09 15:59:08 - Get screen size
    2024-08-09 15:59:08 - Current screen size: 1024 x 768
    2024-08-09 15:59:08 - poll_timings_skylake - SA_PERF_STATUS=13000810 (qclk_ratio=16, qclk_ref=0)
    2024-08-09 15:59:08 - get_mem_ctrl_timings - 2132 MT/s (15-15-15-35)













  • #2
    What motherboard and BIOS version are you running?

    As you pointed out there is a known issue like this.

    In order to initialize ECC, memory has to be written before it can be used (which can be slow). Usually this is done by BIOS during boot, but with some motherboards this step is skipped if "Quick Boot" is enabled. So this is really a BIOS bug, or a least a behaviour that should be documented by the motherboard vendor (saying ECC errors at initial power on are normal and expected). In my opinion it is a bug if there is no option to disable a quick boot.

    Previously this was only a known problem on KONTRON AMI BIOS.
    So it would be interesting to know what you are running.

    There is also a small chance that errors occur only when the RAM is cold (temperature wise). But doing running the test for 15min, then doing a power cycle should allow testing for this.


    Comment


    • #3
      Hi David,

      Thank you for the quick response. We are the mother board designer and am running AMI APTIO V BIOS.

      We have followed your suggestion and ran the test for 15 min followed by a power cycle, the ECC error still shows up.

      Does MemTest change any configurations when running the MemTest? Also is there a way to change the test to display virtual memory address on error instead of dram address?


      Comment


      • #4
        followed by a power cycle, the ECC error still shows up
        Likely BIOS bug in that case. Maybe that BIOS doesn't support ECC at all?

        Does MemTest change any configurations when running the MemTest?
        No.
        Whatever was set in BIOS, then that is what is used.

        display virtual memory address
        Virtual addresses are really a construct of the operating system. Meaning different operating systems are likely to map out the RAM differently.
        Also physical memory might be limited to 128GB, but virtual memory is typically be 2^48 = 256TB (maybe even 2^64 = 1.8e+19, eventually, in future CPUs and OS).
        So lots of ways it could be mapped.

        Comment

        Working...
        X