Announcement

Collapse
No announcement yet.

ECC errors - not happening all times

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ECC errors - not happening all times

    Hello,

    today i had a power outage for a few minutes, which was covered by the UPS. The system did a emergency shutdown. After booting the system i noticed a bunch of these errors in the syslog:

    Code:
    Jun 6 08:11:20 prxsrv kernel: [ 0.386924] EDAC MC: Ver: 3.0.0
    Jun 6 08:11:20 prxsrv kernel: [ 12.168935] EDAC MC0: Giving out device to module ie31200_edac controller IE31200: DEV 0000:00:00.0 (POLLED)
    Jun 6 08:11:22 prxsrv kernel: [ 15.218862] EDAC MC0: 1 UE ie31200 UE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x0 offset:0x0 grain:
    Jun 6 08:11:22 prxsrv kernel: [ 15.218864] EDAC MC0: 1 UE ie31200 UE on mc#0csrow#2channel#1 (csrow:2 channel:1 page:0x0 offset:0x0 grain:
    Jun 6 08:11:26 prxsrv kernel: [ 19.314923] EDAC MC0: 1 UE ie31200 UE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x0 offset:0x0 grain:
    Jun 6 08:11:33 prxsrv kernel: [ 25.462965] EDAC MC0: 1 UE ie31200 UE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x0 offset:0x0 grain:
    Jun 6 08:11:52 prxsrv kernel: [ 44.904811] EDAC MC0: 1 UE UE overwrote CE on any memory ( page:0x0 offset:0x0 grain:
    the system:
    Code:
     [TABLE]
    [TR]
    [TD="class: value, width: 35%"]EFI Specifications[/TD]
     			[TD="class: altvalue, width: 65%"]2.40[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]System[/TD]
     			[TD="class: altvalue, width: 65%"] [/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Manufacturer[/TD]
     			[TD="class: altvalue, width: 65%"]Supermicro[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Product Name[/TD]
     			[TD="class: altvalue, width: 65%"]Super Server[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Version[/TD]
     			[TD="class: altvalue, width: 65%"]0123456789[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Serial Number[/TD]
     			[TD="class: altvalue, width: 65%"]0123456789[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]BIOS[/TD]
     			[TD="class: altvalue, width: 65%"] [/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Vendor[/TD]
     			[TD="class: altvalue, width: 65%"]American Megatrends Inc.[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Version[/TD]
     			[TD="class: altvalue, width: 65%"]2.5[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Release Date[/TD]
     			[TD="class: altvalue, width: 65%"]11/26/2020[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Baseboard[/TD]
     			[TD="class: altvalue, width: 65%"] [/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Manufacturer[/TD]
     			[TD="class: altvalue, width: 65%"]Supermicro[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Product Name[/TD]
     			[TD="class: altvalue, width: 65%"]X11SSH-F[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Version[/TD]
     			[TD="class: altvalue, width: 65%"]1.01[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]Serial Number[/TD]
     			[TD="class: altvalue, width: 65%"]ZM17AS029357[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]CPU Type[/TD]
     			[TD="class: altvalue, width: 65%"]Intel Xeon E3-1245 v6 @ 3.70GHz[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]CPU Clock[/TD]
     			[TD="class: altvalue, width: 65%"]3697 MHz [Turbo: 3776.6 MHz][/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]# Logical Processors[/TD]
     			[TD="class: altvalue, width: 65%"]8 (4 enabled for testing)[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]L1 Cache[/TD]
     			[TD="class: altvalue, width: 65%"]4 x 64K (50607 MB/s)[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]L2 Cache[/TD]
     			[TD="class: altvalue, width: 65%"]4 x 256K (22413 MB/s)[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]L3 Cache[/TD]
     			[TD="class: altvalue, width: 65%"]8192K (13610 MB/s)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Memory[/TD]
     			[TD="class: altvalue, width: 65%"]65356M (8160 MB/s)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Number of RAM SPDs detected[/TD]
     			[TD="class: altvalue, width: 65%"]4[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]SPD #0[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / 03C41ABB[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]19-19-19-43 / 2666 MHz / 1.2V[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]SPD #1[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / F6841ABA[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]19-19-19-43 / 2666 MHz / 1.2V[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]SPD #2[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / F6841088[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]19-19-19-43 / 2666 MHz / 1.2V[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]SPD #3[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / EB84198F[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]19-19-19-43 / 2666 MHz / 1.2V[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Number of RAM slots[/TD]
     			[TD="class: altvalue, width: 65%"]4[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Number of RAM modules[/TD]
     			[TD="class: altvalue, width: 65%"]4[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]DIMM Slot #0[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / 03C41ABB[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]2667 MHz[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]DIMM Slot #1[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / F6841ABA[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]2667 MHz[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]DIMM Slot #2[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / F6841088[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]2667 MHz[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"]DIMM Slot #3[/TD]
     			[TD="class: altvalue, width: 65%"]16GB DDR4 ECC PC4-21300[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]Kingston / 9965684-034.A00G / EB84198F[/TD]
     		[/TR]
    [TR]
    [TD="class: subvalue, width: 35%"] [/TD]
     			[TD="class: altvalue, width: 65%"]2667 MHz[/TD]
     		[/TR]
    [/TABLE]
    So, i had a few memtest runs - ECC errors only happening while Test #0 and Test #1.
    The tests i did:
    • all 4 ram modules
    • only 2 ram modules in dual channel configuration (first A, then B)
    • only one ram module, tried all ram slots
    Example results:
    All modules:

    Code:
    [B]Result summary[/B]
    
    [TABLE]
    [TR]
    [TD="class: value, width: 35%"]Test Start Time[/TD]
     			[TD="class: altvalue, width: 65%"]2021-06-06 13:13:16[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Elapsed Time[/TD]
     			[TD="class: altvalue, width: 65%"]2:46:42[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Memory Range Tested[/TD]
     			[TD="class: altvalue, width: 65%"]0x0 - 1075800000 (67416MB)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]CPU Selection Mode[/TD]
     			[TD="class: altvalue, width: 65%"]Parallel (All CPUs)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]CPU Temperature Min/Max/Ave[/TD]
     			[TD="class: altvalue, width: 65%"]31C/36C/34C[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]RAM Temperature Min/Max/Ave[/TD]
     			[TD="class: altvalue, width: 65%"]52C/62C/57C[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]ECC Polling[/TD]
     			[TD="class: altvalue, width: 65%"]Enabled[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]# Tests Passed[/TD]
     			[TD="class: PASS, width: 65%"]11/11 (100%)[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: value, width: 35%"]ECC Correctable Errors[/TD]
     			[TD="class: altvalue, width: 65%"]66[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]ECC Uncorrectable Errors[/TD]
     			[TD="class: altvalue, width: 65%"]0[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: header, width: 60%"]Test[/TD]
     			[TD="class: header, width: 20%"]# Tests Passed[/TD]
     			[TD="class: header, width: 20%"]Errors[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 0 [Address test, walking ones, 1 CPU][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 1 [Address test, own address, 1 CPU][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 2 [Address test, own address][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 3 [Moving inversions, ones & zeroes][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 4 [Moving inversions, 8-bit pattern][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 5 [Moving inversions, random pattern][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 6 [Block move, 64-byte blocks][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 7 [Moving inversions, 32-bit pattern][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 8 [Random number sequence][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 9 [Modulo 20, ones & zeros][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 10 [Bit fade test, 2 patterns, 1 CPU][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 13 [Hammer test][/TD]
     			[TD="class: altvalue, width: 20%"]0/0 (0%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: header"]Last 10 Errors[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1FC00,0), ECC Corrected: Yes, Syndrome: 00FF, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1FC00,8), ECC Corrected: Yes, Syndrome: 0077, Channel/Slot: 0/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1F280,8), ECC Corrected: Yes, Syndrome: 00AC, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1F280,C), ECC Corrected: Yes, Syndrome: 00DB, Channel/Slot: 0/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1E900,8), ECC Corrected: Yes, Syndrome: 00D5, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1E900,8), ECC Corrected: Yes, Syndrome: 0012, Channel/Slot: 0/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1DF80,0), ECC Corrected: Yes, Syndrome: 00E2, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1DF80,8), ECC Corrected: Yes, Syndrome: 00CF, Channel/Slot: 0/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1D600,8), ECC Corrected: Yes, Syndrome: 0041, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (0,0,1D600,C), ECC Corrected: Yes, Syndrome: 00C5, Channel/Slot: 0/0[/TD]
     		[/TR]
    [/TABLE]
    Single Module:
    Code:
    [B]Result summary[/B]
    
    [TABLE]
    [TR]
    [TD="class: value, width: 35%"]Test Start Time[/TD]
     			[TD="class: altvalue, width: 65%"]2021-06-06 11:19:31[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Elapsed Time[/TD]
     			[TD="class: altvalue, width: 65%"]0:01:01[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]Memory Range Tested[/TD]
     			[TD="class: altvalue, width: 65%"]0x0 - 475800000 (18264MB)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]CPU Selection Mode[/TD]
     			[TD="class: altvalue, width: 65%"]Parallel (All CPUs)[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]CPU Temperature Min/Max/Ave[/TD]
     			[TD="class: altvalue, width: 65%"]30C/30C/30C[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]RAM Temperature Min/Max/Ave[/TD]
     			[TD="class: altvalue, width: 65%"]50C/50C/50C[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]ECC Polling[/TD]
     			[TD="class: altvalue, width: 65%"]Enabled[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]# Tests Passed[/TD]
     			[TD="class: PASS, width: 65%"]4/4 (100%)[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: value, width: 35%"]ECC Correctable Errors[/TD]
     			[TD="class: altvalue, width: 65%"]10[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 35%"]ECC Uncorrectable Errors[/TD]
     			[TD="class: altvalue, width: 65%"]0[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: header, width: 60%"]Test[/TD]
     			[TD="class: header, width: 20%"]# Tests Passed[/TD]
     			[TD="class: header, width: 20%"]Errors[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 0 [Address test, walking ones, 1 CPU][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 1 [Address test, own address, 1 CPU][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 2 [Address test, own address][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [TR]
    [TD="class: value, width: 60%"]Test 3 [Moving inversions, ones & zeroes][/TD]
     			[TD="class: altvalue, width: 20%"]1/1 (100%)[/TD]
     			[TD="class: altvalue, width: 20%"]0[/TD]
     		[/TR]
    [/TABLE]
    [TABLE]
    [TR]
    [TD="class: header"]Last 10 Errors[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,1F200,18), ECC Corrected: Yes, Syndrome: 0063, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,1CC00,8), ECC Corrected: Yes, Syndrome: 00DD, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,1A600,10), ECC Corrected: Yes, Syndrome: 00FF, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,18000,8), ECC Corrected: Yes, Syndrome: 00F9, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,15A00,8), ECC Corrected: Yes, Syndrome: 007F, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,13400,10), ECC Corrected: Yes, Syndrome: 009E, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,10E00,8), ECC Corrected: Yes, Syndrome: 00E5, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,10000,8), ECC Corrected: Yes, Syndrome: 003C, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 1, (Rank,Bank,Row,Col): (2,0,1BA00,0), ECC Corrected: Yes, Syndrome: 0050, Channel/Slot: 1/0[/TD]
     		[/TR]
    [TR]
    [TD="class: value"][ECC Error] Test: 0, (Rank,Bank,Row,Col): (2,0,10000,0), ECC Corrected: Yes, Syndrome: 00CE, Channel/Slot: 1/0[/TD]
     		[/TR]
    [/TABLE]
    The above errors are happening for all modules, regardless in which slot they are seated. I think the problem is not the RAM, maybe the CPU / Mainboard is fried.

    Any ideas ?

  • #2
    There is a known BIOS bug with i3200 chipsets
    https://bugzilla.redhat.com/show_bug.cgi?id=564274
    Final comment was, "Some i3210 BIOSes have problems enabling the hardware checks at the MCU. On those hardware, customers should try to disable Quickboot and / or "Memory Remap Feature" or to disable EDAC drivers.

    This isn't the exact model you have, but behaviour sounds similar.

    Comment


    • #3
      Hello David,

      thanks for your quick reply. The 4x runs without #13 Hammer Test finished without errors. After that, i disabled Fastboot and Memory Remap in BIOS and tried a short run of Test #0 + #1 - the same errors are still happening. So i am quite uncertain about the relevance for the stable operation of the system. The behaviour sounds similar to the bug report, but the server is now around three years old and i never noticed these errors before - which is a bit strange.

      Comment


      • #4
        Sorry for doubleposting, but i can't edit the previous post anymore.

        I replaced the PSU and the CMOS battery on the failed system, then i booted it with all drives disconnected. The ecc errros are still appearing. My workstation supports ECC RAM (Ryzen 5950x), so i switched the RAM between my workstation and the server. On the workstation with the "failed" ram from the server there are so far no errors, the non-ecc (workstation) ram in the server is testing fine currently, too. So i am just more confused. Probably the CPU or the RAM controller on the server may be faulty. Or an incompabitility between the servers ram and bios, but it would be strange after three years of operation without problems...

        Comment

        Working...
        X