Hello,
I just purchased a new server motherboard, CPU, and memory combo so I am running Memtest prior to deploying an OS on it. I believe there is an issue with the RAM that I purchased because I saw some ECC errors within the first 10 minutes of testing. I pulled these RAM modules out and decided to put in a set of known good RAM from another system just to confirm that everything is all good with the CPU and motherboard. These RAM modules survived 4 passes of Memtest in the system they live in full time.
During test #8 on the known good RAM, I saw an error that I never saw before. "[UEFI Firmware Error] Could not start CPU X". This error was repeated twice for CPUs 8 and 9. I let the tests run up until test #13 and then stopped them. I saved the logs and copied them over to my desktop. The HTML log is provided below, and the .log file is attached to this post. Looking through the .log file, it seems that there were many related errors that were thrown for other CPU cores during this test that did not appear in the UI.
After reading through the available documentation and doing some googling, I believe I understand the gist of the issue. There is a bug in the UEFI firmware that is causing issues with this test. With this being a brand new system (to me, motherboard is brand new but CPU is used and the RAM that was in the system when these errors occurred is known good as I said), these results are still a bit worrying. There are admittedly very few results when searching for others who have experienced this problem, so I have a few questions:
1. Would this UEFI firmware bug cause system instability when running a traditional OS?
2. Could memtest actually be finding a hardware failure related issue here that is being misinterpreted as a UEFI bug?
3. I am now running Memtest again on 4 of the 8 new RAM modules as I try to identify the faulty one (or more). This first pass just made it past test #8 without any firmware bug issues appearing in the UI. Does it make sense that this bug is more likely to arise when all 8 RAM slots are populated vs just half of them?
4. Is there any additional information I could provide or tests that I could run that would help us all understand this issue better? I am happy to help however else I can.
I will be running more tests with the RAM I purchased as I try to identify the faulty modules, and will follow up with additional results of further runs on this system.
Summary
System Information
Result summary
I just purchased a new server motherboard, CPU, and memory combo so I am running Memtest prior to deploying an OS on it. I believe there is an issue with the RAM that I purchased because I saw some ECC errors within the first 10 minutes of testing. I pulled these RAM modules out and decided to put in a set of known good RAM from another system just to confirm that everything is all good with the CPU and motherboard. These RAM modules survived 4 passes of Memtest in the system they live in full time.
During test #8 on the known good RAM, I saw an error that I never saw before. "[UEFI Firmware Error] Could not start CPU X". This error was repeated twice for CPUs 8 and 9. I let the tests run up until test #13 and then stopped them. I saved the logs and copied them over to my desktop. The HTML log is provided below, and the .log file is attached to this post. Looking through the .log file, it seems that there were many related errors that were thrown for other CPU cores during this test that did not appear in the UI.
After reading through the available documentation and doing some googling, I believe I understand the gist of the issue. There is a bug in the UEFI firmware that is causing issues with this test. With this being a brand new system (to me, motherboard is brand new but CPU is used and the RAM that was in the system when these errors occurred is known good as I said), these results are still a bit worrying. There are admittedly very few results when searching for others who have experienced this problem, so I have a few questions:
1. Would this UEFI firmware bug cause system instability when running a traditional OS?
2. Could memtest actually be finding a hardware failure related issue here that is being misinterpreted as a UEFI bug?
3. I am now running Memtest again on 4 of the 8 new RAM modules as I try to identify the faulty one (or more). This first pass just made it past test #8 without any firmware bug issues appearing in the UI. Does it make sense that this bug is more likely to arise when all 8 RAM slots are populated vs just half of them?
4. Is there any additional information I could provide or tests that I could run that would help us all understand this issue better? I am happy to help however else I can.
I will be running more tests with the RAM I purchased as I try to identify the faulty modules, and will follow up with additional results of further runs on this system.
Summary
Report Date | 2023-12-03 18:12:00 |
Generated by | MemTest86 V10.6 Free (64-bit) Visit MemTest86.com to Upgrade to Pro |
Result | INCOMPLETE PASS |
EFI Specifications | 2.70 |
System | |
Manufacturer | To Be Filled By O.E.M. |
Product Name | ROMED8U-2T |
Version | To Be Filled By O.E.M. |
Serial Number | To Be Filled By O.E.M. |
BIOS | |
Vendor | American Megatrends Inc. |
Version | P3.40 |
Release Date | 09/26/2022 |
Baseboard | |
Manufacturer | ASRockRack |
Product Name | ROMED8U-2T |
Version | |
Serial Number | BR8PFB000800047 |
CPU Type | AMD EPYC 7282 16-Core |
CPU Clock | 2800 MHz [Turbo: 3200.3 MHz] |
# Logical Processors | 32 (16 enabled for testing) |
L1 Cache | 32 x 64K (151591 MB/s) |
L2 Cache | 32 x 512K (61580 MB/s) |
L3 Cache | 1 x 65536K (13465 MB/s) |
Memory | 262034M (14190 MB/s) |
RAM Configuration | DDR4 ECC 3200MT/s / x16 Channel / 24-22-22-52 / 1.200V |
Number of RAM SPDs detected | 0 |
Number of RAM slots | 8 |
Number of RAM modules | 8 |
DIMM A1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2F1 / 3614D2C9 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM B1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2E1 / 27B8D90B |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM C1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2E1 / 27B8D926 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM D1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2F1 / 3614CF94 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM E1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2F1 / 3614CB66 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM F1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2F1 / 3614D326 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM G1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2E1 / 27B8D924 |
SMBIOS Profile | 3200MT/s 1.2V |
DIMM H1 | 32GB DDR4 2Rx8 ECC PC4-25600 |
Vendor Part Info | Micron Technology / 18ASF4G72PDZ-3G2E1 / 27B8D920 |
SMBIOS Profile | 3200MT/s 1.2V |
Test Start Time | 2023-12-03 15:57:58 |
Elapsed Time | 2:13:49 |
Memory Range Tested | 0x0 - 7FC04000000 (8372288MB) |
CPU Selection Mode | Parallel (All CPUs) |
CPU Temperature Min/Max/Ave | 32C/48C/41C |
ECC Polling | Enabled |
# Tests Completed | 11/48 (22%) |
# Tests Passed | 11/11 (100%) |
Test | # Tests Passed | Errors |
Test 0 [Address test, walking ones, 1 CPU] | 1/1 (100%) | 0 |
Test 1 [Address test, own address, 1 CPU] | 1/1 (100%) | 0 |
Test 2 [Address test, own address] | 1/1 (100%) | 0 |
Test 3 [Moving inversions, ones & zeroes] | 1/1 (100%) | 0 |
Test 4 [Moving inversions, 8-bit pattern] | 1/1 (100%) | 0 |
Test 5 [Moving inversions, random pattern] | 1/1 (100%) | 0 |
Test 6 [Block move, 64-byte blocks] | 1/1 (100%) | 0 |
Test 7 [Moving inversions, 32-bit pattern] | 1/1 (100%) | 0 |
Test 8 [Random number sequence] | 1/1 (100%) | 0 |
Test 9 [Modulo 20, ones & zeros] | 1/1 (100%) | 0 |
Test 10 [Bit fade test, 2 patterns, 1 CPU] | 1/1 (100%) | 0 |
Test 13 [Hammer test] | 0/0 (0%) | 0 |
Comment