Hi all,
I'm new here. I've built several personal PCs in the past, but this is the first time using Memtest86 on a workstation-like machine that I just finished building. Everything in the machine is new. The only parts coming from eBay were the dual Epyc CPUs (reputable seller who states that the CPUs were not used), and the 2 RAM sticks of 64gb each (another reputable seller who also stated the RAM is brand new). Regardless, I thought it'd be prudent to do thorough testing upon building the machine to make sure everything is ok.
It all posted successfully on first try, and I've barely used the machine - the only thing I changed was the RAM speed setting in the BIOS from "auto" to "3200", which is the supported/stated speed of both the RAM and the CPUs/motherboard. I also made sure to have all fans set to full speed.
I've tried to run Memtest86 twice now, and both times the test was unable to complete and my system just rebooted before it was over (and I caught when it happened the second time). In the second attempt, it had 4 errors during pass 3 / 4 (I believe test 7 and . It was probably on test 10 of pass 3 / 4 that the whole thing just stopped entirely and rebooted.
Here's a snippet of Memtest86 after the initial errors, but before it all rebooted:
I unfortunately don't have the errors themselves, but do remember they were in the format of:
Test 8, CPU 15, Address: XXXXX, Expected: XXXXX, Actual: XXXXXX
Ok so there are some unexpected/annoying errors, but I am not sure how to deal with the whole Memtest86 not being able to finish (and it's slow, it was at 18-19 hours pass 3 / 4 before stopping on its own. I planned on filling this up with 1TB eventually)
From the IPMI, you can see when the initial errors happened on tests 7 and 8 (18:20:34), and then when the error(s) happened on likely test 10 that caused the system to reboot (20:33:56). Further below you can also see the health status for the memory sticks, though they went back to green for both sticks after the reboot.
I am also attaching here the logs that I found in the USB flash drive after the reboot MemTest86-20240701-022233.txt, with date modified at 8:32pm (so presumably right before it crashed) -- but I'm not sure what to look for in here, if anything. There are no results outputted anywhere because Memtest86 never got to the end.
CPU temps and whatnot were at most ~53C but stayed in the mid-40s most of the time, including shortly before the reboot. The CPUs idle at around 32-35C and I have all fans set to full speed for this. I'm fairly certain CPU overheating was not a problem. I also opened it up right after the reboot and did not feel major heat anywhere in the system.
And this might be unrelated, but after the reboot the following 2 errors showed up:
"Entry Point Not Found - The procedure entry point GetTempPath2W could not be located in the dynamic link library C:\Windows\system32\spool\DRIVERS\x64\3\FXSTIFF.dl l"
"Entry Point Not Found - The procedure entry point __CxxFrameHandler4 could not be located in the dynamic link library C:\Windows\system32\spool\DRIVERS\x64\3\FXSUI.dll"
Lastly, I am not sure what these represent, but quite a few of them show up even though they don't increase the actual error count. So I assumed this was ok? I do have 14 empty RAM slots and have just 1 stick per CPU.
[ECC Errors] Test: 4 Channel-Slot: 0-X
Specs:
I'd greatly appreciate any help or advice, if anyone has any ideas what might be happening.
I'm new here. I've built several personal PCs in the past, but this is the first time using Memtest86 on a workstation-like machine that I just finished building. Everything in the machine is new. The only parts coming from eBay were the dual Epyc CPUs (reputable seller who states that the CPUs were not used), and the 2 RAM sticks of 64gb each (another reputable seller who also stated the RAM is brand new). Regardless, I thought it'd be prudent to do thorough testing upon building the machine to make sure everything is ok.
It all posted successfully on first try, and I've barely used the machine - the only thing I changed was the RAM speed setting in the BIOS from "auto" to "3200", which is the supported/stated speed of both the RAM and the CPUs/motherboard. I also made sure to have all fans set to full speed.
I've tried to run Memtest86 twice now, and both times the test was unable to complete and my system just rebooted before it was over (and I caught when it happened the second time). In the second attempt, it had 4 errors during pass 3 / 4 (I believe test 7 and . It was probably on test 10 of pass 3 / 4 that the whole thing just stopped entirely and rebooted.
Here's a snippet of Memtest86 after the initial errors, but before it all rebooted:
I unfortunately don't have the errors themselves, but do remember they were in the format of:
Test 8, CPU 15, Address: XXXXX, Expected: XXXXX, Actual: XXXXXX
Ok so there are some unexpected/annoying errors, but I am not sure how to deal with the whole Memtest86 not being able to finish (and it's slow, it was at 18-19 hours pass 3 / 4 before stopping on its own. I planned on filling this up with 1TB eventually)
From the IPMI, you can see when the initial errors happened on tests 7 and 8 (18:20:34), and then when the error(s) happened on likely test 10 that caused the system to reboot (20:33:56). Further below you can also see the health status for the memory sticks, though they went back to green for both sticks after the reboot.
I am also attaching here the logs that I found in the USB flash drive after the reboot MemTest86-20240701-022233.txt, with date modified at 8:32pm (so presumably right before it crashed) -- but I'm not sure what to look for in here, if anything. There are no results outputted anywhere because Memtest86 never got to the end.
CPU temps and whatnot were at most ~53C but stayed in the mid-40s most of the time, including shortly before the reboot. The CPUs idle at around 32-35C and I have all fans set to full speed for this. I'm fairly certain CPU overheating was not a problem. I also opened it up right after the reboot and did not feel major heat anywhere in the system.
And this might be unrelated, but after the reboot the following 2 errors showed up:
"Entry Point Not Found - The procedure entry point GetTempPath2W could not be located in the dynamic link library C:\Windows\system32\spool\DRIVERS\x64\3\FXSTIFF.dl l"
"Entry Point Not Found - The procedure entry point __CxxFrameHandler4 could not be located in the dynamic link library C:\Windows\system32\spool\DRIVERS\x64\3\FXSUI.dll"
Lastly, I am not sure what these represent, but quite a few of them show up even though they don't increase the actual error count. So I assumed this was ok? I do have 14 empty RAM slots and have just 1 stick per CPU.
[ECC Errors] Test: 4 Channel-Slot: 0-X
Specs:
Motherboard | Supermicro H12DSI-N6 |
CPU 1 | AMD Epyc 7532 |
CPU 2 | AMD Epyc 7532 |
CPU Fan 1 | Arctic Freezer 4U-M |
CPU Fan 2 | Arctic Freezer 4U-M |
RAM | Hynix DDR4 64GB 3200 RDIMM PC4-25600 RDIMM ECC Registered (2x64GB for 128GB total) |
Storage | Crucial P3 Plus 4TB SSD |
Storage | WD Ultrastar DCHC550 16TB HDD |
PSU | Corsair HX1500i 80 Plus Platinum 1500W |
OS | Windows 11 Enterprise |
I'd greatly appreciate any help or advice, if anyone has any ideas what might be happening.
Comment