Hi! I could really do with some practical advice...
I ran memtest86 5.1.0 on a machine that has been crashing fairly regularly - once every two to three weeks, roughly. It reported multiple bit fade errors on its first two runs (1, 2) with all four DIMMs plugged in.
I then tested each pair of DIMMs separately, and then all four DIMMs back again in their original sockets, testing 10 full passes each time - but all tests ran perfectly and without errors. Those bit fade errors seem to have vanished.
Here's the trouble: The machine is deployed as part of a voluntary / community project - it needs to run reliably, 24/7, with no technical support on hand. (And yes, I'm aware that a server class machine with ECC memory would be much more appropriate!) There's literally no budget to replace the memory or entire machine. The memory has a lifetime warranty, but given that I can't narrow down which stick has the intermittent fault, I'm not going to have any luck claiming warranty replacement for all four DIMMs on the basis of failed tests that can no longer be replicated.
Can anyone offer any practical advice on what I should do now? How many successful tests would legitimise simply ignoring the former bit fade errors? Is there anything else I can do to try to narrow down which DIMM is faulty, or even if it's the motherboard, etc?
Any hints would be greatly appreciated.
Cheers!
I ran memtest86 5.1.0 on a machine that has been crashing fairly regularly - once every two to three weeks, roughly. It reported multiple bit fade errors on its first two runs (1, 2) with all four DIMMs plugged in.
I then tested each pair of DIMMs separately, and then all four DIMMs back again in their original sockets, testing 10 full passes each time - but all tests ran perfectly and without errors. Those bit fade errors seem to have vanished.
Here's the trouble: The machine is deployed as part of a voluntary / community project - it needs to run reliably, 24/7, with no technical support on hand. (And yes, I'm aware that a server class machine with ECC memory would be much more appropriate!) There's literally no budget to replace the memory or entire machine. The memory has a lifetime warranty, but given that I can't narrow down which stick has the intermittent fault, I'm not going to have any luck claiming warranty replacement for all four DIMMs on the basis of failed tests that can no longer be replicated.
Can anyone offer any practical advice on what I should do now? How many successful tests would legitimise simply ignoring the former bit fade errors? Is there anything else I can do to try to narrow down which DIMM is faulty, or even if it's the motherboard, etc?
Any hints would be greatly appreciated.
Cheers!
Comment