We test storage-server platforms in manufacturing using an automated Linux-based test server to PXE boot the DUT that has AMD EPYC processor, DDR4/ECC DIMMS (typically 8/16 populated DIMM slots), and a number of PCIe connected devices/drives. We get RMAs for our Server products with reported memory ECC errors, where corrections are 1-2 per hour. We retest with burn-in test over 24hrs and these are typically no-fault-found (NFF). We can test with an older memory benchmark tool like STREAM and cause/detect ECC errors. All ECCs are correctable.
Does anyone have a preferred configuration for burn-in test that would favor stressing/causing ECC errors? We run the default Cyclic Test today.
Does anyone have a preferred configuration for burn-in test that would favor stressing/causing ECC errors? We run the default Cyclic Test today.
Comment