Announcement

Collapse
No announcement yet.

Memory failure few months after previous RMA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memory failure few months after previous RMA

    I've had a memory problem previously described here. I've got a successful RMA even though it took more than a month to get the memory replaced, but I've got a brand new (Manufactured in May 2022) pair of memory sticks. I didn't run a full memtest86 upon reception of those sticks, but a first hour didn't show any errors, so I've proceeded and re-installed my OS and wiped my drives to get rid of the issues generated by previous memory modules.
    Everything's been running perfectly fine from May 23rd (RMA received) until 2 days ago. Close the work day's end multiple applications simply started crashing with segfaults in system logs. Same night I left the memtest86 running and the results are simply shocking:

    Click image for larger version

Name:	2022-07-13 10-06-42.jpg
Views:	176
Size:	95.1 KB
ID:	53165

    Click image for larger version

Name:	2022-07-13 10-06-58.jpg
Views:	169
Size:	65.7 KB
ID:	53166
    Click image for larger version

Name:	2022-07-13 10-07-10.jpg
Views:	188
Size:	53.1 KB
ID:	53167
    After another day yesterday, in addition to crashing apps, I see my filesystem error count also started to count up (while staying at zero for those almost 2 months).

    While I'll be surely contacting the memory vendor again, this is a quite baffling situation. How can a brand new set of memory got "damaged/corrupted" in less than two months? Does it indicate the problem with Motherboard/CPU?

    I'm not running any overclocking, rather than XMP profile enabled, CPU is running at stock, no PBO enabled.

    Any help or advise is appreciated.
    Attached Files

  • #2
    It does look like a memory fault. (small address range and small number of bits in error).

    So maybe just really unlucky. Or maybe something is electrically damaging the RAM. (Power spikes, static electricity, corrosion on the pins or sockets).

    Comment


    • #3
      Originally posted by David (PassMark) View Post
      It does look like a memory fault. (small address range and small number of bits in error).

      So maybe just really unlucky. Or maybe something is electrically damaging the RAM. (Power spikes, static electricity, corrosion on the pins or sockets).
      The pins look fine from a visual inspection, same for the modules. I've swapped the modules and the error count went a little down to ~2800 and the system is a bit more stable, but the weird part is the memory error address didn't change, which IIUC should've been shifted as it starts from a different module, so maybe it's the slot issue? I cannot try different slot combination in dual-channel mode as my CPU cooler blocks the closes to CPU memory slot

      Comment


      • #4
        I've swapped the modules ....... the memory error address didn't change,
        Memory addresses are interleaved in dual channel mode (in a complex fashion). So swapping the order of the modules won't change the faulty memory address much at all.

        Can you borrow a known good set of RAM to test. Or test 1 stick at a time.

        Comment

        Working...
        X