Announcement

Collapse
No announcement yet.

Memtest 7.5 Hammer errors always at 0x50000, bug in memtest?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memtest 7.5 Hammer errors always at 0x50000, bug in memtest?

    Hi,

    I already write with passmark support through email because SPD is not readable from passmark (and from linux).

    But I think this problem should be discussed in a forum.

    I always get these errors:
    Test Start Time 2018-06-25 09:12:30
    Elapsed Time 0:04:00
    Memory Range Tested 0x0 - 26E800000 (9960MB)
    CPU Selection Mode Parallel (All CPUs)
    ECC Polling Enabled
    # Tests Passed 0/1 (0%)
    Lowest Error Address 0x50000 (0MB)
    Highest Error Address 0x50214 (0MB)
    Bits in Error Mask 00000000FFFFFFFF
    Bits in Error 32
    Max Contiguous Errors 2
    Test # Tests Passed Errors
    Test 13 [Hammer test] 0/1 (0%) 2010
    Last 10 Errors
    [Data Error] Test: 13, CPU: 0, Address: 50214, Expected: 168571FD, Actual: 16000000
    [Data Error] Test: 13, CPU: 0, Address: 50210, Expected: 2F783C42, Actual: 008792A3
    [Data Error] Test: 13, CPU: 0, Address: 5020C, Expected: 3C0B9A2B, Actual: 17000000
    [Data Error] Test: 13, CPU: 0, Address: 50208, Expected: A8DE6188, Actual: 0083BDAF
    [Data Error] Test: 13, CPU: 0, Address: 50204, Expected: BE0A6069, Actual: 53000000
    [Data Error] Test: 13, CPU: 0, Address: 50200, Expected: 01C2121E, Actual: 00000000
    [Data Error] Test: 13, CPU: 0, Address: 501FC, Expected: B552B837, Actual: 02000000
    [Data Error] Test: 13, CPU: 0, Address: 501F8, Expected: 389FA784, Actual: 00820D50
    [Data Error] Test: 13, CPU: 0, Address: 501F4, Expected: 82CFD915, Actual: 00000000
    [Data Error] Test: 13, CPU: 0, Address: 501F0, Expected: CFB00F3A, Actual: 00000000

    It's always almost perfectly identical except for a few actual values and some of the expected values.
    Which could mean there is some data written by someone else.
    Could it be that BIOS or memtest86 uses this memory area but hammer test does not know about that?

    I already tried:
    - using different brand and size RAM
    - using any combination of slots
    - using any combination of modules (except mixing different modules or using two modules in single channel mode)
    - updating BIOS

    Gigabyte says the CPU might be faulty because that's where the memory controller is. I doubt that.

    Hardware:
    Core i3-8100
    Gigabyte B360M D3H with B360 chip and latest BIOS F4
    400 W beQuiet power supply
    2x8GB Corsair CMK16GX4M2A2400C16R from QVL
    2x4GB GSkil F4-2400C15D-8GNT

    I tried doing a bitwise AND / XOR / OR over all addresses affected. But I am not sure if that means anything.

    AND: 0x50000
    XOR: 0x4
    OR: 0x503fc

    what now?

    thanks

  • #2
    I see that ECC RAM is supported by this CPU, but you don't seem to be use ECC compatible RAM? Using ECC RAM would tell you immediately if it was a RAM error or something else.

    Errors in the hammer test are normally 1 or 2 bit errors. And the addresses are normally more random and less sequential. So your pattern of errors and the low memory address would point to another source.

    We aren't aware of any bugs in the test that would cause this.

    I would be tempted to blame a bug in the BIOS memory map. Resulting in memory being tested that shouldn't be. But if that was the case one would think that other tests would also produce an error. Or maybe they do, you only seem to be running Test #13. So that would explain the lack of errors in other tests.



    Comment


    • #3
      Thanks.
      I believe the board does not support ECC. You may insert ECC memory, but it won't use ECC.
      All other tests don't show any errors. All other memtest software I tried does not show any error.

      After a lot of trial and error I think you are right. It's a memory area used by the BIOS but not reported. I added it to BADRAM, so that Linux knows that it should not use this area.
      But if it's an area used by the BIOS, why does only the hammer test fail?

      Gigabyte provided a BIOS update which makes memtest86 halt(crash) at 0% test2, 0% test3 (if 2 disabled), test 13 (if all previous disabled). Great. Everything else is still fine.

      I tried to convince Gigabyte support to check their memory map, but... well. the support doesn't understand a word of what I am writing.

      The entry in die Linux BIOS map regarding this area seems to be from my Grub BADRAM parameter. So it's no proof that BIOS memory map INT delivers different data than UEFI GetMemoryMap (well, actually it changes a lot, probably because of BIOS settings. but not regarding the area in question)

      Comment

      Working...
      X