Announcement

Collapse
No announcement yet.

Address decoding scheme

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Address decoding scheme

    Any advice on determining the address deciding scheme for particular servers? We’re using the hpe proliant dl380 gen9 as our host.
    Additionally, if we are able to determine the proper values for the ADDR2XXBITS could you provide an example output MemTest would report?

  • #2
    We don't have example values for ADDR2CHBITS, ADDR2SLBITS, etc.. for your system. Sorry.

    But by co-incidence we have been looking at this problem for the last 3 months.
    What we are sure of is that,
    1. It is complex. Very complex
    2. It is secret. Intel don't tell anyone, even under a NDA.
    3. It is different for DDR5, DDR4 and different between AMD & Intel. And also different for different CPU families
    4. It is different for different memory configurations (e.g dual channel vs single channel vs Quad channel). Plus there is different capacities and ranks to consider. And it probably more complex again for servers with more that one socket.
    5. As far as we are aware almost(*) no one in the world has been able to reverse engineer it
      (*) We are aware of one company that claims to do it for a wide range of DDR3/4 systems, but they also charge $100 USD per address lookup.
    So a general solution that works with all systems is very unlikely.

    Having said that we have made some progress on DDR5 decoding with Intel 12th Gen chips for one motherboard (one specific hardware combination, after months of work ).

    We hope to eventually to have an interface like the screen shot below. Were we can go down to the chip level to identify a fault. We are still testing this however and aren't yet entirely convinced what we have done is correct. Should have more news in a few weeks.


    RAM Address Decode chip level




    Comment


    • #3
      Hi David,
      Thank you for the excellent response (well responses as I've started several threads lately). Based on the unlikelihood that I will somehow the magic decoding bits, do you happen to know why these systems happen to combine all their memory ranges in the SMBIOS?
      Code:
      Getting SMBIOS data from sysfs.
      SMBIOS 2.8 present.
      
      Handle 0x0024, DMI type 19, 31 bytes
      Memory Array Mapped Address
          Starting Address: 0x00000000000
          Ending Address: 0x0007FFFFFFF
          Range Size: 2 GB
          Physical Array Handle: 0x000A
          Partition Width: 1
      
      Handle 0x0025, DMI type 19, 31 bytes
      Memory Array Mapped Address
          Starting Address: 0x0000000100000000k
          Ending Address: 0x000000C07FFFFFFFk
          Range Size: 766 GB
          Physical Array Handle: 0x000B
          Partition Width: 1
      That would be my next attempt to mapping the Data Errors memtest finds to a serial if I can figure out the memory address per DIMM in a live linux env.

      Comment


      • #4
        As per the Linux documentation on "dmidecode"
        "More often than not, information contained in the DMI tables is inaccurate, incomplete or simply wrong"

        The split into two ranges seems arbitrary. Probably historic as there are a lot of hardware devices that get mapped into the 1st GB of RAM. I did a quick search for you, but didn't find an coherent description of how SMBIOS determines these ranges. Might be one of those things where you need to read the UEFI BIOS source code to find out what is really going on. What I am sure about however is that the two memory ranges don't reflect any physical reality. The real memory map is way more complex that that. It looks more like this in real life.

        Comment

        Working...
        X