Announcement

Collapse
No announcement yet.

Memory channels and bench-marking

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Memory channels and bench-marking

    The theory states the more (independent) memory channels the more memory throughput.
    So a CPU with 4 memory channels should have (at least) noticeably better memory performance than a CPU with 2 memory channels supported, of course if equal DIMMs are used and installed on all the memory channels.
    The practice does not confirm that somehow.
    I have 2 machines
    1. i7 based, 2 memory channels are supported, 2 memory slots in motherboard, DDR4-2400s are installed in the both.
    2 Xeon E5-26xx, 4 memory channels are supported, 8 memory slots in motherboard, DDR4-2400s are installed in all of them.

    Memetest86’s memory benchmark shows nearly equal speed on the both machines though

    How it could be? The hypotheses
    - memory interleaving is disabled in UEFI (no evident settings for that though)
    - the memory interleaving step is too high, higher than a maximum memory block involved into the test
    - the memory test specifics, strictly sequential access for instance.
    - CPU cache affects the results

    Could you please explain the behavior?
    Looks like I'm just missing something important.


    Thank you



  • #2
    Xeon is likely using ECC buffered RAM. Which is slower.

    DDR4-2400
    There are many other specifications that are important for the RAM (ranks, timing values, x8, x16, buffered, error correcting, etc..).
    The name "DDR4-2400" doesn't capture all the performance characteristics.
    Even the "2400" can be misleading. As typical RAM has multiple SPD profiles that can be selected by BIOS so your 2400 RAM might not be running at 2400M/T.

    Then there is the CPU's memory controller specifications as well.

    It is more complex than it initially appears.

    Comment


    • #3
      Dear David,

      Thank you for the details!
      >Xeon is likely using ECC buffered RAM. Which is slower.
      Unbuffered in my case

      I tried to check all possible the memory differences and was not able to find anything critical
      (excepting the memory controllers that are rather black-boxes).
      Actually I don’t expect that 4 memory channels could double 2 channels performance,
      I just expected that 4 channels could show a noticeable performance boost being not slower at least.

      However if the test is strictly performance oriented, uses relatively small memory blocks (below memory interleaving step) and single threaded (i.e. sequential)
      all the above is pretty explainable – 4 channels Xeon is even a bit slower 2 channels i7 due to ECC and (potentially) more complex/slower memory controller.

      I'm going to write a small multi-thread test myself for that.
      Maybe I will be able to visualize the multichannel benefits anyhow.
      I could hardly imagine think that memory channels is just a marketing race without any practical benefits.

      Could you please tell me is there a way to check whether the memory interleaving is switched on
      and how to determine the interleaving step (or maybe the step is juts typical/universal)?

      My plan is comparing the entire memory throughput using 4 threads test with memory blocks size above (or better equal) interleaving step on 4 and 2 channels CPUs correspondingly.
      If I see the difference I will report.

      Thank you

      Comment


      • #4
        There is a multi-threaded memory test in PerformanceTest if you would like to use that.

        There is also the Advanced memory tests.

        I could look up the results, but you didn't mention which exact CPU models you are using,

        Comment


        • #5
          Memtest86 and PerformanceTest (including the Advanced memory test) gave nearly the same results

          i7-7567U 2x8GB single rank DDR4-2133 15-15-15-36 @1066 MHz shows ~24500 (maximum value)
          E5-2699 v4 8x8GB single rank DDR4-2400 17-17-17-39 @1200 MHz shows ~21500 (maximum value)
          All in all the Xeon is a bit slower with the memory operations alas (maybe due to ECC and more complex memory controller) even with the potentially faster memory.

          However I’ve found the explanation.
          Reading Xeon’s memory controller registers the memory interleaving is switched off, thus the effective/native memory interleaving is 8 GB (equal to the DIMM size), the same is with i7.
          So any test with memory block < 8GB will occupy only one memory channel in spite of number threads involved into the test.

          Then 40-mins programming gave me a multithreaded test with >8GB memory blocks that
          confirmed the initial hypothesis – memtest86 (and PerformanceTest as well) measures physical memory speed only by design, entire memory subsystem performance is just out of the scope.
          Maybe a next memtest86 version could take it into account.

          I have only 2 minor questions left

          1 DDR4-2133 (for instance) have 17000Mb/s maximum physical speed.
          How the test can reach 24500? Is it just effect of CPU caching? Just curious

          2 Can anything be done with the Xeon to make its DDR4-2400s a bit faster?
          Or is it just normal for ECC machines to be _noticeably_ slower than non-ECC ones?
          What is the typical performance “price” of ECC?

          Comment


          • #6
            I have some strong doubts about your theory.

            If you are running dual channel, or higher, then interleaving,
            1) Is MUCH more complex than you assume.
            2) Happens at MUCH smaller memory address ranges. No such thing as 8GB interleaving as far as I know.

            We've just been working on memory address decode. See
            https://www.memtest86.com/tech_DIMM_Decoding.html
            And the mapping of addresses to sticks & chips is way more complex than you might expect.

            Comment

            Working...
            X