Announcement

Collapse
No announcement yet.

low read Uncache values

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • low read Uncache values

    Hi Experts,

    We have the following issue with slow RAM on ibm server.
    When the same test is repeated on laptop the read Uncache is pretty good.

    Laptop 3GB(windows 7), Server(server 2012) 512 GB QUAD CPU with NUMA enabled. No option to disable NUMA.

    Laptop (read Uncache: 12029)
    Click image for larger version

Name:	laptop.jpg
Views:	1
Size:	59.3 KB
ID:	35179


    Server(read Uncache: 2574)
    Click image for larger version

Name:	server.jpg
Views:	1
Size:	60.8 KB
ID:	35180

  • #2
    What is the hardware in the server? It might just be that you have a slow machine. If you have ECC RAM, it tends to be slower than non-ECC ram.

    Comment


    • #3
      low read Uncache values

      Originally posted by David (PassMark) View Post
      What is the hardware in the server? It might just be that you have a slow machine. If you have ECC RAM, it tends to be slower than non-ECC ram.
      Yes. It is ECC (Samsung DDR3-1066 Mhz 16GB 4Rx4 PC3L-8400R-07-11-AB1-D3 M393B2K70DM0-YF8 )
      IBM System x3850 X5
      4 * Xeon E7- 4870
      32 * 16 GB= 512 Gb RAM

      Even tried by reducing to 2 * 16GB =32GB per CPU socket (Total 32GB * 4 sockets= 128GB)
      This server is NUMA enabled and does not have an option to disable it.

      How can i convince the customer with the servers actual performance?

      Comment


      • #4
        DDR3-1066E is one of the slowest DDR3 options. The results do seem pretty low however considering how recent the CPU is.

        Is there any information in BIOS for RAM timings / speed selection?
        Is it possible to try 1 CPU and just 32GB of RAM.

        Comment


        • #5
          low read Uncache values

          Originally posted by David (PassMark) View Post
          DDR3-1066E is one of the slowest DDR3 options. The results do seem pretty low however considering how recent the CPU is.

          Is there any information in BIOS for RAM timings / speed selection?
          Is it possible to try 1 CPU and just 32GB of RAM.
          This server is compatible DDR3-1333 as well. Will the upgrade bring significant improvement?
          But we could not afford additional cost which is involved with this.

          BIOS does not have any option to select RAM timings / speed.

          We thought of trying with 1 CPU option, but it needs entire mother board change. So not an option.

          Even i tried this on VMs in vmware infrastructure(2003,2008,2012), they all have same values of around 2500.

          Could not understand whether its problem with ECC or Numa.

          Comment


          • #6
            It is probably both.

            EEC is well know to slow down the RAM. You trade off some speed for the error correction.

            It is logical that NUMA might also seems to have an effect. But we don't have any data so know the exact impact.

            There might also be some other factor. I looked at the results from some other Quad CPU systems and none of them perform really well (e.g. E5-4617, E7-4850).

            Do you have all the RAM sticks in the optimal slots to allow the system to get into dual / tri / quad channel mode (whatever its supports).

            Comment


            • #7
              low read Uncache values

              Originally posted by David (PassMark) View Post
              It is probably both.

              EEC is well know to slow down the RAM. You trade off some speed for the error correction.

              It is logical that NUMA might also seems to have an effect. But we don't have any data so know the exact impact.

              There might also be some other factor. I looked at the results from some other Quad CPU systems and none of them perform really well (e.g. E5-4617, E7-4850).

              Do you have all the RAM sticks in the optimal slots to allow the system to get into dual / tri / quad channel mode (whatever its supports).
              Yes. The RAM sticks were configured to use dual channel mode.


              I tried this on ProLiant DL385-G2 DC with
              2 * Dual-Core AMD Opteron(tm) Processor 2210 HE
              4 * 2 = 8GB ddr3 ram ecc


              The read uncache value is around 2000.
              This server has no numa option.
              Now numa being eliminated, the only component that is common for all these low value tests is ECC ram.

              Comment


              • #8
                It isn't totally conclusive.

                The current AMD CPUs don't have good memory controllers. Compare these charts
                http://www.memorybenchmark.net/read_..._ddr3_amd.html
                http://www.memorybenchmark.net/read_...dr3_intel.html

                Comment


                • #9
                  low read Uncache values

                  Originally posted by David (PassMark) View Post
                  It isn't totally conclusive.

                  The current AMD CPUs don't have good memory controllers. Compare these charts
                  http://www.memorybenchmark.net/read_..._ddr3_amd.html
                  http://www.memorybenchmark.net/read_...dr3_intel.html
                  True. But values that i got from my tests were the lowest, when compared with the charts given by you.

                  Comment


                  • #10
                    low read Uncache values

                    Originally posted by krishna123 View Post
                    True. But values that i got from my tests were the lowest, when compared with the charts given by you.
                    Today we added two memory cards per cpu. Earlier each cpu had one memory card.
                    Results:
                    There is no significant change in read Uncache values. But the Memory-Threaded value has been increased from 12000 to 31000.


                    NEW Configuration:
                    Total: 2 cpus and 512 GB ram
                    1st cpu with 2 meory cards of 256 GB(128GB per memory card)
                    2nd cpu with 2 meory cards of 256 GB(128GB per memory card)


                    Old Configuration:
                    Total: 4 cpus and 512 GB ram
                    1st cpu with 1 meory card of 128GB
                    2nd cpu with 1 meory card of 128GB
                    3nd cpu with 1 meory card of 128GB
                    4nd cpu with 1 meory card of 128GB

                    Could you please let us know what made threaded value to increase? Is it by adding 2 memory cards per cpu or by reducing cpus from 4 to 2?
                    Last edited by krishna123; Jan-13-2015, 12:13 PM.

                    Comment


                    • #11
                      We don't have any similar hardware here to experiment with. So it is hard to give any definitive answer.

                      Does the motherboard support dual (or Tri or Quad) channel RAM. As moving from single channel (1 stick) to dual channel (2 sticks) should help.

                      I think it is likely there is a point of diminishing returns. Four CPUs mean 80 test threads. Which is probably way more than required to max out the RAM bandwidth. Also more threads means more non-local memory access, which also probably limits the effectiveness of the CPU cache.

                      In short after a certain point adding new threads only adds to overhead and thrashing and not throughput. (the system has way more CPU power than it has memory bandwidth). But this is just a guess.

                      What you can do as a test it go to the Edit / Preferences window and adjust the number of processes. Then test the speed with 1,2,4,8,16,32 & 64 processes. You might find there is a sweet spot at a lower number of processes.

                      Comment


                      • #12
                        low read Uncache values

                        Originally posted by David (PassMark) View Post
                        We don't have any similar hardware here to experiment with. So it is hard to give any definitive answer. Does the motherboard support dual (or Tri or Quad) channel RAM. As moving from single channel (1 stick) to dual channel (2 sticks) should help. I think it is likely there is a point of diminishing returns. Four CPUs mean 80 test threads. Which is probably way more than required to max out the RAM bandwidth. Also more threads means more non-local memory access, which also probably limits the effectiveness of the CPU cache. In short after a certain point adding new threads only adds to overhead and thrashing and not throughput. (the system has way more CPU power than it has memory bandwidth). But this is just a guess. What you can do as a test it go to the Edit / Preferences window and adjust the number of processes. Then test the speed with 1,2,4,8,16,32 & 64 processes. You might find there is a sweet spot at a lower number of processes.
                        From channel perspective, as shown in the picture each cpu has two memory cards and each card has 128GB of ram. This is the optimal way of configuring memory for this server. There is not doubt about this. But no matter how optimal the memory configuration is and no matter what the cpu count is, all the values remain the same except the memory-threaded value. could you please suggest me any other way to test the memory performance? Or pls provide me a reason for low values on servers and good values on desktop machines, so That I can answer my customer.Thank you. Click image for larger version

Name:	cpu.jpg
Views:	1
Size:	51.6 KB
ID:	34903
                        Last edited by krishna123; Jan-14-2015, 05:50 AM.

                        Comment

                        Working...
                        X