Announcement

Collapse
No announcement yet.

Dual-CPU scores vs single (same) CPU scores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dual-CPU scores vs single (same) CPU scores

    Hello,

    I am thinking of buying a used HP D800 workstation with dual Xeons for some (multithreaded/parallel) scientific simulations. So I am looking at your scores trying to guess as confidently as possible the performance I can be expecting from it. Looking at your Multiple-CPU scores and the corresponding same, single CPU scores, I am in generally seeing that adding a second identical CPU to the mix does not not double the score, but rather suggests a much lower gain.

    For example a [Dual CPU] Intel Xeon E5-2697 v3 @ 2.60GHz currently scores 30106, while this processor scores 22143 on its own, that's a 1.36 performance gain for the dual CPU. Or, looking at the processor I am actually looking at, the [Dual CPU] Intel Xeon X5650 @ 2.67GHz, it scores 11675, while the same single CPU scores 7599, so 1.53 speed up.

    So one question is: Are the scores in the two charts (single CPU and Multiple CPU) directly comparable? Adding a second (very expensive!) CPU and getting a mere 36% gain from it just doesn't sound great somehow.

    Another more general question is this: Suppose we have a multithreaded application whose performance scales linerly up to 8 cores/threads with a constant say 75% parallel efficiency. So using 4 cores should give a 3x speed up over a single core, using 8 cores should make it 6 times faster. Now let's say we have an 8-core and two quad-cores, their cores having identical single-threaded performance. Would the 8-core chip always significantly overperform a dual CPU made up from the two quad-cores for the supposed application? (because of memory latency issues I understand). Or put another way, when trying to guess my possible gain in multithreaded simulation performance over my current i7-920 (scoring around 5000), should I be roughly comparing to 11675 (the dual Xeon score), or closer to the double of each of those Xeons, i.e. 15200?

    For all that let's suppose that other factors, like memory speed/bandwidth don't favour one or the other.

    Thanks

  • #2
    Yes, the scores are comparable.
    Adding a 2nd CPU never doubles performance. It is also the same for going from 4 cores to 8 cores.

    There are often shared resources, like the bus, cache and main RAM. How much benefit you get depends on the machine architecture and the type of software you will be running on the hardware. One would expect scientific simulation software to be reasonably well threaded, but if the code is bottlenecked by the RAM or disk or network connection, then you might see no benefit at all. Linear scaling as cores increase isn't common. Normally at some point there is a bottleneck, or communication overhead.

    The Xeons often use ECC RAM as well. Which can be a bit slower that the normal DRAM used by a i7-920.

    The only way to really be sure if to runs some tests on the exact simulation software you are planning on using. (Some simulation software can also run on GPUs as well, so don't forget to consider that).

    Comment


    • #3
      Thanks for the reply David.

      I guess I should have put it slightly differently: does doubling the cores within a single CPU lead to a significantly higher speed up than doubling the CPUs, or they'd both lead to approximately the same (reduced) gain, other things being equal?

      I have written the software so I can verify it's reasonably well threaded, but cannot use the GPU. I have no reason to think it wouldn't scale well up to 8-12 cores as the chunks of work would still be quite large at each iteration and communication cost should be a very small percentage still. It's not RAM-intensive and disk and network connections are not used. It's pure computation.

      Yes I did think of sending an exe to the ebay seller and asking him to run it and report back, but not sure he will do it. Another alternative would be to buy a liquid cooler and get a 3.5/2.67=30% speed up overclocking my i7 920, though not sure it will hold up as it will still be required to run at 100% load for up to a week straight..

      Originally posted by David (PassMark) View Post
      Yes, the scores are comparable.
      Adding a 2nd CPU never doubles performance. It is also the same for going from 4 cores to 8 cores.

      There are often shared resources, like the bus, cache and main RAM. How much benefit you get depends on the machine architecture and the type of software you will be running on the hardware. One would expect scientific simulation software to be reasonably well threaded, but if the code is bottlenecked by the RAM or disk or network connection, then you might see no benefit at all. Linear scaling as cores increase isn't common. Normally at some point there is a bottleneck, or communication overhead.

      The Xeons often use ECC RAM as well. Which can be a bit slower that the normal DRAM used by a i7-920.

      The only way to really be sure if to runs some tests on the exact simulation software you are planning on using. (Some simulation software can also run on GPUs as well, so don't forget to consider that).

      Comment


      • #4
        does doubling the cores within a single CPU lead to a significantly higher speed up than doubling the CPUs
        Multiple CPUs should generally do slightly better, as they often get their own memory bus and own set of cache memory. But then it depends if the software uses NUMA or not as well.

        Heat dissipation is often the limiting factor for CPU performance. So if you double the core count, you often have to reduce the clock speed. Having multiple CPUs increases the heat sink surface area(s) to dissipate the heat and allow higher clock speeds.

        Comment


        • #5
          That's interesting, I thought it would be the opposite, having read that there’s less latency for a say 8-core CPU because the cores can communicate more quickly, as they’re all on the same chip, compared to say two quad-cores seating on different sockets..Didn't know about NUMA to be honest (just read a little about it), but for sure it depeds on how often the different threads need to communicate and access memory across CPUs. I'm a bit confused now, but I got the good point that more cores on the same chip of the same technology will have reduced clock speed for overheating reasons.

          Originally posted by David (PassMark) View Post
          Multiple CPUs should generally do slightly better, as they often get their own memory bus and own set of cache memory. But then it depends if the software uses NUMA or not as well.

          Heat dissipation is often the limiting factor for CPU performance. So if you double to core count, you often have to reduce the clock speed. Having multiple CPUs increases the heat sink surface area(s) to dissipate the heat and allow higher clock speeds.

          Comment


          • #6
            In the end it depends on the software being run. How it uses memory, if that memory get cached, if the algorithms need to work on RAM that is being used by other threads or if each 'package' of work is independent, the type of instructions being used (especially with hyperthreading).

            Situation is complex enough that it is hard to predict without doing some experimentation with the actual software on actual hardware.

            Comment


            • #7
              Originally posted by David (PassMark) View Post
              In the end it depends on the software being run. How it uses memory, if that memory get cached, if the algorithms need to work on RAM that is being used by other threads or if each 'package' of work is independent, the type of instructions being used (especially with hyperthreading).

              Situation is complex enough that it is hard to predict without doing some experimentation with the actual software on actual hardware.
              David, what do you think about these results http://www.spec.org/cpu2017/results/cpu2017.html ?
              In computational tasks, tests show almost x2 performance when you set x2 CPUs, since these tasks do not go to any bottlenecks.

              Talking about Xeon server processors, and typical server tasks - serving user requests as web server or even as SQL server (if no memory/disk overloads) - bottleneck is the CPU queue. Thus putting x2 CPUs should raise permonce significantly (x1.8-1.9, up to x2) (if with new CPU you wont hit memory/disk throughput).

              Many passmark benchmarks shows now shows only 25-30% with dual-cpu systems. I would assume that your benchmarking software has a kind of bottleneck in it.

              Comment


              • #8
                since these tasks do not go to any bottlenecks
                R271-Z31 (AMD EPYC 7601, 2.20 GHz) - 1 CPU - Score 134
                R271-Z31 (AMD EPYC 7601, 2.20 GHz) - 2 CPUs - Score 267

                Yes, results I don't know the details of the test they are using, but as the results are nearly exactly double it means that the tests don't use any significant amounts of RAM, Disk, video, network or anything else. The algorithms are perfectly threaded and held entirely in the CPU cache. I don't think this is very realistic. For a web server (serving files) you should only need a few CPU cores before the disk and network become a bottle neck. SQL uses a ton of RAM, and typically has record locking issues at high throughput. So assuming the entire database is all held in the CPU cache (on all cores) doesn't make sense.

                EDIT, Feb 2020: Additional discussion on the scaling issue can be found in this topic.

                Comment

                Working...
                X