Different tasks (algorithms) scale very differently. Most algorithms make use of disk, RAM, networking or locking semaphores (e.g. databases). All of which quickly become bottlenecks for CPUs with a large core count. You might be adding 300 CPU cores, but are you also adding 300 PCIe lanes with 300 SSDs connected?
Here is another example of algorithm that uses RAM (i.e. can't be entirely kept in the CPU's cache).
Scaling from 1 thread to 2 is perfect (98%). Scaling from 2 to 3 is pretty good (41%). Scaling above 3 doesn't add too much more throughput. Eventually once you get into the hyperthreaded virtual cores performance actually goes backwards.
Table above was done on Ryzen 5 5600X with 32GB DDR4 3600MHz RAM, dual channel (16-19-19-39 timings). The Physics test uses a lot of RAM and the memory controller and RAM module itself is maxed out and quickly becomes a bottleneck. The scaling of the integer test looks a lot better however.
Yes, you could run a bunch of VMs with an instance of PerformanceTest in each one. I don't see how this will avoid memory bandwidth limits however as it is a hardware bottleneck.
Again, these super high core count systems only make sense for very specific software.
Leave a comment: