We had a query recently about why 4x Opteron 6272 processors (32 cores total) were not scoring much better than 2x Opteron 6272 processors (16 cores total).
While most of the tests scored 50 to 100% better, 3 tests in particular, Physics, Prime Numbers & Single Threaded, stood were holding the 4x system back from achieving a higher score.
The single threaded test shouldn't improve with more cores, but in this case the score actually seemed to get worse. We speculate that there may be some inefficiency in the task scheduling of systems with a high number of cores. The prime number and physics tests were a bit stranger, showing no improvement at all when they should scale nearly linearly with more cores.
We did some queries on our database to see if this was a common problem. Systems with multiple CPUs are somewhat of a minority to begin with, however multi-cpu systems with such high core counts are rarer again and we didn't actually have much data available to come to any solid conclusion. In fact 4x Opteron 6272 was the only 32-core system we have in the DB.
The above is a sample of what we pulled from the DB, scores are averages across all data for the CPU types.
As can been seen both the AMD examples seemed to exhibit the same problem with the physics and prime number tests, although in the 6234 didn't seem to exhibit the same single threaded issue the 6272 did. This may mean the single threaded issue is something to do with 4x CPU configurations. These were the only AMD CPUs were we had data for multiple CPU count configurations, and even in these cases we only had a small number of results.
For Intel we had quite a bit more data to choose from, however the two CPU types above are fairly representative of the rest. In Intel's case the physics and prime number tests showed a more reasonable improvement from increasing the number of CPUs, it's still not a doubling as might be expected although it's possible that there is a bottle neck elsewhere. The physics case in particular is heavy on memory use and may be saturating the memory bandwidth. Intel CPUs also didn't show any issue with single threaded performance, although we don't have another 4x configuration for comparison.
As for the overall CPU mark, although most of the other tests scored nearly double in all these cases, the algorithm for calculating the overall score is designed to punish low scores in individual tests and prevent a single high scoring test from giving the CPU a really high overall score. A CPU must perform well across the board in order to increase its score.
While most of the tests scored 50 to 100% better, 3 tests in particular, Physics, Prime Numbers & Single Threaded, stood were holding the 4x system back from achieving a higher score.
The single threaded test shouldn't improve with more cores, but in this case the score actually seemed to get worse. We speculate that there may be some inefficiency in the task scheduling of systems with a high number of cores. The prime number and physics tests were a bit stranger, showing no improvement at all when they should scale nearly linearly with more cores.
We did some queries on our database to see if this was a common problem. Systems with multiple CPUs are somewhat of a minority to begin with, however multi-cpu systems with such high core counts are rarer again and we didn't actually have much data available to come to any solid conclusion. In fact 4x Opteron 6272 was the only 32-core system we have in the DB.
Cores | Prime Numbers | Physics | Single Threaded | CPU Rating | |
1x Opteron 6234 | 6 | 28.21879768 | 523.6332 | 827.8508301 | 6971.315 |
2x Opteron 6234 | 12 | 26.18594551 | 508.7367 | 1034.158569 | 10504.77 |
2x Opteron 6272 | 16 | 20.93978214 | 453.0519 | 937.5236816 | 10202.96 |
4x Opteron 6272 | 32 | 21.23987579 | 494.5319 | 765.5333252 | 11028.8 |
1x Xeon E5-2640 | 6 | 36.29001617 | 638.4076 | 1540.534302 | 9844.799 |
2x Xeon E5-2640 | 12 | 49.94214797 | 816.2755 | 1523.405045 | 14692.37 |
1x Xeon E5-2650 | 8 | 40.21228447 | 697.6804 | 1265.354785 | 9857.421 |
2x Xeon E5-2650 | 16 | 53.15676792 | 856.5676 | 1355.614652 | 14301.72 |
As can been seen both the AMD examples seemed to exhibit the same problem with the physics and prime number tests, although in the 6234 didn't seem to exhibit the same single threaded issue the 6272 did. This may mean the single threaded issue is something to do with 4x CPU configurations. These were the only AMD CPUs were we had data for multiple CPU count configurations, and even in these cases we only had a small number of results.
For Intel we had quite a bit more data to choose from, however the two CPU types above are fairly representative of the rest. In Intel's case the physics and prime number tests showed a more reasonable improvement from increasing the number of CPUs, it's still not a doubling as might be expected although it's possible that there is a bottle neck elsewhere. The physics case in particular is heavy on memory use and may be saturating the memory bandwidth. Intel CPUs also didn't show any issue with single threaded performance, although we don't have another 4x configuration for comparison.
As for the overall CPU mark, although most of the other tests scored nearly double in all these cases, the algorithm for calculating the overall score is designed to punish low scores in individual tests and prevent a single high scoring test from giving the CPU a really high overall score. A CPU must perform well across the board in order to increase its score.
Comment