I have been writing multi-threaded code for statistical analysis that also provides benchmark information (and may run for days or weeks).
Having observed the performance of both Intel and AMD CPUs while running 0.5x, 1x and 2x available threads, I find vast differences in thread scaling depending on OS and CPU.
I notice the same effect in the PassMark CPU Benchmark, where disabling SMT on an AMD (5950X) can result in massive performance increases in certain sub-tests (e.g., Find Prime Numbers and Physics), placing me in the top 99.9%+ of global results.
I have also noticed in my own code that doubling the number of threads I issue beyond what is available on Intel w/HT enabled results in almost no performance loss, while on AMD there is a massive performance loss when doubling threads.
Halving the number of threads beyond what is available on AMD w/SMT enabled results in a substantial performance increase, similar to what I see in the PassMark results with max threads cut in half, and another 10% increase again if I just disable SMT.
An additional note: With SMT enabled on AMD and running 0.5x threads (under WSL1) to maximize performance, the Windows 11 thread scheduler will pack threads into sequential virtual cores after a few hundred seconds or if the terminal window looses focus. I have to run Caffeine64 and keep the WSL window in focus to prevent this thread packing that reduces performance by almost 50%. Instead of this, I have opted to disable SMT to gain an extra 10% performance.
Given the above, for a future PassMark release, it would be nice to have a CPU thread scaling benchmark subtest that provides results for 0.5x, 1x, and 2x available threads so that programmers and end users can better understand how to make best use of their CPU.
Having observed the performance of both Intel and AMD CPUs while running 0.5x, 1x and 2x available threads, I find vast differences in thread scaling depending on OS and CPU.
I notice the same effect in the PassMark CPU Benchmark, where disabling SMT on an AMD (5950X) can result in massive performance increases in certain sub-tests (e.g., Find Prime Numbers and Physics), placing me in the top 99.9%+ of global results.
I have also noticed in my own code that doubling the number of threads I issue beyond what is available on Intel w/HT enabled results in almost no performance loss, while on AMD there is a massive performance loss when doubling threads.
Halving the number of threads beyond what is available on AMD w/SMT enabled results in a substantial performance increase, similar to what I see in the PassMark results with max threads cut in half, and another 10% increase again if I just disable SMT.
An additional note: With SMT enabled on AMD and running 0.5x threads (under WSL1) to maximize performance, the Windows 11 thread scheduler will pack threads into sequential virtual cores after a few hundred seconds or if the terminal window looses focus. I have to run Caffeine64 and keep the WSL window in focus to prevent this thread packing that reduces performance by almost 50%. Instead of this, I have opted to disable SMT to gain an extra 10% performance.
Given the above, for a future PassMark release, it would be nice to have a CPU thread scaling benchmark subtest that provides results for 0.5x, 1x, and 2x available threads so that programmers and end users can better understand how to make best use of their CPU.

Comment