Single Thread Score rating

David (PassMark) replied

Apr-01-2020, 12:47 AM
the same can be said about PassMark

We don't use .NET, we have full access to the source code, there are no disk or network dependencies, we known for a fact the code is single threaded & any changes to the code or compiler are fully under our control. So nothing at all like the same situation.

.NET is a foundation of millions real-world applications

It isn't.
.NET was only ever used for business applications and mostly for user interface work (business data entry forms). There are dozens of incompatible versions and it was replaced to UWP to some extent. Win32 and UWP (and a multitude of web based stuff) are what are now actually being used most of the time. But there is now talk of UWP being killed or morphed into something else as well. If you want to benchmark hardware it makes sense to get as close to the hardware as possible. Not build something upon layers of software you don't control.
Leave a comment:
proboszcz replied

Mar-31-2020, 03:24 PM
Originally posted by David (PassMark) View Post

.NET functions are opaque & out of the control of the benchmarker. Some functions might be single thread, some might not be, some might be threaded in certain .NET releases, but not others. Some might be hardware accelerated in some .NET releases, some might not be. You can't even be sure what the bottleneck will be. For some .NET functions it is likely limited by disk speed, or RAM speed and isn't even a CPU benchmark. To make a broad claim that benchmarkdotnet is the gold standard reference for measurement of single threaded CPU performance doesn't make any sense.

Firstly - the same can be said about PassMark and the workloads that were chosen for tests.
Secondly - .NET is a foundation of millions real-world applications in the wild. So the behavior of it on certain CPUs is by far more better measurement than synthetic benchmarks like PassMark, where the Passmark is not a foundation of any real-world applications.
Leave a comment:
David (PassMark) replied

Mar-30-2020, 10:25 PM
.NET functions are opaque & out of the control of the benchmarker. Some functions might be single thread, some might not be, some might be threaded in certain .NET releases, but not others. Some might be hardware accelerated in some .NET releases, some might not be. You can't even be sure what the bottleneck will be. For some .NET functions it is likely limited by disk speed, or RAM speed and isn't even a CPU benchmark. To make a broad claim that benchmarkdotnet is the gold standard reference for measurement of single threaded CPU performance doesn't make any sense.
Leave a comment:
proboszcz replied

Mar-30-2020, 08:01 PM
Originally posted by David (PassMark) View Post

BenchmarkDotNet isn't a single thread benchmark. So pretty much has no relevance to single thread scores..... [/COLOR]

What? Then please explain - why when running the tests using that BenchmarkDotNet library the task manager is showing the full utilization of only one core?
Leave a comment:
David (PassMark) replied

Mar-30-2020, 12:26 AM
As noted several times, all the old PerformanceTest V9 results have been archived on this V9 page. There is no need to post unreadable screen shots from the Wayback machine, like you were unearthing some hidden secret.

BenchmarkDotNet isn't a single thread benchmark. So pretty much has no relevance to single thread scores. It is also a benchmark primarily aimed at tracking the performance of .NET across difference releases (different compilers and versions of .NET). It isn't really a hardware benchmark. But if you are really a software developer I guess you already know this, and are just trolling.

We welcome an informed technical discussion, but from this point forward any posts that are trolling, baseless accusations, time wasting repetition or just plain dumb will be deleted without notice.
Leave a comment:
Xeinaemm replied

Mar-29-2020, 08:51 PM
PassMark really? Again are you playing with Intel? As a software developer when I saw this and results in BenchmarkDotNet I laughed... BenchmarkDotNet is for microbenchmarking of various topics and you telling me that in less than month Intel magically upgraded older CPUs?
Leave a comment:
proboszcz replied

Mar-29-2020, 08:52 AM
Originally posted by David (PassMark) View Post

Running more tests means a longer run time, and our aim was to have a relatively quick benchmark (as opposed to others on the market that can take hours to run).
So we selected representative tests and short test periods. That decision was made around a decade ago. There is obviously a test time / accuracy trade off that we made. It wouldn't matter what we did, people are still going to come up with conspiracy theories as soon as they see a result they don't like, or doesn't favor the CPU they just bought.
To really understand CPU performance you need a degree in computer science and a in depth study of the domain. There are only a few people who truly understand x86 assembler, SIMD, NUMA, pipelining, variable alignment, caching, compilers, the windows kernel, branch prediction, microcode, etc.... (and to be clear, some of this stuff we also only half understand). So it is all to easy to explain a complex issue as just being a conspiracy. Plus is makes great click bait for the publishers.

Well, you do not have to remove workloads for single thread tests to speedup the execution. You can just decrease the amount of data to process leaving the number of workloads untouched, which will give much more consistent results between multithread and singlethread results than it is currently.
Also does the SingleThread tests include workloads that utilize AVX512? If yes then you should definitely add also the workloads utilizing IA SHA Extensions for fair comparisons.

As to the conspiracy theories - the problem with them is that in the past they came true (Intel was paying benchmarks' vendors and suppliers to favor only their processors). Moreover compilers often were written in such a way, they did not "see" additional instructions on processors on which the CPUID instruction did not return "GenuineIntel" phrase. Having that in mind you should not change the tests so they favor only one Vendor, because it starts to look like the "good" old times have returned...
Leave a comment:
David (PassMark) replied

Mar-29-2020, 02:26 AM
Originally posted by CerianK View Post

Intel 10xxxX processors now at top of the online single-thread chart reading up to 3614. I loaded the latest individual 10.0.1004 baselines and see only about 2700.
For some reason I mistakenly thought those were the new refresh processor leaks... it took me a bit to realize that was not the case.

Yes, they do look a bit high.
They are pretty rare CPUs however. Looking at the few PTR10 results we have for these CPUs (on the 1004 build), it would indicate that they are going to drop a few places once we get some more results in. Some of the results we got for these CPUs also looked to be overclocked, so that reduces the sample pool even further.
Leave a comment:
David (PassMark) replied

Mar-29-2020, 01:40 AM
Originally posted by proboszcz View Post

That would result in much more consistent results and will not trigger conspiracy theories....

Running more tests means a longer run time, and our aim was to have a relatively quick benchmark (as opposed to others on the market that can take hours to run).
So we selected representative tests and short test periods. That decision was made around a decade ago. There is obviously a test time / accuracy trade off that we made. It wouldn't matter what we did, people are still going to come up with conspiracy theories as soon as they see a result they don't like, or doesn't favor the CPU they just bought.
To really understand CPU performance you need a degree in computer science and a in depth study of the domain. There are only a few people who truly understand x86 assembler, SIMD, NUMA, pipelining, variable alignment, caching, compilers, the windows kernel, branch prediction, microcode, etc.... (and to be clear, some of this stuff we also only half understand). So it is all to easy to explain a complex issue as just being a conspiracy. Plus is makes great click bait for the publishers.
Leave a comment:
CerianK replied

Mar-28-2020, 10:42 PM
Intel 10xxxX processors now at top of the online single-thread chart reading up to 3614. I loaded the latest individual 10.0.1004 baselines and see only about 2700.
For some reason I mistakenly thought those were the new refresh processor leaks... it took me a bit to realize that was not the case.
Leave a comment:
proboszcz replied

Mar-25-2020, 06:52 AM
Originally posted by David (PassMark) View Post

No. I am saying you can't draw any conclusions about current generation Intel / AMD IPC by looking at the example you provided. One CPU was 7 years old. It wasn't even close to a fair comparison. I never said anything about .NET at all.

In my previous post I agreed with you about the IPC comparison. What triggered me was that you told me it is irrelevant to check if the library is really supporting the IA SHA Extensions on AMD processors because even if it is not, then this is a real-world example, which I do not agree at all.

I also explained why I showed you the results from the Azure (the same levels of machine which are also priced almost the same but differ only in CPU Vendor). As you may also noticed - my comparison was the real-world case, because when you deploy your application in Azure you are deploying it to a shared environment and want to know what is the performance in that shared environment (not the synthetic one when the CPU is not doing anything else). I repeated my tests several times and the results were quite consistent (not more than 5% difference), so I don't think that the shared environment was introducing the big error to make those comparisons irrelevant, like you implied.

Originally posted by David (PassMark) View Post

SHA hardware acceleration is used when available. Some CPUs from both Intel and AMD support it. See this post for some SHA related graphs.
And at the moment this is grossly in AMD favour for the multi-threaded test.

But, this forum topic is about the single threaded result and SHA is NOT USED AT ALL FOR THE SINGLE THREADED RESULT. So it is irrelevant.

Decompressing is easier and normally disk bound not CPU bound. Getting optimal compression is more CPU intensive.

Thank you for checking the support of SHA Extensions. However can you explain to us - why some workloads are only tested in multithread? I think this will mislead people and cause such discussions to occur, because by doing that, the multithread results can be vastly different than the single thread results and can lead to misunderstandings and strange looking behavior. In my opinion the single thread tests should be performed the same way as the multi thread ones (using the same workloads) but capped only to 1 core. That would result in much more consistent results and will not trigger conspiracy theories...
For example - the encryption tests should be a part of the single thread results, because mostly people on their pcs are using the encryption in single threaded manner (mainly when surfing the internet using web browser to establish an SSL/TLS channel).
Leave a comment:
David (PassMark) replied

Mar-25-2020, 04:13 AM
Originally posted by proboszcz View Post

So basically you are going to tell me, that MS .NET Core is not a real-life scenario, which I showed you?

No. I am saying you can't draw any conclusions about current generation Intel / AMD IPC by looking at the example you provided. One CPU was 7 years old. It wasn't even close to a fair comparison. I never said anything about .NET at all.

however ensuring that not niche but used in mainstream IA SHA Extensions is not neccessary because for you it is not a real-world scenario?

SHA hardware acceleration is used when available. Some CPUs from both Intel and AMD support it. See this post for some SHA related graphs.
And at the moment this is grossly in AMD favour for the multi-threaded test.

But, this forum topic is about the single threaded result and SHA is NOT USED AT ALL FOR THE SINGLE THREADED RESULT. So it is irrelevant.

So why single tests are using only compression, even thou the decompression is a much more real-world scenario

Decompressing is easier and normally disk bound not CPU bound. Getting optimal compression is more CPU intensive.
Leave a comment:
proboszcz replied

Mar-24-2020, 02:05 PM
Originally posted by David (PassMark) View Post

Doesn't really matter. That Crypto library is used by 1000s of software projects. So regardless of if the code if optimal or not, it is what is being used in real life at the moment. And reflecting real life performance is better than having code that super optimised (using techniques that aren't used in normal software).

So basically you are going to tell me, that MS .NET Core is not a real-life scenario, which I showed you? Thousands of web applications are based on that technology. So your point is that your particular library, which may or may not use the IA SHA Extensions, is a better view of real-life applications instead of a core of thousands of applications available in the web today?

So adding niche AVX512 to the tests is fine (which favors only Intel and works only on Intel), however ensuring that not niche but used in mainstream IA SHA Extensions is not neccessary because for you it is not a real-world scenario?

Originally posted by David (PassMark) View Post

Again, you can't compare IPC unless the clock speeds are the same.
...

Yes, I agree with that. I showed you the benchmarks I was able to do by myself because I have access only to those hardwares currently. However previous members pointed you to other benchmarks and even a quite good YT clip showing you that the IPC is greater on the Zen2 comparing to the Coffe Lake. At worst the IPC is the same on both.

As for SHA256 - you told me previously that your encryption tests are using SHA256 and now you are saying that they are not? So why you didn't pick the most common hashing algorithm today for the ,like you said, "real-world" scenario tests? Who then define what is a real-world tests?

Speaking to Real-World usage - how often people are compressing vs decompressing? I will tell you - much often people are decompressing. Even Windows is storing things in memory in a compressed form, and much often requires decompression than compression. So why single tests are using only compression, even thou the decompression is a much more real-world scenario?

Last edited by proboszcz; Mar-24-2020, 03:05 PM.
Leave a comment:
David (PassMark) replied

Mar-24-2020, 03:09 AM
Originally posted by proboszcz View Post

I think it wolud be ok to check if that library is actually really able to use those instructions in AMD processors - in the past there were many cases when libraries were "seeing" additional instructions only on Genuine Intel cpus despite Authentic AMD cpus had them available.

Doesn't really matter. That Crypto library is used by 1000s of software projects. So regardless of if the code if optimal or not, it is what is being used in real life at the moment. And reflecting real life performance is better than having code that super optimised (using techniques that aren't used in normal software).

Originally posted by proboszcz View Post

Honestly this is not as ridiculuos as you want to imply.

Yes it was. You can't compare IPC unless the clock speeds are the same.
It's completely disingenuous to compare a 7 year old part to a new part, then generalise the result to all new CPUs.

It like, My new $50K Toyota car has better fuel efficiency that your $5K, 7 year old, Ford. Therefore Toyota must be better than all Fords.

Originally posted by proboszcz View Post

You can always compile those sources and compare them by your self. I recently made the comparison on Azure VMs using that code EPYC vs XEON and the results were as follows:

Again, you can't compare IPC unless the clock speeds are the same.
Also running in the cloud on shared hardware means you aren't going to see single thread turbo speed very often (as the machine will probably already have multiple cores under load from other users). You have no way of controlling the test environment. So was that Xeon running at it's base speed of 2.1Ghz or it's single core turbo speed of 3.7Ghz for the entire test period?

If your argument was to show that benchmarkdotnet returns vastly different results for SHA256 computations on AMD and Intel, then that is a fair point. But completely irrelevant to this topic. We don't even use SHA256 for our single threaded test.
Leave a comment:
DwaSokoly replied

Mar-24-2020, 12:02 AM
Originally posted by DwaSokoly View Post

Mr.David,
in my opinion there is another issue with Multithread Scores.

Good thing is that there are better scores for multi-cores processors but something happened with ALL AMD processors older than Zen core. These scores are (in average) lower by 33% than previous. It could be even possible if the previous algorithm was inproper but there are some very suspicious records - especially if you are compare 2 cores or 1 core CPU scores. My examples (but there are much more on the site) from my observations:

AMD Athlon X2 370K Dual Core: previous single thread score: 1461, actual single thread score: 1461, previous multithread score: 2251, actual multithread score: 1463 = 65% of previous score - it's worth to mention that it's the same score as single thread figure - looks like algorithms didn't use 2 cores....

AMD Opteron 150 Single Core: previous single thread score: 725, actual single thread score: 725, previous multithread score: 604, actual multithread score: 393 = 65% of previous score - in this case multithread score is on the level of 54% of single thread....

I could be wrong but for me it's some error in algorithm for all AMD older processors which declines the scores about 33%.

Best Regards,
Darek

Hi! Mr David - I know that is not a single thread issue but could you answer this above?
I found there is a some mess in other multithread scores. Example:

Intel Core i7-975 (Nehalem-Bloomfield, 45nm) 4 cores, 8 threads, 3333MHz clock, 3467MHz all cores and 3600MHz 1 thread vs.
Intel Core i7-980X (Westmere-Gulftown, 32nm) 6 cores, 12 threads, 3333MHz clock, 3467MHz all cores and 3600MHz 1 thread

looks that only difference are the core numbers => i7-980x = 1.5x i7-975

and previous scores looks properly:

Intel Core i7-975 single 1460, multi 6135
Intel Core i7-980X single 1455, multi 8808 = 144% of i7-975 score

but new scores are as follows:

Intel Core i7-975 single 1550, multi 3600
Intel Core i7-980X single 1497, multi 7444 = 207% of i7-975 score => way too much...

Other 6 cores, 12 threads Intel CPU have the same jump compared to Intel 4 cores, 8 threads CPU.

Is there seriosly something strange with new algorithm.

Darek
Leave a comment:

Previous 1 2 3 4 5 6 template Next

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: