Single Thread Score rating

dylandog replied

Mar-23-2020, 07:54 PM
Originally posted by dylandog View Post

David there are many test and benchmark that show ryzen 3000 with a higher ipc and multicore then coffee lake (like here https://www.youtube.com/watch?v=DjBC_SzEKh4) and it seems that you don't want to accept it.....ryzen 3950x beat 9900ks in most single thread applications but this new update totally broken ryzen 3000.......while in gaming coffee lake is faster due to lower latency

thz i didnt see wrong link https://www.youtube.com/watch?v=1L3Hz1d6Y9o&t=2s
Leave a comment:
proboszcz replied

Mar-23-2020, 12:08 PM
Originally posted by David (PassMark) View Post

....

"AES, CRC, GCM and SHA use ARM, Intel and PowerPC hardware acceleration when available". Their open source code seems to back this up. So yes, they should be used when available.

I think it wolud be ok to check if that library is actually really able to use those instructions in AMD processors - in the past there were many cases when libraries were "seeing" additional instructions only on Genuine Intel cpus despite Authentic AMD cpus had them available.

Originally posted by David (PassMark) View Post

Honestly that's a ridiculous argument.
That page presents results for the Intel i7-4702MQ and the the Ryzen 7 3700X.
Of course the 3700X is faster. You are comparing a brand new AMD part to a 7 year old Intel part. The Intel part is for Laptops with thermal constraints (37W TDP) while the AMD part is 65W is for desktops. The clock speeds and RAM speeds are also better in the 3700X. You can't compare IPC between CPUs when the clock speeds aren't even the same.

Honestly this is not as ridiculuos as you want to imply. If you look into PassMark single Thread results there are manu 15W (U) Intel processors beating Ryzen 95W desktop processors - so if that comparison is so ridiculus to you, then PassMark results should be also.
I know that the rsults from that github site are not fully showing the IPC lead because author had access only to those two processors probably where one is desktop grade and one is mobile grade, however they are showing the enormous difference in some workloads that can be seen (like the SHA256 one, which is 10 times faster on AMD). You can always compile those sources and compare them by your self. I recently made the comparison on Azure VMs using that code EPYC vs XEON and the results were as follows:

Standard D2as_v4 VM:

Code:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.973 (1809/October2018Update/Redstone5) AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core .NET Core SDK=3.1.102 [Host] : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT DefaultJob : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT | Method | Mean | Error | StdDev | |--------------------- |-----------------:|----------------:|----------------:| | EnumParse | 176.9 ns | 1.93 ns | 1.71 ns | | LinqOrderBySkipFirst | 205,018,995.6 ns | 1,089,789.03 ns | 1,019,389.34 ns | | Sha256 | 64,276,748.2 ns | 181,903.50 ns | 161,252.72 ns | | StringStartsWith | 641,017,313.3 ns | 3,507,783.77 ns | 3,281,183.12 ns | | Deserialize | 425,845,785.7 ns | 4,218,624.54 ns | 3,739,700.77 ns |

Standard D2s_v3 VM:

Code:

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.973 (1809/October2018Update/Redstone5) Intel Xeon Platinum 8171M CPU 2.60GHz, 1 CPU, 2 logical cores and 1 physical core .NET Core SDK=3.1.102 [Host] : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT DefaultJob : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT | Method | Mean | Error | StdDev | |--------------------- |-----------------:|----------------:|----------------:| | EnumParse | 189.0 ns | 1.95 ns | 1.73 ns | | LinqOrderBySkipFirst | 263,568,846.7 ns | 3,371,004.70 ns | 3,153,239.89 ns | | Sha256 | 619,563,485.7 ns | 5,038,128.82 ns | 4,466,169.97 ns | | StringStartsWith | 852,568,914.3 ns | 6,602,072.12 ns | 5,852,564.97 ns | | Deserialize | 426,299,873.3 ns | 6,837,428.74 ns | 6,395,735.09 ns |
Leave a comment:
David (PassMark) replied

Mar-23-2020, 10:28 AM
Originally posted by proboszcz View Post

David, why didn't you answer my previous question about using IA SHA Extensions during testing?.

Because it was a busy week. There was a global pandemic & whole company had to move to working from home, we released new software and had 100s of fan boys complaining that results moved around a bit and so they decided sending abusive anonymous Emails was the best way to deal with it.

There is a description of the tests on this page.
Part of the encryption test is SHA256. This is the implementation from the standard https://www.cryptopp.com/ library.
Their documentation states
"AES, CRC, GCM and SHA use ARM, Intel and PowerPC hardware acceleration when available". Their open source code seems to back this up. So yes, they should be used when available.

Originally posted by proboszcz View Post

Why changes to single core in PassMark favors only one vendor?

It is a two horse race. Logically they can't both do better than each other in the same test.

Originally posted by proboszcz View Post

And I don't agree about you, that Ryzen 3000 series do not have IPC advantage over the Intel Core ........ Also please look at the .NET Core sample results from here: https://github.com/djfoxer/DotNetFrameworkVsCore (this test is lso single-threaded and shows quite big advantage for Ryzen over Intel Core).

Honestly that's a ridiculous argument.
That page presents results for the Intel i7-4702MQ and the the Ryzen 7 3700X.
Of course the 3700X is faster. You are comparing a brand new AMD part to a 7 year old Intel part. The Intel part is for Laptops with thermal constraints (37W TDP) while the AMD part is 65W is for desktops. The clock speeds and RAM speeds are also better in the 3700X. You can't compare IPC between CPUs when the clock speeds aren't even the same.
Leave a comment:
proboszcz replied

Mar-23-2020, 07:31 AM
David, why didn't you answer my previous question about using IA SHA Extensions during testing? Why changes to single core in PassMark favors only one vendor? And I don't agree about you, that Ryzen 3000 series do not have IPC advantage over the Intel Core, because many other single threaded benchmarks like Cinebench, POV-RAY, PassMark v9 show quite a different view. Also please look at the .NET Core sample results from here: https://github.com/djfoxer/DotNetFrameworkVsCore (this test is lso single-threaded and shows quite big advantage for Ryzen over Intel Core).
Leave a comment:
CerianK replied

Mar-23-2020, 02:49 AM
David, thanks for replying.

I had noticed that the single-thread results are not very RAM speed sensitive, as the 3800X build I mentioned is using very unremarkable RAM with high latency (bottom entry in the list you just posted... Chun Well = Oloy, BTW).

Unrelated: Just upgraded another PC to 10.0.1004 and immediately re-ran locally because previous run on older version was via RDP, so there was no 3D result for GTX1060... would not let me upload the now 3D-complete new result due to being within 5% of previous Passmark rating. I'm not sure if allowing that kind of back-fill on GPU results is important to you.
Leave a comment:
David (PassMark) replied

Mar-23-2020, 12:49 AM
Originally posted by CerianK View Post

Consider the new code changes:

Code:

+ std::minstd_rand rng(RAND_SEED); - pbDataBuffer[i] = (rand() % 27) + 96; + pbDataBuffer[i] = (rng() % 27) + 96;

Yes, 'RAND_SEED' indicates a constant declaration, so the sequence should be deterministic if upper bound 'i' is a constant also. However, the issue that was addressed is spending too much time generating random numbers, even though (from David): Based on that, I see some issues:
1. The random numbers generated are 32-bit.
2. The random numbers generated are not considered random by modern standards, even for non-cryptographic use (i.e. minstd_rand, and most others in the library, should be recommended for deprecation).

You might accept #2 as a non-issue for benchmark purposes, but there may be hidden caveats.
#1, however, will cause twice as much time (i.e. not a small part) to be spent on generating random numbers as should be necessary on a 64-bit processor, and could potentially end up testing how well a processor performs in 32-bit scenarios.

Random number code is deterministic. Meaning exactly the same mathematical operations are performed on each run. So if the test environment stays the same, then the execution time stays the same.

For the other two points.

1) Most 64bit code isn't using 64bit variables. It is a wasteful practice if you don't need it. So programmers will use variable like int, char, float, bool, etc.. all the time and none of them are 64bit. A good C/C++ programmer will only use 64bit variables when they need to store numbers larger than 2^32. The situation for some scripting languages is different however. So for Javascript you always get 64bit numbers, even if you only need 8 bits for the job. So Javascript is hugely inefficient for RAM usage.

The C run time Rand() function has been 32bit only for a long long time. So to get a 64bit Rand() you need to call it twice. In fact it isn't really even 32bit. It returns a pseudorandom integer in the range 0 to RAND_MAX (32767). Which is only 15bits. And if you look at our code we shift the values into the ASCII range to simulate the compression of single byte plain text. Yes the code could be faster but the idea with benchmarks isn't to always write the faster code possible. We try to use write code that reflects code that is in common use (or will be in common use).

2) Totally irrelevant. We aren't encrypting anything.
Semi random data is required for the compression test because if we compress a huge buffer full of zeros it isn't a realistic test case. It would be an edge case, not worthy of being a benchmark.
There are lots of scenarios where a fast pseudo random number is preferred to cryptographic random. Random events in games for example (dice rolls, shuffling cards, weather events).
Leave a comment:
David (PassMark) replied

Mar-23-2020, 12:22 AM
Originally posted by CerianK View Post

I do still see some indications of as much as 5% single-thread bias against against AMD in Passmark if I compare to some custom workloads under Linux.
However, Linux is not really directly comparable to Windows

Considering the inconsistencies in Windows benchmarks, it might be a good idea for Passmark to list single-thread sub-scores for individual tests in the interest of full-disclosure.

I think most of the Windows / Linux difference (especially for the high core count Ryzens) is what you already called out. Linux did a better job of improving the scheduler (and NUMA memory access) than Windows. Plus there are probably a lot of Linux machines not running the latest CPU microcode security patches.

Having a Window's benchmark showing a 5% difference from a completely different Linux benchmark isn't at all surprising.
I'm surprised they are even that close, given the completely different environments.
Leave a comment:
David (PassMark) replied

Mar-23-2020, 12:08 AM
Originally posted by dylandog View Post

David there are many test and benchmark that show ryzen 3000 with a higher ipc and multicore then coffee lake (like here [noscript]https://www.youtube.com/watch?v=DjBC_SzEKh4[/noscript]) and it seems that you don't want to accept it.....ryzen 3950x beat 9900ks in most single thread applications but this new update totally broken ryzen 3000.......while in gaming coffee lake is faster due to lower latency

That video was 20min of streaming game play from the game ARK doing a review to the Genesis DLC. Nothing at all to do with benchmarking Ryzen.
Plus it was all in Italian. I'll give you the benefit of the doubt I assume it was a typo and not blatant self promotion of a channel you are associated with.
We've got a super low tolerance for trolling, spammers, self promotional & bots building back links. Is just a huge waste of time for everyone. I sometimes ban up to a dozen people a day (and after 10 years of doing it it gets tiresome). You've been warned.
Leave a comment:
David (PassMark) replied

Mar-22-2020, 11:55 PM
Originally posted by BenchmarkManiac View Post

So far it only getting worse. Ryzen 3950X is already losing even to 3600X. This is nonsense.

Comparison of AMD Ryzen 9 3950X and AMD Ryzen 5 3600X is here,
https://www.cpubenchmark.net/compare...00X/3598vs3494

Situation reversed itself over the week-end. AMD Ryzen 9 3950X is now very slightly in front.

But I think this is a reflection of their actual single threaded performance. Not an anomaly. Even the non-x version, then Ryzen 5 3600 can beat the 3950X, depending on the application.

Quote from TechRadar.
"The story continues in our Middle Earth: Shadow of War benchmark. There, the Ryzen 9 3950X scored an average of 116fps at Full HD and 49fps at 4K. The Ryzen 5 3600 hit 118fps and 51fps for the same tests, shockingly beating the Ryzen 9 3950X. The Intel Core i9-9900K came out slightly ahead with 125fps at Full HD and 52fps at 4K."

Just by coincidence, same quote also said Intel higher end chip was faster.

I am guessing you just assume that the more expensive CPU should perform significantly better in all tasks, but it doesn't.
Leave a comment:
David (PassMark) replied

Mar-22-2020, 11:39 PM
Originally posted by CerianK View Post

Looking at recent benchmarks, many AMD builds cap the single-thread performance by locking the CPU to a speed well under its maximum rating.
This produces a lower single thread performance and increases the deviation in all results.
I just ran my son's new 3800X build and the single core result was about 2790, well above average since it was able to stretch up to 4.5GHz.

Yes, there is a spread of results. But I don't think it is due to capped turbo clock speed. I think the main cause is differences in RAM setups (channels active, latency and RAM clocks).

Compare these two graphs below from PT10. The Physics test is more sensitive to RAM setups than Encryption.

Baseline 1207902 obviously has some minor (non RAM related) setup problem. But the other results are pretty consistent for Encryption.

There are really big differences in the Physics test however, for the same set of machines. You can see the same for Prime numbers which is also sensitive to RAM (graph not shown).

The single threaded is slightly effected by the RAM setup. No were near as much as the physics test however.

Originally posted by CerianK View Post

I just ran my son's new 3800X build and the single core result was about 2790, well above average since it was able to stretch up to 4.5GHz.

Single threaded average for 3800X today is 2750. So in fact you with 2% of average now.
Leave a comment:
CerianK replied

Mar-22-2020, 04:24 PM
Consider the new code changes:

Code:

+ std::minstd_rand rng(RAND_SEED); - pbDataBuffer[i] = (rand() % 27) + 96; + pbDataBuffer[i] = (rng() % 27) + 96;

Yes, 'RAND_SEED' indicates a constant declaration, so the sequence should be deterministic if upper bound 'i' is a constant also. However, the issue that was addressed is spending too much time generating random numbers, even though (from David):

Generating random numbers was always part of the test, but it should have been a small part.

Based on that, I see some issues:
1. The random numbers generated are 32-bit.
2. The random numbers generated are not considered random by modern standards, even for non-cryptographic use (i.e. minstd_rand, and most others in the library, should be recommended for deprecation).

You might accept #2 as a non-issue for benchmark purposes, but there may be hidden caveats.
#1, however, will cause twice as much time (i.e. not a small part) to be spent on generating random numbers as should be necessary on a 64-bit processor, and could potentially end up testing how well a processor performs in 32-bit scenarios.
Leave a comment:
BenchmarkManiac replied

Mar-22-2020, 12:34 PM
I am asking David of course. If they used rand from the library I doubt they really focused on determinism. At least it can change from one compiler to another and from one library to another. When people care about reproducibility of the pseudorandom sequence they write their own generator.
Leave a comment:
CerianK replied

Mar-22-2020, 05:19 AM
Originally posted by BenchmarkManiac View Post

Do you really make random data for the test?

If you are asking me, then no, no connection. In any case, Passmark is now using a (presumably constant seeded?) LCG, minstd, instead of Rand(), which David discussed previously in this thread.

If it matters, I have written PRNG benchmarks in Windows and Linux for my own research and took notice of speed discrepancies in Passmark only after a Reddit poster made a comment to me that prompted me to review my own prior data. It is a coincidence that the original issue (corrected by Version 10.0.1004) was actually related to random numbers, but that did prompt me to post here.

I am not convinced one way or the other if there are any remaining issues that need to be addressed in Passmark, but I can authoritatively state that an AMD 3800X has the potential to be up to 5% faster in some workloads than the Passmark results would predict. Word has it that Intel has a better AVX implementation, that I have not yet tested in my work, but will soon. It wouldn't suprise me to find Intel dominating that test by at least 5%.
Leave a comment:
BenchmarkManiac replied

Mar-21-2020, 11:23 PM
I am thinking about your random data. Do you really make random data for the test? This makes test nonreproducible and nondeterministic. The data for the test should be identical from one run to another from one machine to the other machine. Otherwise you will see this crazy random effects in results. What did you expect if you are running the test in random data input?

I mean.. you can use a self made congruential pseudorandom generator to generate identical sequences of data based on the same seed but it seems that you aren't using this method and do not really bother about determinism of the test.
Leave a comment:
CerianK replied

Mar-21-2020, 03:57 PM
I do still see some indications of as much as 5% single-thread bias against against AMD in Passmark if I compare to some custom workloads under Linux.
However, Linux is not really directly comparable to Windows, as the same code run under both Windows and Linux will often show a bias towards Linux, which could be nearly 10% depending on exactly what is being tested. That indicates that the delta is time wasted doing something else other than the intended workload due to many possible factors (e.g. compiler, security mitigations, thread scheduler thrashing, etc.).

Considering the inconsistencies in Windows benchmarks, it might be a good idea for Passmark to list single-thread sub-scores for individual tests in the interest of full-disclosure.

Originally posted by dylandog View Post

David there are many test and benchmark that show ryzen 3000 with a higher ipc and multicore then coffee lake (like here https://www.youtube.com/watch?v=DjBC_SzEKh4) and it seems that you don't want to accept it.....ryzen 3950x beat 9900ks in most single thread applications but this new update totally broken ryzen 3000.......while in gaming coffee lake is faster due to lower latency

That Youtube link seems incorrect... Ark Genesis in Italian.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: