Announcement

Collapse
No announcement yet.

Single Thread Score rating

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by CerianK View Post
    I do still see some indications of as much as 5% single-thread bias against against AMD in Passmark if I compare to some custom workloads under Linux.
    However, Linux is not really directly comparable to Windows

    Considering the inconsistencies in Windows benchmarks, it might be a good idea for Passmark to list single-thread sub-scores for individual tests in the interest of full-disclosure.
    I think most of the Windows / Linux difference (especially for the high core count Ryzens) is what you already called out. Linux did a better job of improving the scheduler (and NUMA memory access) than Windows. Plus there are probably a lot of Linux machines not running the latest CPU microcode security patches.

    Having a Window's benchmark showing a 5% difference from a completely different Linux benchmark isn't at all surprising.
    I'm surprised they are even that close, given the completely different environments.

    Comment


    • #47
      Originally posted by CerianK View Post
      Consider the new code changes:
      Code:
      + std::minstd_rand rng(RAND_SEED);
      - pbDataBuffer[i] = (rand() % 27) + 96;
      + pbDataBuffer[i] = (rng() % 27) + 96;
      Yes, 'RAND_SEED' indicates a constant declaration, so the sequence should be deterministic if upper bound 'i' is a constant also. However, the issue that was addressed is spending too much time generating random numbers, even though (from David): Based on that, I see some issues:
      1. The random numbers generated are 32-bit.
      2. The random numbers generated are not considered random by modern standards, even for non-cryptographic use (i.e. minstd_rand, and most others in the library, should be recommended for deprecation).

      You might accept #2 as a non-issue for benchmark purposes, but there may be hidden caveats.
      #1, however, will cause twice as much time (i.e. not a small part) to be spent on generating random numbers as should be necessary on a 64-bit processor, and could potentially end up testing how well a processor performs in 32-bit scenarios.
      Random number code is deterministic. Meaning exactly the same mathematical operations are performed on each run. So if the test environment stays the same, then the execution time stays the same.

      For the other two points.

      1) Most 64bit code isn't using 64bit variables. It is a wasteful practice if you don't need it. So programmers will use variable like int, char, float, bool, etc.. all the time and none of them are 64bit. A good C/C++ programmer will only use 64bit variables when they need to store numbers larger than 2^32. The situation for some scripting languages is different however. So for Javascript you always get 64bit numbers, even if you only need 8 bits for the job. So Javascript is hugely inefficient for RAM usage.

      The C run time Rand() function has been 32bit only for a long long time. So to get a 64bit Rand() you need to call it twice. In fact it isn't really even 32bit. It returns a pseudorandom integer in the range 0 to RAND_MAX (32767). Which is only 15bits. And if you look at our code we shift the values into the ASCII range to simulate the compression of single byte plain text. Yes the code could be faster but the idea with benchmarks isn't to always write the faster code possible. We try to use write code that reflects code that is in common use (or will be in common use).

      2) Totally irrelevant. We aren't encrypting anything.
      Semi random data is required for the compression test because if we compress a huge buffer full of zeros it isn't a realistic test case. It would be an edge case, not worthy of being a benchmark.
      There are lots of scenarios where a fast pseudo random number is preferred to cryptographic random. Random events in games for example (dice rolls, shuffling cards, weather events).

      Comment


      • #48
        David, thanks for replying.

        I had noticed that the single-thread results are not very RAM speed sensitive, as the 3800X build I mentioned is using very unremarkable RAM with high latency (bottom entry in the list you just posted... Chun Well = Oloy, BTW).

        Unrelated: Just upgraded another PC to 10.0.1004 and immediately re-ran locally because previous run on older version was via RDP, so there was no 3D result for GTX1060... would not let me upload the now 3D-complete new result due to being within 5% of previous Passmark rating. I'm not sure if allowing that kind of back-fill on GPU results is important to you.

        Comment


        • #49
          David, why didn't you answer my previous question about using IA SHA Extensions during testing? Why changes to single core in PassMark favors only one vendor? And I don't agree about you, that Ryzen 3000 series do not have IPC advantage over the Intel Core, because many other single threaded benchmarks like Cinebench, POV-RAY, PassMark v9 show quite a different view. Also please look at the .NET Core sample results from here: https://github.com/djfoxer/DotNetFrameworkVsCore (this test is lso single-threaded and shows quite big advantage for Ryzen over Intel Core).

          Comment


          • #50
            Originally posted by proboszcz View Post
            David, why didn't you answer my previous question about using IA SHA Extensions during testing?.
            Because it was a busy week. There was a global pandemic & whole company had to move to working from home, we released new software and had 100s of fan boys complaining that results moved around a bit and so they decided sending abusive anonymous Emails was the best way to deal with it.

            There is a description of the tests on this page.
            Part of the encryption test is SHA256. This is the implementation from the standard https://www.cryptopp.com/ library.
            Their documentation states
            "AES, CRC, GCM and SHA use ARM, Intel and PowerPC hardware acceleration when available". Their open source code seems to back this up. So yes, they should be used when available.


            Originally posted by proboszcz View Post
            Why changes to single core in PassMark favors only one vendor?
            It is a two horse race. Logically they can't both do better than each other in the same test.

            Originally posted by proboszcz View Post
            And I don't agree about you, that Ryzen 3000 series do not have IPC advantage over the Intel Core ........ Also please look at the .NET Core sample results from here: https://github.com/djfoxer/DotNetFrameworkVsCore (this test is lso single-threaded and shows quite big advantage for Ryzen over Intel Core).
            Honestly that's a ridiculous argument.
            That page presents results for the Intel i7-4702MQ and the the Ryzen 7 3700X.
            Of course the 3700X is faster. You are comparing a brand new AMD part to a 7 year old Intel part. The Intel part is for Laptops with thermal constraints (37W TDP) while the AMD part is 65W is for desktops. The clock speeds and RAM speeds are also better in the 3700X. You can't compare IPC between CPUs when the clock speeds aren't even the same.

            Comment


            • #51
              Originally posted by David (PassMark) View Post
              ....

              "AES, CRC, GCM and SHA use ARM, Intel and PowerPC hardware acceleration when available". Their open source code seems to back this up. So yes, they should be used when available.
              I think it wolud be ok to check if that library is actually really able to use those instructions in AMD processors - in the past there were many cases when libraries were "seeing" additional instructions only on Genuine Intel cpus despite Authentic AMD cpus had them available.

              Originally posted by David (PassMark) View Post

              Honestly that's a ridiculous argument.
              That page presents results for the Intel i7-4702MQ and the the Ryzen 7 3700X.
              Of course the 3700X is faster. You are comparing a brand new AMD part to a 7 year old Intel part. The Intel part is for Laptops with thermal constraints (37W TDP) while the AMD part is 65W is for desktops. The clock speeds and RAM speeds are also better in the 3700X. You can't compare IPC between CPUs when the clock speeds aren't even the same.
              Honestly this is not as ridiculuos as you want to imply. If you look into PassMark single Thread results there are manu 15W (U) Intel processors beating Ryzen 95W desktop processors - so if that comparison is so ridiculus to you, then PassMark results should be also.
              I know that the rsults from that github site are not fully showing the IPC lead because author had access only to those two processors probably where one is desktop grade and one is mobile grade, however they are showing the enormous difference in some workloads that can be seen (like the SHA256 one, which is 10 times faster on AMD). You can always compile those sources and compare them by your self. I recently made the comparison on Azure VMs using that code EPYC vs XEON and the results were as follows:

              Standard D2as_v4 VM:
              Code:
              BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.973 (1809/October2018Update/Redstone5)
              AMD EPYC 7452, 1 CPU, 2 logical cores and 1 physical core
              .NET Core SDK=3.1.102
                [Host]     : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT
                DefaultJob : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT
              
              
              |               Method |             Mean |           Error |          StdDev |
              |--------------------- |-----------------:|----------------:|----------------:|
              |            EnumParse |         176.9 ns |         1.93 ns |         1.71 ns |
              | LinqOrderBySkipFirst | 205,018,995.6 ns | 1,089,789.03 ns | 1,019,389.34 ns |
              |               Sha256 |  64,276,748.2 ns |   181,903.50 ns |   161,252.72 ns |
              |     StringStartsWith | 641,017,313.3 ns | 3,507,783.77 ns | 3,281,183.12 ns |
              |          Deserialize | 425,845,785.7 ns | 4,218,624.54 ns | 3,739,700.77 ns |
              Standard D2s_v3 VM:
              Code:
              BenchmarkDotNet=v0.12.0, OS=Windows 10.0.17763.973 (1809/October2018Update/Redstone5)
              Intel Xeon Platinum 8171M CPU 2.60GHz, 1 CPU, 2 logical cores and 1 physical core
              .NET Core SDK=3.1.102
                [Host]     : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT
                DefaultJob : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT
              
              
              |               Method |             Mean |           Error |          StdDev |
              |--------------------- |-----------------:|----------------:|----------------:|
              |            EnumParse |         189.0 ns |         1.95 ns |         1.73 ns |
              | LinqOrderBySkipFirst | 263,568,846.7 ns | 3,371,004.70 ns | 3,153,239.89 ns |
              |               Sha256 | 619,563,485.7 ns | 5,038,128.82 ns | 4,466,169.97 ns |
              |     StringStartsWith | 852,568,914.3 ns | 6,602,072.12 ns | 5,852,564.97 ns |
              |          Deserialize | 426,299,873.3 ns | 6,837,428.74 ns | 6,395,735.09 ns |

              Comment


              • #52
                Originally posted by dylandog View Post

                David there are many test and benchmark that show ryzen 3000 with a higher ipc and multicore then coffee lake (like here https://www.youtube.com/watch?v=DjBC_SzEKh4) and it seems that you don't want to accept it.....ryzen 3950x beat 9900ks in most single thread applications but this new update totally broken ryzen 3000.......while in gaming coffee lake is faster due to lower latency
                thz i didnt see wrong link https://www.youtube.com/watch?v=1L3Hz1d6Y9o&t=2s

                Comment


                • #53
                  Originally posted by DwaSokoly View Post

                  Mr.David,
                  in my opinion there is another issue with Multithread Scores.

                  Good thing is that there are better scores for multi-cores processors but something happened with ALL AMD processors older than Zen core. These scores are (in average) lower by 33% than previous. It could be even possible if the previous algorithm was inproper but there are some very suspicious records - especially if you are compare 2 cores or 1 core CPU scores. My examples (but there are much more on the site) from my observations:

                  AMD Athlon X2 370K Dual Core: previous single thread score: 1461, actual single thread score: 1461, previous multithread score: 2251, actual multithread score: 1463 = 65% of previous score - it's worth to mention that it's the same score as single thread figure - looks like algorithms didn't use 2 cores....

                  AMD Opteron 150 Single Core: previous single thread score: 725, actual single thread score: 725, previous multithread score: 604, actual multithread score: 393 = 65% of previous score - in this case multithread score is on the level of 54% of single thread....

                  I could be wrong but for me it's some error in algorithm for all AMD older processors which declines the scores about 33%.

                  Best Regards,
                  Darek
                  Hi! Mr David - I know that is not a single thread issue but could you answer this above?
                  I found there is a some mess in other multithread scores. Example:

                  Intel Core i7-975 (Nehalem-Bloomfield, 45nm) 4 cores, 8 threads, 3333MHz clock, 3467MHz all cores and 3600MHz 1 thread vs.
                  Intel Core i7-980X (Westmere-Gulftown, 32nm) 6 cores, 12 threads, 3333MHz clock, 3467MHz all cores and 3600MHz 1 thread

                  looks that only difference are the core numbers => i7-980x = 1.5x i7-975

                  and previous scores looks properly:

                  Intel Core i7-975 single 1460, multi 6135
                  Intel Core i7-980X single 1455, multi 8808 = 144% of i7-975 score

                  but new scores are as follows:

                  Intel Core i7-975 single 1550, multi 3600
                  Intel Core i7-980X single 1497, multi 7444 = 207% of i7-975 score => way too much...

                  Other 6 cores, 12 threads Intel CPU have the same jump compared to Intel 4 cores, 8 threads CPU.

                  Is there seriosly something strange with new algorithm.

                  Darek

                  Comment


                  • #54
                    Originally posted by proboszcz View Post
                    I think it wolud be ok to check if that library is actually really able to use those instructions in AMD processors - in the past there were many cases when libraries were "seeing" additional instructions only on Genuine Intel cpus despite Authentic AMD cpus had them available.
                    Doesn't really matter. That Crypto library is used by 1000s of software projects. So regardless of if the code if optimal or not, it is what is being used in real life at the moment. And reflecting real life performance is better than having code that super optimised (using techniques that aren't used in normal software).

                    Originally posted by proboszcz View Post
                    Honestly this is not as ridiculuos as you want to imply.
                    Yes it was. You can't compare IPC unless the clock speeds are the same.
                    It's completely disingenuous to compare a 7 year old part to a new part, then generalise the result to all new CPUs.

                    It like, My new $50K Toyota car has better fuel efficiency that your $5K, 7 year old, Ford. Therefore Toyota must be better than all Fords.

                    Originally posted by proboszcz View Post
                    You can always compile those sources and compare them by your self. I recently made the comparison on Azure VMs using that code EPYC vs XEON and the results were as follows:
                    Again, you can't compare IPC unless the clock speeds are the same.
                    Also running in the cloud on shared hardware means you aren't going to see single thread turbo speed very often (as the machine will probably already have multiple cores under load from other users). You have no way of controlling the test environment. So was that Xeon running at it's base speed of 2.1Ghz or it's single core turbo speed of 3.7Ghz for the entire test period?

                    If your argument was to show that benchmarkdotnet returns vastly different results for SHA256 computations on AMD and Intel, then that is a fair point. But completely irrelevant to this topic. We don't even use SHA256 for our single threaded test.

                    Comment


                    • #55
                      Originally posted by David (PassMark) View Post

                      Doesn't really matter. That Crypto library is used by 1000s of software projects. So regardless of if the code if optimal or not, it is what is being used in real life at the moment. And reflecting real life performance is better than having code that super optimised (using techniques that aren't used in normal software).
                      So basically you are going to tell me, that MS .NET Core is not a real-life scenario, which I showed you? Thousands of web applications are based on that technology. So your point is that your particular library, which may or may not use the IA SHA Extensions, is a better view of real-life applications instead of a core of thousands of applications available in the web today?

                      So adding niche AVX512 to the tests is fine (which favors only Intel and works only on Intel), however ensuring that not niche but used in mainstream IA SHA Extensions is not neccessary because for you it is not a real-world scenario?

                      Originally posted by David (PassMark) View Post

                      Again, you can't compare IPC unless the clock speeds are the same.
                      ...
                      Yes, I agree with that. I showed you the benchmarks I was able to do by myself because I have access only to those hardwares currently. However previous members pointed you to other benchmarks and even a quite good YT clip showing you that the IPC is greater on the Zen2 comparing to the Coffe Lake. At worst the IPC is the same on both.

                      As for SHA256 - you told me previously that your encryption tests are using SHA256 and now you are saying that they are not? So why you didn't pick the most common hashing algorithm today for the ,like you said, "real-world" scenario tests? Who then define what is a real-world tests?

                      Speaking to Real-World usage - how often people are compressing vs decompressing? I will tell you - much often people are decompressing. Even Windows is storing things in memory in a compressed form, and much often requires decompression than compression. So why single tests are using only compression, even thou the decompression is a much more real-world scenario?
                      Last edited by proboszcz; 03-24-2020, 03:05 PM.

                      Comment


                      • #56
                        Originally posted by proboszcz View Post
                        So basically you are going to tell me, that MS .NET Core is not a real-life scenario, which I showed you?
                        No. I am saying you can't draw any conclusions about current generation Intel / AMD IPC by looking at the example you provided. One CPU was 7 years old. It wasn't even close to a fair comparison. I never said anything about .NET at all.

                        however ensuring that not niche but used in mainstream IA SHA Extensions is not neccessary because for you it is not a real-world scenario?
                        SHA hardware acceleration is used when available. Some CPUs from both Intel and AMD support it. See this post for some SHA related graphs.
                        And at the moment this is grossly in AMD favour for the multi-threaded test.

                        But, this forum topic is about the single threaded result and SHA is NOT USED AT ALL FOR THE SINGLE THREADED RESULT. So it is irrelevant.

                        So why single tests are using only compression, even thou the decompression is a much more real-world scenario
                        Decompressing is easier and normally disk bound not CPU bound. Getting optimal compression is more CPU intensive.


                        Comment


                        • #57
                          Originally posted by David (PassMark) View Post

                          No. I am saying you can't draw any conclusions about current generation Intel / AMD IPC by looking at the example you provided. One CPU was 7 years old. It wasn't even close to a fair comparison. I never said anything about .NET at all.
                          In my previous post I agreed with you about the IPC comparison. What triggered me was that you told me it is irrelevant to check if the library is really supporting the IA SHA Extensions on AMD processors because even if it is not, then this is a real-world example, which I do not agree at all.

                          I also explained why I showed you the results from the Azure (the same levels of machine which are also priced almost the same but differ only in CPU Vendor). As you may also noticed - my comparison was the real-world case, because when you deploy your application in Azure you are deploying it to a shared environment and want to know what is the performance in that shared environment (not the synthetic one when the CPU is not doing anything else). I repeated my tests several times and the results were quite consistent (not more than 5% difference), so I don't think that the shared environment was introducing the big error to make those comparisons irrelevant, like you implied.

                          Originally posted by David (PassMark) View Post


                          SHA hardware acceleration is used when available. Some CPUs from both Intel and AMD support it. See this post for some SHA related graphs.
                          And at the moment this is grossly in AMD favour for the multi-threaded test.

                          But, this forum topic is about the single threaded result and SHA is NOT USED AT ALL FOR THE SINGLE THREADED RESULT. So it is irrelevant.



                          Decompressing is easier and normally disk bound not CPU bound. Getting optimal compression is more CPU intensive.

                          Thank you for checking the support of SHA Extensions. However can you explain to us - why some workloads are only tested in multithread? I think this will mislead people and cause such discussions to occur, because by doing that, the multithread results can be vastly different than the single thread results and can lead to misunderstandings and strange looking behavior. In my opinion the single thread tests should be performed the same way as the multi thread ones (using the same workloads) but capped only to 1 core. That would result in much more consistent results and will not trigger conspiracy theories...
                          For example - the encryption tests should be a part of the single thread results, because mostly people on their pcs are using the encryption in single threaded manner (mainly when surfing the internet using web browser to establish an SSL/TLS channel).

                          Comment


                          • #58
                            Intel 10xxxX processors now at top of the online single-thread chart reading up to 3614. I loaded the latest individual 10.0.1004 baselines and see only about 2700.
                            For some reason I mistakenly thought those were the new refresh processor leaks... it took me a bit to realize that was not the case.

                            Comment


                            • #59
                              Originally posted by proboszcz View Post
                              That would result in much more consistent results and will not trigger conspiracy theories....
                              Running more tests means a longer run time, and our aim was to have a relatively quick benchmark (as opposed to others on the market that can take hours to run).
                              So we selected representative tests and short test periods. That decision was made around a decade ago. There is obviously a test time / accuracy trade off that we made. It wouldn't matter what we did, people are still going to come up with conspiracy theories as soon as they see a result they don't like, or doesn't favor the CPU they just bought.
                              To really understand CPU performance you need a degree in computer science and a in depth study of the domain. There are only a few people who truly understand x86 assembler, SIMD, NUMA, pipelining, variable alignment, caching, compilers, the windows kernel, branch prediction, microcode, etc.... (and to be clear, some of this stuff we also only half understand). So it is all to easy to explain a complex issue as just being a conspiracy. Plus is makes great click bait for the publishers.



                              Comment


                              • #60
                                Originally posted by CerianK View Post
                                Intel 10xxxX processors now at top of the online single-thread chart reading up to 3614. I loaded the latest individual 10.0.1004 baselines and see only about 2700.
                                For some reason I mistakenly thought those were the new refresh processor leaks... it took me a bit to realize that was not the case.
                                Yes, they do look a bit high.
                                They are pretty rare CPUs however. Looking at the few PTR10 results we have for these CPUs (on the 1004 build), it would indicate that they are going to drop a few places once we get some more results in. Some of the results we got for these CPUs also looked to be overclocked, so that reduces the sample pool even further.

                                Comment

                                Working...
                                X