Announcement

Collapse
No announcement yet.

Single Thread Score rating

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by BenchmarkManiac View Post
    Ok. Now with more sample data there is no weird spikes among AMD CPU. The fact that ThreadReapers perform better than the boosted to bigger frequency 3950X... one can claim that this is due to better thermal design of threadreapers and the test is slow and very tough. etc etc.

    All your explanation of how Zen 2 is worse than Intel just can't explain why they are still better multithreaded. 9900KS just can't perform worse than 3800X then. It is boosted to 5 GHz all cores and has the same number of cores. If it is 20% faster in ST it should just rip 3800X to shreds multithreaded. So why it lose to 3800X in multithreaded test then?
    It looks like Zen cpus have much better SMT and multi-core scaling than Intel's, which isn't new if you check other threaded apps. So a 8 core 16 threads 3800 has around 9x score scaling from the single thread value meanwhile 9900K with the same core amount sits at 6.7x.
    Both aren't scoring 8x as the linear core count suggest, but a bit above or below as the underling architecture shares resources or not.

    For example one thread on that Intel processor can access all the L3 so that explains the better score than even other Intel's 5GHz CPUs (8086k with 2800 points).
    Each Zen2 core can access half the L3 on die, with higher latency too, but that means with more cores under work it scales better as there are more resources left available: it's made for server workloads and scales very well accordingly.

    With that said the 20% difference in single thread is somewhat misleading as most high end AMD processors today are much closer to Intel's on many benches as other have pointed out. Maybe it would be better to point out what individual sub-test favour each architecture?

    Comment


    • #17
      Originally posted by David (PassMark) View Post
      So from the results above it seems in V9 we claimed new generation 3000 series was 40% faster for single threaded. Clearly this wasn't very reflective of real life. The 3000 series wasn't 40% faster, except maybe in a few edge cases. Now with V10 results we are claiming the 3000 series is 13% faster than the 2000 series. Which seems a much more plausible number and more reflective of real life applications. So I think from that point of view the new results are an definite improvement
      Hi David,

      first of all let me say I appreciate your efforts to update the CPU benchmark to better reflect today's software infrastructures. I work as software developer for a hardware company and we use PassMark as a quick comparison of CPUs. But I feel version 10 isn't very useful to us. I know that such heavy changes are not easy and cause a lot of fuss you have to deal with. So, I really hope you carefully listen to the community to improve the new version.

      I don't know what's going on but something definitely isn't working as it should. And don't get me wrong, I'm not interested in any AMD vs Intel discussions. I just care about the differences between CPUs and that it should be plausible. Which isn't the case with the new version.

      Let me give you an example, I own a Ryzen 5 1600 and recently upgraded to the Ryzen 5 3600. Nothing else changed on my system. Except the memory now can run at 3200 instead of 2666. I did some quick benchmarks (CPU-Z, Fritz Chess, Linpack, etc) and saw an improvement of 30-35% core for core. This is absolutely in line with the specs (up to 16-17% more boost clocks) and what AMD claimed for the Zen 2 architecture (+15% IPC). And this is also confirmed by many single core tests you can find on the Internet. Some show even improvements of about 40%.

      Now let's have a look at your v10 single thread numbers:

      AMD Ryzen 5 1600: 1934
      AMD Ryzen 5 3600: 2370

      This is an advantage of just 22-23% for the Ryzen 3600. So no, I cannot agree with you. At the moment the new version isn't more reflective of real life applications.

      Let's look back at the v9 single thread numbers:

      AMD Ryzen 5 1600: 1822
      AMD Ryzen 5 3600: 2804

      This is an advantage of almost 54% for the Ryzen 3600. This of course isn't very reflective of real life applications as well. The question is, was the Ryzen 5 1600 rated too low or the Ryzen 5 3600 too high before? Probably people do not care much about if newer hardware is shown in a better light than it should. But they care if newer hardware "mysteriously" lose some teeth for no apparent reason.

      Anyway, something between v9 and v10 scores would be the most reflective solution in my opinion. For the moment the comparison chart doesn't look very reliable to me. And I don't feel it will change that much with more submitted results. It needs more changes.

      Comment


      • #18
        Originally posted by BenchmarkManiac View Post
        All your explanation of how Zen 2 is worse than Intel just can't explain why they are still better multithreaded. 9900KS just can't perform worse than 3800X then. It is boosted to 5 GHz all cores and has the same number of cores. If it is 20% faster in ST it should just rip 3800X to shreds multithreaded. So why it lose to 3800X in multithreaded test then?
        First of all, the i9-9900KS shouldn't be 20% ahead of Ryzen 7 3800X single core. Zen 2 has a little IPC advantage over Coffee Lake. While the i9-9900KS has a single core boost advantage of ~11% over the Ryzen 7 3800X. So, in fact the difference between both should be under 10% single core. Multithreaded is another story. It also depends on SMT. I don't known the current situation. But when the first Zen was released it showed better SMT scaling than Skylake. Zen 2 shouldn't be worse. And Coffee Lake basically still is Skylake. So, I would expect AMD to improve in multithreaded scenarios compared to single thread.

        Comment


        • #19
          Take a look at Cinebench standings for example. In their single core test Threadripers do not outperform boosted 3950X. In multicore test 9900KS is indeed faster than the 3800X. The results looks consistent and predictable.

          Comment


          • #20
            Originally posted by David (PassMark) View Post
            Next there is the question of should both the 2000 and 3000 series be higher up the new PT10 single thread rankings (i.e. above the best Intel CPUs)?
            There is no right answer for this as performance really depends on the particular application you are running. The best Intel and AMD CPUs are close enough at the moment that you can make a case either to be faster based on the applications you choose to run.

            Looking at the current results today we have.
            (These will change slightly over the coming weeks however)

            Intel Core i7-9700K @ 3.60GHz
            PT10 Single threaded result: 2812

            AMD Ryzen 7 2700X
            PT10 Single threaded result: 2182 (22% slower)

            AMD Ryzen 5 2600X
            PT10 Single threaded result: 2142 (23% slower).

            How does this compare to real life? Here is a result from Tom's Hardware running the POV-RAY application, single threaded.

            Click image for larger version

Name:	SingleThreadPT10-1.png
Views:	18465
Size:	199.8 KB
ID:	46875

            So for this real life app we see the following
            (remember lower score are better for this POV rendering test)

            Intel Core i7-9700K @ 3.60GHz
            POV-Ray result: 505

            AMD Ryzen 7 2700X
            POV-Ray result: 633 (25% slower)

            AMD Ryzen 5 2600X
            POV-Ray result: 636 (26% slower)

            So these results line up pretty well with PT10.

            Cinebench (single threaded) gives results of 16% and 18%.
            Y-Cruncher (single threaded) gives results of 88% and 95% (big difference is due to AVX instructions in Intel's CPU)

            So if we accept that the Ryzen 3000 series gives +13% performance over the 2000 series, then that still puts them around 10 to 15% behind the best Intel CPUs.

            I am sure it is possible cheery pick counter examples, but hopefully the majority of people will see the new results as an improvement over what we had.

            This doesn't totally address everything however. We still have a problem presenting results for which we have zero PT10 samples (the rare and old CPUs). We are looking at that problem at the moment. Hopefully we'll have a reasonable solution in the next couple of days and get those rare units into their correct rankings.



            And after cpu-z, usebenchmark now also passmark nerf ryzen performance...and David you can see on HwGeek comment that fast POV-RAY single thread cpu is ryzen 3950x next time choose POV-RAY result with ryzen 3000....

            Comment


            • #21
              Does PassMark v10 uses IS SHA Extensions when available when testing? Those instructions are supported by almost all cryptographic libraries for SHA1 and SHA256 hashes but are present only in AMD Zen based processors and only in a very small amount of Intel Based processors.
              If it is not then addition of AVX512 is a clear favor of only Intel based processors for the benchmark...

              The results for single thread, like others pointed out, are now a clear favor of Intel only processors. Other benchmarks, like Cinebench R20 clearly shows that AMD Zen 2 base processors have a lead in IPC over the Intel in Single Thread. But now from PassMark v10 we do not see that. Why these changes to the benchmark favors only one vendor? This is suspicious in the same way the UserBenchmark just after Zen2 launch changed their benchmark point calculations to favor only one vendor - Intel of course.

              Why such things happen and why always Intel is being favored? Why benchmarks must be altered when AMD launches new architecture and when Intel does so, there is no such need?

              Comment


              • #22
                Originally posted by David (PassMark) View Post
                Hi,

                We released a new version of PerformanceTest a few days ago, version 10.
                Improvements in the benchmark test algorithms & using a more modern compiler resulted the single threaded test performing a much higher number of operations per second. These changes should push the CPU harder and use modern CPU features (out of order execution and multiple pipelines) better. The result was roughly 3x times more operations per second being performed, compared to PerformanceTest V9.

                Yesterday we started to switch over the graphs on the web site to start to use results from PerformanceTest V10. This accounts for the change in the results in the graphs.

                However in hindsight we think have done the wrong thing. We should had scaled down the PT10 single threaded result to match the PT9 results for the single threaded test. This single threaded test was already an average of values from several different single threaded algorithms. So additional scaling wouldn’t have changed the significance of the value.

                On Monday (9th March 2020) we plan to patch the version 10 release to scale the single threaded value back to the PT9 results. Things should then be back to normal. In the meantime we have reverted the single threaded graph on the web site to use only PT9 results.

                As we collect more PT10 results we expect PT10 to perform better on modern CPUs compared to older ones (relative to PT9). So overtime there might be a spreading out of the single threaded results, with the newer hardware pulling away from the older hardware a bit more.

                Sorry for any confusion all this has caused.

                More info:
                See also this post for some additional details
                https://www.passmark.com/forum/pc-ha...s-huge-changes
                Mr.David,
                in my opinion there is another issue with Multithread Scores.

                Good thing is that there are better scores for multi-cores processors but something happened with ALL AMD processors older than Zen core. These scores are (in average) lower by 33% than previous. It could be even possible if the previous algorithm was inproper but there are some very suspicious records - especially if you are compare 2 cores or 1 core CPU scores. My examples (but there are much more on the site) from my observations:

                AMD Athlon X2 370K Dual Core: previous single thread score: 1461, actual single thread score: 1461, previous multithread score: 2251, actual multithread score: 1463 = 65% of previous score - it's worth to mention that it's the same score as single thread figure - looks like algorithms didn't use 2 cores....

                AMD Opteron 150 Single Core: previous single thread score: 725, actual single thread score: 725, previous multithread score: 604, actual multithread score: 393 = 65% of previous score - in this case multithread score is on the level of 54% of single thread....

                I could be wrong but for me it's some error in algorithm for all AMD older processors which declines the scores about 33%.

                Best Regards,
                Darek

                Comment


                • #23
                  Originally posted by David (PassMark) View Post
                  So for this real life app we see the following
                  (remember lower score are better for this POV rendering test)

                  Intel Core i7-9700K @ 3.60GHz
                  POV-Ray result: 505

                  AMD Ryzen 7 2700X
                  POV-Ray result: 633 (25% slower)

                  AMD Ryzen 5 2600X
                  POV-Ray result: 636 (26% slower)

                  So these results line up pretty well with PT10.

                  Cinebench (single threaded) gives results of 16% and 18%.
                  Y-Cruncher (single threaded) gives results of 88% and 95% (big difference is due to AVX instructions in Intel's CPU)

                  So if we accept that the Ryzen 3000 series gives +13% performance over the 2000 series, then that still puts them around 10 to 15% behind the best Intel CPUs.

                  I am sure it is possible cheery pick counter examples, but hopefully the majority of people will see the new results as an improvement over what we had.
                  Originally posted by HwGeek View Post
                  Dear David,
                  As you see the problem Zen 2.0 scores, even if you compare the bench from TH, the Zen 2.0 ST performance is better then Coffee Lake and only the 5Ghz 9900K/S can match it.
                  on AVG the 9900K and Ryzen 3900X/3950X should be in ~5% margin, not ~20% like the PT10 ST scores show.
                  The discrepancy between the data you guys posted is interesting. The problem with a lot of the tests that you guys picked out is they all use AVX. That means 2700X and 2600X are at a huge disadvantage against 9700K, since they don't have 256-bit wide AVX (they split it into two 128-bit instructions). Zen2 does have full 256-bit wide AVX, so the gap closes (or even surpasses) vs Intel in the data HwGeek shows. I think Y-Cruncher chart HwGeek posted shows this the best, with all the last gen ThreadRippers doing worse than any Zen2 proc by far, but 9980XE outperforming everything just because of AVX512.

                  Comment


                  • #24
                    look for example in your own table, there is i7-4770k@3.5GHz and i5-4690k@3.5Ghz, with PT9 these CPUs had 2249 and 2235, which sounds pretty good (same clock and gen), but now with PT10 the i7 is rated at 1962 and the i5 at 2195, back in the days where I buyed this i5 the test scaled with clockspeed when you OCed it, now you cannot use it for a simple comparison in one productline. (these two CPUs are just an example, there are more of these were it doesnt look right at all)
                    I would really like to get more at least linear data in things that can kinda be compared like apples to apples. In the way the ratings are atm you cannot even use it for rough estimation how a system is performing, exept you know how old PT9 looks like and how it changed to PT10 and how this is similar to other benchmarks and how not. And at that point, for me, I will just use another Benchmark, sad because I was used to use this benchmark because it was fast to use and find for each CPU

                    Comment


                    • #25
                      A really interesting update:

                      As background: The single threaded test is an aggregate of the floating point, string sorting and data compression tests (each of them are run in series on 1 core). The compression test uses Crypto++ Gzip (based on the DEFLATE compression algorithm). This tests uses memory buffers totaling about 4MB per core.

                      AMD were kind enough to take a look at the single thread results, pull part the code from PerformanceTest v10 to see what was being executed and contacted us about it.

                      From the 3 sub-tests, the data compression test was pulling the AMD 3000 series down the most (relative to other CPUs).

                      Deeper analysis on the data compression test showed that it wasn't doing as much compression as expected, it was spending an unexpected large portion of its time generating random data to be compressed. Generating random numbers was always part of the test, but it should have been a small part.

                      So this is the interesting part. We compared different Window’s releases for the CPU Compression test. There was a 15% drop in the compression benchmark between Win10 Build 10240 & Win10 Build 18362 (we don't know exactly which patch caused the problem, but speculation is that it was one of the many security fixes). So it seems clear that patches on Windows significantly slowed down the test and the function that became slower was Rand(). But the aim of this test was not to measure the performance of Rand(), nor measure the impact of that security patch. So we decided to change it. We can't have a situation where different Win10 versions so significantly impacts the CPU score. (that might be fine for the 2D score, but not the CPU score)

                      We changed Rand() for minstd_rand. Which is a different random number generator algorithm. Basically a two line change in the code without altering the functionality.

                      Code:
                      +    std::minstd_rand rng(RAND_SEED);
                      -        pbDataBuffer[i] = (rand() % 27) + 96;
                      +        pbDataBuffer[i] = (rng() % 27) + 96;
                      This was the impact on the single threaded test.
                      CPU Model PerformanceTest 10.1003
                      with Rand()
                      PerformanceTest 10.1004
                      with minstd_rand()
                      Increase in benchmark result
                      i7-8700k 2,778 3,003 8.1%
                      i3-4160 1,871 2,075 10.9%
                      FX-8120 1,405 1,652 17.6%
                      Ryzen 9 3900 2,529 3,022 19.5%
                      Ryzen TR 3970X 2,500 2,997 19.9%
                      So all CPUs benefited. But AMD 3000 series got the most benefit.
                      After scaling all the CPU results back down to PT9 levels, the net benefit to the 3000 series should be around 9%. Which should help close the expectation gap people are referring to above. We won't know exactly until we get a few hundred new results come it.

                      But to be clear, the old code was totally valid code & rand() has been the goto function for random number generation for 30+ years. Rand was used everywhere historically. Microsoft have now given it inconsistent performance however.

                      It's a good illustration of just how fickle CPUs are to different code. And to some degree what folly it is to rely on a single benchmark. People need to look at a range of results. It isn't reasonable to expect all benchmarks (or real life apps) to get consistent results across a range of CPUs. Despite the fact that this change should bring us slightly closer to community expectation, we are still of the opinion that diversity of results highlighting different aspects of a CPU is a good thing. (It really isn't a good thing if all benchmarks match Cinebench and POV).

                      This change has been rolled out in PerformanceTest V10 build 1004.

                      Comment


                      • #26
                        Between rolling out the new graphs, PT10 release and dealing with impact of coronavirus today, I didn't get time to read all the posts above.

                        Hopefully they are all polite & factual. Might get to them tomorrow. Software patch probably addresses some of it anyway.

                        Comment


                        • #27
                          You've given a link to standard C++ library rand() function, it is statically linked to the code and shouldn't be different among different Windows versions. If you are using some kind of crypto api Rand() implementation then it can be very very different in performance from one platform to another, it can cause context switches etc and absolutely shouldn't be part of a timed portion of the benchmark.

                          Comment


                          • #28
                            An attempt is being made to follow up with Microsoft to see if we can get them to tell us what they did to Rand() to mess it up. My guess is that internally Rand() calls some Windows API and a lot of the API calls had there performance effected by the spectre, meltdown, etc.. security patches.

                            We had a quick look at the Rand() source code today (the part of it that was public). The core of it is pretty simple (just 2 lines of code in fact). So as BenchmarkManiac correctly pointed out obviously that can't be different from one Windows version to the next. But there is a bunch of extra code managing per thread CRT status structures (100s of lines of code) and it isn't all public. There was far more code for the memory management of the thread state than actually for generating random number. So there is maybe something in that code.

                            The new minstd_rand() function isn't a crypto type random. i.e. it isn't truly random. It is still pseudo random. So no special hardware acceleration & no context switches.

                            If the documentation is to be believed the new function is basically 1 line of code. Being,
                            Code:
                            x = x * 48271 % 2147483647

                            Comment


                            • #29
                              The inconsistent results of "PerformanceTest 10.1003 with Rand()" will be completely removed from the standings calculation formula shortly, right?

                              Comment


                              • #30

                                Yes, should happen tomorrow.
                                (they would be diluted shortly in any case).

                                From calculations this morning (off a fairly small number of build 1004 samples), indicate results will look more like this tomorrow. Ignore the ridiculous number of decimal points, the numbers are nothing like that accurate.


                                Click image for larger version  Name:	SingleThread-Build 1004.png Views:	0 Size:	23.2 KB ID:	46971


                                Comment

                                Working...
                                X