Announcement

Collapse
No announcement yet.

CPU benchmarks huge changes?

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CPU benchmarks huge changes?

    Hi,

    Today I wanted to check some CPUs benchmarks and was really surprised to see that the charts have changed significantly.
    I'm using Passmark benchmarks for many years and have never seen such changes.
    Here are some examples:


    Passmark few days ago Passmark today Change in %

    i7-2600 8181 5655 - around 30%
    x5670 7800 7795 0%
    e5-2670 12374 10506 - around 20%
    e5-1650 11736 11733 0%
    q9500 3960 2961 - around 30%
    i3-4130 4801 4196 - around 15%

    i7-960 5230 3858 - around 30%
    i7-980 7930 8522 + around 8%

    Difference between i7-960 and i7-980 is more than 120% now??? Really?


    There are many more... And as you can see there are huge differences even for CPUs in same generation...

    What happened? Which are the real numbers, the ones that we have used for many years or the new ones?

    Thank you in advance!

    Best regards,
    Nick

  • #2
    We released a new version of PerformanceTest a few days ago, version 10.

    Yesterday we started to switch over the graphs on the web site to start to use results from PerformanceTest V10 (PT10)

    For the single threaded result there were huge differences, but we are going to fix that up in a couple of days.

    For the CPUMark result there is more of a dilemma.

    We collected millions of benchmark results (baselines) that people sent in over the last few years from PerformanceTest V9 & V8. With millions of results we were able to get a pretty accurate average for each CPU model.

    But for PerformanceTest V10 we did really major changes to the CPU test algorithms. These changes included
    - Using new CPU instructions (e.g. AVX512) only available in modern CPUs.
    - Use a more up to date compiler (Visual Studio 2019 instead of 2013) which also brings some code optimization.
    - Have better support for out of order execution, which is a feature of newer CPUs.
    - Updated the 3rd party libraries we use for some of the tests (including more modern versions of GZip, Crypto++ and Bullet Physics.
    - Fixed up a bunch of bugs that hurt performance (like some variable alignment issues and compiler optimization flags).
    - Completely rewrote some of the tests. e.g. removed TwoFish encryption and replaced it with the more common Elliptic curve encryption.
    - Improving the algorithms to push more data through the CPU also results in more load on the cache and memory subsystem. So older CPUs, those with inadequate cache or memory bandwidth are expected not to perform so well with PT10.

    So the new individual PT10 results can't at least on the surface be compared to the PT9 results. They are really different. Probably the biggest algorithm overhaul in 20 years.

    BUT for the CPUMark value, which is a combination result derived from the results of all the individual tests, we scaled it back to PT9 levels. So the PT10 CPUMark is somewhat comparable to the PT9 CPUMark.

    Obviously we want to start using the PT10 results on our graphs. But if we wait until we have a million PT10 results, that might take a year. And in the meantime we have no results for any new CPUs on PT9, as noboby will be using PT9 anymore.

    So the solution we selected (the least worst solution from the collection bad solutions) was to take all the average CPUMark values from PT9 (one value per CPU model) and then start averaging that with all the new PT10 results as they come in. So what this means, especially for the first few weeks is a lot of volatility as the graphs slowly move to reflect more of the PT10 result and less of the PT9 results. Initial PT10 results have a big impact, but each additional PT10 result has less impact as a new average is found.

    If anyone notices any really extreme moves, let us know, we can manually fix them up until the average gets better.

    Also if you want to look at the old V9 results, you can find them here
    https://www.cpubenchmark.net/pt9_cpu_list.php

    Comment


    • #3

      Also another effect of a new database being used for PerformanceTest V10 is the the percentile figures are going to change.

      We had V8 and V9 running collecting results for many years. So a new modern machine will compare very well against the average machine from the last 5 years.

      PerformanceTest V10 has only been collecting results for a week. So even a brand new machine will look kind of average against all the other relatively new machines that have been submitted in the last week.

      Here is a screen shot from the same machine running PerformanceTest V9 and V10.

      You can see that the percentile figures are down across the board, as now with PT10 you will be comparing your machine against a newer more powerful group of machines.

      Click image for larger version  Name:	Percentiles.png Views:	0 Size:	48.9 KB ID:	46777

      Comment


      • #4
        Hi, David,

        Thanks for your fast reply!

        So as far as I understand first couple of weeks we should expect a lot of glitches and they will be fixed in the time.

        These are the issues that I found yesterday. I'm posting them again because they are not readable in the above post and I was not able to edit it:

        ______________Passmark few days ago_________ Passmark today_________ Change in %

        i7-2600 _____________8181_______________________ 5655______________ - around 30%
        x5670 ______________7800 _______________________7795__________________ 0%
        e5-2670 ____________12374______________________ 10506_____________ - around 20%
        e5-1650 ____________11736______________________ 11733__________________ 0%
        q9500______________ 3960_______________________ 2961______________ - around 30%
        i3-4130_____________ 4801_______________________ 4196______________ - around 15%

        i7-960______________ 5230_______________________ 3858______________ - around 30%
        i7-980______________ 7930________________________8522______________ + around 8%


        Thanks for your cooperation!

        Best regards,
        Nick

        Comment


        • #5
          I would prefer some option to go back to the old number system. The nice thing about the old system was you knew a 1000 rating was an AMD X2 3800/4200 or there abouts (my memory is vague but it was one of those) and I had a lot of experience with that chip for years, so I knew exactly what one did for performance. It was like Celcius, you knew 0 was freezing and 100 was boiling.

          I refreshed numbers on one of my comparisions today, and not only was I like "whoa, those are way off"... I was like, those aren't even in the right ranked ordered... no way a 3.2 8320E low powered chop outclasses an8350
          https://www.cpubenchmark.net/compare...74vs1780vs1781

          With the new system, it's abitrary like Fahrenheit is arbitrary with the numbers having no meaning for me... so I can't say I'm a fan of any changes.
          If it ain't broke, don't fix it? Or at least a checkbox in the top right corner, to go back to the old scores?

          Comment


          • #6
            Hi,
            For example, the score of Intel Xeon E3-1230 V2 was 8868, now it is 7247. It is above 20 percent.

            Originally posted by David (PassMark) View Post
            We released a new version of PerformanceTest a few days ago, version 10.
            If anyone notices any really extreme moves, let us know, we can manually fix them up until the average gets better.

            Comment


            • #7
              My suggestion is to freeze TP9 results, keep then public, and to begin compute TP10 in new page. This transition will make average CPU Mark useless for a long time. The site should have a message explanning in whats is goind on with the performance test. Combine results from TP9 with TP10 was not a good idea in my point of view.

              Comment


              • #8
                We collect around 500 new results a day. So after a week we should have 1000s of new results and the accuracy of the average result should significantly improve (it will take longer for rare CPUs).

                If you want to look at the old V9 results, you can find them here
                https://www.cpubenchmark.net/pt9_cpu_list.php

                It is also worth mentioning that,
                A) We eventually need to put out a new benchmark algorithms to keep up with the hardware & compilers. We've been using nearly the same algorithms for around 8 years now.
                B) It doesn't make sense to release a new benchmark if it returns exactly the same numbers as the old benchmark and all the CPUs remain ranked in the same positions. So if nothing changes with the new release we really haven't been doing our job very well.

                Comment


                • #9
                  Here is a more in depth analysis of the relative movements of various CPU models. I've pulled out some interesting CPU models and those with a higher number of PT10 samples. They are sorted by the percentage difference between PT9 and PT10 results.

                  Some samples get excluded from the average as they are overclocked, or running on the incorrect number of cores for the CPU model.

                  Note that it is very early days and these results could still move around a bit over the next few weeks.

                  As expected the new CPUs don't move around too much, while the older models get punished. This is to be expected as the new test algorithms use some new features only available in new CPUs (see my post above).
                  CPU Model Number samples included in average out of these submitted using PT10. PT10 CPUMark PT9 CPUMark Movement
                  Intel Core i5-1035G4 @ 1.10GHz 2 of 2 9,987.37 9,032.00 11%
                  AMD Ryzen 9 3950X 18 of 20 39,651.49 35,646.00 11%
                  AMD Ryzen 7 2700 13 of 18 16,215.58 15,055.00 8%
                  AMD Ryzen 7 2700X 35 of 45 18,005.79 16,924.00 6%
                  AMD Ryzen 7 1700 11 of 14 14,641.00 13,939.00 5%
                  AMD Ryzen 9 3900X 47 of 55 32,890.65 31,954.00 3%
                  AMD Ryzen 5 1600 10 of 22 12,548.34 12,310.00 2%
                  AMD Ryzen 5 2600X 21 of 24 14,524.80 14,356.00 1%
                  Intel Core i9-9820X @ 3.30GHz 2 of 3 21,332.45 21,091.00 1%
                  AMD Ryzen 5 2600 21 of 28 13,570.40 13,492.00 1%
                  Intel Core i9-7900X @ 3.30GHz 3 of 3 21,974.50 21,869.00 0%
                  Intel Core i9-9900KF @ 3.60GHz 5 of 5 20,214.68 20,196.00 0%
                  AMD Ryzen 5 3500U 11 of 11 7,584.66 7,746.00 -2%
                  Intel Core i9-9900 @ 3.10GHz 3 of 3 18,095.73 18,549.00 -2%
                  AMD Ryzen 7 3750H 4 of 4 8,541.16 8,851.00 -4%
                  AMD Ryzen 5 2500U 8 of 8 7,109.68 7,382.00 -4%
                  AMD Ryzen 7 3700X 46 of 57 22,929.40 23,839.00 -4%
                  AMD Ryzen 7 3800X 19 of 21 23,448.01 24,488.00 -4%
                  Intel Core i7-9800X @ 3.80GHz 2 of 3 18,761.95 19,777.00 -5%
                  Intel Core i9-9900K @ 3.60GHz 35 of 39 19,081.29 20,205.00 -6%
                  AMD Ryzen 3 2200G 10 of 10 6,889.89 7,331.00 -6%
                  Intel Core i5-1035G1 @ 1.00GHz 5 of 5 8,031.25 8,624.00 -7%
                  Intel Core i9-9900KS @ 4.00GHz 3 of 6 19,771.00 21,470.00 -8%
                  Intel Core i5-1035G7 @ 1.20GHz 2 of 2 9,023.19 9,807.00 -8%
                  Intel Core i7-9700F @ 3.00GHz 2 of 2 14,967.00 16,415.00 -9%
                  Intel Core i7-9750HF @ 2.60GHz 2 of 3 13,308.80 14,598.00 -9%
                  Intel Core i7-8665U @ 1.90GHz 6 of 6 7,763.97 8,579.00 -10%
                  Intel Core i7-9700 @ 3.00GHz 7 of 7 14,354.01 15,933.00 -10%
                  AMD Ryzen 5 3600 68 of 79 17,895.22 19,864.00 -10%
                  AMD Ryzen 5 3600X 19 of 21 18,324.52 20,491.00 -11%
                  Intel Core i7-8700 @ 3.20GHz 19 of 19 13,460.50 15,136.00 -11%
                  Intel Core i7-1065G7 @ 1.30GHz 6 of 6 9,235.25 10,472.00 -12%
                  Intel Core i7-8750H @ 2.20GHz 17 of 18 10,879.81 12,388.00 -12%
                  Intel Core i7-8700K @ 3.70GHz 10 of 22 13,963.00 15,936.00 -12%
                  Intel Core i7-9700K @ 3.60GHz 19 of 26 14,931.64 17,213.00 -13%
                  Intel Core i5-9600K @ 3.70GHz 12 of 14 11,581.24 13,533.00 -14%
                  Intel Core i7-7700HQ @ 2.80GHz 11 of 12 7,483.70 8,758.00 -15%
                  Intel Core i7-7700K @ 4.20GHz 8 of 14 10,210.52 11,987.00 -15%
                  AMD Ryzen 5 3500X 3 of 3 13,514.13 15,962.00 -15%
                  Intel Core i7-9750H @ 2.60GHz 29 of 31 11,459.89 13,552.00 -15%
                  Intel Core i5-8265U @ 1.60GHz 14 of 15 6,735.17 7,979.00 -16%
                  Intel Core i7-6700K @ 4.00GHz 13 of 18 9,180.53 11,109.00 -17%
                  Intel Core i5-9400F @ 2.90GHz 14 of 15 9,919.70 12,030.00 -18%
                  Intel Core i7-6700HQ @ 2.60GHz 13 of 13 6,658.94 8,126.00 -18%
                  Intel Core i5-8250U @ 1.60GHz 15 of 16 6,250.10 7,636.00 -18%
                  AMD Ryzen 3 2200U 2 of 2 3,885.91 4,827.00 -19%
                  AMD A9-9420 1 of 2 1,829.87 2,348.00 -22%
                  Intel Core i7-8550U @ 1.80GHz 7 of 9 6,409.11 8,226.00 -22%
                  Intel Core i7-2670QM @ 2.20GHz 2 of 2 4,404.29 5,878.00 -25%
                  Intel Core i7-4790K @ 4.00GHz 19 of 23 8,354.01 11,164.00 -25%
                  AMD FX-6100 Six-Core 3 of 3 4,014.83 5,407.00 -26%
                  AMD FX-8300 Eight-Core 2 of 2 5,765.97 7,780.00 -26%
                  Intel Core i7-4770 @ 3.40GHz 14 of 14 7,226.50 9,779.00 -26%
                  Intel Core i7-4790 @ 3.60GHz 18 of 18 7,151.74 9,991.00 -28%
                  AMD A9-9425 2 of 2 1,742.04 2,450.00 -29%
                  Intel Core i7-7500U @ 2.70GHz 9 of 10 3,580.16 5,116.00 -30%
                  AMD FX-8350 Eight-Core 11 of 21 6,181.59 8,960.00 -31%
                  Intel Core i7-3770 @ 3.40GHz 13 of 14 6,390.64 9,277.00 -31%
                  AMD FX-9590 Eight-Core 2 of 2 6,955.94 10,188.00 -32%
                  AMD FX-6300 Six-Core 6 of 8 4,284.47 6,412.00 -33%
                  Intel Core i7-2600 @ 3.40GHz 9 of 10 5,235.99 8,178.00 -36%
                  AMD Phenom II X4 965 4 of 6 2,550.01 4,174.00 -39%
                  AMD FX-8320 Eight-Core 7 of 7 4,448.95 8,035.00 -45%
                  Intel Core i3-2100 @ 3.10GHz 3 of 3 1,898.44 3,695.00 -49%

                  Comment


                  • #10
                    Originally posted by choppergirl View Post
                    I would prefer some option to go back to the old number system. The nice thing about the old system was you knew a 1000 rating was an AMD X2 3800/4200 or there abouts (my memory is vague but it was one of those) and I had a lot of experience with that chip for years, so I knew exactly what one did for performance. It was like Celcius, you knew 0 was freezing and 100 was boiling.

                    I refreshed numbers on one of my comparisions today, and not only was I like "whoa, those are way off"... I was like, those aren't even in the right ranked ordered... no way a 3.2 8320E low powered chop outclasses an8350
                    https://www.cpubenchmark.net/compare...74vs1780vs1781

                    With the new system, it's abitrary like Fahrenheit is arbitrary with the numbers having no meaning for me... so I can't say I'm a fan of any changes.
                    If it ain't broke, don't fix it? Or at least a checkbox in the top right corner, to go back to the old scores?
                    Originally posted by hespozel View Post
                    My suggestion is to freeze TP9 results, keep then public, and to begin compute TP10 in new page. This transition will make average CPU Mark useless for a long time. The site should have a message explanning in whats is goind on with the performance test. Combine results from TP9 with TP10 was not a good idea in my point of view.
                    I strongly support Choppergirl's and Hespozel's suggest. In my point of view, I think the newest version will not be accurate for a long time, and really many people will check the list everyday. If we can't check the accurate data, it will really bother me. I think the best way is:
                    1. Add a switch here https://www.cpubenchmark.net/socketType.html , so we can deside to check TP9 or TP10 chart. (Default TP9, change Default to TP10 when it's really accurate)
                    2. Add a message on the top of the top of the website to explain what's going on with the performance test.

                    Comment


                    • #11
                      Hello, I had this same exact burning question as the OP. Thanks for all the information.
                      I've been comparing some older laptops to desktop over that past year which I use before selling.
                      I am shocked at the huge differences in 3rd generation and 4th generation CPUs scores now, for e.g. the 4910mq, and 3740qm, especially after I did some practical tests to compare them with a 3570k in a desktop PC I have. The CPU usage if I recall correctly was 24% (3740) 23% (3570) 20% (4900).
                      ​​​​​​​
                      Anyway, I will stay tuned to see how this progresses, as I rely on these scores as a key indicator for my next purchase to use with my music projects.

                      Comment


                      • #12
                        Originally posted by David (PassMark) View Post
                        - Have better support for out of order execution, which is a feature of newer CPUs.
                        Out of order execution is here for decades literally. Starting from the first Pentium in 90th on desktops. It is absolutely not a new feature.

                        The Single Thread performance benchmark is very suspicions. Ryzen 3800X used to be in top 10 and not it lose 500 points and placed 69.

                        Comment


                        • #13
                          There is a discussion here about single thread performance.

                          New CPUs do a much much better job of out of order execution and there are more of them with this feature than in the past.

                          From Wikipedia,
                          "The high logical complexity of the out-of-order technique is the reason that it did not reach mainstream machines until the mid-1990s. Many low-end processors meant for cost-sensitive markets still do not use this paradigm due to the large silicon area required for its implementation"

                          But my statement was more of a general statement to encompass improvements in pipelines, queues, branch prediction and execution units. So for example the integer benchmark test previously didn't allow much out of order execution. This wasn't deliberate, it was just an artifact of the near 20 year code we had for the integer test. So the integer benchmark result was largely dependent on clock speeds. At the same clock speed a Pentium and i9 CPU could get kind of the same result (not exactly but hopefully you get the idea). By restructuring the code some out of order execution is now possible, if the CPU supports it and there are enough integer execution units in the CPU. So the benchmark result is now influenced by both the clock speed and the CPU architecture improvements. (i.e. a CPU with OOO & two or more integer execution unit will do better than before). Meaning newer CPUs will tend to score better.

                          As an example, AMD Ryzen has four 128-bit execution units for floating point and vector operations.

                          I think the newest version will not be accurate for a long time
                          Using the data we have already got. We extrapolated it out today to fill in some of the holes for the rare and old CPUs. This should give them a more reasonable starting point for their new average. So I think it is already looking better today than it was yesterday. While there will surely be a few quirks (and I am sure people will let us know about them) I am fairly confident it will be looking a lot better in the coming weeks.

                          Comment


                          • #14
                            [Moderator] Deleted duplicate earlier version of this post.

                            Editing my last post...

                            I have been using your site for quite some time. It was reported to be a comprehensive, legit, and unbiased site.
                            I have based a lot of purchases and even directed some of my customers to include these benchmark ratings/reports
                            in their purchases. I also run a few media servers and this site was used to help gauge a server build or upgrade.
                            With that said, I don't remember any adjustments of any significance for meltdown or specter patching... at least no reply to my inquiries

                            To have such dramatic swings in results especially in the negative it almost voids some of my purchases and consulting on purchases. (45 -60 since Nov, not much but a lot to me.)
                            Not to mention making the score suspect...
                            This site has now become nice charts and graphs.. but no longer accurate. reliable or even an authority..
                            Very disappointed, no notice no mention of the changes on the site. should be highlighted for the significant changes. but it has all been false reporting and bad information...

                            Good luck everyone.. enjoy the pictures!!

                            Here's a novel idea... add a V10 score column so that folks can see the disparity in scores and criteria (use a hover in the column title, to describe the major contributing factor in V10 [dated])

                            Respectfully,

                            Comment


                            • #15
                              I don't remember any adjustments of any significance for meltdown or specter patching.
                              There was a few posts about it in the forum. But in the end the major impact was on those algorithms that did a lot of kernel context switches. Which means software that made a lot of short requests to the Windows O/S to do various tasks, suffered the most. So the CPU result didn't move very much as there was nearly no operations performed in the Windows kernel. But the 2D and disk operations did suffer a bit. 2D was probably the worst as it was almost entirely reliant on the O/S to perform operations (e.g. draw some text, bitblt an image).

                              To have such dramatic swings in results especially in the negative it almost voids some of my purchases
                              If you were happy with the hardware last week there is no reason not to be happy today. The algorithms did need to change, some of them really were based on 20 years old code. We'd surely get more criticism over the next 5 years if we if we didn't change. People would asking why we were not using AVX512 instruction, people asking why we are using a 10 year old physics library, people asking why we are using encryption algorithms that no one uses anymore, people asking why some of the code wasn't using the available compiler optimisation flags, etc....
                              Various users and the CPU vendors were already asking these questions years ago in fact.

                              To have such dramatic swings in results especially in the negative
                              Some CPUs will look better as the result of these changes. Some won't. It is just a fact of life that CPUs have different features and architectures and different code benefits different CPUs. The actual number (the CPUMark) isn't really important. It doesn't have a meaningful unit of measurement (like MB/sec or frames per second), so the value of the number doesn't matter. Positive or negative. All that matters is relative performance. So a lower CPUMark isn't bad if all the CPUs you compare it to are also lower. As per the table in my earlier post we tried to avoid big swing for most new CPUs. Despite this being nearly mission impossible, we succeeded in this to a large degree.

                              There isn't one "right" benchmark number. You can't summarise the entire performance of a CPU with one number. What's right depends on how you use your computer. For some people, "right" means matching gaming performance. For others it will matching the performance of scientific applications. You can't do this with one number. So some people are always going to get upset and claim the number is wrong.

                              We totally get that change brings disruption. It is a enormous amount of work and pain for us as well. But we think it is necessary to remain relevant for the next 10 years.

                              If you want to look at the old V9 results to compare, you can find them here
                              https://www.cpubenchmark.net/pt9_cpu_list.php

                              You can also still download V9 of PerformanceTest to look at all the individual baseline submissions (all 1,200,000 of them)

                              Comment

                              Working...
                              X