Announcement

Collapse
No announcement yet.

CPU Mark V9 vs V10 numbers, older CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CPU Mark V9 vs V10 numbers, older CPUs

    I know I'm late to the party on this, but I just looked at the CPU Mark Mega List recently, and noticed the huge changes on V10 scores compared to previous V9 numbers.

    Yes, as has been stated before, some went up, some went down. But overall most went down 30-40% from V9 to V10 especially anything older than a few years or not top of the line, basically those below 10,000 mark where most PCs out there live.

    One particular note, comparing the V9 3/20/20 Mega List to the current V10 7/18/20 Mega List, there are a large number of mostly older CPUs where the CPU Mark number is identical and cannot be real data that should be displayed in the V10 Mega List. I counted 343 CPUs with identical V9 and V10 CPU mark numbers, in addition to the the 226 CPUs which are no longer in the V10 list. Those false numbers or CPUs should probably be removed, or shown as NA since not tested yet. Or perhaps add a column for the V9 CPU mark numbers to the list.

    Just some random examples of the 343 duplicate numbers:
    CPU Name V9 CPU Mark V10 CPU Mark
    AMD Athlon64 X2 Dual Core 4600+ 1108 1108
    AMD Athlon 64 X2 3800+ 959 959
    AMD Athlon 64 X2 Dual Core BE-2300 1082 1082
    AMD Athlon 64 X2 Dual Core BE-2350 1045 1045
    AMD Athlon 64 X2 Dual-Core TK-42 884 884
    AMD Athlon 1500+ 294 294
    AMD Athlon 1640B 633 633
    AMD Opteron 2427 3069 3069
    Intel Atom N2800 @ 1.86GHz 633 633
    Intel Atom S1260 @ 2.00GHz 916 916
    Intel Atom T5700 @ 1.70GHz 2039 2039
    Intel Atom Z520 @ 1.33GHz 240 240
    Intel Pentium 4 Mobile 1.90GHz 210 210
    Intel Pentium 4 Mobile 2.00GHz 203 203
    Intel Pentium D1508 @ 2.20GHz 3813 3813
    Intel Pentium Extreme Edition 955 @ 3.46GHz 912 912
    Intel Pentium Extreme Edition 965 @ 3.73GHz 1155 1155

  • #2
    I only grab the Mega CPU List every few months or so and put it into a spreadsheet, which is why I only noticed now about the drastic change from V9 to V10 CPU benchmark scores. While the numbers have been slowly dropping over the years, nothing was so drastic as the V9 to V10 drop.

    Looking at all CPUs as a whole, and graphing the V9 CPU mark scores sorted low to high and comparing to V10 numbers,(logarithmic scale) we get this:

    Click image for larger version

Name:	_V9 vs V10 CPU chart graph all.png
Views:	1154
Size:	20.8 KB
ID:	48168

    Even with all the "noise" from individual variances, we can clearly see how well the two plots line up. Note the bunching up at the low end is from those many false numbers of 343 duplicate results from V9 and V10 on mostly older CPUs, so lets remove those:

    Click image for larger version

Name:	V9 vs V10 CPU chart graph -dupes.png
Views:	1014
Size:	20.9 KB
ID:	48169

    Now the two plots clearly line up, showing how well the V9 and V10 benchmark results match up across the broad spectrum of CPUs. The only question is the scale or offset. So I said before that it is clear to see that most CPUs dropped about 30-40% with the V10 benchmark. So lets cut that in half and say increase V10 scores by 35% and we get this:

    Click image for larger version

Name:	V9 vs V10 CPU chart graph +35%.png
Views:	1034
Size:	34.5 KB
ID:	48170

    Well, there you go, problem solved! (with all the irate customers) That's a pretty good match from just a blanket 35% increase. Individual changes could probably refine further. Only one CPU actually broke 100000, the AMD Ryzen Threadripper 3990X. With a significant bump on the high end CPUs, as would be expected with new features. The upper middle class about the same, some lower, some higher. And most lower end CPUs have a predictable but more digestible drop.

    Now I realize that you are probably not going to change your algorithm at this point, so this is all in fun. But the CPU Mark score is subjective, and the question of scale or offset is likewise a subjective one. From all those objective test results of integer/float, compression, encryption, time to complete, and all that, you roll that up to a single score somehow. But how you do that is subjective and up for interpretation.

    So why do historical number even matter? I realize the focus should be on newer CPUs and newer technologies, that makes sense from a business standpoint. But that doesn't mean you have to throw away all that history. If history didn't matter, you could come up with a new algorithm every month and say, that's the way it is. One month the AMD Ryzen Threadripper 3990X could have a score of 79872, the next month it could be 20.475, the next it could be 2344609, etc. The historical numbers and CPU comparison loses ALL value at that point, unless you can instantaneously re-test all CPUs, even decades old.

    Comment


    • #3
      We'll have a look at the 343 duplicate numbers.

      Our aim with the scaling of the PT10 numbers was to get the number about the same for the newer CPUs. We knew it was impossible to get it to be the same for all CPUs, so we thought having it roughly the same for new CPUs would cause the least disruption. Of course this meant that the drop for the older CPUs would be large. At the time PT10 was released we only had around 20 data points, so it was always going to be a rough estimate. Now there is ~100,000 results to look at and yes, we probably could have got the scaling slightly more accurate if that data was available 6 months ago. But even with all the data I still think the right decision would be to match the results for the newest CPUs (rather than your proposal of matching the results of 5 year old CPUs).

      The number above are from the CPUMark value. But the most complaints were from the AMD fanboys and the single threaded result. They couldn't believe that any Intel CPU could be faster than AMD's CPUs.

      The older PT9 results remain available here.


      you roll that up to a single score somehow
      Yes, formula is here
      https://www.passmark.com/forum/perfo...-and-disk-mark


      If history didn't matter, you could come up with a new algorithm every month
      We kept the algorithms mostly the same for 8 years prior to PT10. Some of the code was around 22 years old.

      We hope to keep the PT10 algorithms in place for several years, at least.

      For more background see this post.
      https://www.passmark.com/support/per...php#V10Results


      Comment


      • #4
        But the most complaints were from the AMD fanboys and the single threaded result. They couldn't believe that any Intel CPU could be faster than AMD's CPUs.
        Ha, ha. I guess you can never please everyone. My concern was more for the middle class, 1000-10000 range, those high end CPUs are out of my league.

        Oh well, just my $0.02 and and thanks for listening, and clarification.
        Last edited by joecpu; Jul-20-2020, 03:46 AM.

        Comment


        • #5
          Looking at the 7/20/20 CPU Mega list, looks like all the duplicates from V9 to V10 have been removed, revised plot:

          Click image for larger version

Name:	V9 vs V10 CPU chart graph corrected.png
Views:	1041
Size:	19.8 KB
ID:	48184

          ...although, looking at the clean line at the low end, I would guess some of those old cpu numbers are interpolated.

          Comment


          • #6
            In cases where we didn't have results PT10 results for a CPU model we estimated the PT10 result by particular CPU by looking at other CPU from the same CPU Family/Model/Stepping and the PT9 benchmark. This helped get reasonably accurate results for those very rare or very old CPUs where we might never get PT10 results. Where PT10 results are available, they are used.

            But there were a couple of very old CPU families where we didn't have any results at all for the whole CPU family. Those were the duplicates as they were left unscaled. But we've manually fix those small groups up now. And also re-computed all the scaling factors based on all the results to date.

            Example extract of scaling table looks like this,
            Click image for larger version  Name:	scaling.png Views:	0 Size:	6.1 KB ID:	48186

            Comment


            • #7
              Another interesting thing found with V9 vs V10 CPU Mark, looking at a CPU I have in a budget Chromebook, the Intel Celeron N3160, was the slightly older and lower clocked (burst) version Intel Celeron N3150 showed showed higher V10 score, where it should on paper be lower and V9 shows it is lower. This and some other samples of strange decreases within close family members:

              CPU Name V9 CPU Mark V10 CPU Mark %
              AMD A4-4000 APU 1923 1145 -40.46%
              AMD A4-4020 APU 2115 1054 -50.17%
              Intel Atom x5-Z8500 @ 1.44GHz 1677 1254 -25.22%
              Intel Atom x5-Z8550 @ 1.44GHz 1818 1171 -35.59%
              Intel Celeron N3150 @ 1.60GHz 1639 1191 -27.33%
              Intel Celeron N3160 @ 1.60GHz 1686 1166 -30.84%
              Intel Pentium G840 @ 2.80GHz 2485 1306 -47.44%
              Intel Pentium G850 @ 2.90GHz 2670 1243 -53.45%
              Intel Pentium N3530 @ 2.16GHz 1816 1198 -34.03%
              Intel Pentium N3540 @ 2.16GHz 1879 1177 -37.36%

              Is this just bad test sample data, or estimated?

              Comment


              • #8
                The N3160 and N3150 are both very close to each other in both V9 and V10. Just a couple of percentage points. So the margin or error for the benchmark for these CPUs (especially the N3160 which is very rare now on Windows machines) likely exceeds the small performance difference.
                We have a number of V10 results for the N3150, but nothing at the moment for the N3160. So the estimate for the N3160 is likely a couple of percentage points off.

                Both the benchmark numbers and the estimates should get better over time as more V10 results are collected. Eventually only a small amount of estimating should be required, relative to the size of the entire data set.

                Comment


                • #9
                  I too am disappointed about the CPU marks between versions. Just moved from v8. Upgraded without even giving it a thought because I was so satisfied with v8. If I had trialed v10 I probably would not have done the upgrade. I have an inventory of machines going back to 2014 and all that data is now obsolete. Well not really. I can just keep using v8 but....

                  I can appreciate using different (better) algorithms but running v10 on the same machine is so different that the info becomes unusable.

                  I'm guessing there is no way to compare results from different versions.

                  Luckily I kept a copy of v8 and it's key. I'm doing a few upgrades and will be able to compare apples to apples

                  Comment


                  • #10
                    V8 of the software was developed and released in 2011/2012. So we kept it the (nearly) same for a fairly long time.

                    As you pointed out you can continue to use V8 of the software, the software doesn't ever expire, the download is still available on our web site and we could have looked up old license keys for you.

                    As the graphs above so graphically illustrate there was no simple scaling algorithm that would line up all the V8/9 results with the V10 results (and we didn't have the level of detail available in those graphs when V10 was developed, we only had a few data points). The only other alternative is to never change anything.

                    Comment


                    • #11
                      "The only other alternative is to never change anything"
                      Not really a good option otherwise we would still have 8088s or Z80s with a whopping 256k.

                      I guess going forward I can do both. Re-do the ones I still have with v10 and new stuff with both to still have same feel of what I am seeing in v10.

                      I have the keys.

                      Comment

                      Working...
                      X