Announcement

Collapse
No announcement yet.

CPU benchmarks huge changes?

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by WizardOfBoz View Post
    I think its important to understand that when you put a very useful database of stuff online, it's going to be a big hit to a lot of people if you arbitrarily change it. Your "I think we made a mistake" comment reflect an extraordinarily mature outlook, though. My understanding is that
    1) Passmark values had been based upon aggregated hard test results from previous versions. This is DATA (important!)
    2) Passmark has released a new version that is superior in evaluating some of the newer chips which are insanely powerful, but the new version gives different numbers than previous versions
    3) One thought woudl be to use a multiplier to scale all the data from previous versions. This changes it from raw data directly related to tests, to an inference. Probably less useful.

    I would suggest having two columns (for now) on your spreadsheet. One for the old DATA, and another for the new test. Initially you could us a multipler to get an "Inferred version 10 result". One results start coming in from the new version, the "Inferred version 10 result" would be replaced by "Actual version 10 test results". In this way, the old data would be available but the new version data would have a place to go.

    In sum, suggest
    1) Not removing or changing posted data that has been used for years by people (put back the old numbers!)
    2) Add a column for the new benchmarks. Either populate with scaled version 9 data (with a notation stating so) or leave blank. As new data pours in, add the new data in the new column.

    Thanks
    very good post - thank you - PassMark team - read it carefully, please!

    It's really not a good decision to change almost everything (what has been accumulated for years) just per one night.

    Somebody has mentioned that the year would be required PT10 results to take full power - that's OK. My third suggestion (in addition to these two):
    3) starting e.g. just 2021 (not right now) only PT10 database will be mainted/visible (only) - when PT10 will be more/less stable/debugged/finalised.
    I.e, old (historical) results will remain at least for 2020.

    Please, please, reconsider every pro's and contra's of such dramatic shift (so many negative signs/opinions for our friends should not be ignored, isn't it?).

    Thank you.

    Comment


    • #32

      Originally posted by WizardOfBoz View Post

      1) Not removing or changing posted data that has been used for years by people (put back the old numbers!)

      2) Add a column for the new benchmarks. Either populate with scaled version 9 data (with a notation stating so) or leave blank. As new data pours in, add the new data in the new column.

      Thanks
      +1 !

      If you are redirecting passenger flow from a reliable and verified aircraft (v9) to a spaceship under construction (v10),
      why do you continue to display the old “reliable aircraft” sign at the entrance? It's not a truth!

      Now the situation looks similar:
      1) highly qualified programmers have made a super ingenious new testing system,
      2) then a couple of teenagers (from Pakistan and China, working and earning money after studying for food) “scaled” the previous results (v9) ...

      This causes uncertainty and distrust among users.

      So - Please, Add a NEW column for the new benchmarks!
      Last edited by Ivan Ivanov; Mar-13-2020, 09:51 PM.

      Comment


      • #33
        Originally posted by WizardOfBoz View Post
        Another thought would be to post the old version database (which has enormous amounts of information distilled in it) separately, so folks could access it it they want.
        This has been answered in my first post and then several times more in following posts.

        All the old data is here,
        https://www.cpubenchmark.net/pt9_cpu_list.php

        All the individual V9 baselines that make up these averages (more than a million of them) are also still available from within the PerformanceTest software.

        If we get this same question again because people didn't even bother to read the first post it is going to be deleted.

        Comment


        • #34
          Originally posted by WizardOfBoz View Post
          I think its important to understand that when you put a very useful database of stuff online, it's going to be a big hit to a lot of people if you arbitrarily change it
          There was months of warning new software and new benchmarks were coming. Years actually. But the software was available to the general public for a couple of months.
          We couldn't tell people how it would effect every CPU model in advance as we didn't know. We don't have 2000 different CPU model to test on. We have about 20 models.

          Originally posted by WizardOfBoz View Post
          Passmark has released a new version that is superior in evaluating some of the newer chips which are insanely powerful, but the new version gives different numbers than previous versions
          More or less. The CPUMark number for some of the newer CPUs didn't actually change that much as we re-scaled the new number to match the old one. See my earlier post.

          One thought woudl be to use a multiplier to scale all the data from previous versions.
          See my first post which covered this.

          Originally posted by WizardOfBoz View Post
          I would suggest having two columns (for now) on your spreadsheet. One for the old DATA, and another for the new test
          There is no spreadsheet. But this was covered in my earlier post.

          Originally posted by WizardOfBoz View Post
          The loss of the data was a big hit for me.
          There is no loss. Nothing was deleted. We just moved the old results to a different web page.

          Originally posted by WizardOfBoz View Post
          before the differences between the 3820qm, 3840qm, 3920xm, and 3940xm were pretty clear (like 5 or 10 percent improvement for each jump). The currrent listings show these all having about the same CPUBenchmark!
          I should quote Alberto Brandolini at this point.

          Anyway spent some time looking into this (why I don't know). And the numbers don't back your assertion.

          I don't know what you imagine the numbers are, but here are the real numbers

          PT9 results (CPUMark, Single threaded result)
          Intel Core i7-3820QM @ 2.70GHz 8,397, 1,844
          Intel Core i7-3840QM @ 2.80GHz 8,759, 1,914
          Intel Core i7-3920XM @ 2.90GHz 8,983, 1,963
          Intel Core i7-3940XM @ 3.00GHz 9,133, 1,982


          PT 10 results as at 14/Mar/2020, 3pm. (CPUMark, Single threaded result)
          Intel Core i7-3820QM @ 2.70GHz 5,701, 1878
          Intel Core i7-3840QM @ 2.80GHz 5,694, 1914
          Intel Core i7-3920XM @ 2.90GHz 5,839, 1963
          Intel Core i7-3940XM @ 3.00GHz 5,937, 1982

          PT9 CPUMark differences are,
          4.3%, 2.6% & 1.7% between the 4 CPUs. None of them hit the 10% you claimed. None of them even hit 5%.

          PT10 CPUMark differences are,
          0.1%, 2.5% & 1.7% between the 4 CPUs. Clearly they are not all identical scores as you claimed. But there is one anomaly of 4%.
          This 4% anomaly is due to lack of PT10 samples at this point in time for the 3 faster & rarer CPUs in this bunch.

          PT9 SingleThread differences are,
          3.8%, 2.6%, 1,0%. Again, nothing like the 5% to 10% you claim.

          PT10 SingleThread differences are,
          1.9%, 2.6%, 1.0%. Clearly they are not all the same as you claimed. But to be fair they are super close. But they were always super close.

          It is also worth noting that,
          A) In real life these tiny differences are meaningless. No one is going to notice a 4% performance difference in the CPU. Other factors are way more important. Like disk speed, battery life, etc..
          B) These differences are really below the margin of error for the benchmark. Especially for laptops which have a wide distribution of results due to power saving measures, thermal throttling and bad RAM setups. 100s and 100s of samples are needed before anyone could claim accuracy to around the 1% level.
          C) The numbers are going the bounce around a bit for a few weeks until new averages are found, Like the 4% anomaly above. So there is every chance the numbers will be slightly different tomorrow and different again the day after that.
          D) For those people who want nothing to change. The whole point of releasing a new benchmark is that the new number aren't the same as the old ones. If there were all the same, there would have been no point in releasing new software. The software needs to keep up with modern hardware to remain relevant. If that means a few weeks of minor inconvenience to get to a better place for the next 10 years, then we are prepared to wear that.


          NOTE: I'm happy to address any new issues or major discrepancies, but if new posts are just re-asking the same questions, or are just deliberate (or lazy) misrepresentations I'm going to close the topic.

          Comment


          • #35
            Dear David,

            I have been using Passmark as my go to benchmark for years to compare processors. Thank you and your team for creating such a good software and simple interface.

            The new algorithm makes this difficult. Of course I see the need to benchmark the newest features in processors.
            However not many real software out there make use of them. See for example here for AVX512 https://en.wikipedia.org/wiki/Advanc...sions#Software
            Only 8 in the list use AVX512. It sure is not complete, but it gives an idea.
            It will take a long time till usual software utilizes these features.

            So it would be awesome to have two entries if possible.
            One showing the capabilities of the CPU,
            your new algorithm seems to be suited for that as far as I as an amateur can evaluate that from your descriptions.
            And one showing how current usual demanding tasks perform,
            the previous algorithm seems to be good for that,
            at least I could confirm its rankings with several personal tests with software like h264, 7zip, Video cutting software (kdenlive), image managing software (imagemagick, digikam).

            Maybe you can also take this idea further in future versions.


            Comment


            • #36
              So far, so good!
              Murphy's Law (still) continues to apply:

              Vyshkovsky's theorem:

              Regardless of the units used by the supplier or buyer,
              the manufacturer will use its own units converted to the supplier or buyer units using weird and unnatural conversion factors.

              Click image for larger version

Name:	1fish21.png
Views:	1569
Size:	137.5 KB
ID:	46918


              Approaching stage 5 - "acceptance" (an agreement with an inevitable fate).
              Last edited by Ivan Ivanov; Mar-14-2020, 07:34 PM.

              Comment


              • #37
                Some time ago, User benchmark did the same thing, updated their program to support AVX512 which isn't supported by modern consumer CPU's so why did everything radically changed, well, good to know that PassMark also isn't trustworthy anymore.

                Comment


                • #38
                  Originally posted by macros View Post
                  Dear David,
                  I have been using Passmark as my go to benchmark for years to compare processors. Thank you and your team for creating such a good software and simple interface.
                  Thanks for the positive feedback.

                  Originally posted by macros View Post
                  The new algorithm makes this difficult. Of course I see the need to benchmark the newest features in processors.
                  I don't see why it is difficult. It is no harder than it was last week.

                  Originally posted by macros View Post
                  However not many real software out there make use of them.
                  We disagree. Visual Studio 2019 is in wide use. Out of order code execution occurs commonly. ECC encrpytion is more common than Twofish.

                  Originally posted by macros View Post
                  See for example here for AVX512 https://en.wikipedia.org/wiki/Advanc...sions#Software
                  Only 8 in the list use AVX512. It sure is not complete, but it gives an idea.
                  There are more than 20 packages in that surely incomplete list. Plus some of those items are libraries and compilers. Which are used to build a large collection of other software that isn't on that list. I don't know where you got the 8 packages from.

                  Originally posted by macros View Post
                  It will take a long time till usual software utilizes these features.
                  It's already happened.

                  But it is important to note that the AVX512 part of the benchmark is pretty small. It is portion of the Extended Instructions Test. Which is one of eight tests. If AVX512 isn't available then FMA instructions are still used.

                  For those of you who haven't hand coded SSE, AVX, FMA and AVX512 instructions this is the kind of results to expect.
                  SSE AVX FMA AVX512
                  8,977 16,540 30,713 41,181
                  This was from a i9-7900X and is the number of matrix multiplications per second. We average these numbers in the benchmark by the way. We don't just take the best number.

                  So AVX512 isn't a huge gain over FMA. And AVX512 accounts for only a small part of the overall result of the CPUMark as it is combined with so many other test results. Maybe just a few percent in the final calculation. There is of course no "right" level for inclusion of new code. Different people use different software, so one benchmark can't reflect the needs of everyone.

                  Important: AVX512 plays NO PART AT ALL in the single threaded test.

                  And this also explains why AMD chips are the current leaders in the charts, despite not having AVX512.
                  AVX512 isn't that important, but we have to build something to last for 10 years, so we felt the need to include it to some degree.

                  Originally posted by macros View Post
                  So it would be awesome to have two entries if possible.
                  One showing the capabilities of the CPU,
                  your new algorithm seems to be suited for that as far as I as an amateur can evaluate that from your descriptions.
                  And one showing how current usual demanding tasks perform,
                  the previous algorithm seems to be good for that,
                  Way to confusing. You can't really show the capabilities of a CPU with a single number. If you want the old data it is still available, as posted multiple times above.
                  I can't imagine the pain in trying to explain this to people 20 times a day (most of whom struggle to understand even the concept of single thread / multi-thread) .

                  Comment


                  • #39
                    Originally posted by EnterpriseNL View Post
                    Some time ago, User benchmark did the same thing, updated their program to support AVX512 which isn't supported by modern consumer CPU's so why did everything radically changed, well, good to know that PassMark also isn't trustworthy anymore.
                    I've no idea what User benchmark did. Not sure how it is relevant. We don't talk to them. Don't even know who the people behind it are.

                    Unfortunately as per some of the previous posts, this is poorly researched and just wrong.
                    AVX-512 is now in consumer CPUs. The entire Ice-lake series supports it, plus others. These are Laptops from HP, Lenovo, Acer, etc... How are these not consumer CPUs?

                    Allowing AVX-512 to influence the result by a couple of percent isn't a radical change. We've been adding new instruction support in each software release for the last 21 years, but always with care to ensure it doesn't dominate the results. It is amazing people are complaining about this without actually looking a the numbers to see what the influence really is. It's very hard to take people seriously when they are just parroting what some random 11 year kid said anonymously. It really isn't a big deal and anyone that actually looked at the numbers would see that.





                    Comment


                    • #40
                      There were a few other posts that were just fact free rants, misinformation and repetition which I have deleted.

                      Happy to have a discussion on the facts but it is a waste to time to just keep feeding the trolls.

                      Free free to contact us directly if you have a legitimate issue with the site or software.

                      Also: Today we did a new patch release that very slightly changes the single threaded score. So that should resolve some of the concerns on the single threaded score. Details of the change are here.


                      Topic is closed.

                      Comment

                      Working...
                      X