Announcement

Collapse
No announcement yet.

CPU benchmarks huge changes?

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Unfortunally, new Single-Tr. raiting looks like bullshit.

    For example:
    1571 for i3-3220 @ 3.3 MHz
    1836
    for E3-1220 v2 @ 3.1 MHz (3.4 MHz in turbo mode).

    This CPU has same Ivy Bridge techology, so real difference is less 5%.

    IMHO, without old database resuls (from 2012-2019 years) site passmark.com is useless!

    Comment


    • #17
      For E3-1650 change in 50%

      Click image for larger version

Name:	1PM.png
Views:	2642
Size:	165.9 KB
ID:	46831

      Comment


      • #18
        About 5955 and 1756 in new version

        Comment


        • #19
          new Single-Tr. raiting looks like bullshit. For E3-1650 change in 50%
          No it isn't.
          Single thread score was 1948. It is now 1756. So that is a difference of 9.8%.
          Even if you look at the multi-threaded results the difference isn't 50%. (its closer to 30%)
          But E5-1650 is an old CPU now. As pointed out in posts & table above old CPUs are going to suffer a bit with the new algorithms.

          I understand that people don't like seeing their old CPUs made to look suddenly worse. But this is what happens if new code is written that makes use of features that old CPUs don't have.

          You also don't need to use the Wayback machine & take screen shots. As pointed out above a couple of times, V9 results have all be saved here for your viewing pleasure,
          https://www.cpubenchmark.net/pt9_cpu_list.php

          I agree the single thread difference between the i3-3220 & E3-1220 v2 is a bit strange. We'll have a look at this.

          Comment


          • #20
            I understand that people don't like seeing their old CPUs made to look suddenly worse.
            But this is what happens if new code is written that makes use of features that old CPUs don't have.
            It makes no difference to measure the length in inches or centimeters, if the results are repeatable, correct and can be used for comparison.

            Unfortunately, the new passmark v10 results is not connected with the actual hardware structure of the processor
            and therefore (since March 2020) is almost useless for comparison.

            An example

            Click image for larger version  Name:	1PM3.png Views:	0 Size:	32.4 KB ID:	46859

            i3 - 3220 (3.3 Mhz, 2 cores & 4 threads) - raiting 2278 in the new version
            E3-1220 v2 (3.1 Mhz, 4 cores & 4 threads) - raiting 4667 in the new version.
            E5-1620 v2 (3.7 Mhz, 4 cores & 8 threads) - raiting 6179 in the new version.

            Those. according to the new test (at almost the same clock frequencies and exacly same "Ivy Bridge" CPU manufacturing technology):

            1) The hyper trading mode is useless, judging by the difference between (1) and (2).

            2) An increase in the number of threads by 2 (two) times, from 4 to 8, creates a change in productivity + 11%?
            (The frequency of the processor E5-1620 v2 is 20% higher than that of the E3-1220 v2).

            Click image for larger version  Name:	1PM4.png Views:	0 Size:	25.5 KB ID:	46861

            The hyper trading mode is useless, judging by the 3,78% difference between G2140 and i3-3220
            (with exacly same frequency & Ivy Bridge tech.)

            New v10 test results has nothing to do with real hardware
            (and with fact CPU speed and perfomance in usual user's tasks).

            Therefore, the new results are completely devoid of physical and practical meaning in terms of comparison.

            It’s a good and great thing to come up with a new one.

            Please then name the results in a new Passmark 2020 / Passmark v10 and so on - and not use "classic" Passmark name !!!

            And, Show v9 test results for "All threads" and "Single-core" in compare CPU mode (in compare table), please !


            Agree:
            If you take a ruler to measure the size of the same brand that you have been used to for 8 years, and (since March 2020) now it turns out to be rubber instead of metal, which now has 80 centimeters in the first meter and 60 centimeters in the second (or 10 inches in the first foot, and in the second foot it has 7 inches), then this is a completely different measuring ruler (not the same instrument).
            Last edited by Ivan Ivanov; Mar-12-2020, 11:12 AM.

            Comment


            • #21
              2 moderator:
              please delete two my "Unapproved" today's posts
              https://www.passmark.com/forum/pc-ha...6841#post46841
              https://www.passmark.com/forum/pc-ha...6844#post46844

              Comment


              • #22
                Originally posted by David (PassMark) View Post

                There was a few posts about it in the forum. But in the end the major impact was on those algorithms that did a lot of kernel context switches. Which means software that made a lot of short requests to the Windows O/S to do various tasks, suffered the most. So the CPU result didn't move very much as there was nearly no operations performed in the Windows kernel. But the 2D and disk operations did suffer a bit. 2D was probably the worst as it was almost entirely reliant on the O/S to perform operations (e.g. draw some text, bitblt an image).



                If you were happy with the hardware last week there is no reason not to be happy today. The algorithms did need to change, some of them really were based on 20 years old code. We'd surely get more criticism over the next 5 years if we if we didn't change. People would asking why we were not using AVX512 instruction, people asking why we are using a 10 year old physics library, people asking why we are using encryption algorithms that no one uses anymore, people asking why some of the code wasn't using the available compiler optimisation flags, etc....
                Various users and the CPU vendors were already asking these questions years ago in fact.



                Some CPUs will look better as the result of these changes. Some won't. It is just a fact of life that CPUs have different features and architectures and different code benefits different CPUs. The actual number (the CPUMark) isn't really important. It doesn't have a meaningful unit of measurement (like MB/sec or frames per second), so the value of the number doesn't matter. Positive or negative. All that matters is relative performance. So a lower CPUMark isn't bad if all the CPUs you compare it to are also lower. As per the table in my earlier post we tried to avoid big swing for most new CPUs. Despite this being nearly mission impossible, we succeeded in this to a large degree.

                There isn't one "right" benchmark number. You can't summarise the entire performance of a CPU with one number. What's right depends on how you use your computer. For some people, "right" means matching gaming performance. For others it will matching the performance of scientific applications. You can't do this with one number. So some people are always going to get upset and claim the number is wrong.

                We totally get that change brings disruption. It is a enormous amount of work and pain for us as well. But we think it is necessary to remain relevant for the next 10 years.

                If you want to look at the old V9 results to compare, you can find them here
                https://www.cpubenchmark.net/pt9_cpu_list.php

                You can also still download V9 of PerformanceTest to look at all the individual baseline submissions (all 1,200,000 of them)
                ...And this is why you are no longer credible...
                It not whether I am happy or not that is insulting but the fact that you think explaining it away is acceptable....
                you should:
                - show your changes side by side
                -as well as your algorithm change
                - alert/notify in advance a change is coming
                - let folks make their own decision based on information from both algorithms.

                ...nice picture





                Comment


                • #23
                  Unfortunally, Passmark are no longer credible...

                  Click image for larger version  Name:	1PM5.png Views:	0 Size:	25.5 KB ID:	46869

                  Click image for larger version

Name:	1PM6.png
Views:	2924
Size:	27.5 KB
ID:	46870

                  Comment


                  • #24
                    i3 - 3220 (3.3 Mhz, 2 cores & 4 threads) - raiting 2278 in the new version
                    E3-1220 v2 (3.1 Mhz, 4 cores & 4 threads) - raiting 4667 in the new version.
                    E5-1620 v2 (3.7 Mhz, 4 cores & 8 threads) - raiting 6179 in the new version.

                    Those. according to the new test (at almost the same clock frequencies and exacly same "Ivy Bridge" CPU manufacturing technology):

                    1) The hyper trading mode is useless, judging by the difference between (1) and (2).

                    2) An increase in the number of threads by 2 (two) times, from 4 to 8, creates a change in productivity + 11%?
                    (The frequency of the processor E5-1620 v2 is 20% higher than that of the E3-1220 v2).
                    Honestly this comes across as more of a rant than a sensible argument. There is only a couple of us working on this, but we have 3M+ users. So time available to respond to each user is pretty limited. So I'll just focus on the 1st post but the comments apply generally for all CPUs.

                    Super briefly.
                    1. "hyper trading mode" isn't even a thing. I guess you are talking about hyper-threading. But making up terminology (or getting it wrong) doesn't help your credibility.
                    2. You've ignored turbo speeds. The i3-3200 doesn't turbo at all.
                    3. CPUs don't run at Mhz frequencies. They run at Ghz. Typo or ignorance? Either way it doesn't help your credibility
                    4. 3.1Ghz" isn't "almost the same" as 3.7Ghz. It's a 20% difference.
                    5. The benefits of hyper threading (virtual cores) is very variable. In some situations it hurts performance. So 4 physical cores are significantly better than 2 physical + 2 virtual. See my screen shot below. It is also very hard to predict in advance when it will help and when it won't.
                    6. You've completely ignored the massive differences in cache. i3-3220 has just 3MB. While E5-1620V2 has 10MB
                    7. You've ignored differences in number of memory channels (2 vs 4). Probably a good chuck on those 3200 machines were on single channel.
                    8. You've ignored differences in the max memory speed (DDR3 1600 vs DDR 1866)
                    9. You've ignored instruction set differences. For example support of accelerated encryption (AES) in the Xeons.
                    10. As already stated scores are going to move around a bit until new averages are found.


                    Example of virtual cores not helping performance (in a machine with 32 physical cores & 64 virtual).
                    Click image for larger version

Name:	AdvancedCPUTest.png
Views:	2553
Size:	28.8 KB
ID:	46880

                    Comment


                    • #25
                      - show your changes side by side
                      Screen space is a problem for mobile devices. We prefer to leave the old data on a separate page.
                      Plus for 99% of users having two very similar sets of numbers on the same page just adds to confusion and doesn't help them.

                      as well as your algorithm change
                      List of changes is here
                      https://www.passmark.com/products/pe...st/history.php

                      Description of new tests can be found here
                      https://www.cpubenchmark.net/cpu_test_info.html
                      (and in the included User's Guide for PerformanceTest as well)

                      We've only got around 20 test machines here. So we don't know the full impact of a new release until after it is released.

                      alert/notify in advance a change is coming
                      The alpha / beta release was public on our web site site months before the release.

                      I've been tweeting about it since Oct 2019.
                      https://twitter.com/PassMarkInc/stat...348339712?s=20

                      Unfortunately I've only got 600 followers so it probably didn't reach a huge audience, but it isn't from lack of trying.

                      We've Emailed a bunch of reviewer / press people. Some replied. It is in their queue to look at (we are told).

                      let folks make their own decision based on information from both algorithms.
                      All the data is still available. Nothing has been removed. Folks can decide what data they want to use. But we would think that for the majority of users the above discussion is just techno-babble and more (slightly conflicting) data isn't going to help.

                      Comment


                      • #26
                        Thank you, David!

                        English is not my native language, so thank you for the corrections
                        (although I love to think and write in English)

                        Of course, the frequency in GHz... and "hyper threading" is right too ...

                        According to Passmark v10 benchmarks, the Xeon e5-1620 v2 is 30% faster than the Xeon E3-1220 v2 (6179/4667)

                        Given that the clock frequency of the Xeon e5-1620 v2 is 20% higher (3700 MHz or 3.7 GHz)
                        compared to the Xeon E3-1220 v2 (3100 MHz or 3.1 GHz), the performance to frequency ratio is higher by 11 %

                        1) e5-1620 v2 has 8 threads compared to 4 for E3-1220 v2;
                        2) e5-1620 v2 has 1866 Mhz RAM speed compared to 1600MHz E3-1220 v2;
                        3) e5-1620 v2 has 4 channels of access to RAM compared to 2 channels of E3-1220 v2.

                        According to Passmark v9 results (9504/6760) adjusted for the difference in frequencies, the ratio of performance to frequency E5-1620 v2 is higher by 17.6%

                        It seems that the new Passmark v10 “does not like” hyper threading
                        and reduces its influence on the final result by at least 1.6 times compared to Passmark v9 in this case.

                        For a pair of Intel i3 3220/3230/3240 to Pentium 2120/2130/2140 and the average ratio in Passmark v9 = +28% (for hyper threading reason)
                        In the new Passmark v10, this ratio = +14%
                        The effect of hyper treading on the final result is reduced by at least 2 (two) times compared to Passmark v9 in this case.

                        Therefore, at first glance, the new Passmark v10 results wither and testify: gyper threading is almost useless.

                        I do not think that processor manufacturers will be happy to agree with this ...


                        And, Please:
                        Show (additionally) v9 test results for "All threads" and "Single-core" in compare CPU mode (in the table) !
                        Last edited by Ivan Ivanov; Mar-13-2020, 07:49 AM.

                        Comment


                        • #27
                          My personal test results in Passmark v10 for E3-1220 v2 is bit bigger than indicated on site: 4923 vs 4667 (+5%)

                          Click image for larger version  Name:	E3-1220_v2_PM_v09.png Views:	0 Size:	310.2 KB ID:	46895
                          Last edited by Ivan Ivanov; Mar-13-2020, 01:50 PM.

                          Comment


                          • #28
                            I think its important to understand that when you put a very useful database of stuff online, it's going to be a big hit to a lot of people if you arbitrarily change it. Your "I think we made a mistake" comment reflect an extraordinarily mature outlook, though. My understanding is that
                            1) Passmark values had been based upon aggregated hard test results from previous versions. This is DATA (important!)
                            2) Passmark has released a new version that is superior in evaluating some of the newer chips which are insanely powerful, but the new version gives different numbers than previous versions
                            3) One thought woudl be to use a multiplier to scale all the data from previous versions. This changes it from raw data directly related to tests, to an inference. Probably less useful.

                            I would suggest having two columns (for now) on your spreadsheet. One for the old DATA, and another for the new test. Initially you could us a multipler to get an "Inferred version 10 result". One results start coming in from the new version, the "Inferred version 10 result" would be replaced by "Actual version 10 test results". In this way, the old data would be available but the new version data would have a place to go.

                            The loss of the data was a big hit for me. I'm trying to figure out whether to upgrade a CPU, and before the differences between the 3820qm, 3840qm, 3920xm, and 3940xm were pretty clear (like 5 or 10 percent improvement for each jump). The currrent listings show these all having about the same CPUBenchmark! Where did the approximately 20% difference from 3820qm to 3940xm go? I'm not even sure what the new numbers represent.

                            In sum, suggest
                            1) Not removing or changing posted data that has been used for years by people (put back the old numbers!)
                            2) Add a column for the new benchmarks. Either populate with scaled version 9 data (with a notation stating so) or leave blank. As new data pours in, add the new data in the new column.

                            Thanks

                            Comment


                            • #29
                              Another thought would be to post the old version database (which has enormous amounts of information distilled in it) separately, so folks could access it it they want.

                              Comment


                              • #30
                                In the new Passmark v10, this ratio = +14%
                                Therefore, at first glance, the new Passmark v10 results wither and testify: gyper threading is almost useless.
                                I don't have time to check all your numbers. But,
                                1) You can't just look at a single CPU model and assume all the other CPUs behave in the same way. It's lazy cherry picking.
                                2) Even if we assume it really is +14% for all CPUs that have Hyperthreading. How is getting a +14% speed improvement useless?
                                3) Moving from 28% to 14% (if that was even correct) isn't "reduced by at least 2 (two) times". It's half.

                                Here's a proper study of Hyper-threading from Dell
                                http://ftp.dell.com/app/4q02-Len.pdf
                                And a quote from it. "Hyper-Threading can improve the performance of some MPI applications, but not all. Depending on the cluster configuration and, most importantly, the nature of the application running on the cluster, performance gains can vary or even be negative"

                                That study found performance gains of around +10% (at best). So by your logic would would be showing 40% higher gains than this real life scenario. But the truth is that it is highly dependent on the code.

                                Somewhat unrelated, but interesting, Intel also suggested turning off Hyperthreading as a result of their Spectre security bugs.

                                Comment

                                Working...
                                X