We are planning on making some changes to the GPU benchmark charts over the next couple of days to improve their accuracy but with a consequence of (temporally) reducing how comprehensive they are.
As background.
The charts are made up of average results for each video card type. When we originally made the charts public we used results from PerformanceTest V7. For popular GPUs types we had up to 10,000 samples to average. For rare GPUs maybe only a few samples to make up an average.
When PerformanceTest V8 was released (Oct 2012) we didn't have a huge number of benchmark results collected. Only about 3000 submissions at launch. For comparison, for PerformanceTest V7 we had around 600,000 submissions over preceding few years.
We want the charts to have a comprehensive list of video cards, so on the PT8 release date we decided to combine the results from PerformanceTest V7 and PerformanceTest V8 to make up the charts. The plan was that over time the PerformanceTest V8 result will start to dominate and displace the V7 results. The method for doing this will be to take the final average V7 result for each CPU and use this as 1 sample for each type of GPU. Then as more V8 samples arrive this single V7 result will be averaged with all the V8 results. Effectively diluting the V7 result into insignificance over time.
The impact of this can be seen in the charts, the sample count (which you can see via mouse over on the charts) will reflect the V8 sample count (+1 for the single V7 sample).
Although we made a deliberate effort to make the PT7 and PT8 3D results comparable, it is an impossible job in the end. PT8 uses DirectX11, Direct compute, Higher resolutions, etc.. So the results for every video card is never going to be the same as for PT7 which didn't know about any of this.
The Consequences (5 months on)
The plan worked, mostly. Since the release of PT8 we have collected around 65,000 baseline results with new submissions arriving at the rate of about 20 per hour, 24 hours a day.
For popular GPUs the charts are totally dominated by the PT8 results. However for some of the very rare and very old video cards we still don't have any PT8 results. Meaning that the charts are now really a mix of PT7 and PT8 results. With rare cards displaying the PT7 result and popular cards displaying the PT8 result. Which isn't really desirable as the accuracy of the benchmarks for old rare cards is now pretty poor (see the comments on comparability above).
As extreme example of this can be seen with the now rare, Radeon X1800 CrossFire Edition and Radeon X1900 CrossFire Edition cards. These cards did better in PerformanceTest V7 than in V8. So we have results like this,
Radeon X1800 CrossFire Edition, 1066 3DMark, 1 sample, made up of only PT7 results.
Radeon X1800 GTO, 237 3DMark, 6 samples, made up of mostly PT8 results. (the PT7 result was ~700)
These cards should be ranked much closer together.
What's changing
We are going to remove all the PT7 results from the video charts.
Older PT7 results will be filed away on this page as a static list never to be updated again,
This means many of the older and rarer cards are going to drop off the main charts. So the charts will be less comprehensive than they are now, but more accurate. Our estimate is that we'll have around 800 different types of video cards in the list once the update in done in the next couple of days (down from 1900). This is probably only around 40% of all the video card ever released for Windows based PCs however. As time goes on we should get more submissions and the list should grow in size again. There are some cards that will probably never make a re-appearance however like the classic S3 ProSavage, RIVA TNT & Rage Fury.
As background.
The charts are made up of average results for each video card type. When we originally made the charts public we used results from PerformanceTest V7. For popular GPUs types we had up to 10,000 samples to average. For rare GPUs maybe only a few samples to make up an average.
When PerformanceTest V8 was released (Oct 2012) we didn't have a huge number of benchmark results collected. Only about 3000 submissions at launch. For comparison, for PerformanceTest V7 we had around 600,000 submissions over preceding few years.
We want the charts to have a comprehensive list of video cards, so on the PT8 release date we decided to combine the results from PerformanceTest V7 and PerformanceTest V8 to make up the charts. The plan was that over time the PerformanceTest V8 result will start to dominate and displace the V7 results. The method for doing this will be to take the final average V7 result for each CPU and use this as 1 sample for each type of GPU. Then as more V8 samples arrive this single V7 result will be averaged with all the V8 results. Effectively diluting the V7 result into insignificance over time.
The impact of this can be seen in the charts, the sample count (which you can see via mouse over on the charts) will reflect the V8 sample count (+1 for the single V7 sample).
Although we made a deliberate effort to make the PT7 and PT8 3D results comparable, it is an impossible job in the end. PT8 uses DirectX11, Direct compute, Higher resolutions, etc.. So the results for every video card is never going to be the same as for PT7 which didn't know about any of this.
The Consequences (5 months on)
The plan worked, mostly. Since the release of PT8 we have collected around 65,000 baseline results with new submissions arriving at the rate of about 20 per hour, 24 hours a day.
For popular GPUs the charts are totally dominated by the PT8 results. However for some of the very rare and very old video cards we still don't have any PT8 results. Meaning that the charts are now really a mix of PT7 and PT8 results. With rare cards displaying the PT7 result and popular cards displaying the PT8 result. Which isn't really desirable as the accuracy of the benchmarks for old rare cards is now pretty poor (see the comments on comparability above).
As extreme example of this can be seen with the now rare, Radeon X1800 CrossFire Edition and Radeon X1900 CrossFire Edition cards. These cards did better in PerformanceTest V7 than in V8. So we have results like this,
Radeon X1800 CrossFire Edition, 1066 3DMark, 1 sample, made up of only PT7 results.
Radeon X1800 GTO, 237 3DMark, 6 samples, made up of mostly PT8 results. (the PT7 result was ~700)
These cards should be ranked much closer together.
What's changing
We are going to remove all the PT7 results from the video charts.
Older PT7 results will be filed away on this page as a static list never to be updated again,
This means many of the older and rarer cards are going to drop off the main charts. So the charts will be less comprehensive than they are now, but more accurate. Our estimate is that we'll have around 800 different types of video cards in the list once the update in done in the next couple of days (down from 1900). This is probably only around 40% of all the video card ever released for Windows based PCs however. As time goes on we should get more submissions and the list should grow in size again. There are some cards that will probably never make a re-appearance however like the classic S3 ProSavage, RIVA TNT & Rage Fury.
Comment