Announcement

Collapse
No announcement yet.

Testing PT

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing PT

    Good Morning,

    I have a customer who is interested in purchasing a bench mark software. They are currently looking at Passmark for the very large enterprise solution.

    In evaluating the passmark software the customer is quite pleased in seeing the results, the ease of use for their users, and how easily comparisons can be made.

    Here is my issue. In order to choose passmark, they need a 3rd party independant evaluation of the benchmark results I am getting. I've used a number of other benchmarking softwares, and I get results that indicate Machine A is Faster than Machine B, however the performance measurements (Whetstones, Dhrystones, Interger Math) for CPU and MB/Sec for disk and Ram do not line up at all.

    Typically out by at least a factor of 10x with Passmark.

    Could anyone who is using Passmark in a corporate world provide me with the method of acceptance you used in getting Passmark recognized as accurate and dependable.

    I appreciate your time in reading this long post.

  • #2
    There is no benchmark "standard". So even for apparently simple tests like disk or RAM speed there are many many different ways such a test could be coded. And just because two benchmarks are implemented differently doesn't mean one must be invalid. Both might be valid, but they might be measuring different things.

    For example if you measure RAM speed here are some options that need to be considered.

    1) Is it a read or write test, or a mix?

    2) Is the read sequential or random or stepped. Typical RAM is much faster when read sequentially. But random reads are also a valid test.

    3) How big is the block of RAM be read. You can get a 10x speed drop moving between cache and main ram.

    4) Does the block spam more than 1 page

    5) Is the block small enough that is get entirely cached

    6) How many read loops are preformed (as this can effect the cache)

    7) Was the read done on a 4 / 8 byte boundary. Unaligned reads can be slow, but it doesn't mean the test is invalid.

    Is 1 byte read at a time, or 4 bytes, or 8 bytes (64bits) read at a time. This can make a huge difference, and all tests are valid.

    9) Is the test just measuring the read speed, or for each cycle, does it read from memory location 1 and store to location 2. i.e. is it a read test or a copy test?

    10) How was the code written. Hand optimized assembly code, or a high level language. This by itself can result in a 5x performance difference. If a high level language was used, then what compiler / interpreter was used.

    11) Were only standard x86 instructions used, or were more exotic SIMD instructions used to gain performance

    12) Was NUMA used

    13) How many test threads were running at the same time. In the case where there are several memory buses available (multiple physical CPUs) this can make a big difference.

    So it is easy to come up with 2 different RAM tests where the results that are 10x different from each other, and still have both tests be valid.

    Comment


    • #3
      Agreed, and thank you for taking the time to respond to my query.

      I wasn't questioning the validity of the tests, or even the results I was getting with Passmark, or even the fact that I was getting a 10x factor differential. I understand that there are no "standard" benchmarks.

      The simple Whetstone and Dhrystone "standard" measurements are not standard as there are areas where code can be modified and still be considered a whetstone or dhrystone.

      The customer loves the interface, loves the graphical nature of the app, loves the intuitiveness of the app. (Passmark)

      However, he cannot accept the app without some third party validations of passmark.

      I was also considering using the delta's of performance between various applications.

      Example:

      On machine model A:


      I turn on Processor Affinity to a single core processor for the passmark thread.
      I up the priority of the Passmark thread to realtime.
      I Set Passmark to Run CPU tests.

      On machine model B: I repeat the tests.

      As these are two differing machines I will get a delta.

      Example:

      Machine B's processor is twice as fast or runs twice as many whetstones/Dhrystones as Machine A.

      100% Increase in speed.

      I do the exact same thing with another piece of software. Can I expect the results to be similar?

      Do I need to extend this to three machines with three pieces of software and determine a much more complex math formula for the validation of Passmark?

      Again thank you for your time.

      Comment


      • #4
        I wouldn't think even relative performance would necessarily be the same between all benchmarks. If your assumption was true, then the consequence of this would be that all benchmarks would rank all computers in the same order. And we know this is not true.

        For example some 3D games favor ATI video cards over nVidia cards. Other games are the reverse. Some media transcoders have been coded to favour Intel CPUs over AMD becuase they use "Quick Sync". But other transcoders don't use "Quick Sync" and thus AMD ranks much higher.

        Both 3D games and various transcoders are regularly used as benchmarks, but they rank CPUs in different orders. Doesn't mean they are "wrong" however.

        Different benchmarks measure different aspects of performance. The trick is to try and find a benchmark that mirrors to some degree how you individually will use your computer.

        For example if you do nothing but play World of Warcraft, then the absolute best benchmark to use would be World of Warcraft itself.

        Comment

        Working...
        X