Announcement

Collapse
No announcement yet.

Benchmarking Virtual Machines (A Recommended Tool?)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Benchmarking Virtual Machines (A Recommended Tool?)

    Is PassMark planning or working with anyone to make PerformanceTest (PT) give "better" (i.e., more consistent, etc) results when running inside virtual machines (VMs)?

    Searching through the forums, I found only a handful of posts -- all a little dated -- on using PT for VMs, mostly with a tone that PassMark does not consider VMs to be very important platforms for benchmark testing. I could not disagree more. In fact, I would argue that VMs are the premier platform for benchmark testing today. Incidentally: I recently found at VMware a technical white paper titled "Performance Best Practices and Benchmarking Guidelines" wherein the author(s) recommend using PT in several categories (links below). To further my point, I think it is also important to point out that SPEC (spec.org) is currently in the process of developing their own virtualization benchmark.
    I, for one, have always really liked PT: the broad suite of tests, simple data points, the pretty charts, scripted runs... and I like the idea of running the exact same set of tests on my VMs that I use on my physical machines. I also like knowing that I am seeing a real "apples-to-apples comparison" when those results are put next to available baselines (created by me or by others in the community of users).
    I tried to do that very thing (compare results from physical machines with those from various VMs -- configured with equivalent resources) but was disappointed by the inconsistencies. (e.g., some VMs were reported to out perform the host hardware) Is there any chance PassMark will take another look at PT in VMs?

    As an aside, I am also very curious: how many other people out there are using PT to benchmark VMs and in which host environment(s)? (e.g.: VMware, Xen, Hyper-V, etc)

    - Thanks.

    http://www.vmware.com/resources/techresources/1061
    (direct: http://www.vmware.com/pdf/VI3.5_Performance.pdf, pg.36)

  • #2
    The software works on most VM's. We were aware of a few cases where PT would totally fail to start up. But this was due to bugs and incomplete emulation of real hardware in the VM. With the newer VM's this type of problem seems to be disappearing.

    We are also aware of some VM's performing badly (or not at all) in some of the tests. Like the 2D and 3D tests. This seems mostly due to the unavailability of the good video card device drivers and/ or DirectX in the VM, or the guest O/S being emulated in such as way as it is not identical to the same O/S on real hardware.

    We also also ware of some VM's in some circumstances getting high disk results, compared to the native hardware. But this is just because of caching issues and the results reported by PT still seem to be accurate (even if the user doesn't understand that caching is going on in the VM).

    There is a difference between people not understanding the results and the results being wrong / inconsistent.

    We are not aware of any real wrong results in VM's. But the results can certainly be tricky to compare.

    Comment


    • #3
      I understand that the software "works" in VMs -- I did run it, and I was able to compare the results across my test targets. I am wondering about making it work "better" in VMs.

      Also, I understand that the video performance (if the tests even run) will likely be poor, relative to real h/w w/ a GPU; to be fair, these machines were intended to be in "server"-roles anyway, so video performance was not something that mattered for me.

      The inconsistent results I saw, I know, were a direct result of the tool being run in a VM. Across the board the tests (esp. the CPU set) were affected by the time-slice issues; in such cases, the results from multiple runs on the same VM came back with scores that more than doubled or halved... one set of numbers was an order of magnitude higher than the others, which actually put it above what the host itself should have scored.

      I consider all of these things to be side-effects of virtualization, not problems with PT. However, because I want to be able to continue using PT in VMs with the same confidence I can on physical machines, I am wondering if PassMark is doing anything to bring those "tricky" results back to where I (and others like me) can actually use their empirical values (instead of subjective trends and inferences). I agree that "there is a difference between people not understanding the results and the results being wrong," and I have not seen any "real wrong results" either. I still say, however, that the results from VMs are inconsistent with the results from physical machines: I cannot trust the numbers to be within an accepted tolerance for repeated tests and I cannot compare them directly with results from physical machines.

      Especially with a direct recommendation from VMware, I was just hoping for better news.

      Comment


      • #4
        We are not aware of any instance where a VM guest O/S will outperform the host O/S for the CPU test. If this is what you saw, then I think it must be a configuration or methodology issue.

        While I know what a time slice is in terms of O/S task scheduling, I am not sure why you think this would imapct the results. Time slices are (or should be) measured in milliseconds, while the duration of the PT CPU tests should be measured in seconds. So an extra time slice here or there shouldn't have any significant impact on the results.

        Check,
        1) The CPU numbers for the host O/S reflect comparable results. Maybe the numbers on the host O/S are low (rather than the guest being high).
        2) Increase the test duration from the preferences window
        3) Check the number of processes in the prefernces window is at least as high as the number of execution units you have (eg. 4 on a quad core CPU, 8 on a quad code with hyperthreading, 32 on a 8 way system with quad cores, etc..).

        Comment

        Working...
        X