Announcement

**Ian (PassMark)** · Apr-17-2011, 11:19 PM

The SSE3 and SSE4a tests have been written by Passmark to test a subset of the SSE3 and SSE4a CPU instructions, not all of these CPU instructions.

Specifically the SSE3 test does some SSE3 specific 128-bit float mathematical operations and the SSE4a test does some SSE4a specific 128-bit integer bitwise operations . These tests are repeated in known combinations and the results validated.

If you can't select a CPU test, then it is not supported by your CPU. MMX and 3DNow! are very old. MMX for example is not supported when a CPU is executing in 64-bit mode.

**cnchang** · Apr-25-2011, 07:34 AM

taking SSE4a test as an example

Hi,

Thank you for replying. I selected SSE4a extension test and got a PASS. But, I can't tell either what kind of 128-bit test is actually performed(e.g., video transcoding, prime number crunching, Pi calculating, 3D rendering, ...) or what is the performance difference between running SSE3 and running SSE4a from viewing the attached summary and trace.

By the way, I am not sure if I understand MMX is not supported in 64-bit system. Do you mean enabling MMX or leaving it out to run on x87 FPU codes makes no performance difference in x64?

Thank you.

------- attached log and trace ------

Result summary

Test Start timeMon Apr 25 13:12:17 2011 Test Stop timeMon Apr 25 13:27:22 2011 Test Duration000h 15m 05s TestCyclesOperationsResultErrorsLast ErrorCPU1796 4.656 Trillion PASS0 No errorsTEST RUN PASSED

Detailed event log

EventLOG NOTE: 2011-04-25 13:27:24, System Event, 96207 - Information, Event ID: 0x00000B7C, Source: Service Control Manager (Generated time: 2011-04-25 13:27:23) LOG NOTE: 2011-04-25 13:27:24, System Event, 96207 - Description: Multimedia Class Scheduler A w i J A C LOG NOTE: 2011-04-25 13:27:26, System Event, 96208 - Information, Event ID: 0x00000B7C, Source: Service Control Manager (Generated time: 2011-04-25 13:27:25) LOG NOTE: 2011-04-25 13:27:26, System Event, 96208 - Description: Windows Error Reporting Service A w i J A C LOG NOTE: 2011-04-25 13:29:12, System Event, 96209 - Information, Event ID: 0x0000002A, Source: Microsoft-Windows-Kernel-Power (Generated time: 2011-04-25 13:29:10) LOG NOTE: 2011-04-25 13:29:12, System Event, 96209 - Description: Windows Error Reporting Service A w i J A C LOG NOTE: 2011-04-25 13:29:20, System Event, 96210 - Information, Event ID: 0x00000B82, Source: Service Control Manager (Generated time: 2011-04-25 13:29:19) LOG NOTE: 2011-04-25 13:29:20, System Event, 96210 - Description: TCP/IP NetBIOS Helper A w \ e C w ]: 0x40030011 [ @ ~ t : s u ( w p )] : L LOG NOTE: 2011-04-25 13:29:20, System Event, 96211 - Information, Event ID: 0x00000B7C, Source: Service Control Manager (Generated time: 2011-04-25 13:29:19) LOG NOTE: 2011-04-25 13:29:20, System Event, 96211 - Description: TCP/IP NetBIOS Helper A w i J A C LOG NOTE: 2011-04-25 13:45:48, System Event, 96212 - Information, Event ID: 0x00000001, Source: Microsoft-Windows-Kernel-General (Generated time: 2011-04-25 13:45:46) LOG NOTE: 2011-04-25 13:45:54, System Event, 96212 - Description: \ ~ C LOG NOTE: 2011-04-25 13:45:54, System Event, 96213 - Information, Event ID: 0x00000B7C, Source: Service Control Manager (Generated time: 2011-04-25 13:45:46) LOG NOTE: 2011-04-25 13:45:54, System Event, 96213 - Description: Windows Error Reporting Service A w i J A C LOG NOTE: 2011-04-25 13:45:56, System Event, 96214 - Warning, Event ID: 0x000003F6, Source: Microsoft-Windows-DNS-Client (Generated time: 2011-04-25 13:45:50)
...
...
LOG NOTE: 2011-04-25 14:07:50, System Event, 96250 - Description: Application Experience A w i J A C LOG NOTE: 2011-04-25 14:09:32, System Event, 96251 - Information, Event ID: 0x00000B7C, Source: Service Control Manager (Generated time: 2011-04-25 14:09:30) LOG NOTE: 2011-04-25 14:09:32, System Event, 96251 - Description: Multimedia Class Scheduler A w i J A C

**David (PassMark)** · Apr-25-2011, 08:15 AM

It might help if you could keep the line breaks when pasting text. What you posted is near unreadable, and doesn't seem to be related at all to SSE3 or SSE4a instructions.

By the way, I am not sure if I understand MMX is not supported in 64-bit system

If you don't understand the above explanation then you might want to do some research into MMX & SSE. MMX for example allowed 64bit operations to occur on 32bit systems. But on 64bit systems with newer SSE instructions, it doesn't make much sense to used MMX.

or what is the performance difference between running SSE3 and running SSE4a

BurnInTest is not a benchmark. It doesn't measure relative performance of SSE instructions.

**cnchang** · Apr-25-2011, 02:35 PM

What is SSE3 or SSE4a in a nutshell?

I am sorry the format of pasted trace is not as neat, but I want to fit more data in the allowed 10,000 characters limit so that you can pick a part to decipher. By the way, the pasted text is copied from the SSE4a test result and trace. Run SSE4a or SSE3 test, and you can generate a trace at your preferred format. Then, can you explain what are done(e.g., 3D rendering, Prime or Pi calculation, ...) in these tests from the trace? Would you elaborate more on these, other than 128-bit float mathematical or integer bitwise operations?

When I use TMPGEnc Video Mastering Works for video transcoding in Win 7 64-bit, I can see performance difference when MMX option in CPU/GPU preference is enabled comparing with running x87 only. Of course, enabling SSE can garner more horsepower from CPU as you suggested. That's where my confusion about MMX not supported when a CPU is executing in 64-bit mode comes from.

Thanks.

**David (PassMark)** · Apr-25-2011, 09:46 PM

...10,000 characters limit so that you can pick a part to decipher...

As it turns out we have more interesting ways to spend our time. But a quick look seems to indicate that the log is irrelevant to your question.

The SSE3/4 tests aren't doing 3D, Prime number or PI calculations. They do the SSE equivalent of adding & multiplying numbers together and checking the result is correct. For example here is a section of the source code.

000000014011C48C movapd xmmword ptr [rsi+4120h],xmm7
000000014011C494 addpd xmm8,xmm11
000000014011C499 mulpd xmm14,xmm10
000000014011C49E movapd xmmword ptr [rsi+4100h],xmm0
000000014011C4A6 subpd xmm13,xmm1
000000014011C4AB mulpd xmm6,xmm10
000000014011C4B0 addpd xmm2,xmm4
000000014011C4B4 mulpd xmm9,xmm15
000000014011C4B9 mulpd xmm8,xmm15
000000014011C4BE mulpd xmm13,xmm5
000000014011C4C3 mulpd xmm2,xmm5
000000014011C4C7 movapd xmmword ptr [rsi+0C330h],xmm14
000000014011C4D0 movapd xmmword ptr [rsi+0C310h],xmm6
000000014011C4D8 movapd xmmword ptr [rsi+0C320h],xmm9
000000014011C4E1 movapd xmmword ptr [rsi+0C300h],xmm8
000000014011C4EA movapd xmmword ptr [rsi+4130h],xmm13
000000014011C4F3 movapd xmmword ptr [rsi+4110h],xmm2
000000014011C4FB lea rsi,[rsi+80h]
000000014011C502 add al,4
000000014011C504 jae 000000014011C21E
000000014011C50A lea rsi,[rsi+80h]
000000014011C511 add ah,80h
000000014011C514 jae 000000014011C21E
000000014011C51A lea rsi,[rsi+0C300h]

I can't really comment on what TMPGEncis doing. It isn't our software. But what I am sure of is that there is no MMX option for the GPU and you should be comparing MMX against x86, not x87. MMX was only for integer operations. In fact the CPU registers used for MMX (MM0 to MM7)were also shared with floating point. So you couldn't do both at the same time.

**cnchang** · Apr-26-2011, 10:25 AM

regarding SIMD tests

My understanding is that MMX provides SIMD operations for floating-point operands by using fixed-point arithmetic paradigm to boost performance. The idea is to trade off IEEE FP's minimum 23 bits of precision to a much lower precision where users may not even notice any difference on 3D rendering quality or maybe HD video transcoding quality. Of course some video processing may be interger only. So in some cases, without MMX's integer operation or other streaming SIMD extensions, I still think some floating-point 3D or multimedia applications will need to fall back to x87. Certainly, if the target application is not using floating point to begin with, then it would be a different story.

I would like to know if there are any user-perceivable effects for each SSE test other than adding and multiplying registry data in the background so that users can see or even compare the test results. It would be neat to see how each new extension set can fire up and enhance the SIMD capable test routine's performance. I know Passmark may not intend to make CPU SSE4a BIT act like a performance benchmark tool, but the numbers of cycles/operations are still recorded for each batch. BTW, TMPGEnc can also take advantage of CUDA for video transcoding thus they have these [preferences...] [CPU/GPU] settings.

And last but not the least, I appreciate your precious time and your generous source codes sharing here. I am very impressed with Passmark's excellent tools and willing to recommend your product to all. Actually, I have already used your benchmark site as a reference in my web site.

**David (PassMark)** · Apr-26-2011, 08:22 PM

BurnInTest was designed to test a systems reliability and stability under extended load. It was designed to check that the components of a system are operating correctly. That a given set of inputs produce a known output.

It isn't meant to be a benchmark, we have the PerformanceTest software for benchmarking. Which also makes use of SIMD instructions.

My understanding is that MMX provides SIMD operations for floating-point operands by using fixed-point arithmetic paradigm to boost performance.

MMX is integer only. From Wikipedia,
"The main usage of the MMX instruction set is based on the concept of packed data types, which means that instead of using the whole register for a single 64-bit integer, two 32-bit integers, four 16-bit integers, or eight 8-bit integers may be processed concurrently."

"MMX provides only integer operations. When originally developed, for the Intel i860, the use of integer math made sense (both 2D and 3D calculations required it), but as graphics cards that did much of this became common, integer SIMD in the CPU became somewhat redundant for graphical applications".

**cnchang** · Apr-27-2011, 02:19 AM

You can find how SIMD integer instructions can be used in floating-point oriented apps in "MMX Technology Architecture Overview" ftp://download.intel.com/technology/itj/q31997/pdf/archite.pdf
You can find how standard x87 (i.e. non-SIMD) versions can be used as a fallback in:
http://simdx86.sourceforge.net/

Yes, I do find a [CPU-SSE] category for CPU score in Passmark Performance Test. But, I can't tell how each MMX, SSE, SSE2, SSE3, SSE4a test performed as you can test each SSE extension in BIT. What does CPU-SSE score, like 16.7 million matrices per second, mean in comparison to fps in 3D rendering or seconds in video transcoding?

**David (PassMark)** · Apr-27-2011, 03:16 AM

You can find how integer instructions can be used in floating-point...

Sorry, this is just plain wrong. MMX instructions can not be used for floating point operations, they are integer only. And the Intel document does not describe how to do floating point. The document actually says floating-point units are hardware intensive (and so not implemented). And that only fixed point is possible, and only then as a complicated workaround.

"In fixed-point computation, from the point of view of the
processor architecture, computations are done on integer
values, but programmer/applications interpret the integer
values as fraction values. Some number of leading bits
(determined by the application) are interpreted as an
integer, while the remaining bits of the value are
interpreted as a fraction. It is the application’s
responsibility to perform appropriate shifts in order to
scale the number"

Fixed point is not floating point, as the document points out.
"Industry-standard floating-point (IEEE FP) requires a minimum of 23 bits of precision"

You can't directly relate performance in matrices to 3D frames per second.
I think you are asking for something we can't provide, and maybe no one can provide.

**cnchang** · Apr-27-2011, 04:19 AM

No doubt it is integer operations not floating point operations running inside, but isn't MMX/SSE designed as a fixed-point version lower-precision floating-point emulator so as to help improve media application performance.

Media applications involve working on fraction values,
for example, the use of a weighting coefficient in filtering
averaging, etc. One way to support operations on fraction
values is to provide SIMD operations for floating-point
operands. However, floating-point units are hardwareintensive.
Also, for several media applications, even
precision of 10 to 12 binary bits and dynamic range of 4
to 6 bits are sufficient. Industry-standard floating-point
(IEEE FP) requires a minimum of 23 bits of precision.
Looking at application requirements and the trade-off of
performance and design complexity leads to the use of a
fixed-point arithmetic paradigm for several media
applications.

-- extracted from "MMX Technology Architecture Overview"

What I said before:

MMX provides SIMD operations for floating-point operands by using fixed-point arithmetic paradigm to boost performance. The idea is to trade off IEEE FP's minimum 23 bits of precision to a much lower precision.

should be rephrased to:
"MMX/SIMD provides an option for using fixed-point operations to emulate low-precision floating-point operations with the help of the library/app coding".
If these MMX/SSE instructions are not available, then you can only achieve the similar floating point stuff by using x87. Isn't it?

If my perception is wrong, then can you comment on this note?

libSIMDx86 supports Intel's MMX, SSE, SSE2, SSE3; and AMD's 3DNow!, 3DNow!+, and MMX+. Additionally, standard x87 (i.e. non-SIMD) versions of the functions have been provided as a fallback and a control.

-- extracted from http://simdx86.sourceforge.net/

Well, can you relate performance in matrices per second to SIMD(MMX/SSE) itself then? What user-perceivable tasks are involved in CPU-SSE test to take advantage of SIMD?

Announcement

what have been tested in SSE3 or SSE4a

what have been tested in SSE3 or SSE4a

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment