Announcement

**David (PassMark)** · Sep-18-2014, 07:27 AM

I see in a subsequent post you came across some analysis of similar Opteron behaviour that we found last year.

We don't know the exact reason, but if we have anything new to add we do it in the existing post linked to above.

**rvborgh** · Sep-20-2014, 06:15 AM

Hi David,

After much methodical testing... i think i have figured out the issue or at least have some progress and a bit better understanding.

In the SuperMicro BIOS there are 4 settings for how memory is handled. Bank Interleaving. Node Interleaving, Channel Interleave, and another relevant one called Bank Swizzle.

In order for the Passmark benchmarks to come out proper for these multi processor rigs- ie as one might reasonably expect scaling wise for Physics, Prime Number and encryption benchmark tests the following settings must be set as follows in the BIOS:

Bank Interleaving needs to be Auto (ie enabled)
Node Interleaving needs to be Auto (ie enabled)
Channel Interleaving needs to be Auto (ie enabled)
Bank Swizzle needs to be Disabled.

When i do this... then the benchmarks come out proper... and as a result on my dual Opteron 6180SE... cpumark scores jump from 8900 to 11700.

Now right before i did this... i installed the following Microsoft HotFix that deals with core parking:

http://support.microsoft.com/kb/2534356

link to hotfix (this is for server 2008 R2):

http://hotfixv4.microsoft.com/Window...tl_x64_zip.exe

i had previously installed the hot fix prior to going through my test matrix... prior to finding the fix with the same bad results - but when i switched node interleaving to auto (enabled) then the benches were proper. i do not know whether this hotfix had an effect or not... i just mention my steps for anyone that might need to duplicate this in the future.

Node Interleaving enabled as i understand it gets rid of the NUMA functionality (the OS doesn't have an SRAT telling it which portions of memory are physically associated with each processor node).

The downside to getting rid of NUMA... memory latency is basically doubled... from 67ns up to around 113 or so. There is a 34% penalty on the memory mark because of this vs with node interleaving disabled (NUMA on).

At any rate... i'll keep on testing... but it would appear that having the OS NUMA aware is completely screwing up those 3 benchmarks for multiprocessor Opteron rigs.

Perhaps the prep code for the benchmark allocates it on a core that happens to be in a specific NUMA node... and the threads that work on that data are executed on cores that lie within a different NUMA node... or somesuch. i know that on Stream... for me to get good results... 45 GB/s... i have to set thread affinity... otherwise bandwidth drops down to 5GB/s.

Hope this helps someone.

Announcement

strange performance with dual Opteron 6180SE

strange performance with dual Opteron 6180SE

Comment

Comment