Announcement

**David (PassMark)** · Mar-25-2024, 08:55 AM

There are some results from years ago here
https://forums.passmark.com/performa...d-threadripper

The whole idea of multiple sockets is going out of fashion, why have two sockets of 14 Cores each, when you can have a single socket CPU with up to 96 cores in one package. Very little software is optimised for NUMA.

It would definitely be interesting to see results from a few other machines. As I agree the results look at bit strange (like the NUMA RAM allocation nodes were relative to the NUMA CPU nodes).

As indicated in the linked post, some motherboard have BIOS settings to play around with.

**ikaros7** · Apr-04-2024, 01:10 AM

This time I tested with Intel Memory Latency Checker, and got the expected result (symmetric remote memory access delays between each numa).

Isn't there any possibility that the PerformanceTest advanced memory test had malfunctioned?

Here is the Intel Memory Latency Checker test result.

================================================== ==================================================

Intel(R) Memory Latency Checker - v3.11
Measuring idle latencies for random access (in ns)...

	Numa node 0	Numa node 1
Numa node 0	91.8	125.6
Numa node 1	128.6	90.4

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads : 126783.9
3:1 Reads-Writes : 122172.0
2:1 Reads-Writes : 121868.1
1:1 Reads-Writes : 114215.6
Stream-triad like: 107334.5

Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type

	Numa node 0	Numa node 1
Numa node 0	64761.6	16684.5
Numa node 1	16725.6	64444.8

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject Latency Bandwidth
Delay (ns) MB/sec
==========================
00000 210.29 128318.6
00002 210.81 128501.7
00008 211.57 128222.4
00015 211.70 128083.5
00050 199.87 127241.8
00100 183.51 125596.4
00200 121.22 92607.0
00300 110.09 63446.0
00400 104.28 48124.2
00500 100.70 38986.3
00700 97.13 28266.5
01000 97.98 19979.7
01300 93.68 15644.0
01700 92.60 12166.5
02500 91.72 8513.3
03500 91.03 6294.2
05000 91.36 4614.1
09000 91.05 2881.8
20000 90.88 1685.7

Measuring cache-to-cache transfer latency (in ns)...
Using small pages for allocating buffers
Local Socket L2->L2 HIT latency 39.7
Local Socket L2->L2 HITM latency 43.4
Remote Socket L2->L2 HITM latency (data address homed in writer socket)

	Reader Numa Node 0	Reader Numa Node 1
Writer Numa Node 0	-	97.9
Writer Numa Node 1	98.5	-

Remote Socket L2->L2 HITM latency (data address homed in reader socket)

	Reader Numa Node 0	Reader Numa Node 1
Writer Numa Node 0	-	98.2
Writer Numa Node 1	97.6	-

================================================== ==================================================

latency (random range)	NUMA node 0	NUMA node 1
NUMA allocation node 0	55.58ns	55.58ns
NUMA allocation node 1	81.58ns	80.88ns

block write speed	NUMA node 0	NUMA node 1
NUMA allocation node 0	3785MB/s	3813MB/s
NUMA allocation node 1	2119MB/s	2069MB/s

Announcement

advanced memory test shows huge performance drop for the 2nd CPU of MP workstations

advanced memory test shows huge performance drop for the 2nd CPU of MP workstations

Comment

Comment