No announcement yet.

Threadripper 1950x very low integer and floating point math tests. Help?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Threadripper 1950x very low integer and floating point math tests. Help?

    I've been struggling with this for a couple days now and am at wits end. My Threadripper is posting depressingly low numbers for both the integer math test and the floating point math test (which are critical uses for me).
    Full benchmark:

    Asus ROG Zenith Extreme running BIOS 0701 (latest)
    Threadripper 1950x
    GSkill F4-3200C14Q-32GTZR
    Custom watercooled loop using EK-Supremacy EVO Threadripper Edition - Full Nickel waterblock
    850 Watt EVGA G3 power supply
    Cosmos C700P case (large, 3 fans in on waterblock in front, 3 fans exhaust @ 2 top, 1 back)
    Samsung 960 pro M.2 behind NVMe heatsink on mobo

    Fresh install of win10 on a brand new blank m.2 NVMe (reinstalled two more times in desperation), so I'm certain there isn't an older chipset/driver for a different cpu causing conflict.

    My test results vs about average (average taken as the approx mean of the most populous gaussian-ish curve of values - eyeballing the averages here)
    CPU benchmark numbers (mine/cluster median)
    Integer math : 28000/85000 (distressingly below average)*******
    Prime Numbers : 75/73 (slightly above average?)
    Compression : 53000/52000 (slightly better than average)
    Physics : 1450/1500 (average)
    CPU single thread 1800/2050 (significantly below average)
    Floating point math 4800/31000 (distressingly below average)*******
    Extended instructions 1470/1500 (average)
    Encryption 8300/8300 (average)
    Sorting 3100/3100 (average)

    Interestingly, in the reported stats for people with the same chip, there is a small cluster of users who seem to be experiencing the same issue both for integer math and floating point math.

    Tried running the benchmarks with default bios settings, DOCP 3200Mhz only, CPU 4Ghz overclock (asus setting) only, and both overclocked. Same low math result for each. During the benchmarks, my cpu usage rails to 100% (except during the single thread test obviously). I changed the test duration to long and checked the temperature and power using HWinfo64. During the Integer math, my max Tdie was 48.8C, Tctl was 75.8C, and max CPU package power was 130W (should it have peaked to 180W?). During the floating point math, max temps were Tdie 40.5C, Tctl 67.5C and 134W max CPU package power. When I run something with a normal value like the encryption test, I get Tdie 50.8, Tctl 77.8 and 179W (full usage?).

    Performance per CPU core for medium duration integer math test (default 32):
    1 - 2996
    2 - 5902
    3 - 8900
    4 - 11762
    5 - 14598
    6 - 17243
    7 - 19704
    8 - 21919
    9 - 24058
    10 - 25498
    11 - 25473
    12 - 25684
    13 - 25991
    14 - 26117
    I don't see the point in continuing further

    Performance per CPU core for medium duration floating point math test (default 32):
    1 - 1670
    2 - 3080
    3 - 4090
    4 - 4158
    5 - 4301
    6 - 4208
    7 - 4357
    8 - 4372
    9 - 4444
    10 - 4346
    11 - 4405
    12 - 4348
    13 - 4362
    14 - 4482
    I don't see the point in continuing further

    Testing against something that appears to have a "normal" benchmark:
    Performance per CPU core for medium duration encryption test (default 32):
    1 - 358
    2 - 726
    3 - 1101
    4 - 1453
    5 - 1799
    6 - 2142
    7 - 2495
    8 - 2876
    9 - 3213
    10 - 3572
    11 - 3904
    12 - 4269
    13 - 4627
    14 - 4893
    15 - 5339
    16 - 5648
    17 - 5816
    18 - 6014
    19 - 6158
    20 - 6330
    21 - 6505
    22 - 6669
    23 - 6846
    24 - 7018
    I think we get the trend- it seems to be levelling off a bit as we creep up to a "good" value, still making reasonable gains at 24 threads whereas the "bad" tests level off around 9 threads (integer) and 3 threads (flop)

    These results are devastating for me since I was intending to use this computer for high-load FLOP and integer math (multithreaded Matlab computation), but my Nephew's Ryzen5 1600x basically pulls the same performance in matlab. So I got a Ryzen 5 at Threadripper 1950x prices? Am I overlooking some sort of system setting? Is this processor an RMA? Is there a chance it could be the motherboard? I would be VERY grateful for anyone who could lend insight or at least put me out of my 48 hour misery of an ordeal.

  • #2
    We had a few similar reports over the last year. Here are the links.
    Especially this post.

    So as a summary, this seemed to help in other cases:
    - Turning on Intel Speed step in BIOS
    - Uninstalling Gigabyte bloatware (e.g EasyTuningService)
    - Full reinstall of Windows if above don't work.

    In your case you won't have Intel speed step or Gigabyte EasyTuningService software, and have already reinstalled windows. (I assume you run the benchmark directly after the new install of Windows and didn't load any 3rd party "tuning" first).

    Some type of throttling seems likely.

    Some other things to check.
    - Windows power plan
    - Any power saving settings in BIOS
    - If there is any updated drivers available
    - Some people with Ryzen CPUs are blaming the HPET for performance issues (High Precision Event Timer), we don't know if this is true however. (Important Update 1/Mar/2021. ASUS confirmed there is a problem with their ASUS AI Suite 3 software and the HPET timer causing floating point performance issues).


    • #3
      You're my personal hero right now. In my naivete, I installed the latest ASUS drivers right after installing windows and THEN performed the benchmark. I just wiped my system and did my 4th reinstall, but went immediately from the windows install->switch power profile to high performance (always did that before)->install and run passmark. Now, without the asus drivers, my threadripper is nestled very comfortably in the middle of all of the group results. You're awesome! Thank you!!!

      Now I'm off to install drivers one-by-one until I find the culprit...


      • #4
        Thanks for update. Let us know which driver is at fault, as it will likely help other people as well.


        • #5
          Following up for others (I know... it's been a busy few weeks). I've reinstalled everything except for the "Asus AI suite 3" utility and the "AMD chipset" driver. No issues whatsoever. I don't see the need for installing either of those right now since the system is currently up and running in its completed state, giving me no issues, and benchmarking very nicely.


          • #6
            Confirmed: Installing the Dual Intelligent Processors 5 software (under AI suite) completely nerfs ThreadRipper. Even if I let it "learn" my system and turn performance all the way up. Complete trash.

            This is a testament to how important software like PassMark is.


            • #7
              Thanks a lot for posting this issue and solution - shame resolution was a couple of weeks too late for me, on my very similar rig, and I'd made same discovery the hard way in that time.

              Unfortunately, I still have an issue as my RAM scores are some way off. I'm running a 64GB 2x Corsair CMK32GX4M2Z2400C16 setup... Would you mind posting your RAM scores for comparison, please? Especially my latency score is terrible.

              Any suggestions for what I should be looking at to resolve this? Thanks.


              • #8
                Actually, I've just checked the QVL, and the part number of what I have fitted is not on the list! I has assumed that the company I got the custom workstation from would check such matters...

                Could that essentially be the explanation for what is going on?


                • #9
                  We have a 1950x ThreadRipper machine here for testing & the memory latency is bad. We see 97ns on our system. This is with 64GB of PC4-21333 RAM.

                  You might be able to get it slightly better by playing around with the BIOS setting for the RAM. You could also try setting 'Creator Mode' and 'Game Mode' to see what impact that has. But I don't think there is any 'fixing' it as it is a consequence of the design of the CPU.

                  Those qualified memory lists are often out of date and incomplete.


                  • #10
                    Thanks, David.

                    Hmmm. Well, for a custom high-spec workstation I would expect a memory mark better than 57th percentile - although my latency is "only" 89ns!

                    Fundamentally, SOMETHING is killing my "real-world" performance, i.e. running Python computations (which involve a range of single & multi-thread operations etc.): overall timings are 10-30% worse (even without AI Suite killing performance) than the i7 laptop I'm moving from! As the RAM benchmark is the only score which is substantially worse than on old system, this seems like the obvious culprit.


                    • #11
                      P.S.: BTW, I don't seem to be able to find the XMP profile in the BIOS... any ideas? An explanation for performance troubles?


                      • #12
                        What model i7 was in that laptop? Maybe the laptop had a nice SSD which helped the Python scripts?

                        XMP was an Intel thing.
                        AMD also called it, AMP, AMD Memory Profile
                        Asus called it DOCP, Direct Over Clock Profile,
                        Gigabyte called it EOCP, Extended Over Clock Profiles.
                        MSI called it it A-XMP

                        So have a look for those acronyms in your BIOS.


                        • #13
                          Sorry for the delay, thanks for getting back to me, and the explanation on overclock profile names. I've got a manual profile running currently, but it's based on a preset Asus 4GHz profile (Zenith Extreme board with 701 BIOS).

                          So the laptop was running an i7-4800MQ - 4 cores at 2.7GHz. This is rather less than the 16 cores at 4GHz with water cooling I've now got... Similarly, all other hardware now is better on paper than previous machine, e.g. new workstation is running a 1TB Samsung 960 pro M.2 SSD - vs. spinning disk on laptop. And the benchmark scores reflect that - overall scores are much higher, some individual test scores are a bit worse (notably RAM latency). Oddly, the 2D graphics score is lower on new workstation, despite running a Titan XP, vs. Quadro K4100M.

                          I did switch RAM mode to "local" (as used in "Game" profile in Ryzen Master software), which cut latency to about 65ns, but with a smallish penalty on the CPU scores. This has improved my Python code's performance, but it's still slower than on the laptap...

                          I'm tearing my hair out here! Any other ideas for what I should be trying??


                          • #14
                            2D graphics performance doesn't scale with the cost of the graphics card. Cheap cards can do OK at 2D.

                            My guess is that your Python code is more single threaded than you think. And the 1950X is no better than the 4800MQ in this regard. Or maybe there is some other difference. e.g. different versions of Python.


                            • #15
                              Right... I've made some progress. There's probably some further work to be done for the system to be fully optimal for my needs, but it is usable now, and substantially faster than the previous i7 laptop.

                              Key findings:
                              - RAM configuration matters significantly. Overclocking RAM (now on 2800MHz with CAS 16, up from 2400MHz with CAS 15, but will both try to push this further and also see if I can get some faster sticks) has made a noticeable difference to single-threaded performance.
                              - There is, at least in my Python code, a greater overhead associated with parallelization than before, so short tests won't necessarily show a speed benefit overall - but in long tests improvements over old setup become obvious. Some of my code is now running more than twice as fast as before - but given 4x core count at higher clock speed, there would appear to still be scope for improvement.
                              - My code needs to be re-profiled and re-optimized...

                              Thanks for support to get me this far!