Announcement

Collapse
No announcement yet.

AMD llano A-series benchmark and CPU bug

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD llano A-series benchmark and CPU bug

    This is a post to document some investigation we did into the AMD Llano CPUs.
    These are also known as AMD Fusion chips, or the 12h family. Chips in this series include the A6-3600, A6-3650, A8-3850, and about a dozen other models.

    We have had several customers query us about benchmark results for this series of chips.

    What was reported was that a number of these CPUs performed very badly on the Integer Maths test and the Prime number test.

    Here is a graph showing the spread of the results across 180 difference PCs with the AMD-A6-3650 CPU. Results are in millions of operations per second.



    About half of all systems show a 80% drop in performance on this integer maths test. The graph is similar for the prime number test, and also similar for the A8-3850 CPU. However the same behavior is not seen for other tests (like the floating point test).

    So the bad behavior is only seen on the integer maths test and the prime number test.
    The common thing about these two tests is that they make heavy use of integer division.

    For several months we were unable to explain the behavior. We eliminated many possibilities such as over heating, bad configuration, etc.. but up until now couldn't come to any conclusion.

    But a couple of days ago we came across this AMD document, "Revision Guide for AMD Family 12h Processors"

    A section of this document describes, "Errata 665". These errata are lists of known bugs for a CPU.

    Here is text for the Errata 665

    Code:
    [B]665 Integer Divide Instruction May Cause Unpredictable Behavior[/B]
    [B][I]Description[/I][/B]
    Under a highly specific and detailed set of internal timing conditions, the processor
    core may abort a speculative DIV or IDIV integer divide instruction (due to the
    speculative execution being redirected, for example due to a mispredicted branch)
    but may hang or prematurely complete the first instruction of the non-speculative path.
    
    [B][I]Potential Effect on System[/I][/B]
    Unpredictable system behavior, usually resulting in a system hang.
    
    [B][I]Suggested Workaround[/I][/B]
    BIOS should set MSRC001_1029[31].
    This workaround alters the DIV/IDIV instruction latency specified in the Software
    Optimization Guide for AMD Family 10h and 12h Processors, order# 40546. With
    this workaround applied, the DIV/IDIV latency for AMD Family 12h Processors
    are similar to the DIV/IDIV latency for AMD Family 10h Processors.
    
    [B][I]Fix Planned[/I][/B]
    No
    This bug seems to closely match up with the behavior we are seeing. While we can't be 100% sure at this point we believe this bug maybe the cause of the poor benchmark results. These is a documented workaround, but in the BIOS we checked the motherboard manufacturers are not applying the correction.

    There also seems to be isolated incidents of this bug impacting real life applications as well, but on the whole there is very little information available about the impact.

    Update: Same problem is seen for the AMD Athlon II X4 631, 641 & 651 CPUs and mobile CPUs like the A6-3400M. Which is to be expected as they also use the same 'Fusion' CPU core.

  • #2
    665 Integer Divide Instruction workaround

    A bit more speculation and some additional testing.

    We are suspecting that there are two possibilities for getting low benchmark results as a result of this bug.

    Possibility 1:
    Some of our test threads hang, but not all. Reducing the test to running on 1 or 2 core. (Update: this was later ruled out)

    Possibility 2:
    There might be are scenarios where a system hang does not occur, but rather there is a significant performance impact due to possibly reloading the instruction pipeline (or the like).

    Possibility 3:
    Something else we don't fully understand

    What is MSRC001_1029[31] ?
    This appears to be partially undocumented. Which is strange as it is suggested as a workaround.

    MSRC001_1029 is the Model specific register known as the Decode Configuration Register (DE_CFG). But we don't know what function bit 31 serves.

    What is the effect of setting MSRC001_1029[31] ?
    Unfortunately we don't have a badly behaving machine to test this on (please contact us if you have a machine like this). But we wrote some code to test it on a A6-3650 machine that doesn't show the bad behavior, to see what impact it would have should BIOS manufacturers decide to universally implement it.

    Before patch
    ------------
    Integer Math: 3059 Millions of operations / Sec
    Prime numbers: 2026

    After patch
    ------------
    Integer Math: 2428 (20% slower)
    Prime numbers: 1494 (26% slower)

    So as per the AMD description this seems to slow down the execution of this IDIV instruction on the CPU.

    Comment


    • #3
      We have made an application that applies the work around for the AMD Llano errata #665. This patch is only temperary until the next reboot, but if you are seeing very low Integer and Prime number CPU results in PerformanceTest, or having CPU hangs, then you can run this application to see if the problem is caused by CPU errata 665. Please send us feedback.

      From our experience, it would appear that few BIOS's currently implement this work-around.

      The 32-bit and 64-bit patches are here:
      http://www.passmark.com/ftp/LlanoCPUerrata665.zip (see below for updated version)

      Comment


      • #4
        OK, we have worked it out. The cause of the low PerformanceTest Integer and Prime number results is due to the workaround implemented in BIOS for Llano CPU errata 665.

        V1.1 of our patch tool only applies/removes the patch to 1 CPU core and it needs to be applied/removed from all CPU cores. We have updated our utility to do this. Please try our updated patch utility to see the impact on performance, v1.2, from here:
        http://www.passmark.com/ftp/LlanoCPUerrata665_v1.2.zip

        We tested this on an A6-3650 system in our lab and found that with the patch not applied the CPU results were:
        Integer test: 3074
        Prime number: 2054

        When the work around patch for Llano errata 665 is applied the CPU results dropped to:
        Integer test: 528
        Prime number: 751

        The CPU tests that do not utilize integer divide are unaffected.

        This is consistent with the baselines submitted to the PassMark benchmark site for Llano CPUs. Where the high CPU results are for systems without the BIOS patch for errata 665, while the lower results are for systems with the BIOS patch.

        Comment


        • #5
          So with the tool above users of these CPUs now have the choice between unpredictable system hangs, or very poor integer division performance. (The bug might also impact modulo operations, but we haven't tested this).

          For most people poor integer division will be the better choice. While it is hard to come by real figures, the research others have done suggest that division only makes up 0.2% to 0.6% of instructions executed in some real life applications.

          Our integer maths benchmark test in PerformanceTest V7 is 25% division. So the impact is much more obvious. If however you are running some high end scientific calculations on these CPUs, then you might want to re-write your code to avoid doing divisions.

          Comment


          • #6
            PassMark guys,

            I ran into this late last year, or early this year when I acquired a 3870K CPU for building a new HTPC. Looking at the specs I could tell it would be a very low power system at idle (due to that 800MHz base clock speed) and on-chip video.

            When I first built the system the PassMark scores were super high. Higher than any other benchmarking utility was showing. I didn't care too much, as it definitely had plenty of punch for what I wanted to do with it. I probably did 8-10 hours of constant testing on it and never had any BSOD's or other strange behavior. I made a post on my blog noting the exceptional CPU results at that point.

            http://operationcsquared.com/tech/?p=29


            Then, I did a BIOS update, just to keep everything clean before pulling it out of my lab environment and moving it into active duty. At this point, all hell broke loose. The integer performance and prime number performance both went down to 25% of their original scores. I figured out which BIOS update had made the change, and backed out until my scores went back up. Again, never any stability issues.

            The question I have is, does the awesome Integer performance shown when the Errata 665 is not patched actually prove out, or is it artificially inflated due to bailing out from the process before the real work is actually done?

            Thanks,
            Chris

            Comment


            • #7
              The process is fully executed. But really slowly. So any real life task that uses a lot of integer division will suffer a fairly dramatic loss of speed once the patch is applied.

              If you don't apply the patch then any application that uses division runs the risk of having random behavior. This random behavior doesn't need to be a BSOD, but could manifest itself in other ways, e.g. a bad calculation, an application crash, screen artifacts, etc..

              Comment


              • #8
                RAM issue

                Hi David,

                Sorry if this has come a little late, I have been using the Llano (fusion) core CPU's in many builds. We have had over 50 units with this issue (before patch) and after a little testing found the problem easily corrected via the RAM. If you use only one module (single channel) of 4Gb DDR3 PC3-10666/PC3-12800 then the CPU runs OK and fast for the Llano. However running more than 4Gb or any amount in dual channel you hit this wall of problems. This has been tested by us on all the Athlon II 6*1/K and A6/A8 desktop versions with A55 and A75 boards.

                Comment


                • #9
                  AMD just says,
                  "Under a highly specific and detailed set of internal timing conditions",
                  which isn't very helpful. But it is possible dual channel RAM changes the timing of the instruction execution and provokes the problem.

                  Comment


                  • #10
                    How did you test for this?

                    Originally posted by PC Extra View Post
                    ...I have been using the Llano (fusion) core CPU's in many builds. We have had over 50 units with this issue (before patch) and after a little testing found the problem easily corrected via the RAM. If you use only one module (single channel) of 4Gb DDR3 PC3-10666/PC3-12800 then the CPU runs OK and fast for the Llano. However running more than 4Gb or any amount in dual channel you hit this wall of problems...
                    How did you test for this problem? Do you have a program which can be run on one of these systems, to determine whether or not it is exhibiting the problem?

                    Thanks!

                    Dave
                    http://www.geeksalive.com/email/

                    Comment

                    Working...
                    X