Announcement

Collapse
No announcement yet.

CPU-Maths - no operations reported in timeout period

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    We seem to have found the cause of the problem, one of the Windows API functions we use for timing the length of certain tests (QueryPerformanceCounter, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) is returning unreliable values on your hardware, consequently this results in some tests becoming stuck executing while not updating any of the results and triggering the watchdog timer (to flag the error) after a certain time.

    Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.

    The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .


    Geode results:
    Code:
       
    1      3,579,545     970,856,713     3,572,548
    2      3,579,545     957,664,136     -13,192,577
    3      3,579,545     961,248,841     3,584,705
    4      3,579,545     964,833,608     3,584,767
    5      3,579,545     968,418,269     3,584,661
    6      3,579,545     972,002,932     3,584,663
    7      3,579,545     992,364,858     20,361,926
    8      3,579,545     995,949,537     3,584,679
    9      3,579,545     999,534,244     3,584,707
    10     3,579,545     1,003,119,028    3,584,784
    11     3,579,545     989,926,436     -13,192,592
    12     3,579,545     993,511,135     3,584,699
    13     3,579,545     997,095,848     3,584,713
    14     3,579,545     1,000,680,522   3,584,674
    15     3,579,545     1,004,265,263   3,584,741
    16     3,579,545     1,024,627,167   20,361,904
    17     3,579,545     1,028,211,841   3,584,674
    18     3,579,545     1,031,796,543   3,584,702
    19     3,579,545     1,035,381,265   3,584,722
    20     3,579,545     1,038,965,967   3,584,702
    21     3,579,545     1,025,773,431   -13,192,536
    E8400 results
    Code:
    1     3,000,060,000     61,949,764,826,091     2,990,583,234
    2     3,000,060,000     61,952,764,860,855     3,000,034,764
    3     3,000,060,000     61,955,764,868,889     3,000,008,034
    4     3,000,060,000     61,958,764,832,661     2,999,963,772
    5     3,000,060,000     61,961,764,860,666     3,000,028,005
    6     3,000,060,000     61,964,764,858,818     2,999,998,152
    7     3,000,060,000     61,967,764,858,032     2,999,999,214
    8     3,000,060,000     61,970,764,865,544     3,000,007,512
    9     3,000,060,000     61,973,764,894,080     3,000,028,536
    10    3,000,060,000     61,976,764,904,535     3,000,010,455
    11    3,000,060,000     61,979,764,908,762     3,000,004,227
    12    3,000,060,000     61,982,764,942,860     3,000,034,098
    13    3,000,060,000     61,985,764,907,838     2,999,964,978
    14    3,000,060,000     61,988,764,912,344     3,000,004,506
    15    3,000,060,000     61,991,764,921,764     3,000,009,420
    16    3,000,060,000     61,994,764,930,248     3,000,008,484
    17    3,000,060,000     61,997,764,938,408     3,000,008,160
    18    3,000,060,000     62,000,764,948,755     3,000,010,347
    19    3,000,060,000     62,003,765,004,309     3,000,055,554
    20    3,000,060,000     62,006,764,969,161     2,999,964,852
    21    3,000,060,000     62,009,764,990,443     3,000,021,282
    As you can see in the Geode results the current counter values and difference from the previous sample jumps around significantly, even resulting in negative values, a negative should only ever occur if the counter has reached it maximum value and started again (something that should only occur after days of system uptime). In comparison the E8400 results are very consistent, the current counter value always increases and the last sample minus the first sample when divided by the frequency works out as 20 minutes elapsed during the test. This doesn’t hold true for the Geode and the count value even seems to wrap around close to 0 about 10 mins into the test.

    While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.

    Comment


    • #17
      So in fact the problem was pretty much exactly what we suggested it might be a few weeks ago. A problem with the system timers.

      You have a timer which should increase steadily over time, counting backwards from time to time.

      Comment

      Working...
      X