We seem to have found the cause of the problem, one of the Windows API functions we use for timing the length of certain tests (QueryPerformanceCounter, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) is returning unreliable values on your hardware, consequently this results in some tests becoming stuck executing while not updating any of the results and triggering the watchdog timer (to flag the error) after a certain time.
Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.
The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .
Geode results:
E8400 results
As you can see in the Geode results the current counter values and difference from the previous sample jumps around significantly, even resulting in negative values, a negative should only ever occur if the counter has reached it maximum value and started again (something that should only occur after days of system uptime). In comparison the E8400 results are very consistent, the current counter value always increases and the last sample minus the first sample when divided by the frequency works out as 20 minutes elapsed during the test. This doesn’t hold true for the Geode and the count value even seems to wrap around close to 0 about 10 mins into the test.
While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.
Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.
The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .
Geode results:
Code:
1 3,579,545 970,856,713 3,572,548 2 3,579,545 957,664,136 -13,192,577 3 3,579,545 961,248,841 3,584,705 4 3,579,545 964,833,608 3,584,767 5 3,579,545 968,418,269 3,584,661 6 3,579,545 972,002,932 3,584,663 7 3,579,545 992,364,858 20,361,926 8 3,579,545 995,949,537 3,584,679 9 3,579,545 999,534,244 3,584,707 10 3,579,545 1,003,119,028 3,584,784 11 3,579,545 989,926,436 -13,192,592 12 3,579,545 993,511,135 3,584,699 13 3,579,545 997,095,848 3,584,713 14 3,579,545 1,000,680,522 3,584,674 15 3,579,545 1,004,265,263 3,584,741 16 3,579,545 1,024,627,167 20,361,904 17 3,579,545 1,028,211,841 3,584,674 18 3,579,545 1,031,796,543 3,584,702 19 3,579,545 1,035,381,265 3,584,722 20 3,579,545 1,038,965,967 3,584,702 21 3,579,545 1,025,773,431 -13,192,536
Code:
1 3,000,060,000 61,949,764,826,091 2,990,583,234 2 3,000,060,000 61,952,764,860,855 3,000,034,764 3 3,000,060,000 61,955,764,868,889 3,000,008,034 4 3,000,060,000 61,958,764,832,661 2,999,963,772 5 3,000,060,000 61,961,764,860,666 3,000,028,005 6 3,000,060,000 61,964,764,858,818 2,999,998,152 7 3,000,060,000 61,967,764,858,032 2,999,999,214 8 3,000,060,000 61,970,764,865,544 3,000,007,512 9 3,000,060,000 61,973,764,894,080 3,000,028,536 10 3,000,060,000 61,976,764,904,535 3,000,010,455 11 3,000,060,000 61,979,764,908,762 3,000,004,227 12 3,000,060,000 61,982,764,942,860 3,000,034,098 13 3,000,060,000 61,985,764,907,838 2,999,964,978 14 3,000,060,000 61,988,764,912,344 3,000,004,506 15 3,000,060,000 61,991,764,921,764 3,000,009,420 16 3,000,060,000 61,994,764,930,248 3,000,008,484 17 3,000,060,000 61,997,764,938,408 3,000,008,160 18 3,000,060,000 62,000,764,948,755 3,000,010,347 19 3,000,060,000 62,003,765,004,309 3,000,055,554 20 3,000,060,000 62,006,764,969,161 2,999,964,852 21 3,000,060,000 62,009,764,990,443 3,000,021,282
While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.
Comment