So in fact the problem was pretty much exactly what we suggested it might be a few weeks ago. A problem with the system timers.
You have a timer which should increase steadily over time, counting backwards from time to time.
Announcement
Collapse
No announcement yet.
CPU-Maths - no operations reported in timeout period
Collapse
X
-
We seem to have found the cause of the problem, one of the Windows API functions we use for timing the length of certain tests (QueryPerformanceCounter, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) is returning unreliable values on your hardware, consequently this results in some tests becoming stuck executing while not updating any of the results and triggering the watchdog timer (to flag the error) after a certain time.
Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.
The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .
Geode results:
Code:1 3,579,545 970,856,713 3,572,548 2 3,579,545 957,664,136 -13,192,577 3 3,579,545 961,248,841 3,584,705 4 3,579,545 964,833,608 3,584,767 5 3,579,545 968,418,269 3,584,661 6 3,579,545 972,002,932 3,584,663 7 3,579,545 992,364,858 20,361,926 8 3,579,545 995,949,537 3,584,679 9 3,579,545 999,534,244 3,584,707 10 3,579,545 1,003,119,028 3,584,784 11 3,579,545 989,926,436 -13,192,592 12 3,579,545 993,511,135 3,584,699 13 3,579,545 997,095,848 3,584,713 14 3,579,545 1,000,680,522 3,584,674 15 3,579,545 1,004,265,263 3,584,741 16 3,579,545 1,024,627,167 20,361,904 17 3,579,545 1,028,211,841 3,584,674 18 3,579,545 1,031,796,543 3,584,702 19 3,579,545 1,035,381,265 3,584,722 20 3,579,545 1,038,965,967 3,584,702 21 3,579,545 1,025,773,431 -13,192,536
Code:1 3,000,060,000 61,949,764,826,091 2,990,583,234 2 3,000,060,000 61,952,764,860,855 3,000,034,764 3 3,000,060,000 61,955,764,868,889 3,000,008,034 4 3,000,060,000 61,958,764,832,661 2,999,963,772 5 3,000,060,000 61,961,764,860,666 3,000,028,005 6 3,000,060,000 61,964,764,858,818 2,999,998,152 7 3,000,060,000 61,967,764,858,032 2,999,999,214 8 3,000,060,000 61,970,764,865,544 3,000,007,512 9 3,000,060,000 61,973,764,894,080 3,000,028,536 10 3,000,060,000 61,976,764,904,535 3,000,010,455 11 3,000,060,000 61,979,764,908,762 3,000,004,227 12 3,000,060,000 61,982,764,942,860 3,000,034,098 13 3,000,060,000 61,985,764,907,838 2,999,964,978 14 3,000,060,000 61,988,764,912,344 3,000,004,506 15 3,000,060,000 61,991,764,921,764 3,000,009,420 16 3,000,060,000 61,994,764,930,248 3,000,008,484 17 3,000,060,000 61,997,764,938,408 3,000,008,160 18 3,000,060,000 62,000,764,948,755 3,000,010,347 19 3,000,060,000 62,003,765,004,309 3,000,055,554 20 3,000,060,000 62,006,764,969,161 2,999,964,852 21 3,000,060,000 62,009,764,990,443 3,000,021,282
While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.
Leave a comment:
-
The links to the images are broken (they might be on your intranet, which we can't access). But sounds OK.
Leave a comment:
-
Thanks in advance.
Thanks in advance for your help in this matter. I have verified our issue on multiple systems. In order to make troubleshooting a little easier I have put together this test sled for your use.
The test sled, has the SBC in question with I/O cabled out for ease of use. It is loaded with a registered and updated version of Windows XP and all drivers installed. Unit should boot upon arrival and has Burn In Test loaded. Failure will occur within the first Hour to Hour and a Half of running. This has been verified using this same sled at our location.
Please see below pictures.
Leave a comment:
-
The post you linked to above does mention an issue on the Geode back in 2006. But this issue was corrected in Release 5.1 build 1012, 15/August/2006. The symptoms are also not identical. Their issue was a single error being flagged right at the end of a 12 hour run.
There was also this issue from early 2007,
http://www.passmark.com/forum/showthread.php?t=800
we suggested it might be a hardware / device driver fault, maybe with the system timer. And this turned out to be the case, the customer later stated, "My colleagues who have been investigating have found a problem with time handling by the platform. We have a fix and will be rerunning tests this week."
If you want to ship us something you can do so at this address,
http://www.passmark.com/about/contact_us.htm
Address it to Ian Robinson. And include a printout of this page & your contact details.
Please don't ship us something that doesn't just work. We have had cases where we have been sent motherboards, but without a compatible CPU, nor RAM, nor device drivers, etc.. We don't want to spend days trying to get it to even boot.
Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure
A DEBUG build is one where we add in extra logging, to investigate a particular problem. We don't have one for this issue. We could do one, but my fear is that it might not show anything more than what was in the event log you already posted 9 months ago. And so might spend several weeks doing stuff by trial and error. But we could give this a go if you want.
Leave a comment:
-
Couple of questions. First I was perusing the forums for a similar failure mode that we are seeing and I came up with this: http://www.passmark.com/forum/showthread.php?t=474
Any way I can see if the DEBUG version solves my issues as well?
Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure? It does not seem like this issue is going to be dropped.
And third, you had offered for us to send you a board for your help to debug this issue. I am now in a position that requires this. Is there a debug tracking number that I can use with my shipping paperwork?
Leave a comment:
-
I have confirmation that no hardware was shipped to Passmark on this Topic, though I am not sure why. Since this topic was created have you had anyone else with this type of issue on a GEODE based processor?
I am in need of a paragraph explaining what is happening during this failure and a statement that it is or is not an issue.
Leave a comment:
-
I don't think anything was shipped to us in the end. But I will check. Contact details are here.
Leave a comment:
-
About Derek@CDI
Hello, My name is Will Anniss and I am the Technology Leader where Derek worked. He was working with you on an issue that we were seeing with an AMD board and the Passmark test. Materials were shipped to you to conduct said tests on the SBC.
Who should I contact to discuss this further?
Leave a comment:
-
I realize that Geode is not a mainstream for the consumer market but in the embedded world it is quite popular. Although as you said the problem may be specific to this particular board and not the CPU/chipset in general. We've built close to 40 units so far based on this hardware in couple different flavors and all of them during testing showed that error.
We appreciate you assistance as resolving this problem is important to us. We are willing to ship the hardware to you if you think it will help. Is there a procedure we need to follow? Is your shipping address the same as listed on the website? Thanks.
Leave a comment:
-
We have never done any testing on the Geode CPU and the associated hardware. In principle it should be OK, but clearly something is going wrong. From what you have described the display thread is crashing. So the display stops updating and the watchdog timer triggers some time later. For here we can't tell you the exact cause of this crash. I assume you don't get any pop windows with crash details?
This is not a known issue & no other customer has reported behaviour like this. Do you have a 2nd example of this device on which you can run the test and compare the behaviour. Given that no other customer has reported the issue it might well be hardware related. e.g. a bad CPU, bad RAM or bad Bus.
If these were cheap devices you could ship us one, and we then we could do a much deeper investigation of the problem (and ship it back at the end if required).
Leave a comment:
-
Time has passed but the problem is still here. Going along with your suggestion we purchased v6 of BIT Pro. Unfortunately the symptoms remained the same. I ran several tests selecting only each group of instructions separately and they all produce the same results. 15-20 min within the test CPU test windows stops updating, CPU usage however, remains unchanged - around 87%. Also, changing CPU load doesn't seem to affect the results either.
I'm including the trace from one of the tests below. Please try to be more specific in your explanation. Thanks.
PassMark BurnInTest Log file - http://www.passmark.com
================================================== ======
Date: 06/04/09 14:44:14
BurnInTest V6.0 Pro 1014
Trace detail level: Activity trace 2
**************
SYSTEM SUMMARY
**************
Windows XP Professional Service Pack 2 build 2600 (32-bit),
1 x Geode(TM) Integrated Processor by AMD PCS [499.9 MHz],
503MB RAM,
AMD Custom Driver For 640x640 Panel,
4GB HDD,
GENERAL
System Name: OEM-AF71FC943EB
System Model: AWRDACPI
Motherboard Name: AMD-GX3
BIOS Manufacturer: Phoenix Technologies, LTD
BIOS Version: 6.00 PG
BIOS Release Date: 08/28/2009
CPU
CPU manufacturer: AuthenticAMD
CPU Type: Geode(TM) Integrated Processor by AMD PCS
CPUID: Family 5, Model A, Stepping 2
Physical CPU's: 1
Cores per CPU: 1
Hyperthreading: Not capable
CPU features: MMX 3DNow!
Clock frequencies:
Measured CPU speed: 499.9 MHz
Cache per CPU package:
L1 Instruction Cache: 2 x 64 KB
L1 Data Cache: 2 x 64 KB
L2 Cache: 2 x 128 KB
MEMORY
Total Physical Memory: 503MB
Available Physical Memory: 380MB
Memory devices:
A0:
- 512MB,
A1:
- 8MB,
A2:
- 8MB,
A3:
- 8MB,
GRAPHICS
AMD Custom Driver For 640x640 Panel
Chip Type: GeodeLX
DAC Type: Internal DAC
Memory: 8MB
Driver provider: Advanced Micro Devices
Driver version: 3.0.2.0
Driver date: 6-19-2009
Monitor 1: 640x640x32 70Hz (Primary monitor)
DISK VOLUMES
C: Local drive, NTFS, (3.83GB total, 2.49GB free)
DISK DRIVES
Disk drive: Model Netlist Flash v2.0 (Size: 3.83GB)
OPTICAL DRIVES
NETWORK
Intel(R) 8255xER PCI Adapter
PORTS
Communications Port: COM2 - RS232 Serial Port (max Baud rate: 115200)
Keyboard Port: PS/2 connector
Mouse Port: PS/2 connector
Standard OpenHCD USB Host Controller
Standard Enhanced PCI to USB Host Controller
******************
DETAILED EVENT LOG
******************
LOG NOTE: 2009-06-04 14:44:14, Save Preferences before: C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824
LOG NOTE: 2009-06-04 14:44:14, General, C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824, 27824
LOG NOTE: 2009-06-04 14:44:14, Save Preferences after: C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824
LOG NOTE: 2009-06-04 14:44:14, Status, PassMark BurnInTest V6.0 Pro 1014
LOG NOTE: 2009-06-04 14:44:15, Status, Main Tests started
LOG NOTE: 2009-06-04 14:44:15, Perform test: CPU at 50%
LOG NOTE: 2009-06-04 14:44:15, CPU, Starting test
LOG NOTE: 2009-06-04 14:44:16, CPU, CPU General test: CPU 0 Cycle 0 Ops 0
LOG NOTE: 2009-06-04 14:52:14, Operation watchdog for 1 is 8419133448 (previous 0)
LOG NOTE: 2009-06-04 15:00:14, Operation watchdog for 1 is 8419133448 (previous 841913344
WARNING: 2009-06-04 15:00:14, CPU, No operations reported in timeout period
LOG NOTE: 2009-06-04 15:00:14, Operation watchdog triggered for 1 is 8419133448 (previous 841913344
LOG NOTE: 2009-06-04 15:08:14, Operation watchdog for 1 is 8419133448 (previous 841913344
LOG NOTE: 2009-06-04 15:14:15, StopTests [2]
LOG NOTE: 2009-06-04 15:14:15, Test run stopping - step 0.0. 0: 0 of 1
LOG NOTE: 2009-06-04 15:14:15, CPU, Stopping test
LOG NOTE: 2009-06-04 15:14:18, Status, Test run stopped
**************
RESULT SUMMARY
**************
Test Start time: Thu Jun 04 14:44:14 2009
Test Stop time: Thu Jun 04 15:14:18 2009
Test Duration: 000h 30m 04s
Test Name Cycles Operations Result Errors Last Error
CPU 52 95.788 Billion FAIL 1 No operations reported in timeout period
TEST RUN FAILED
*******************************************
SERIOUS ERROR SUMMARY FOR THE LAST TEST RUN
*******************************************
-----------------------------------------------------------------------------------------------------
******************
DETAILED EVENT LOG
******************
LOG NOTE: 2009-06-04 15:33:36, StopTests [6]
LOG NOTE: 2009-06-04 15:33:36, Test run stopping - step 0.0. 0: 0 of 0
Leave a comment:
-
As with David's post, I would first suspect device driver, BIOS or hardware issues.
If you run the CPU test at 100% load, does the problem occur more quickly?
In V5 of BurnInTest, the CPU math test operation count is increased in the CPU math test thread, while the CPU MMX test operation count is increased in the CPU MMX display thread.
The next step would be to determine if the CPU test thread or the CPU test windows update thread has stopped, if you open task manager while running BurnInTest and watch the CPU load. If the CPU test is running, you should see load on all CPU cores - in which case the display thread has stopped. If the CPU test is not running, you should not see load on all CPU cores - in which case the cpu test thread has stopped.
If the CPU test thread has stopped, I would try V6.0 of BurnInTest as you have much more control over which CPU instructions are run from Preferences->CPU. I would then try each CPU instruction grouping seperately to see if the problem occurs on particular types of CPU instructions. V6.0 also has more Activity trace level 2 logging for the CPU test. You can downlaod a trial version of V6.0 here:
http://www.passmark.com/download/bit_download.htm
Regards,
Ian
Leave a comment:
-
Let me just clarify my previous post cause I don't think you understood me correctly.
I started out by setting up a standard test with all necessary peripherals included. Everything passed except both CPU tests - Maths and SIMD which registered single error that I mentioned before. In the process of troubleshooting I started turning off different tests and finally ended up just with CPU Math test at 20% load. Every time the error showed up within 30min. CPU usage reported by Performance Monitor bounced between 60 - 80% and memory usage was marginal. I also disabled Paging File to make sure it didn't affect the results as we use IDE Flash Drive in this system.
I was watching the test window and I noticed (as I mentioned before) that it freezes periodically but recovers after a while. It goes on like that until it finally freezes for good. Then after several minutes the error occurs. The unit however, is perfectly responsive and when I eventually stop the test the number of operations in the main window all of a sudden increases. I believe this tells me that the test was still running in the background but results were not being updated. Please take another look at it as we really need to know what's behind this error.
We have been using BurnInTest Pro 5 on various systems for few years now and never seen it before.
Thanks,
Derek
Leave a comment:
Leave a comment: