Announcement

Collapse
No announcement yet.

CPU-Maths - no operations reported in timeout period

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • David (PassMark)
    replied
    So in fact the problem was pretty much exactly what we suggested it might be a few weeks ago. A problem with the system timers.

    You have a timer which should increase steadily over time, counting backwards from time to time.

    Leave a comment:


  • Tim (PassMark)
    replied
    We seem to have found the cause of the problem, one of the Windows API functions we use for timing the length of certain tests (QueryPerformanceCounter, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) is returning unreliable values on your hardware, consequently this results in some tests becoming stuck executing while not updating any of the results and triggering the watchdog timer (to flag the error) after a certain time.

    Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.

    The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .


    Geode results:
    Code:
       
    1      3,579,545     970,856,713     3,572,548
    2      3,579,545     957,664,136     -13,192,577
    3      3,579,545     961,248,841     3,584,705
    4      3,579,545     964,833,608     3,584,767
    5      3,579,545     968,418,269     3,584,661
    6      3,579,545     972,002,932     3,584,663
    7      3,579,545     992,364,858     20,361,926
    8      3,579,545     995,949,537     3,584,679
    9      3,579,545     999,534,244     3,584,707
    10     3,579,545     1,003,119,028    3,584,784
    11     3,579,545     989,926,436     -13,192,592
    12     3,579,545     993,511,135     3,584,699
    13     3,579,545     997,095,848     3,584,713
    14     3,579,545     1,000,680,522   3,584,674
    15     3,579,545     1,004,265,263   3,584,741
    16     3,579,545     1,024,627,167   20,361,904
    17     3,579,545     1,028,211,841   3,584,674
    18     3,579,545     1,031,796,543   3,584,702
    19     3,579,545     1,035,381,265   3,584,722
    20     3,579,545     1,038,965,967   3,584,702
    21     3,579,545     1,025,773,431   -13,192,536
    E8400 results
    Code:
    1     3,000,060,000     61,949,764,826,091     2,990,583,234
    2     3,000,060,000     61,952,764,860,855     3,000,034,764
    3     3,000,060,000     61,955,764,868,889     3,000,008,034
    4     3,000,060,000     61,958,764,832,661     2,999,963,772
    5     3,000,060,000     61,961,764,860,666     3,000,028,005
    6     3,000,060,000     61,964,764,858,818     2,999,998,152
    7     3,000,060,000     61,967,764,858,032     2,999,999,214
    8     3,000,060,000     61,970,764,865,544     3,000,007,512
    9     3,000,060,000     61,973,764,894,080     3,000,028,536
    10    3,000,060,000     61,976,764,904,535     3,000,010,455
    11    3,000,060,000     61,979,764,908,762     3,000,004,227
    12    3,000,060,000     61,982,764,942,860     3,000,034,098
    13    3,000,060,000     61,985,764,907,838     2,999,964,978
    14    3,000,060,000     61,988,764,912,344     3,000,004,506
    15    3,000,060,000     61,991,764,921,764     3,000,009,420
    16    3,000,060,000     61,994,764,930,248     3,000,008,484
    17    3,000,060,000     61,997,764,938,408     3,000,008,160
    18    3,000,060,000     62,000,764,948,755     3,000,010,347
    19    3,000,060,000     62,003,765,004,309     3,000,055,554
    20    3,000,060,000     62,006,764,969,161     2,999,964,852
    21    3,000,060,000     62,009,764,990,443     3,000,021,282
    As you can see in the Geode results the current counter values and difference from the previous sample jumps around significantly, even resulting in negative values, a negative should only ever occur if the counter has reached it maximum value and started again (something that should only occur after days of system uptime). In comparison the E8400 results are very consistent, the current counter value always increases and the last sample minus the first sample when divided by the frequency works out as 20 minutes elapsed during the test. This doesn’t hold true for the Geode and the count value even seems to wrap around close to 0 about 10 mins into the test.

    While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.

    Leave a comment:


  • David (PassMark)
    replied
    The links to the images are broken (they might be on your intranet, which we can't access). But sounds OK.

    Leave a comment:


  • WillTA3@CDI
    replied
    Thanks in advance.

    Thanks in advance for your help in this matter. I have verified our issue on multiple systems. In order to make troubleshooting a little easier I have put together this test sled for your use.

    The test sled, has the SBC in question with I/O cabled out for ease of use. It is loaded with a registered and updated version of Windows XP and all drivers installed. Unit should boot upon arrival and has Burn In Test loaded. Failure will occur within the first Hour to Hour and a Half of running. This has been verified using this same sled at our location.

    Please see below pictures.


    Leave a comment:


  • David (PassMark)
    replied
    The post you linked to above does mention an issue on the Geode back in 2006. But this issue was corrected in Release 5.1 build 1012, 15/August/2006. The symptoms are also not identical. Their issue was a single error being flagged right at the end of a 12 hour run.

    There was also this issue from early 2007,
    http://www.passmark.com/forum/showthread.php?t=800
    we suggested it might be a hardware / device driver fault, maybe with the system timer. And this turned out to be the case, the customer later stated, "My colleagues who have been investigating have found a problem with time handling by the platform. We have a fix and will be rerunning tests this week."

    If you want to ship us something you can do so at this address,
    http://www.passmark.com/about/contact_us.htm
    Address it to Ian Robinson. And include a printout of this page & your contact details.

    Please don't ship us something that doesn't just work. We have had cases where we have been sent motherboards, but without a compatible CPU, nor RAM, nor device drivers, etc.. We don't want to spend days trying to get it to even boot.

    Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure
    I am not sure if I understand the question, but you can move the software between machines.

    A DEBUG build is one where we add in extra logging, to investigate a particular problem. We don't have one for this issue. We could do one, but my fear is that it might not show anything more than what was in the event log you already posted 9 months ago. And so might spend several weeks doing stuff by trial and error. But we could give this a go if you want.

    Leave a comment:


  • WillTA3@CDI
    replied
    Couple of questions. First I was perusing the forums for a similar failure mode that we are seeing and I came up with this: http://www.passmark.com/forum/showthread.php?t=474

    Any way I can see if the DEBUG version solves my issues as well?
    Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure? It does not seem like this issue is going to be dropped.

    And third, you had offered for us to send you a board for your help to debug this issue. I am now in a position that requires this. Is there a debug tracking number that I can use with my shipping paperwork?

    Leave a comment:


  • WillTA3@CDI
    replied
    I have confirmation that no hardware was shipped to Passmark on this Topic, though I am not sure why. Since this topic was created have you had anyone else with this type of issue on a GEODE based processor?

    I am in need of a paragraph explaining what is happening during this failure and a statement that it is or is not an issue.

    Leave a comment:


  • David (PassMark)
    replied
    I don't think anything was shipped to us in the end. But I will check. Contact details are here.

    Leave a comment:


  • WillTA3@CDI
    replied
    About Derek@CDI

    Hello, My name is Will Anniss and I am the Technology Leader where Derek worked. He was working with you on an issue that we were seeing with an AMD board and the Passmark test. Materials were shipped to you to conduct said tests on the SBC.

    Who should I contact to discuss this further?

    Leave a comment:


  • David (PassMark)
    replied
    I'll send you an E-Mail to arrange the details.

    Leave a comment:


  • derek@CDI
    replied
    I realize that Geode is not a mainstream for the consumer market but in the embedded world it is quite popular. Although as you said the problem may be specific to this particular board and not the CPU/chipset in general. We've built close to 40 units so far based on this hardware in couple different flavors and all of them during testing showed that error.
    We appreciate you assistance as resolving this problem is important to us. We are willing to ship the hardware to you if you think it will help. Is there a procedure we need to follow? Is your shipping address the same as listed on the website? Thanks.

    Leave a comment:


  • David (PassMark)
    replied
    We have never done any testing on the Geode CPU and the associated hardware. In principle it should be OK, but clearly something is going wrong. From what you have described the display thread is crashing. So the display stops updating and the watchdog timer triggers some time later. For here we can't tell you the exact cause of this crash. I assume you don't get any pop windows with crash details?

    This is not a known issue & no other customer has reported behaviour like this. Do you have a 2nd example of this device on which you can run the test and compare the behaviour. Given that no other customer has reported the issue it might well be hardware related. e.g. a bad CPU, bad RAM or bad Bus.

    If these were cheap devices you could ship us one, and we then we could do a much deeper investigation of the problem (and ship it back at the end if required).

    Leave a comment:


  • derek@CDI
    replied
    Time has passed but the problem is still here. Going along with your suggestion we purchased v6 of BIT Pro. Unfortunately the symptoms remained the same. I ran several tests selecting only each group of instructions separately and they all produce the same results. 15-20 min within the test CPU test windows stops updating, CPU usage however, remains unchanged - around 87%. Also, changing CPU load doesn't seem to affect the results either.
    I'm including the trace from one of the tests below. Please try to be more specific in your explanation. Thanks.

    PassMark BurnInTest Log file - http://www.passmark.com
    ================================================== ======

    Date: 06/04/09 14:44:14

    BurnInTest V6.0 Pro 1014
    Trace detail level: Activity trace 2

    **************
    SYSTEM SUMMARY
    **************
    Windows XP Professional Service Pack 2 build 2600 (32-bit),
    1 x Geode(TM) Integrated Processor by AMD PCS [499.9 MHz],
    503MB RAM,
    AMD Custom Driver For 640x640 Panel,
    4GB HDD,


    GENERAL
    System Name: OEM-AF71FC943EB
    System Model: AWRDACPI
    Motherboard Name: AMD-GX3
    BIOS Manufacturer: Phoenix Technologies, LTD
    BIOS Version: 6.00 PG
    BIOS Release Date: 08/28/2009

    CPU
    CPU manufacturer: AuthenticAMD
    CPU Type: Geode(TM) Integrated Processor by AMD PCS
    CPUID: Family 5, Model A, Stepping 2
    Physical CPU's: 1
    Cores per CPU: 1
    Hyperthreading: Not capable
    CPU features: MMX 3DNow!
    Clock frequencies:
    Measured CPU speed: 499.9 MHz
    Cache per CPU package:
    L1 Instruction Cache: 2 x 64 KB
    L1 Data Cache: 2 x 64 KB
    L2 Cache: 2 x 128 KB

    MEMORY
    Total Physical Memory: 503MB
    Available Physical Memory: 380MB
    Memory devices:
    A0:
    - 512MB,
    A1:
    - 8MB,
    A2:
    - 8MB,
    A3:
    - 8MB,

    GRAPHICS
    AMD Custom Driver For 640x640 Panel
    Chip Type: GeodeLX
    DAC Type: Internal DAC
    Memory: 8MB
    Driver provider: Advanced Micro Devices
    Driver version: 3.0.2.0
    Driver date: 6-19-2009
    Monitor 1: 640x640x32 70Hz (Primary monitor)

    DISK VOLUMES
    C: Local drive, NTFS, (3.83GB total, 2.49GB free)

    DISK DRIVES
    Disk drive: Model Netlist Flash v2.0 (Size: 3.83GB)

    OPTICAL DRIVES

    NETWORK
    Intel(R) 8255xER PCI Adapter

    PORTS
    Communications Port: COM2 - RS232 Serial Port (max Baud rate: 115200)
    Keyboard Port: PS/2 connector
    Mouse Port: PS/2 connector
    Standard OpenHCD USB Host Controller
    Standard Enhanced PCI to USB Host Controller
    ******************
    DETAILED EVENT LOG
    ******************
    LOG NOTE: 2009-06-04 14:44:14, Save Preferences before: C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824
    LOG NOTE: 2009-06-04 14:44:14, General, C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824, 27824
    LOG NOTE: 2009-06-04 14:44:14, Save Preferences after: C:\Documents and Settings\User\My Documents\PassMark\BurnInTest\LastUsed.bitcfg, 27824
    LOG NOTE: 2009-06-04 14:44:14, Status, PassMark BurnInTest V6.0 Pro 1014
    LOG NOTE: 2009-06-04 14:44:15, Status, Main Tests started
    LOG NOTE: 2009-06-04 14:44:15, Perform test: CPU at 50%
    LOG NOTE: 2009-06-04 14:44:15, CPU, Starting test
    LOG NOTE: 2009-06-04 14:44:16, CPU, CPU General test: CPU 0 Cycle 0 Ops 0
    LOG NOTE: 2009-06-04 14:52:14, Operation watchdog for 1 is 8419133448 (previous 0)
    LOG NOTE: 2009-06-04 15:00:14, Operation watchdog for 1 is 8419133448 (previous 841913344
    WARNING: 2009-06-04 15:00:14, CPU, No operations reported in timeout period
    LOG NOTE: 2009-06-04 15:00:14, Operation watchdog triggered for 1 is 8419133448 (previous 841913344
    LOG NOTE: 2009-06-04 15:08:14, Operation watchdog for 1 is 8419133448 (previous 841913344
    LOG NOTE: 2009-06-04 15:14:15, StopTests [2]
    LOG NOTE: 2009-06-04 15:14:15, Test run stopping - step 0.0. 0: 0 of 1
    LOG NOTE: 2009-06-04 15:14:15, CPU, Stopping test
    LOG NOTE: 2009-06-04 15:14:18, Status, Test run stopped

    **************
    RESULT SUMMARY
    **************
    Test Start time: Thu Jun 04 14:44:14 2009
    Test Stop time: Thu Jun 04 15:14:18 2009
    Test Duration: 000h 30m 04s

    Test Name Cycles Operations Result Errors Last Error
    CPU 52 95.788 Billion FAIL 1 No operations reported in timeout period
    TEST RUN FAILED

    *******************************************
    SERIOUS ERROR SUMMARY FOR THE LAST TEST RUN
    *******************************************

    -----------------------------------------------------------------------------------------------------
    ******************
    DETAILED EVENT LOG
    ******************
    LOG NOTE: 2009-06-04 15:33:36, StopTests [6]
    LOG NOTE: 2009-06-04 15:33:36, Test run stopping - step 0.0. 0: 0 of 0

    Leave a comment:


  • Ian (PassMark)
    replied
    As with David's post, I would first suspect device driver, BIOS or hardware issues.

    If you run the CPU test at 100% load, does the problem occur more quickly?

    In V5 of BurnInTest, the CPU math test operation count is increased in the CPU math test thread, while the CPU MMX test operation count is increased in the CPU MMX display thread.

    The next step would be to determine if the CPU test thread or the CPU test windows update thread has stopped, if you open task manager while running BurnInTest and watch the CPU load. If the CPU test is running, you should see load on all CPU cores - in which case the display thread has stopped. If the CPU test is not running, you should not see load on all CPU cores - in which case the cpu test thread has stopped.

    If the CPU test thread has stopped, I would try V6.0 of BurnInTest as you have much more control over which CPU instructions are run from Preferences->CPU. I would then try each CPU instruction grouping seperately to see if the problem occurs on particular types of CPU instructions. V6.0 also has more Activity trace level 2 logging for the CPU test. You can downlaod a trial version of V6.0 here:
    http://www.passmark.com/download/bit_download.htm

    Regards,
    Ian

    Leave a comment:


  • derek@CDI
    replied
    Let me just clarify my previous post cause I don't think you understood me correctly.
    I started out by setting up a standard test with all necessary peripherals included. Everything passed except both CPU tests - Maths and SIMD which registered single error that I mentioned before. In the process of troubleshooting I started turning off different tests and finally ended up just with CPU Math test at 20% load. Every time the error showed up within 30min. CPU usage reported by Performance Monitor bounced between 60 - 80% and memory usage was marginal. I also disabled Paging File to make sure it didn't affect the results as we use IDE Flash Drive in this system.
    I was watching the test window and I noticed (as I mentioned before) that it freezes periodically but recovers after a while. It goes on like that until it finally freezes for good. Then after several minutes the error occurs. The unit however, is perfectly responsive and when I eventually stop the test the number of operations in the main window all of a sudden increases. I believe this tells me that the test was still running in the background but results were not being updated. Please take another look at it as we really need to know what's behind this error.
    We have been using BurnInTest Pro 5 on various systems for few years now and never seen it before.
    Thanks,
    Derek

    Leave a comment:

Working...
X