Announcement

Collapse
No announcement yet.

Burn-In Test Application clashes with WinXP SP3?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Burn-In Test Application clashes with WinXP SP3?

    Hi,

    I have a computer system running on the following specification:

    2X Intel® Quad-Core Xeon 2.0GHz FSB1333MHz 4MB Cache
    ServerBoard with Intel(R) 5000P Chipset
    2X 1GB FB DIMM RAM
    Nvdia GF8500 graphic card
    160GB SATA HDD
    WindowsXP Professional SP3

    We did a Burn-In Test (BIT) on the system, and after a while (say about 1,2 hours) the BIT application closed automatically.
    There isn't any message left that tell us what is the error.

    BTW, there wasn't any problem encountered before when we are using WindowsXP Professional SP2.

    Please advise...

    P.S. we've tried both BIT V5.3 and V4.0, V4.0 can slightly sustaining a longer time frame of test.

  • #2
    We aren't aware of any specific problem with SP3.
    What tests do you have running? Are you able to narrow the problem down to a specific test. e.g. just the 3D test or just the RAM test?

    Comment


    • #3
      Basically, we are running most of the test and they are in 100%.
      the following test that are run:
      CPU - Maths
      CPU - MMX / SSE
      Memory (RAM)
      2D Graphics
      3D Graphics
      Disk (C: OS)
      Disk (D: Application)
      Disk (E: Images)
      Network 1
      Network 2
      Parallel Port
      Serial Port 1
      Serial Port 2

      Thank you for your response...

      Comment


      • #4
        We found a similar problem.
        we're running following tests @ 100%:

        CPU - Maths
        CPU - MMX / SSE
        Memory (RAM)
        2D Graphics
        3D Graphics
        Disk (C: OS)

        OS is windows XP pro SP3 GER (Systembuilder), BIT is 5.3 pro latest build.
        The strange thing is, that on some units it passes and other units BIT is stopped. On all XP pro SP2b and SP2c it was no problem.

        The computer we're running the test on, is a Laptop with Intel 945GM Chipset and 1.66GHz CPU, 1-2GB RAM, 120GB Toshiba HDD.

        Is there a way to create a trace log, that we can see what happened before the test was crashed?

        On some units if we start the test for a 2nd time, they pass also.

        My problem now is, that our full production test is based on your software. Currently we're not able to ship units on time. So we're really in a hurry to find a solution!

        I'll try to find out if we can focus on one test, but that's difficult, because sometimes the test crashes after 3-6 minuts and sometimes after 6 hours.

        Comment


        • #5
          This is an unrecoverable software exception in BurnInTest. The software error could occur in BurnInTest, a library BurnInTest uses (like DirectX), the Windows Operating System, a device driver, in some cases a hardware error can cause a software problem (like faulty RAM) or even the BIOS. I would say that in most cases it is BurnInTest exercising a Device driver with a bug. In your case, as it is after a long test, then maybe a device driver has a resource leak, such that when the resources are low enough, the device driver fails in some manner. There are a couple of approaches to try and determine where the fault is:

          1) We have found a couple of problems with Graphics card device drivers lately that will cause this problem with the 2D Video memory test. Could you please try the test without the 2D test and post whether the problem still occurs. If the problem does still occur, can you please reduce the test set until you have the smallest test set that provokes this problem and let us know what this test set is (e.g. just the 3D test).

          2) As the problem is often a faulty device driver, simply check you have the latest device drivers and update accordingly. If you think it is related to the graphics test, then start by updating the graphics card device driver.

          3) Get the software error information, such as:
          - The specific build number: e.g. BurnInTest Professional v5.3.1026
          - The address related to this problem, eg 0x00412345. If this information is not available from a link from the "BurnInTest has stopped working" error window (or you don't see this window) then you can normally get this information from the Windows Application event log. In XP (other operating systems will be similar) the Application event log is here:
          Start-> right click "My Computer" and select Manage, double click "Event Viewer" on the left hand side of the window, double click "Application" and find the event of type "Error" around the time the problem occurred. This will contain information like:
          "Faulting application bit.exe, version 5.0.1003.0, faulting module bit.exe, version 5.0.1003.0, fault address 0x00412345."


          EDIT: You can set up a trace file in Preferences->Logging, Set trace level to activity trace 2. However, often this does not provide specific enough detail for this type of problem. The best debugging tool for an access violation is a Minidump file, but this is only generated if you get the "BurnInTest has stopped", do you want to send this error report to Microsoft. The next best thing is as listed in the points above.

          Regards,
          Ian
          Last edited by Ian (PassMark); Sep-04-2008, 01:32 AM.

          Comment


          • #6
            1) We have found a couple of problems with Graphics card device drivers lately that will cause this problem with the 2D Video memory test. Could you please try the test without the 2D test and post whether the problem still occurs. If the problem does still occur, can you please reduce the test set until you have the smallest test set that provokes this problem and let us know what this test set is (e.g. just the 3D test).
            Ok, I'll give it a try.

            2) As the problem is often a faulty device driver, simply check you have the latest device drivers and update accordingly. If you think it is related to the graphics test, then start by updating the graphics card device driver.
            I can not imagine that this is a driver issue, because with SP2 and 100% same drivers everything is OK. We're in the defence business, so we can not just update a driver. I can do this for testing, but this will cause a lot of qualification process if we need to change.

            3) Get the software error information, such as:
            - The specific build number: e.g. BurnInTest Professional v5.3.1026
            - The address related to this problem, eg 0x00412345. If this information is not available from a link from the "BurnInTest has stopped working" error window (or you don't see this window) then you can normally get this information from the Windows Application event log. In XP (other operating systems will be similar) the Application event log is here:
            Start-> right click "My Computer" and select Manage, double click "Event Viewer" on the left hand side of the window, double click "Application" and find the event of type "Error" around the time the problem occurred. This will contain information like:
            "Faulting application bit.exe, version 5.0.1003.0, faulting module bit.exe, version 5.0.1003.0, fault address 0x00412345."
            I didn't have the build number on hand, and you're right it's not the latest one. I'll update and see what happens. It's Build 1018

            There is no error message. BIT is just closed and that's it.

            Comment


            • #7
              XP SP3 updates a whole bunch of system programs (.exe), system libraries (.dll), system registry entries and device drivers (.sys,.cat,.inf). Just maybe not the ones you are thinking of.

              Regards,
              Ian

              Comment


              • #8
                Hi Ian,
                last night we ran a btach of 23 units of the Laptop as written already and only 1 of them passed. All others (with same installation) just cloesed BurnIn test. Tested with BurnIn pro 5.3 Build 1026. Tests running:

                CPU - Maths
                CPU - MMX / SSE
                Memory (RAM)
                2D Graphics
                3D Graphics
                Disk (C: OS)

                On this batch we now run test again, but skip one of the running tests. On some other units we only run one of that tests listed above. The problem is, that sometimes if we launch the test once more the unit can pass.

                I have another unit, with:
                Supermicro M/B X7DBE-X
                Quadcore CPU
                2GB RAM
                2x Matrox Display wall conrtoller. (one card has 8 Monitor outputs)
                2x Seagate Rugged HDD running mirrored RAID

                CPU - Maths
                CPU - MMX / SSE
                Memory (RAM)
                2D Graphics
                Disk (C: OS)

                no 3D Test, because Matrox Cards don't support.
                All the drivers and BIOS are defenetly the latest that are availabel.
                I'll try to collect some more infomration from the event log later.

                Regards
                Malte

                Comment


                • #9
                  Malte,

                  How long into the test was it before they failed (e.g. did they all fail around the same time of e.g. 2 hours)?

                  We have made some changes in BurnInTest for this type of problem, v5.3.1028.

                  Can you please use this new build with one of the laptops that failed to:
                  (i) see if the problem occurs, or whether you at least now get the option to send the error report to Microsoft (which we can get from them);
                  (ii) reduce the test set to the smallest, e.g. Disk and Memory.
                  We have another report of this behavior and their testing seems to indicate that it may be related to the disk test, in conjunction with another test (but this may have nothing to do with your problem).

                  So, if you could first retry the test without the disk test and see if you don't get the problem. If you don't, then could you please try the disk test with SMART turned off and the test pattern set to sequential (rather than cyclic)?

                  The latest 32-bit build can be downloaded here (currently 5.3.102.
                  http://www.passmark.com/ftp/bitpro.exe

                  Please do your testing with Activity trace testing turned off, but periodic logging turned on to e.g. 1 minute.

                  Thanks.
                  Ian

                  Comment


                  • #10
                    Originally posted by Ian (PassMark) View Post
                    How long into the test was it before they failed (e.g. did they all fail around the same time of e.g. 2 hours)?
                    Unfortunatly I have no idea, because tha failed at night when nobody was here. We started next Testrun 1,5h ago. So we might have more results soon.
                    Originally posted by Ian (PassMark) View Post
                    We have made some changes in BurnInTest for this type of problem, v5.3.1028.

                    Can you please use this new build with one of the laptops that failed to:
                    (i) see if the problem occurs, or whether you at least now get the option to send the error report to Microsoft (which we can get from them);
                    (ii) reduce the test set to the smallest, e.g. Disk and Memory.
                    We have another report of this behavior and their testing seems to indicate that it may be related to the disk test, in conjunction with another test (but this may have nothing to do with your problem).
                    I'll wait the results with Build 1026 first, than update to 1028
                    Originally posted by Ian (PassMark) View Post
                    So, if you could first retry the test without the disk test and see if you don't get the problem. If you don't, then could you please try the disk test with SMART turned off and the test pattern set to sequential (rather than cyclic)?
                    I'll have a look for it also, but as I have some drives running in SATA with conventional setup (like IDE) on the Laptops, and a RAID configuration on my DisplayWall Controller. I don't really think this is the problem.

                    On one of the Laptops we don't run HDD test so this unit should pass, if your guess is correct.

                    Comment


                    • #11
                      Thanks.

                      FYI, if you set Preferences->Logging, logging interval to 1 minute, this will output interim results every 1 minute and hence give you a good indication when BurnInTest closes.

                      Regards,
                      Ian

                      Comment


                      • #12
                        It seems to be a combination of various tests.
                        The unit that didn't run the 2D graphics test but all others, did fail. All other units are still running since mor the 5h now.

                        I'll update with http://www.passmark.com/ftp/bitpro.zip to build 1028 now and test this failed unit once more, to see what happened.

                        By the way, there is absolutely nothing written to the event log. So I'll setup the testrun as you told before.

                        Comment


                        • #13
                          Malte,

                          Thanks.

                          I assume you are using the 2D "All video memory" test. BIT5.3.1028 has two 2D Video memory test crash workarounds implemented. Crashes in (i) DirectX DirectShow and (ii) ATI atiumdag.dll library. So it will be interesting to see if this solves your problem.

                          Regards,
                          Ian

                          Comment


                          • #14
                            Hi Ian,
                            I now did the test with the new build, but it's same result.
                            There is still absolutely no error message, that is really strange to me.

                            I have 2 log files and a trace log:

                            Code:
                            ****************************************
                            RESULT SUMMARY - INTERIM PERIODIC REPORT
                            ****************************************
                            Test Start time: Thu Sep 11 13:59:36 2008 
                            Current time: Thu Sep 11 20:42:36 2008
                            Test Duration: 006h 43m 00s 
                            Test Name                   Cycles   Operations      Result Errors   Last Error
                                          CPU - Maths   11558    2.107 Trillion  PASS   0        No errors
                                           CPU - SIMD   8421     2.015 Trillion  PASS   0        No errors
                                         Memory (RAM)   45       108 Billion     PASS   0        No errors
                                          2D Graphics   1092     314 Million     PASS   0        No errors
                                          3D Graphics   188      377001       Ware komplett kommissioniert, Vollzaehligkeit geprueft:     i.O. 
                            Geprueft von: NB04 
                            Sichtpruefung:                                              i.O. 
                            Geprueft von: NB04
                            Code:
                            ****************************************
                            RESULT SUMMARY - INTERIM PERIODIC REPORT
                            ****************************************
                            Test Start time: Thu Sep 11 13:49:42 2008 
                            Current time: Thu Sep 11 22:36:42 2008
                            Test Duration: 008h 47m 00s 
                            Test Name                   Cycles   Operations      Result Errors   Last Error
                                          CPU - Maths   15046    3.089 Trillion  PASS   0        No errors
                                           CPU - SIMD   11147    3.217 Trillion  PASS   0        No errors
                                         Memory (RAM)   75       181 Billion     PASS   0        No errors
                                          2D Graphics   1107     247 Million     PASS   0        No errors
                                          3D Graphics   167      334246          PASS   0        No errors
                                           Disk (C: )   11       28.380 Billion  PASS   0        No errors
                             
                            ****************************************
                            RESULT SUMMARY - INTERIM PERIODIC REPORT
                            ****************************Ware komplett kommissioniert, Vollzaehligkeit geprueft:     i.O. 
                            Geprueft von: NB06 
                            Sichtpruefung:                                              i.O.
                            Please let me know where to send the trace log, if you need it.
                            I also can send you the configuration file and an extraction of the batch we use for launching BIT.

                            We're using a batch fil that waits for BIT to be closed then process other testlogs to the same logfile. That's what you can see here, BIT is writing the log, but don't finish. Than the batch file continues.
                            Code:
                            if exist %_sn%.log del %_sn%.log
                            echo SETLOG "%_sn%.log" >> tmp-script.txt
                            echo SETMACHINETYPE "%_model%" >> tmp-script.txt
                            echo SETSERIAL "%_sn%" >> tmp-script.txt
                            echo SETNOTES "Assembly order: %_fa% Tester ID: %_test%" >> tmp-script.txt
                            type tmp-script.txt >> bit-script-input.txt
                            start /wait bit -p -c .\stresstest.bitcfg -r
                            Also the running time is always different I saw units that only ran less than 1h, these ran more tha 6h AND we had units that passed w/o any problem. In this batch it was 1, in other batches they passed when launching the test for a 2nd time. It's totally confusing me.

                            Comment


                            • #15
                              Malte,

                              Did you find any pattern removng tests? Can you please tell me which test sets caused the problem (e.g. it can be interesting to know that the problem occured without the 2D test).

                              Is there anything in the Windows System or Application event logs around the time of the failure?
                              Some information about event logging is here:
                              http://support.microsoft.com/kb/308427

                              I don't read any German, what does "Ware komplett kommissioniert, Vollzaehligkeit geprueft: i.O. " mean and why is it in the log report?

                              Please send the trace file and configuration to
                              help [at] passmark [dot] com

                              I will send you a debug version of BurnInTest in response.

                              Thanks.

                              Regards,
                              Ian

                              Comment

                              Working...
                              X