Announcement

Collapse
No announcement yet.

PCIe test card errors

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCIe test card errors

    Hi;

    I just got a Passmark PCIe test card. Using this on my ASUS X99-E WS/USB 3.1 system under Windows 7 64 bit I ran a loopback test for a few minutes.

    I see that I get multiple uncorrectable non-fatal errors. Is this normal? An example end summary is




    I get errors like:

    Sat Jul 22 14:24:30 2017: Benchmark - Loopback blocks 3018753?3019776: Min 744.8 MB/s, Max 1007.6 MB/s
    Sat Jul 22 14:24:30 2017: Benchmark - Loopback blocks 3019777?3020800: Min 967.2 MB/s, Max 1009.1 MB/s
    Sat Jul 22 14:24:30 2017: ERROR - PCIe data transfer error: 3104 of 4096 bytes incorrect.
    Sat Jul 22 14:24:30 2017: ERROR - PCIe Transaction layer error: 17 Completer Abort error(s) (Uncorrectable - Non Fatal)
    Sat Jul 22 14:24:30 2017: ERROR - PCIe Transaction layer error: 4 Completion Timeout error(s) (Uncorrectable - Non Fatal)
    Sat Jul 22 14:24:30 2017: ERROR - PCIe Transaction layer error: 17 Unexpected Completion error(s) (Uncorrectable - Non Fatal)
    Sat Jul 22 14:24:30 2017: Benchmark - Loopback blocks 3020801?3021824: Min 948.6 MB/s, Max 1009.9 MB/s

    ...

    Sat Jul 22 14:25:09 2017: Benchmark - Loopback blocks 4276225?4277248: Min 773.0 MB/s, Max 1009.9 MB/s
    Sat Jul 22 14:25:09 2017: ERROR - PCIe Transaction layer error: 1 Completer Abort error(s) (Uncorrectable - Non Fatal)

    Thanks;

  • #2
    Hi,

    Do you see these errors on a particular slot or it happens on all slots?
    Can you repeat the test using a different machine to make sure the PCIe Test Card is working as expected?
    Can you run the loopback test for 10 minutes and send us the log file?

    Comment


    • #3
      Originally posted by HamidK (PassMark) View Post
      Hi,

      Do you see these errors on a particular slot or it happens on all slots?
      Can you repeat the test using a different machine to make sure the PCIe Test Card is working as expected?
      Can you run the loopback test for 10 minutes and send us the log file?
      I'm sorry that this description is going to be long winded, but it will hopefully give you some background for the data I'm attaching.

      I suspect that my motherboard is having some sort of intermittent hardware issue.

      I am seeing two bad things from my MB intermittently without any obvious pattern.

      1) Intermittent Event 129s. These seem to say that my M.2 SSD NVMe drive (mounted in an adapter card in slot 6) was reset by the driver because of a timeout. This seems to happen in clusters at variable times between 7 - 14 days apart, and happens with the M.2 in all the slots as far as I can tell. (This is a Samsung 960 Pro, and I tried a Samsung 950 Pro with the same results).

      and

      2) I get occasional WHEA 18 Events , and Bugcheck 124 crashes complaining about a Machine check error. These are about
      14-34 days apart and can not be produced on demand.

      I suspected that these might be related. My system seems to run o.k. between problems. Stress tests like Furmark, Prime95, Linx don't produce any errors.

      So I bought your PCIe test card to see what it might show. Here's what I've seen so far.

      A) The errors in my first post are from when I first put the PCIe Test card in slot 7. I was seeing a lot of complaints right away about several types of non-fatal errors.

      I got a few Event 129 'SECNVMe' events from the M.2 driver while I was running a linear read test on the drive around this time.

      I have a 2 minute PCIe test log from this time named PCIe-Errors-2.txt. It wasn't hard to see lots of errors in the test logs from multiple runs.

      B) By the next day when I tried things again, I was not seeing nearly as many errors being reported by the test card, and the M.2 drive didn't seem to be getting any Event 129 resets. (I didn't move the card from it's slot.)

      I ran a 10-minute test and saved the log as PCIeTestlog-10min.txt. I only see a few 'Completer Abort Errors'.

      I don't know why things are improved now, but I guess that might match the pattern I was seeing where my system seems to run o.k. for a while, but intermittently fails for no obvious reason. Reported Voltages seem about the same (all green except for the external power connector which is not in).

      C) I moved the PCIe test card to slot 5, and now I am able to get a run of 10 minutes with zero errors.

      This is in PCeTestlog.-slot-5.txt, and I ran this test soon after the B) test.

      So, I would guess that either my system or the test card is improved today.

      I can try to put the card in a different system, or maybe move it back to slot 7 to see how it reports.
      Attached Files

      Comment


      • #4
        Yes, we think it is worthwhile to try the card in a known good machine. This should help narrow down the problem.

        Comment


        • #5
          Originally posted by David (PassMark) View Post
          Yes, we think it is worthwhile to try the card in a known good machine. This should help narrow down the problem.
          I moved the card back to slot 7, and I'm seeing no errors (even though previously I was seeing some errors the other day).

          I have a question about voltages though. I hooked up my Fluke 189 to the card. I used the 'Ground' pin on the card, and measured the voltage at the 12 V and 3.3 V pins extending out of the card. (Looking at the Voltage LED side, the 12V pin at the lower left and the 3.3 V pin at the upper right with the 4X connector pointing down).

          The 12V I measured at the card test pin was different than I expected. (This is without the card being connected to an external 12V connection).

          12V Pin on Card 11.538 V (measured with Fluke meter).

          12V reported by MB 12.096 V (from AIDA64).

          12V reported by PCIe 12.0 V - 12.1 V
          Test program

          12V measured at EATX 24-Pin MB connector = 12.068 V (measured by meter)

          I'm a little confused as to why the 12V at the test pin measured noticeably less than what is reported by the PCIe Test program.

          The 3.3 V pin and PCIe test seemed to match much better ( PCIe Test says 3.3 - 3.3 V. Meter reads 3.3047 V).

          Is this expected?

          All the test cards voltage LEDs are green (except the two because the external power connector isn't hooked up).

          Thanks;

          Comment


          • #6
            Sometimes a noisy power supply can also lead to these type of intermittent failures. It might be worth changing the PSU.
            Regarding your measurement, the 12V test-pin is connected to B2 pin on the PCIe slot via two diodes, and a measurement of 11.4~11.6 indicates a correct voltage on B2 (due to a voltage drop across two diodes).

            Comment

            Working...
            X