No announcement yet.

Network cable errors and how to best catch them.

  • Filter
  • Time
  • Show
Clear All
new posts

  • Network cable errors and how to best catch them.

    Running Linux BIT v3_1008 and I notice that there exits several conditions were the Ethernet cable could become unplugged and BIT not report an error. This seemed odd so I did some testing and I found that because I had turned off "fail on any dropped packets" and left the timeout and percent set to defaults values (2000ms and 1% I think) you can have a condition exits where BIT won't think the error is significant enough to flag a failure.

    I was able to reproduce this and it has me very concerned about the validity of these tests as any failure later in the tests will not be recognized as a significant error. The steps I take are:

    1) Start BIT running with all cables connected.
    2) Let several thousand packets go thru.
    3) Disconnect Ethernet cable.

    Unless a significant time is allowed for error packets to accumulate, the percentage of failures to good packets never gets high enough to flag a failure. Even setting the timeout very low doesn't help if the error occurs late in the test cycle (I usually run for 5 minutes). I can only set the percentage to 1% and that to high for this error.

    I find this completely unacceptable as the main purpose of this testing is catch parts failing after they warm up. On or two missed packets is acceptable but there doesn't seem to be a way to specify how many packet errors are acceptable.

    I took a bunch of screen captures if more detail is required.



  • #2
    The error ratio is calculated on the number of bad packets since the last successful send so there needs to be a minimum number of bad packets for this ratio to be triggered, this would also be affected by how long the test has been running without an error (more packets sent successfully would requite more bad packets to trigger the error condition).

    For example at 1% and 2000ms timeout, from the start of the test 100 packets would need to be sent and if each of those time out that would be about 3.5 mins. If the test had already sent 1,000,000 packets then 10,000 bad packets would be needed to trigger the error condition. A smaller timeout could reduce the amount of time waiting for the error condition.

    We're currently updating BurnInTest to work with some new hardware so we'll look into if there is a reliable way to monitor the connection status in Linux.


    • #3
      Thanks for the reply. I'm concerned that a hardware failure (say due to temperature) after a few minutes running would not be caught using this method. I can't use the option "fail on any missed packet" because I see an occasional dropped packet which is acceptable. Is it possible to add an entry field for max number of packets lost allowed? That way one could specify a number (say 10) as too many and not have to rely on a percentage to trigger a failure.