Announcement

Collapse
No announcement yet.

Windows 11 Hive Corruption Error during BIT?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Windows 11 Hive Corruption Error during BIT?

    I am logging Hive Recovery errors (image attached) during BIT testing of new builds. It appears that the systems are attempting to create a Restore Point and fail. There should be no driver or OS updates happening as the systems are disconnected from the network during testing, therefor Restore Point creation shouldn't be triggered. I have to wonder if the Restore Point creation is in response to a driver loaded from BIT?

    The Hive Recovery errors occur about 30% of the time and are relatively new (last 6 month or so). Sometimes I can reload the OS (Windows 11) and test again with no issues. Sometimes the issue persists. There is a very predictable temporary spike in temperature during the attempt to create the Restore Point (Hive Recovery errors or not). The event always happens between 5 and 20 minutes into the 12 hour test and doesn't happen again. BIT never logs errors and the systems pass the tests configured (CPU, RAM, 2D Video @ 80%) with nothing unusual in the BIT log.

    I suppose that I could disable System Restore during the testing, but I think that I would lose previous Restore Points and would rather not do that if at all possible.

    This has spanned a couple versions of BIT. It is not limited to the most recent version.

    Any thoughts?
    Attached Files

  • #2
    Hive recovery events aren't a known issue. Never been reported before as far as we know.
    As we have never seen this before, I did some Googling. There are a huge number of people with similar issues on Google, but none of them seem to be related to BurnInTest.
    For the issues read up on Google, they all seemed to be related to hardware failure. RAM or Disk most commonly.

    BurnInTest make no attempt to create a restore point. And even if it did, it doesn't really make sense that this would happen 20min into a test run, (A device driver is used to collect the system information, like RAM model numbers and clock speeds, but this happens at BIT launch, and not after 20min). We've never observed this provoking the creation of a restore point.

    Are these machine using a clean windows install?
    If you just turn the machines on and wait an hour, with the machines idle, does it create a restore point?

    Comment


    • #3
      Nothing is 100%, but I do not believe this to be strictly hardware related. They are clean installs (very vanilla) and the systems pass BIT and NEVER show the error again. They do not exhibit hardware related issues or failures afterwards. It only happens during the testing.

      I misspoke about the 20 minutes. We had one system not exhibit the behavior until 40+ minutes into the 12 hours on Saturday.

      A system can have the failure event, get reloaded from scratch (same hardware, drivers, updates, everything) and it complete the restore point during BIT next time. You can still see the restore points being created in Event Viewer (and the temp spike in BIT), but the restore point succeeds. To me, it seems like the OS is having a hard time creating the restore point due to the resources being used up by BIT (which makes sense). The questions I have is why the systems are triggering a restore point creation during BIT (at seemingly random times). They always happen early in testing (and not again). I believe that Task Scheduler creates restore points at specific times each day, but it should be the same time every day and shouldn't happen if recent restore points have already been made.

      It doesn't seem to be random or isolated as I have personally had it happen on dozens of systems, but it could be isolated to our specific situation. The hardware (and drivers) that are being tested are the same each time (all of the systems we build are nearly identical). We don't run anything odd in the testing, just CPU, RAM, and 2D Video at 80% for 12 hours, no SSD testing in BIT.

      We will have to do more experimentation with the timing of the events. Thus far, we build the systems, install the OS, drivers, and updates, restart and immediately start BIT. I will have to see if:

      A. Is a system restore point created during BIT on a system that has been running for any period of time?

      or

      B: If we wait an hour before running BIT, does the system create a restore point prior to running BIT?

      The odd thing is this is relatively new to us. We haven't experienced this behavior until relatively recently (last several months). That doesn't mean BIT necessarily. It could have been a change in hardware, drivers or even Windows updates. It is just not something we are used to seeing.

      Comment


      • #4

        Maybe Windows running Windows Update and there is a restore point for that.

        If there is a scheduled job to create a restore point you could try disabling that (my Win11 machine doesn't have a scheduled task for this).

        Comment


        • #5
          We have run a few tests. If we wait 1 hour after completing the build and updating Windows, we still see the temp spike and failed System Restore Point creation events.

          If we wait 2 hours, we don't see the spike, nor the consequent failure events. It appears from the logs that within those first two hours, even after running all the updates and disconnecting the system from the network (unplug Ethernet), the system sill runs several update tasks and installations, thus triggering Restore Point activities.

          What concerns me is that this hasn't always happened. We have seen the spiked temperatures (CPU) for some time and after connecting those to the System Restoe events, we weren't overly concerned. The spikes weren't too high and they were very quick. What we haven't seen is the Restore Point failures and potential corruption warnings in Event Viewer (until relatively recently).

          Comment

          Working...
          X