Announcement

Collapse
No announcement yet.

MemTest86 V4.3 released

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MemTest86 V4.3 released

    MemTest V4.3 Released

    Version V4.3 of MemTest86
    was released today.

    The summary of the changes is as follows.

    • Changed default CPU selection mode to round robin. Running all CPUs at once has been shown to cause false positives on a number of systems. (See more detail below)
    • Fixed a bug that could cause the program to go into a tight loop that could not be escaped when setting certain memory ranges to test.
    • Fixed a bug displaying the memory location of individual errors. The values after the decimal point in the MB readout were incorrect.
    • Fixed a bug in configuring upper and lower memory limits, previously lower limits equal or grater than 2gb would not work, as well as some other more obsucre configurations.
    • Added a misc option to display the systems memory map.
    • Fixed a bug that would cause the number of passes to not correctly reset after changing the selected tests.
    • Added missing source code to some of the download packages.
    • Fixed a bug in test 8 causing a single error to cascade into multiple errors.
    • Fixed a bug causing the average error bits to be incorrect once the errors had maxed out at 65k
    • Fixed a bug preventing test 10 to be selected as a single test to run.
    • Fixed bug displaying individual test error counts.
    • Fixed bug making overall errors 10x what they should be.


    More detail on CPU test mode changes

    For the following discussion I am assuming multiple CPUs the same as having multiple Cores in a single CPU. In V4.2 the default CPU test mode was multi-threaded. In simple terms this meant that when more than 1 CPU was present the available RAM was split up into ranges and each memory range was tested by different CPU. This happened up until a max of 32 CPUs. This had the advantage of testing the memory faster and also testing the different memory buses, when more than 1 was available.

    The problem however is that on a small number of systems various bad behaviors were observed. Example of these bad behaviors are documented in these posts.
    Lockups near the start of testing.
    Test #3 reports a large number of errors and then sometimes freezing
    Multi-CPU mode test fails on Test #3 but passes in single threaded mode.
    Test #6 reports a small number of errors in Multi-CPU mode.

    We purchased a couple of additional test machines, matching the specs of the machines that had the errors and reproduced a number of these faults in house. After reproducing the fault we were initially optimistic about fixing the problem. Too optimistic.

    It turned out the problem was more elusive that we expected. Basically when multiple threads are running the CPU registers get corrupted. They appear to spontaneously become corrupted during the test. Causing a flood of errors. People generally encountered the problem at Test #3 as this was the first test to use multi-threading. But the other tests had similar problems. We don't know if this is a CPU errata, a bug in the way multi-threading is setup, non maskable hardware interrupts messing up the CPU's state or something more subtle. Part of the problem is also that the available debugging techniques (as MemTest86 runs without having an operating system being loaded) are extremely basic. Debugging the code probably even worse than it was in the DOS3.3 days.

    So the work-around implemented was to change the default CPU selection mode to round robin. In this mode only one CPU is used at a time, but after each test the CPU in use is rotated. So all CPUs will still get used, but only after a longer period of time.

    As MemTest86 V4 is open source, maybe someone else can take a look at the multi-threading code and get it to work. For the moment however people need to be aware that the multi-threading option can result in false memory errors being reported on some hardware. This impacts we estimate maybe ~5% of machines. Xeons with ECC RAM seemly having more problems, but the problem isn't limited to the Xeons.

    We are going to have another hack at Multi-threading in V5.0 (which to a large extent will be a re-write of the code).


    More detail on Keyboard support

    While working on the release of V4.3 we also became aware of a number of issues around keyboard support. We have known for some months there are problems with the Mac, but it turns out it is more widespread.

    The way MemTest works at the moment is that it reads keyboard scan codes from two I/O ports (port 60 and port 64 to be exact). This makes use of the AT keyboard controller interface, which has been around since about 1984.

    Once the general switch to USB keyboards took place, then BIOS was setup to emulate the behaviour of the AT keyboard controller. This means that all your software (like the DOS operating system) would keep working via this hardware emulation.

    This emulation process is covered in this document,
    Universal Serial Bus PC Legacy Compatibility Specification (http://www.otdl.com/USB_LE9.PDF)

    The problem is that some PCs, and it seems most newer Macs, don't emulate an AT Keyboard any more. For most PCs the fix is trivial, there is a setting in BIOS to enable legacy keyboard support. In fact most PCs have this on by default, so MemTest, DOS, Linux boot loaders all just work. Mac's don't have this BIOS configuration option. So it seems there is no quick fix for the Mac.

    So, in short, Apple didn't follow the USB standard regarding emulation so the keyboard doesn't work. Maybe they did this deliberately, as this is an easy way to complicate other operating systems being loaded.

    The real solution would be to write some code to detect the USB devices on the system then talk to the keyboard directly via the USB protocols. This is complicated by the fact that there are several USB standards OHCI, UHCI, (USB1.x) and EHCI (USB2.0). We made an attempt to write code to directly interface with USB Keyboards, rather than use the BIOS interface. It is technically possible, but rather complex. The USB host controllers need to be found, USB devices need to be detected and the USB protocol implemented. So probably 1000s of lines of code just to read in a key press. We started to pull out bits of code from Linux but there was no clean separation of the I/O code and we didn't want to pull in all of the Linux kernel just to support a keyboard.

    So we have abandon this attempt, as it is just too much work. Again, as V4 is open source anyone is welcome to have a go at adding support themselves.

    Our preferred solution, which we are investigating in parallel as part of the V5 development, is to develop a pure UEFI based solution. This will never provide a solution for the older BIOS only machines. The work around is to just allow MemTest86 to boot and run, without any keyboard interaction. Then power the machine down when you are done.

    In addition to all of this we have also seen some UEFI/BIOS implementations only do a partial AT keyboard emulation, or maybe they are buggy, we don't really know. In these cases we have seen for example the emulation only work when CPU/Core0 is the active core. In these cases the keyboard response will appear to be non existent until CPU0 gets it turn in the round robin. This might appear to the user as a (very) delayed keyboard response. We only see these types of problems getting worse over time as all keyboards move to the USB interface and BIOS is totally replaced by UEFI.

    Again, the medium / long term solution is to do a new version of MemTest86 that is pure UEFI. We are working on this at the moment.
Working...
X