Announcement

Collapse
No announcement yet.

Minnowboard Turbot multi-CPU failures

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Minnowboard Turbot multi-CPU failures

    It was reported to us, and we have since verified, that running memtest86 (7.4) in multi-CPU mode on a Minnowboard Turbot shows errors. Running in single CPU mode (on any core) does not show errors. These results were repeated across multiple boards with both 2-core and 4-core CPUs.

    The Turbot boards are based on the Intel Bay Trail SoC platform with memory down (fixed memory). We tested an Intel NUC DN2820FYKH which is also based on the Bay Trail SoC family, but with DIMM memory, and it also failed with the same characteristics.

    We ran memtest86+ for comparison (yes, I understand the history there). This required us to use a Coreboot BIOS with legacy support instead of the Intel UEFI BIOS normally loaded on the Turbot boards. Those tests passed, even in multi-CPU modes. We also ran multiple instances of memtester under Linux, again with no errors reported. This is leading us to believe the hardware is fine.

    Errors are reported in tests 3, 4, 5, 8, 9, and 13, with the error mask only showing the bottom 8 bits affected. This is similar to issues reported in other threads like the one linked to this post.

    I attached an html test report (renamed .log to get it to post).

    We can quickly ship you a Turbot board if you would like to duplicate the issue.
    Attached Files
    Official forum for MemTest86 community support and feedback.

  • #2
    The file you attached was the test report. Not the debug log. Can you E-Mail us, or post, the debug log.
    A debug log file (MemTest86.log) is automatically created and updated while MemTest86 v7 is running. This file is saved in the 'EFI/BOOT' directory in the USB drive's first partition.

    I don't think the other forum post you linked is the same issue. That issue was with V4 of the software and was resolved in V4.3.

    The test report you attached shows an error at memory address 0. So there is a good chance this is a UEFI BIOS bug with the memory map. We'll know more if we can get the debug log.

    Comment


    • #3
      Attached is a fresh test log.
      Attached Files

      Comment


      • #4
        From the log, this is the start of the memory map, as reported by BIOS.

        2017-03-28 00:00:22 - 0x000000000000 - 0x00000008EFFF (572KB) {Free Memory}
        2017-03-28 00:00:22 - 0x00000008F000 - 0x00000008FFFF (4KB) {ACPI Non-volatile}
        2017-03-28 00:00:22 - 0x000000090000 - 0x00000009DFFF (56KB) {Free Memory}
        2017-03-28 00:00:22 - 0x00000009E000 - 0x0000000FFFFF (392KB) {Reserved Memory}
        2017-03-28 00:00:22 - 0x000000100000 - 0x00001FFFFFFF (511MB) {Free Memory}

        The first entry seems a bit strange, that address 0 is free available RAM.

        Are you the developers of the board / BIOS? Are you in a position to check this is correct?

        Comment


        • #5
          It's an ADI/Intel joint development. The BIOS was developed by Intel. Your findings are being passed back to the Intel BIOS team.

          Comment


          • #6
            Update from Intel BIOS team:
            • We narrowed down the issue to MP CPU driver (multi-processor UEFI driver) related. If we replace the MP CPU driver with another version in UEFI firmware, error reporting is gone(we can’t just switch to this version, long story). This ruled out the hardware issue ( both CPU and memory), I think that is ADI most worry about. And the driver will only take effect when booting time, no effect after OS runs. It is NOT necessary a bug of the MP CPU driver, properly since the memtest tool use MP protocol in the wrong way. We will continue looking into it but since we don’t have full source of the memtest tool, it might be problematic. ”
            The main points I'm hearing from them:
            • It’s not a memory error. An error is being reported from the MP CPU driver (multi-processor UEFI driver) caused by an MP protocol interface issue with the memtest86 tool.
            • They have limited debug capability without the source code for the memtest tool. They're looking into sending details of the protocol.

            Comment


            • #7
              It is NOT necessary a bug of the MP CPU driver, properly since the memtest tool use MP protocol in the wrong way
              We don't believe this is the case. Do they have any evidence?
              We found similar bugs in other BIOS releases (e.g. from ASUS) and they eventually fixed the bugs.

              We can provide snippets of source code if it helps them. You can contact us directly if you need this.

              Comment


              • #8
                The Intel BIOS team has root-caused the software issue that is causing the erroneous memory errors to be reported. Details of the fix will be coming from Intel.

                Comment


                • #9
                  OK, I'll be interested to see what the fix was. For other users who might encounter the same problem, it would be good to know which firmware version it is fixed in.

                  For the record, this was a description of the BIOS failure, based on information in the log.

                  When calling EFI_MP_SERVICES_PROTOCOL->StartupThisAP() and the WaitEvent parameter is specified, the event does not get signalled even after the AP finishes execution. Which prevents multitasking from working correctly.

                  (there may also be additional problems with the memory map when in non CSM mode)

                  Comment


                  • #10
                    This is the information posted to the Minnowboard mailing list:
                    Hello all,

                    We have tracked down the issue and found that memtest86 tool is using a function call with rare but legitimate NULL value which causes memory test failure. Currently we are working to update BIOS firmware with fix to prevent this memory test function call failure from memtest86 tool. Again this is protocol interface issue between memtest86 and our FW driver, there is no hardware issue found with memory on Minnowboard.

                    Thank you, Keerock

                    http://lists.elinux.org/pipermail/elinux-minnowboard/Week-of-Mon-20170828/002616.html

                    The issue is also being tracked on GitHub here:

                    https://github.com/MinnowBoard-org/b...help/issues/60

                    Comment

                    Working...
                    X