Announcement

Collapse
No announcement yet.

MTL-H 2D 3D GPGPU fail

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MTL-H 2D 3D GPGPU fail

    Intel MTL-H platform run burnin test, show 2D 3D GPGPU fail.
    Could you help us to juge it is HW or Driver or BURNIN setting bug?

    system information show below:
    BurnInTest 10.2 1011 (64-bit)

    System summary:
    Windows 11 Professional Edition build 22631 (64-bit),
    1 x Intel(R) Core(TM) Ultra 7 155H [3612.3 MHz],
    31.7GB RAM,
    Intel(R) Arc(TM) Graphics,
    2 x 954GB SSD,


    General:
    System Name: DESKTOP-0OQ9SGN
    Motherboard Manufacturer: Standard
    Motherboard Name: Standard
    Motherboard Version: Standard
    Motherboard Serial Number: Standard
    BIOS Manufacturer: American Megatrends International, LLC.
    BIOS Version: B.0.05A00
    BIOS Release Date: 12/10/2023
    BIOS Serial Number: Standard
    TPM: Available, V2.0
    Webcam HD Webcam

    CPU:
    CPU manufacturer: GenuineIntel
    CPU Type: Intel(R) Core(TM) Ultra 7 155H
    CPUID: Family 6, Model AA, Stepping 4
    Physical CPU's: 1
    Cores per CPU: 16
    Threads per CPU: 22
    P-Cores: 6
    P-Threads: 6
    E-Cores: 10
    E-Threads: 20
    Hyperthreading: Enabled
    CPU features: MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 DEP PAE Intel64 AES AVX AVX2 FMA3
    Clock frequencies:
    - Measured CPU speed: 3612.3 MHz
    Cache per CPU package:
    - L1 Instruction Cache: Unknown
    - L1 data cache: Not applicable
    - L2 cache: Not applicable
    - L3 cache: Not applicable

    Memory
    Total Physical Memory: 32435MB
    Available Physical Memory: 27263MB
    Memory devices:
    Slot 1:
    - 16GB DDR5 SDRAM PC5-44800
    - Samsung M425R2GA3BB0-CWMOD, serial#: 0x46805EC3, wk/yr: 33/2023
    - 1.1V, Clk: 2800.0MHz, Timings 45-45-45-90 (@ Max. freq.)
    Slot 2:
    - 16GB DDR5 SDRAM PC5-44800
    - Samsung M425R2GA3BB0-CWMOD, serial#: 0x46805F35, wk/yr: 33/2023
    - 1.1V, Clk: 2800.0MHz, Timings 45-45-45-90 (@ Max. freq.)
    Virtual memory: C:\pagefile.sys (allocated base size 32768MB)

    Memory SPD:
    DIMM#0
    Memory type: DDR5 SDRAM
    SPD revision: 1.0
    Manufacturer: Samsung
    Manufacturing date: Year: 2023, Week: 33
    Serial number: 46805EC3
    Part number: M425R2GA3BB0-CWMOD
    Clock speed: 2800.0 MHz
    Memory size: 16384 MB
    ECC: No
    Module voltage: 1.1V

    DIMM#1
    Memory type: DDR5 SDRAM
    SPD revision: 1.0
    Manufacturer: Samsung
    Manufacturing date: Year: 2023, Week: 33
    Serial number: 46805F35
    Part number: M425R2GA3BB0-CWMOD
    Clock speed: 2800.0 MHz
    Memory size: 16384 MB
    ECC: No
    Module voltage: 1.1V


    Graphics
    Intel(R) Arc(TM) Graphics
    Memory: 128MB
    Driver provider: Intel Corporation
    Driver version: 31.0.101.5179
    Driver date: 12-13-2023
    Monitor 1: 2880x1800x32 120Hz 192 DPI (Primary monitor)

    Disk volumes
    C: Local Drive, \\?\Volume{823f8512-384a-46c0-aa39-1e82e5e76cfd}\, NTFS, (476.08GB total, 382.27GB free)
    D: Local Drive, \\?\Volume{11c7ad2f-a8b5-4ef3-96e7-d4bdc82cffc5}\, New Volume, NTFS, (476.92GB total, 476.81GB free)

    Disk drives
    Disk drive: Model: YMTC PC300-1TB-B Serial: YMA21T0JA233740CNL (Disk: 1, SSD, Size: 953.86GB, Interface: NVMe, Volumes: D)
    Disk drive: Model: YMTC PC300-1TB-B Serial: YMA21T0JA233740C87 (Disk: 0, SSD, Size: 953.86GB, Interface: NVMe, Volumes: C)

    Optical drives

    Network
    Bluetooth Device (Personal Area Network) (Speed: 3Mb/s) (MAC: 30:F6:EF:70:FE:99)
    Intel(R) Wi-Fi 6E AX211 160MHz (MAC: 30:F6:EF:70:FE:95)

    Ports

    USB
    USB xHCI Compliant Host Controller
    USB xHCI Compliant Host Controller
    - Intel(R) Wireless Bluetooth(R)

    Thank you.​
    Attached Files

  • #2
    Relevant part of the log is,

    SERIOUS: 2023-12-17 00:06:00, 3D Graphics, An error occured during the DX12 3D test
    LOG NOTE: 2023-12-17 00:06:00, 3D Graphics, Unexpected error running DirectX 12 Test. Error Number 0x887A0005 (-2005270523)
    LOG NOTE: 2023-12-17 00:06:00, 3D Graphics, The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.
    LOG NOTE: 2023-12-17 00:06:00, 3D Graphics, DXGI_ERROR_DEVICE_REMOVED (0x887a0005) - Device Removed reason: DXGI_ERROR_DEVICE_HUNG (0x887a0006)
    SERIOUS: 2023-12-17 00:06:00, 2D Graphics, Error writing to GPU memory
    LOG NOTE: 2023-12-17 00:06:00, 2D Graphics, GPU Intel(R) Arc(TM) Graphics, Failed to write GPU buffer object (-5) - Block 73 / 100​

    And the errors in the user interface was.
    GPGPU 993 196 Trillion FAIL 1 No operations reported in timeout period
    2D Graphics 5922 5.405 Trillion FAIL 1 Error writing to GPU memory
    3D Graphics 36 1.104 Million FAIL 1 An error occured during the DX12 3D test​

    This happened at around 8 hours into the test.

    As noted in the log, from the applications point of view the entire video card disappeared (was removed from the system). The sub-reason, was a hang. The event log also reports a crash with the video card device driver.

    I assume if you repeat the test, the same thing happens again, at around the 8 hour mark?
    If this was the case then it might be a resource leak in the video card driver eventually causing a hang.

    There might be a few more details available if you turn on level 2 logging in BurnInTest. The log file will be much much bigger however.

    Might also be interesting to repeat the same test without the Memory(RAM) test being active, to see if the issue was related to low RAM.


    Comment


    • #3
      I test it again and without RAM test. test result still 2D/3D/GPGPU fail. I will use Level 2 logging to test again and provide the log later
      Attached Files

      Comment


      • #4
        New log showed almost the same thing. A crash in the video card driver. The sub-reason was different however. This time, it was device reset.
        Also the crash was 13 hours into the test not 8.

        LOG NOTE: 2023-12-20 04:30:29, 3D Graphics, DXGI_ERROR_DEVICE_REMOVED (0x887a0005) - Device Removed reason: DXGI_ERROR_DEVICE_RESET (0x887a0007)

        Next step would be checking the level 3 log and looking for excess RAM usage.

        Comment


        • #5
          Provide Level 2 log, because passmark limitation, i pull data on google Drive. please download the log on below linking.
          https://drive.google.com/file/d/13Ku...usp=drive_link
          please help us to anlysis this 2D/3D/GPGPU fail on Intel MTL-H platform

          Comment


          • #6
            i delete some pass log to downgrade size, please help us analysis level 2 log to find 2D/3D/GPGPU fail root cause
            Attached Files

            Comment


            • #7
              Doesn't look like the machine ran out of physical RAM (via a leak), which I thought might be a possibility.

              So as far was we can tell the video card driver crashed or locked up.
              LOG NOTE: 2023-12-22 08:27:58, 3D Graphics, DXGI_ERROR_DEVICE_REMOVED (0x887a0005) - Device Removed reason: DXGI_ERROR_DEVICE_HUNG (0x887a0006)

              Shortly after this BurnInTest started reporting errors, as you would expect if the video card suddenly disappeared.

              We don't know why the video card driver crashed. A bug of some sort we presume.

              Comment

              Working...
              X