0x00 故障描述
最近电脑蓝屏死机的现象频频发生,蓝屏代码为DRIVER_IRQL_NOT_LESS_OR_EQUAL,考虑到刚更换了显卡,一直从显卡方面入手,更换了几次显卡驱动,从game ready版本更新到studio版本,问题依旧。
0x01 排查过程
某日下午,熟悉的蓝屏再次出现,死机重启之后,我决定彻底排查该问题
1、打开计算机管理-系统工具-事件查看器-Windows日志-系统

2、筛选出该时间段的系统错误日志,查到系统针对该异常关机事件生成了转储文件dmp。

3、使用windbg.exe打开dmp文件

4、加载完成后运行!analyze -v命令进行分析
ExtensionGallery settings after reading 'SOFTWARE\Microsoft\Debug Engine' registry:
ExtensionGallery ExtensionRepository: Implicit
************* Preparing the environment for Debugger Extensions Gallery repositories **************
ExtensionRepository : Implicit
UseExperimentalFeatureForNugetShare : true
AllowNugetExeUpdate : true
NonInteractiveNuget : true
AllowNugetMSCredentialProviderInstall : true
AllowParallelInitializationOfLocalRepositories : true
EnableRedirectToChakraJsProvider : false
-- Configuring repositories
----> Repository : LocalInstalled, Enabled: true
----> Repository : UserExtensions, Enabled: true
>>>>>>>>>>>>> Preparing the environment for Debugger Extensions Gallery repositories completed, duration 0.000 seconds
************* Waiting for Debugger Extensions Gallery to Initialize **************
>>>>>>>>>>>>> Waiting for Debugger Extensions Gallery to Initialize completed, duration 0.031 seconds
----> Repository : UserExtensions, Enabled: true, Packages count: 0
----> Repository : LocalInstalled, Enabled: true, Packages count: 46
Microsoft (R) Windows Debugger Version 10.0.29547.1002 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\Minidump\051326-25703-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 19041 MP (16 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Kernel base = 0xfffff803`78800000 PsLoadedModuleList = 0xfffff803`7942a3f0
Debug session time: Wed May 13 16:35:39.228 2026 (UTC + 8:00)
System Uptime: 15 days 18:49:25.546
Loading Kernel Symbols
...............................................................
................................................................
................................................................
................................................................
..................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000000ac`f0959018). Type ".hh dbgerr001" for details
Loading unloaded module list
................
For analysis of this file, run !analyze -v
nt!KeBugCheckEx:
fffff803`78bfd510 48894c2408 mov qword ptr [rsp+8],rcx ss:0018:ffff8603`91443b30=000000000000000a
15: kd> !analyze -v
Loading Kernel Symbols
...............................................................
................................................................
................................................................
................................................................
..................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000000ac`f0959018). Type ".hh dbgerr001" for details
Loading unloaded module list
................
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000008, value 0 = read operation, 1 = write operation
Arg4: 0000000000000000, address which referenced memory
Debugging Details:
------------------
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 1281
Key : Analysis.Elapsed.mSec
Value: 6773
Key : Analysis.IO.Other.Mb
Value: 0
Key : Analysis.IO.Read.Mb
Value: 1
Key : Analysis.IO.Write.Mb
Value: 1
Key : Analysis.Init.CPU.mSec
Value: 593
Key : Analysis.Init.Elapsed.mSec
Value: 15776
Key : Analysis.Memory.CommitPeak.Mb
Value: 79
Key : Analysis.Version.DbgEng
Value: 10.0.29547.1002
Key : Analysis.Version.Description
Value: 10.2602.27.2 amd64fre
Key : Analysis.Version.Ext
Value: 1.2602.27.2
Key : Bugcheck.Code.LegacyAPI
Value: 0xd1
Key : Bugcheck.Code.TargetModel
Value: 0xd1
Key : Dump.Attributes.AsUlong
Value: 0x8
Key : Dump.Attributes.KernelGeneratedTriageDump
Value: 1
Key : Failure.Bucket
Value: AV_nt!KiPageFault
Key : Failure.Hash
Value: {ec3e2762-48ae-ffe9-5b16-fbcb853e8320}
Key : Faulting.IP.Type
Value: Null
Key : Hypervisor.Enlightenments.Value
Value: 77057948
Key : Hypervisor.Enlightenments.ValueHex
Value: 0x497cf9c
Key : Hypervisor.Flags.AnyHypervisorPresent
Value: 1
Key : Hypervisor.Flags.ApicEnlightened
Value: 1
Key : Hypervisor.Flags.ApicVirtualizationAvailable
Value: 0
Key : Hypervisor.Flags.AsyncMemoryHint
Value: 0
Key : Hypervisor.Flags.CoreSchedulerRequested
Value: 0
Key : Hypervisor.Flags.CpuManager
Value: 1
Key : Hypervisor.Flags.DeprecateAutoEoi
Value: 0
Key : Hypervisor.Flags.DynamicCpuDisabled
Value: 1
Key : Hypervisor.Flags.Epf
Value: 0
Key : Hypervisor.Flags.ExtendedProcessorMasks
Value: 1
Key : Hypervisor.Flags.HardwareMbecAvailable
Value: 1
Key : Hypervisor.Flags.MaxBankNumber
Value: 0
Key : Hypervisor.Flags.MemoryZeroingControl
Value: 0
Key : Hypervisor.Flags.NoExtendedRangeFlush
Value: 0
Key : Hypervisor.Flags.NoNonArchCoreSharing
Value: 1
Key : Hypervisor.Flags.Phase0InitDone
Value: 1
Key : Hypervisor.Flags.PowerSchedulerQos
Value: 0
Key : Hypervisor.Flags.RootScheduler
Value: 0
Key : Hypervisor.Flags.SynicAvailable
Value: 1
Key : Hypervisor.Flags.UseQpcBias
Value: 0
Key : Hypervisor.Flags.Value
Value: 4853999
Key : Hypervisor.Flags.ValueHex
Value: 0x4a10ef
Key : Hypervisor.Flags.VpAssistPage
Value: 1
Key : Hypervisor.Flags.VsmAvailable
Value: 1
Key : Hypervisor.RootFlags.AccessStats
Value: 1
Key : Hypervisor.RootFlags.CrashdumpEnlightened
Value: 1
Key : Hypervisor.RootFlags.CreateVirtualProcessor
Value: 1
Key : Hypervisor.RootFlags.DisableHyperthreading
Value: 0
Key : Hypervisor.RootFlags.HostTimelineSync
Value: 1
Key : Hypervisor.RootFlags.HypervisorDebuggingEnabled
Value: 0
Key : Hypervisor.RootFlags.IsHyperV
Value: 1
Key : Hypervisor.RootFlags.LivedumpEnlightened
Value: 1
Key : Hypervisor.RootFlags.MapDeviceInterrupt
Value: 1
Key : Hypervisor.RootFlags.MceEnlightened
Value: 1
Key : Hypervisor.RootFlags.Nested
Value: 0
Key : Hypervisor.RootFlags.StartLogicalProcessor
Value: 1
Key : Hypervisor.RootFlags.Value
Value: 1015
Key : Hypervisor.RootFlags.ValueHex
Value: 0x3f7
BUGCHECK_CODE: d1
BUGCHECK_P1: 0
BUGCHECK_P2: 2
BUGCHECK_P3: 8
BUGCHECK_P4: 0
FILE_IN_CAB: 051326-25703-01.dmp
DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump
FAULTING_THREAD: ffff8b0b051d4080
READ_ADDRESS: fffff803794fb390: Unable to get MiVisibleState
0000000000000000
PROCESS_NAME: aTrustXtunnel.
CUSTOMER_CRASH_COUNT: 1
STACK_TEXT:
ffff8603`91443b28 fffff803`78c11da9 : 00000000`0000000a 00000000`00000000 00000000`00000002 00000000`00000008 : nt!KeBugCheckEx
ffff8603`91443b30 fffff803`78c0d778 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
ffff8603`91443c70 00000000`00000000 : fffff803`7e413865 ffff8b0a`f1c5ddf0 ffff8603`91443e90 00000000`00000000 : nt!KiPageFault+0x478
SYMBOL_NAME: nt!KiPageFault+478
MODULE_NAME: nt
IMAGE_NAME: ntkrnlmp.exe
IMAGE_VERSION: 10.0.19041.6216
STACK_COMMAND: .process /r /p 0xffff8b0b059c5080; .thread /r /p 0xffff8b0b051d4080 ; kb
BUCKET_ID_FUNC_OFFSET: 478
FAILURE_BUCKET_ID: AV_nt!KiPageFault
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {ec3e2762-48ae-ffe9-5b16-fbcb853e8320}
Followup: MachineOwner
---------
0x02 问题分析
错误类型: DRIVER_IRQL_NOT_LESS_OR_EQUAL (0xD1)
- 这个错误意味着某个内核模式的驱动程序试图在不被允许的过高权限级别(IRQL)下,访问一段可以被换出到硬盘的内存(pageable memory)。简单来说,就是驱动程序“越权”访问了它不该在此时访问的内存地址。
触发进程: PROCESS_NAME: aTrustXtunnel.
- 这是定位问题的关键线索。崩溃发生时,正在运行的程序是
aTrustXtunnel.exe。这个程序通常是深信服(Sangfor)零信任安全客户端的一部分,它会安装自己的内核驱动来管理网络连接和安全策略。
根本原因:
- 综合以上两点,可以得出结论:深信服 aTrust 客户端的某个驱动程序存在缺陷或与当前系统环境不兼容。当
aTrustXtunnel.exe调用该驱动执行操作时,驱动程序内部的错误逻辑导致了这次非法内存访问,从而引发系统蓝屏保护性重启。虽然堆栈显示故障发生在nt!KiPageFault(Windows内核的页面错误处理程序),但这只是系统捕获到错误后的最终行为,真正的源头是导致这次错误访问的驱动。
0x03 疑惑
我并没有开启aTrust客户端的开机自启动功能,为什么还有后台进程在运行,从任务管理器中也可以证实该现象:

继续排查,我打开服务管理窗口,看到aTrust确实安装了一项服务,点进服务详情:

该项服务描述为:深信服零信任守护服务,禁止后,将导致功能使用异常
该描述更是莫名奇妙,我甚至没有启动主程序,您在后台守护啥呢?
0x03 解决方案
由于接下来的工作已经不再需要再用到深信服的aTrust客户端,所以直接将其删除,后续并没有再出现蓝屏的情况。