There's a lot of good work here and I don't want to minimise the issue in any way but: unless the Windows ACPI stack is implemented in an
extremely fucked up way, I'd be surprised if some of the technical conclusions here are accurate. (Proviso: my experience here is pretty much all Linux, with the bits that aren't still being the ACPI-CA stack that's used by basically every OS other than Windows. Windows
could be bizarre here, but I'd be surprised if it diverged to a huge degree)
AML is an interpreted language. Interrupts need to be handled quickly, because while a CPU is handling an interrupt it can't handle any further interrupts. There's an obvious and immediate conflict there, and the way this is handled on every other OS is that upon receipt of an ACPI interrupt, the work is dispatched to something that can be scheduled rather than handled directly in the interrupt handler. ACPI events are not intended to be performance critical. They're also not supposed to be serialised as such - you should be able to have several ACPI events in flight at once, and the language even includes mutex support to prevent them stepping on each other[1]. Importantly, "Sleep()" is intended to be "Wake me when at least this much time has passed" event, not a "Spin the CPU until this much time has passed" event. Calling Sleep() should let the CPU go off and handle other ACPI events or, well, anything else. So there's a legitimate discussion to be had about whether this is a sensible implementation or not, but in itself the Sleep() stuff is absolutely not the cause of all the latency.
What's causing these events in the first place? I thought I'd be able to work this out because the low GPE numbers are generally assigned to fixed hardware functions, but my back's been turned on this for about a decade and Intel's gone and made GPE 2 the "Software GPE" bit. What triggers a software GPE? Fucked if I can figure it out - it's not described in the chipset docs. Based on everything that's happening here it seems like it could be any number of things, the handler touches a lot of stuff.
But ok we have something that's executing a bunch of code. Is that in itself sufficient to explain video and audio drops? No. All of this is being run on CPU 0, and this is a multi-core laptop. If CPU 0 is busy, do it all on other cores. The problem here is that all cores are suddenly not executing the user code, and the most likely explanation for that is System Management Mode.
SMM is a CPU mode present in basically all Intel CPUs since the 386SL back in 1989 or so. Code accesses a specific IO port, the CPU stops executing the OS, and instead starts executing firmware-supplied code in a memory area the OS can't touch. The ACPI decompilation only includes the DSDT (the main ACPI table) and not any of the SSDTs (additional ACPI tables that typically contain code for additional components such as GPU-specific methods), so I can't look for sure, but what I suspect is happening here is that one of the _PS0 or _PS3 methods is triggering into SMM and the entire system[2] is halting while that code is run, which would explain why the latency is introduced at the system level rather than it just being "CPU 0 isn't doing stuff".
And, well, the root cause here is probably correctly identified, which is that the _L02 event keeps firing and when it does it's triggering a notification to the GPU driver that is then calling an ACPI method that generates latency. The rest of the conclusions are just not important in comparison. Sleep() is not an unreasonable thing to use in an ACPI method, it's unclear whether clearing the event bits is enough to immediately trigger another event, it's unclear whether sending events that trigger the _PS0/_PS3 dance makes sense under any circumstances here rather than worrying about the MUX state. There's not enough public information to really understand why _L02 is firing, nor what is trying to be achieved by powering up the GPU, calling _DOS, and then powering it down again.
[1] This is absolutely necessary for some hardware - we hit issues back in 2005 where an HP laptop just wouldn't work if you couldn't handle multiple ACPI events at once
[2] Why the entire system? SMM is able to access various bits of hardware that the OS isn't able to, and figuring out which core is trying to touch hardware is not an easy thing to work out, so there's a single "We are in SMM" bit and all cores are pushed into SMM and stop executing OS code before access is permitted, avoiding the case where going into SMM on one CPU would let OS code on another CPU access the forbidden hardware. This is all fucking ludicrous but here we are.