HN Reader

The Asus gaming laptop ACPI firmware bug

438

188

This is an amazing discovery, article, and fix proposal. Fantastic work, very impressive and also very instructive on how things work on modern PCs and how far you can actually dig to get at stuff that is "supposed" to be hidden.

As someone who has written embedded firmware for many years (not for PCs), I can only dream of an end user being this capable to discover a bug. I want to live in the world where Asus immediately send an e-mail offering some kind of short-term contracting work to fly in and talk to their firmware people for a few days and get $FIVE_FIGURES or something, and leave with an updated laptop running their new production BIOS.

Obviously this bug has gone un-fixed for four years so that is not the world we're in. That makes me sad. :|

Edit: s/fix/fix proposal/.

1 day agoby unwind

Impressive that they managed to ship crippling stuttering for 4 years in gaming laptops specifically. Makes you wonder about the end user psychology, evidently they didn't get a show stopping rate of product returns.

A quote from one of the linked reddit threads. I wonder if the warranty trip is part of their scheme.

"I did everything you suggested , but nothing changed. I send it back via garante. I am curious what they do whit it."

"what was it at the end? did they respond?"

"They have claimed that the plato works perfectly. So basically i just got use to it. I am using bluetooth earbuds all the time so i cant notice the problems."

1 day agoby fulafel

It always puzzles me how apparently well known flaws never mentioned by product reviewers, even genuinely pro-consumer and generally well respected, like rtings or notebookchecks.

You buy product after stellar review, encounter problem, search for solution, find reddit thread where everyone is "yeah, it is always like that, why do you act surprised?"

Why indeed?

22 hours agoby nopurpose

I wonder if the "programmer" (and I use this term very loosely) who wrote that sleep-in-an-interrupt code ever tested the code personally, or if it was some other distant responsibility-diluted department of a hundred other lamers who didn't care "because the automated tests all pass". This is a situation where dogfooding, in the original Microsoft sense, would definitely be beneficial as among the developers experiencing this on their own machines, surely one would be tempted to fix it.

1 day agoby userbinator

What kind of expertise one should have to do that kind of debug? Windows driver developer? Or even someone who wrote firmware?

I wrote some embedded firmware for simpler ARM CPUs, but I have very little understanding of this technical explanation.

I feel like these kind of issues are guarded from ordinary developers by being too complex to understand, although it might just be me. May be if technical stack was more approachable, we would have more quality reports and workarounds? If there're 100 people in the world who could understand the problem, it's bad situation.

1 hour agoby vbezhenar

This happens on a 2019 MSI gaming laptop (GS65 Stealth) I have too. Latencymon shows >10ms stutters within a minute of being open. Found out disabling all ACPI devices fixes the stutter. Also found out disabling all ACPI drivers disables the dedicated GPU. It's possible this affects a large portion of gaming laptops with dGPUs.

Here's a link to MSI forums also discovering the ACPI hiccups with Latencymon: https://forum-en.msi.com/index.php?threads/constant-micro-st...

Just google "nvidia gaming laptop stutter latencymon acpi"

20 hours agoby hnuser123456

Short version: don't buy ASUS gaming laptops until this is definiteively fixed, and if you one under warranty, file a warranty claim, being prepared to go to Small Claims Court.

1 day agoby Animats

> I used an LLM for wording

It's incredibly obvious. I'm not doubting the actual information here, it's clearly thorough and well researched. The issue is that I cannot _stand_ the hyper-homogenized cadence and style that all LLMs use. It's "Corporate Memphis" all over again. I don't understand why everyone is so violently afraid of something looking like a human being made it?

11 hours agoby manchmalscott

No wonder people end up pushing macs.

It's unbelievable that something this bad has been shipping for four years. I guess I know what I'm not buying, at least...

1 day agoby Panzer04

Sometime around 2015 I promised myself to never buy a laptop with switchable graphics again. This has worked well so far.

But it never ceases to amuse me watching brands that position themselves as 'premium' spending pennies on firmware development team somewhere down in a basement compared to millions they spend on shiny marketing.

1 day agoby puzzlingcaptcha

Congratulations ASUS for 0.01% of your marketing budget you could have fixed the experience of your millions of users, reduced the amount of replaced computers, improved your brand positive views... This once again proves that many companies are absolutely mismanaged and think marketing will make them more efficient than good engineering...

1 day agoby ta988

Great writeup but surprised the author didn’t try patching the ACPI table. I believe the GRUB bootloader can do this, and Microsoft seems to have its own way too: https://learn.microsoft.com/en-us/windows-hardware/drivers/b...

I’m also curious if the debugging of the timing and ACPI events could have m been done under Linux…

6 hours agoby BobbyTables2

One laptop model with buggy ACPI down, 5,387 to go.

1 day agoby dlcarrier

Excellent sleuthing. I feel like this has some great quick debugging ideas for system wonkiness that I never knew existed.

It feels a bit of a shame to wrap it all up in an AI-written summary, but I guess if that was the only way to get the info out, so be it.

1 day agoby brian-armstrong

I have an older ASUS laptop from 2015 which also has (more minor than this!) ACPI state management bugs. I initially bought that machine because it was a pretty high-end and was somewhat disappointed about both the build quality and the firmware/software support.

1 day agoby frnx

Thanks so much for this!

I've had an ROG Zeph collecting dust for a couple years now, specifically for the reasons you described, which I now have a good reason to dig out and poke around in. Got my weekend sorted :)

4 hours agoby cwsx

article is great but it feels like its rewritten by an llm. " a crucial insight", "occurring like clockwork"

1 day agoby giantpotato

I have one of these, a Zephyrus G15. That it had an AMD CPU and Nvidia GPU should have been a red flag that support would be really poor. Only a year out of warranty, it is a brick on a shelf because the thermals are so atrocious it pretty much burned itself out, and even with a thorough new application of thermal paste through a multi hour process there just isn't any way to get it to perform within spec. Supposedly, if you RMA it through ASUS they will charge you something like $700 and be unlikely to fix it. They have an insane dud rate, and even when it does work the hardware is barely hanging on. Several acquaintances have had similar problems.

It drastically reduced my perception of Asus as a brand - I wanted something I could game with, it promised the moon of portability and performance but they couldn't pull it off.

1 day agoby taurath

I'm honestly a little surprised that ACPI has a Sleep()-like function at all, I can't really imagine many situations where the firmware actually wants the user operating system to wait for any real length of time, as that would block any other ACPI events, even if it wasn't in an interrupt handler. I feel there's pretty easy ways to deadlock the system in that sort of situation.

And also surprised that windows actually allows the ACPI driver to sleep in an interrupt handler - on Linux that'll immediately BUG()... Unless windows doesn't and the above ACPI blocking is what they're measuring here.

1 day agoby kimixa

Out of curiosity, why not release BIOS mod with a fix? Atleast personal laptops (out of warranty) can benefit out of it until Asus fixes their sht.

People blame Windows being slow and etc but most of the times hardware manufactures don't even get into this level to make best out of thier hardware. This is the reason why Apple is so successful, they control hardware, software while in open world, software like Linux/Windows is written by someone while hardware is designed by someone else.

1 day agoby nitinreddy88

Fantastic reverse engineering and debugging.

Bad ACPI is the #1 reason I don't buy gaming laptops. Mac for mobile, parts-built PCs for desktop/gaming.

6 hours agoby xbar

I either had this, or a similar issue with an ASUS laptop years ago that would cause stuttering similar to what the article described.

I have a Mac now, and have yet to experience similar hardware bugs.

My only phantom issue is that safari sometimes doesn't save a google search page to history and ignores it when hitting the back arrow, though I suspect that might be google being fancy with their loading.

13 hours agoby daedrdev

ASUS should hire someone in QA with the original github author's skills. I know that will (most likely) never happen...

1 day agoby _zoltan_

My HP screen (HP Aero 13, not a gaming laptop, with a integrated gpu only) does flicker, turning completely off and then on, and this issue doesn't appear when connecting to external monitor. The same happens under linux as well. This post had me curious about the ACPI now... maybe I can follow along !

1 day agoby spyridonas

Excellent article.

I do not have the same technical depth to dig this far as the author, but this kind of problem seems pretty common on laptops, especially those with "switchable" iGPU/dGPU setups.

I had an Acer laptop about 7-8 years ago with almost the exact same latency symptoms. In the end I just disabled the dGPU in the BIOS (since I only used it for office work), and that instantly solved the issue.

This kind of thing is very infuriating because not only is it hard to track down the root cause (which I am very grateful the author did), but it is also even harder to get the vendor to actually acknowledge or fix it.

1 day agoby thrdbndndn

i'm craving for a world where hw manufacturers exist who fully embrace opensource development, and are rewarded for that...

1 day agoby attila-lendvai

Reminds me of a cursed HP keyboard I had at work once. You plug it into USB, then your irq rate goes way, way up. All keyboard events telling you the key pressed last millisecond is still pressed. You literally gained 5 to 10% speed when compiling without it. Apart from that, and the way its sleep-state-destroying irqs ate battery, it was a nice keyboard, so some data entry person got it in the end.

22 hours agoby hyperman1

So if I understood it right, every single Asus gaming laptop have easily measurable stutters? That's insanity. And I thought about buying Asus RoG Strix laptop, because for some reason marketing made me believe they're the best gaming laptops out there.

21 hours agoby vbezhenar

Friends don't let friends buy laptops with nvidia GPUs. I can't believe optimus is still such a mess.

1 day agoby bubblethink

Now I wonder if this is why my laptop (Dell) is so goddamn annoying too.

LatencyMon does show elevated ISR and DPC max exec times (i.e. reports that for realtime tasks my system is busted), and they're also primarily carried out by CPU0. But it's on the order or 2 ms, not 30 ms.

Still a long time though for interrupts I guess. ACPI.sys taking the lead too, with NDIS.sys being close second, and then pretty much nothing else.

19 hours agoby perching_aix

This really underscores how complex a computer is these days. This complexity is why you get a ton of little glitches when you slap Linux on a Windows PC--and this is even on Windows, as designed!

22 hours agoby trelane

I have a 2024 Zephyrus G14 and it has bursts of stuttering which seem to be directly linked to running off USB-C power. It doesn't do it on the original power brick, but on a 70W USB power brick, it slows down massively every now and then, to the point where the mouse cursor is only updating every few seconds and any playing audio starts underrunning buffers. Unplugging USB power immediately clears the issue up for a while. It's fine running off battery, and it's fine when I plug USB power back in, even straight away.

It does other stupid things with power management, too:

- There seems to be some "cooldown" logic that keeps it awake with the fan running for a while (sometimes minutes) after closing the lid. If I just unplug the laptop stick it straight in a backpack, it'll keep doing this (getting hotter and hotter, and burning half of the battery capacity) until it hits the critical high temp shutdown. It's great fun taking it out at the start of a plane flight and finding out it's on low battery and has bbq'd itself.

- Even if I do wait for the fan to turn off before stashing the laptop, when I open the lid and wake it up, it immediately goes into hibernate mode, and I have to wait for it to finish hibernating, turn it back on, and wait for it to boot up, which is really frustrating.

The solution to both of these (for me) is to reassign the power button to be 'hibernate' instead of 'sleep', and to explicitly hibernate it every time I'm packing it up. It's still stupid and annoying, and a damn shame because it's otherwise a really nice laptop. The OLED screen is beautiful and the build quality feels great. I just wish it wasn't crippled.

1 day agoby taneq

... and people are looking forward to signed UEFI and ACPI on ARM systems too. How do they expect an ACPI written in a chinese sweatshop will work if Asus quality is this low?

1 day agoby M95D

I hacked the ACPI firmware on my system, linux is able to apply "my firmware" rather than use the operating system supplied firmware.

Does anyone know if windows can do the same ?

1 day agoby worthless-trash

What are the equivalent tools on Linux to capture those traces / events / interruption ect...?

20 hours agoby Thaxll

Great write up. As annoying failures are, always learn most from debugging sessions

1 day agoby xr8

I have a Zephyrus G14 and have experienced these issues. Audio crackling, lots of stuttering when playing back video on the iGPU. Throttling has been less of an issue for me. I'm on Linux Mint, so the problems are definitely BIOS related and not OS-dependent. Stuttering continued even after I manually expanded the iGPU's available memory pool in the AMD kernel settings.

The laptop was a nice deal on fire sale, but I guess I get what I pay for.

20 hours agoby netbioserror

There's a lot of good work here and I don't want to minimise the issue in any way but: unless the Windows ACPI stack is implemented in an extremely fucked up way, I'd be surprised if some of the technical conclusions here are accurate. (Proviso: my experience here is pretty much all Linux, with the bits that aren't still being the ACPI-CA stack that's used by basically every OS other than Windows. Windows could be bizarre here, but I'd be surprised if it diverged to a huge degree)

AML is an interpreted language. Interrupts need to be handled quickly, because while a CPU is handling an interrupt it can't handle any further interrupts. There's an obvious and immediate conflict there, and the way this is handled on every other OS is that upon receipt of an ACPI interrupt, the work is dispatched to something that can be scheduled rather than handled directly in the interrupt handler. ACPI events are not intended to be performance critical. They're also not supposed to be serialised as such - you should be able to have several ACPI events in flight at once, and the language even includes mutex support to prevent them stepping on each other[1]. Importantly, "Sleep()" is intended to be "Wake me when at least this much time has passed" event, not a "Spin the CPU until this much time has passed" event. Calling Sleep() should let the CPU go off and handle other ACPI events or, well, anything else. So there's a legitimate discussion to be had about whether this is a sensible implementation or not, but in itself the Sleep() stuff is absolutely not the cause of all the latency.

What's causing these events in the first place? I thought I'd be able to work this out because the low GPE numbers are generally assigned to fixed hardware functions, but my back's been turned on this for about a decade and Intel's gone and made GPE 2 the "Software GPE" bit. What triggers a software GPE? Fucked if I can figure it out - it's not described in the chipset docs. Based on everything that's happening here it seems like it could be any number of things, the handler touches a lot of stuff.

But ok we have something that's executing a bunch of code. Is that in itself sufficient to explain video and audio drops? No. All of this is being run on CPU 0, and this is a multi-core laptop. If CPU 0 is busy, do it all on other cores. The problem here is that all cores are suddenly not executing the user code, and the most likely explanation for that is System Management Mode.

SMM is a CPU mode present in basically all Intel CPUs since the 386SL back in 1989 or so. Code accesses a specific IO port, the CPU stops executing the OS, and instead starts executing firmware-supplied code in a memory area the OS can't touch. The ACPI decompilation only includes the DSDT (the main ACPI table) and not any of the SSDTs (additional ACPI tables that typically contain code for additional components such as GPU-specific methods), so I can't look for sure, but what I suspect is happening here is that one of the _PS0 or _PS3 methods is triggering into SMM and the entire system[2] is halting while that code is run, which would explain why the latency is introduced at the system level rather than it just being "CPU 0 isn't doing stuff".

And, well, the root cause here is probably correctly identified, which is that the _L02 event keeps firing and when it does it's triggering a notification to the GPU driver that is then calling an ACPI method that generates latency. The rest of the conclusions are just not important in comparison. Sleep() is not an unreasonable thing to use in an ACPI method, it's unclear whether clearing the event bits is enough to immediately trigger another event, it's unclear whether sending events that trigger the _PS0/_PS3 dance makes sense under any circumstances here rather than worrying about the MUX state. There's not enough public information to really understand why _L02 is firing, nor what is trying to be achieved by powering up the GPU, calling _DOS, and then powering it down again.

[1] This is absolutely necessary for some hardware - we hit issues back in 2005 where an HP laptop just wouldn't work if you couldn't handle multiple ACPI events at once

[2] Why the entire system? SMM is able to access various bits of hardware that the OS isn't able to, and figuring out which core is trying to touch hardware is not an easy thing to work out, so there's a single "We are in SMM" bit and all cores are pushed into SMM and stop executing OS code before access is permitted, avoiding the case where going into SMM on one CPU would let OS code on another CPU access the forbidden hardware. This is all fucking ludicrous but here we are.

12 hours agoby mjg59

Back when Hackintoshes were a thing I remember that I had to tinker with ACPI bytecode stuff in order to get the battery icon displayed on the screen.

Unfortunately I can't remember the details now.

1 day agoby secondcoming

Great read!

Suspect may be in other products too, have seen similar issues elsewhere in Asus produce line

13 hours agoby OrvalWintermute