HN Reader

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

I wrote a short position paper arguing that current agentic AI safety failures are the confused deputy problem on repeat. We are handing agents ambient authority and trying to contain it with soft constraints like prompts and userland wrappers. My take: you need hard, reduce-only authority enforced at a real boundary (kernel control plane class), not something bypassable from userland. Curious how others are modeling this. What constraints do you think are truly non-negotiable?

Here are some important differences:

- The players in competitive games don't write code. Coding agents do. When you copy the code outside the sandbox and run it, what permissions does it get?

- Game players usually don't have access to confidential material, so you don't need to prevent them from exfiltrating it.

7 hours agoby skybrian

Was this written with a LLM? If so, please add a note about it at the start of the README.

8 hours agoby mzajc

People want convenience more than they want security. No one wants permission grants to go away in minutes or hours. Every time the agent is stopped by permissions grant check, the average user experience is a little worse.

8 hours agoby solidasparagus

I recommend researching sandboxes.

3 hours agoby cadamsdotcom

> I wrote a short position

> "Reality check"

Hi GPT :)

7 hours agoby zb3