HN Reader

Show HN: archgw: open-source, intelligent proxy for AI agents, built on Envoy

Hi HN! This is Adil, Salman, Co and Shuguang and we're excited to introduce archgw [1], an open source intelligent proxy for agents built on Envoy [2]. Arch moves the critical but crufty work around safety, observability, and routing of prompts outside business logic. Arch is a uniquely intelligent infrastructure primitive, engineered with purpose-built fast LLMs [3] for tasks like intent detection over multi-turn, parameter identification and extraction, triggering single/multiple function calls, and offers convenience features to auto dispatch LLM calls for summarization based on data from your APIs via system prompts configured in archgw.

Today, the approach to build a smart production-ready agent is weaving together a large set of mono-functional opinionated libraries, adding extra layers like LLM-based preprocessing to determine things like relevance and safety of the user's prompt (e.g. applying governance and guardrails). Once past that stage, developers must extract relevant information from the user prompt to determine intent, extract parameters as necessary, package relevant tools calls to an LLM to trigger a backend API to execute particular domain-specific task. etc. After all that is done then only are developers ready to trigger an LLM call for summarization and must manage upstream error handling and retry logic themselves. Not to mention, if they want to experiment with multiple LLMs or move between LLM versions, they have to write crufty undifferentiated code. This entire experience is slow, error prone, cumbersome, and not specifically unique.

Prior to building archgw, the team spent time building Envoy [2] at Lyft, API Gateway at AWS, specialized search and intent models at Microsoft Research and worked on safety at Meta. archgw was born out of the belief that several rules based mono-functional tools should be converged into a multi-functional infrastructure primitive designed for prompts and agents. We built archgw on the highly popular, battle-tested open source proxy Envoy and re-imagined it for prompts and agents. For this we had to build blazing fast LLMs [3] that can handle crufty, ahead-in-the-request-path type of work in handling and processing prompts that are sent to an agent, so that developers can focus on what matters most: building fast personalized agents without the unnecessary prompt engineering and systems integration work needed to get there.

Here are some additional details about the open source project. arghw is written in rust, and the request path has three main parts:

* Listener subsystem which handles downstream (ingress) and upstream (egress) request processing.

* Prompt handler subsystem. This is where archgw makes decisions on the safety of the incoming request via its prompt_guard primitive and identifies where to forward the conversation to via its prompt_target primitive.

* Model serving subsystem is the interface that hosts all the lightweight LLMs engineered in archgw and offers a framework for things like hallucination detection of our these models

We loved building this open source project, and our belief is that this infra primitive would help developers build faster, safer and more personalized agents without all the manual prompt engineering and systems integration work needed to get there. We hope to invite other developers to use and improve Arch. Please give it a shot and leave feedback here, or at our discord channel [4]

Also here is a quick demo of the project in action [5]. You can check out our public docs here at [6]. Our models are also available here [7].

[1] https://github.com/katanemo/archgw

[2] https://www.envoyproxy.io/

[3] https://huggingface.co/collections/katanemo/arch-function-66...

[4] https://discord.com/channels/1292630766827737088/12926307682...

[5] https://www.youtube.com/watch?v=I4Lbhr-NNXk

[6] https://docs.archgw.com/

[7] https://huggingface.co/katanemo

why didn’t you build your own gateway from ground up? especially when rust runtime in envoy is not production ready yet. From envoyproxy,

… This extension is functional but has not had substantial production burn time, use only with this caveat.

This extension has an unknown security posture and should only be used in deployments where both the downstream and upstream are trusted.

2 days agoby mudassaralam

For complex agent scenarios where there might be COT reasoning needed how does archgw work in that scenario? BTW nice detailed post. And congratulations on the launch.

2 days agoby naveed174

I am interested in knowing how the arch would run? Is it a library that I need to add in my code to make it work? Or do I need to deploy some sort of service in my infrastructure?

2 days agoby herewhere

Hey Adil, Thanks for sharing and congratulations on launch.

Can I just use arch for routing between LLMs? And what LLMs do you support? And what about key management? Do I manage access keys myself?

2 days agoby fahimulhaq

Congrats Adil! Interested idea with lot of potential.

Do you have to use envoyproxy to use archgw? Can archgw be used for LLM routing without using envoyproxy?

2 days agoby mikram

This is honestly quite a detailed and thoughtfully put together post. I do have some questions and would love to hear your thoughts on those. First off, can I use just the model itself? Do you have models hosted somewhere or they run locally? If they run locally what are the system requirements? Can I build RAG based applications on arch? And how do you do intent detection in multi-turn dialogue? How does parameter gathering work, is the model capable of conversing with the user to gather parameters?

2 days agoby Nomi21

With all the focus on language specific frameworks - this out of process architecture choice is an interesting one. On one hand, it helps you side step the "is this functionality available on js, java, etc" question, and on the other it means its not as easy as `import archgw` in python. Good luck though, feels like an interesting project

2 days agoby honorable_judge