Hi there, thanks for writing and sharing your experiences. I'm one of the builders of GoodMem (
https://goodmem.ai/), which is infra to simplify end-to-end RAG/agentic memory systems like the one you built.
It's built on Postgres, which I know you said you left behind, but one of the cool features it supports is hybrid search over multiple vector representations of a passage, so you can do a dense (e.g. nomic) and sparse (e.g. splade) search. Reranking is also built in, although it lacks automatic caching (since, in general, the corpus changes over time)
It also deploys to fly.io/railway and costs a few bucks a month to run if you're willing to use cloud-hosted embedding models (otherwise, you can run TEI/vLLM on CPU or GPU for the setup you described).
I hope it's helpful to someone.