In the list of features, it mentions:
> vision-based search for comprehensive document understanding
but it's not clear to me what this means, is it just vector embeddings for each image in every document via a CLIP-like model?
In addition, I'd be curious what's the rationale behind using the plethora of databases, given the docs on running it in production spins them all up, I assume they're all required, for instance I'd be curious on the trade-offs between using postgres with something like pg_search (for bm25 support, which vanilla postgres FTS doesn't have) vs using both postgres and ElasticSearch.
The docs are also very minimal, I'd have loved to see at least 1 example of usage.