Reminds me of PG-Strom[1] which is a Postgres extension for GPU-bound index access methods (most notably BRIN, select GIS functions) and the like; it relies on proprietary NVIDIA GPUDirect tech for peer-to-peer PCIe transactions between the GPU and NVMe devices. I'm not sure whether amdgpu kernel driver has this capability in the first place, and last I checked (~6 mo. ago) ROCm didn't have this in software.
However, I wonder whether the GPU's are a good fit for this to begin with.
Counterpoint: Xilinx side of the AMD shop has developed Alveo-series accelerators which used to be pretty basic SmartNIC platforms, but have since evolved to include A LOT more programmable logic and compute IP. You may have heard about these in video encoding applications, HFT, Blockchain stuff, what-have-you. A lot of it has to with AI stuff, see Versal[2]. Raw compute figures are often cited as "underwhelming," and it's unfortunate that so many pundits are mistaking the forest for the trees here. I don't think the AI tiles in these devices are really meant for end-to-end LLM inference, even though memory bandwidth in the high-end devices allows it.
The sauce is compute-in-network over fabrics.
Similarly to how PG-Strom would feed the GPU with relational data from disk, or network directly, many AI teams on the datacenter side are now experimenting with data movement, & intermediate computations (think K/V cache management) over 100/200/800+G fabrics. IMHO, compute-in-network is the MapReduce of this decade. Obviously, there's demand for it in the AI space, but a lot of it lends nicely to the more general-purpose applications, like databases. If you're into experimental networking like that, Corundum[3] by Alex Forencich is a great, perhaps the best, open source NIC design for up to 100G line rate. Some of the cards it supports also expose direct-attach NVMe's over MCIO for latency, and typically have as many as two, or four SFP28 ports for bandwidth.
This is a bit naive way to think about it, but it would have to do!
Postgres is not typically considered to "scale well," but oftentimes this is a statement about its tablespaces more than anything; it has foreign data[4] API, which is how you extend Postgres as single point-of-consumption, foregoing some transactional guarantees in the process. This is how pg_analytics[5] brings DuckDB to Postgres, or how Steampipe[6] similarly exposes many Cloud and SaaS applications. Depending on where you stand on this, the so-called alternative SQL engines may seem like moving in the wrong direction. Shrug.
[1] https://heterodb.github.io/pg-strom/
[2] https://xilinx.github.io/AVED/latest/AVED%2BOverview.html
[3] https://github.com/corundum/corundum
[4] https://wiki.postgresql.org/wiki/Foreign_data_wrappers
[5] https://github.com/paradedb/pg_analytics
[6] https://hub.steampipe.io/#plugins