I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.
Loading data into DuckDB is super easy, I was surprised :
SELECT
avg(sale_price),
count(DISTINCT customer_id)
FROM '/my-data-lake/sales/2024/*.json';
and you can also load into a JSON type column and can use postgres type syntax
col->>'$.key'