HN Reader

NewTopBestAskShowJob
Show HN: Csvdb – Git-friendly CSV directories that convert to SQLite or DuckDB
score icon2
comment icon0
4 hours agoby jeff-gorelick
I built csvdb because I kept running into the same problem: I had small relational datasets (config tables, rate tables, seed data) that I wanted to version control, but SQLite files produce useless git diffs.

csvdb converts between a directory of CSV files and SQLite/DuckDB databases. The CSV side is the source of truth you commit to git. The database side is what you query.

  csvdb to-csvdb mydb.sqlite    # export to CSV directory
  vim mydb.csvdb/rates.csv      # edit data
  git diff mydb.csvdb/          # meaningful row-level diffs
  csvdb to-sqlite mydb.csvdb/   # rebuild database
The key design decision is deterministic output. Rows are sorted by primary key, so identical data always produces identical CSV files. This means git diffs show only actual data changes, not row reordering noise.

A few details that took some thought:

- NULL handling: CSV has no native NULL. By default csvdb uses \N (the PostgreSQL convention) to distinguish NULL from empty string. Roundtrips are lossless.

- Format-independent checksums: csvdb checksum produces the same SHA-256 hash whether the input is SQLite, DuckDB, or a csvdb directory. Useful for verifying conversions.

- Schema preservation: indexes and views survive the roundtrip, not just tables.

Written in Rust. Beta quality -- the file format may still change.

https://github.com/jeff-gorelick/csvdb