HN Reader

NewTopBestAskShowJob
Show HN: Searchable compression for JSON (p50≈0.18 ms; 10-min demo)
score icon4
comment icon1
3 days agoby kodomonocch1
Hi! I built SEE (Semantic Entropy Encoding) because the “data tax” (storage/egress) and the “CPU tax” (decompress/parse) keep rising together.

Tradeoff: it’s not always smaller than Zstd, but it stays searchable while compressed and minimizes I/O. Key numbers (demo): combined≈19.5% of raw, skip≈99%, lookup p50≈0.18 ms (bloom≈0.30).

10-min reproduction (no marketing): 1) Download the Demo ZIP (Release). 2) Follow README_FIRST.md. 3) Run `python samples/quick_demo.py` → prints ratio/skip/bloom + p50/p95/p99.

ROI quick math: Savings/TB ≈ (1 − 0.195) × Price_per_GB × 1000 (e.g., $0.05/GB → ~$40/TB). NDA/VDR (private, no confidential info in public): [https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...]

Happy to answer technical questions (schema-aware layout, delta strategy, bloom density, skip heuristics, failure modes).