HN Reader

NewTopBestAskShowJob
Show HN: 5MB Rust binary that runs HuggingFace models (no Python)
score icon2
comment icon2
17 hours agoby MKuykendall
Hi HN! I built Shimmy, a lightweight AI inference server that can now load HuggingFace SafeTensors models directly without any Python dependencies.

  The core problem: I wanted to run HuggingFace models locally but didn't want the heavyweight Python ML stack. Most solutions require Python + PyTorch + transformers libraries, which can be 2GB+ just for
   dependencies.

  What's new in v1.2.0:
  • Native SafeTensors support - loads .safetensors files directly in Rust
  • 2x faster model loading compared to traditional formats
  • Zero Python dependencies - pure Rust implementation
  • Still just a 5MB binary (vs 50MB+ alternatives like Ollama)
  • Full OpenAI API compatibility for drop-in replacement

  Technical details:
  - Built with native SafeTensors parsing (not Python bindings)
  - Memory-efficient tensor loading with bounds checking
  - Tested up to 100MB+ models with sub-second loading
  - Cross-platform: Windows, macOS (Intel/ARM), Linux
  - Supports mixed model formats (GGUF + SafeTensors)

  This bridges the gap between HuggingFace's model ecosystem and lightweight local deployment. You can now grab any SafeTensors model from HuggingFace and run it locally with just a single binary.

  GitHub: https://github.com/Michael-A-Kuykendall/shimmy
  Install: `cargo install shimmy`

  Happy to answer questions about the SafeTensors implementation or Rust AI inference in general!