HN Reader

Show HN: 5MB Rust binary that runs HuggingFace models (no Python)

Hi HN! I built Shimmy, a lightweight AI inference server that can now load HuggingFace SafeTensors models directly without any Python dependencies.

  The core problem: I wanted to run HuggingFace models locally but didn't want the heavyweight Python ML stack. Most solutions require Python + PyTorch + transformers libraries, which can be 2GB+ just for
   dependencies.

  What's new in v1.2.0:
  • Native SafeTensors support - loads .safetensors files directly in Rust
  • 2x faster model loading compared to traditional formats
  • Zero Python dependencies - pure Rust implementation
  • Still just a 5MB binary (vs 50MB+ alternatives like Ollama)
  • Full OpenAI API compatibility for drop-in replacement

  Technical details:
  - Built with native SafeTensors parsing (not Python bindings)
  - Memory-efficient tensor loading with bounds checking
  - Tested up to 100MB+ models with sub-second loading
  - Cross-platform: Windows, macOS (Intel/ARM), Linux
  - Supports mixed model formats (GGUF + SafeTensors)

  This bridges the gap between HuggingFace's model ecosystem and lightweight local deployment. You can now grab any SafeTensors model from HuggingFace and run it locally with just a single binary.

  GitHub: https://github.com/Michael-A-Kuykendall/shimmy
  Install: `cargo install shimmy`

  Happy to answer questions about the SafeTensors implementation or Rust AI inference in general!

that's great !

I will try it, does it support GPU like cuda?

one question, can I use it as a library in my rust project, or I can only call it through new process with exe file?

10 hours agoby somesun