HN Reader

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase.

Main Features:

- Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...).

- Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb.

Performance:

We benchmarked single-threaded on a CPU:

- Python: ~4650 embeddings/sec

- Rust: ~8000 embeddings/sec (~1.7× speedup)

First open-source project in Rust for us, so would be great to get some feedback!

How does it handle documents longer than the context length of the model? Sorry there are a ton of these regularly and they don't usually think about this.

Edit: it seems like it just splits in to sentences which is a weird thing to do given in English only 95%ish percent agreement is even possible on what a sentence is. ``` // Process in batches for batch in sentences.chunks(batch_size) { // Truncate each sentence to max_length * median_token_length chars let truncated: Vec<&str> = batch .iter() .map(|text| { if let Some(max_tok) = max_length { Self::truncate_str(text, max_tok, self.median_token_length) } else { text.as_str() } }) .collect(); ```

8 hours agoby gthompson512

What is your preferred static text embedding model?

For someone looking to build a large embedding search, fast static embeddings seem like a good deal, but almost too good to be true. What quality tradeoff are you seeing with these models versus embedding models with attention mechanisms?

16 hours agoby noahbp

How do I load a custom model instead of the ones on Hugging Face?

5 hours agoby badmonster

Surprised it is so much faster. I would have thought the python one is C under the hood

16 hours agoby Havoc

I love that you're doing this, Tananon.

We've been using Candle and Cudarc and having a fairly good time of it. We've built a real time drawing app on a custom LCM stack, and Rust makes it feel rock solid. Python is way too flimsy for something like this.

The more the Rust ML ecosystem grows, the better. It's a little bit fledgling right now, so every little bit counts.

If llama.cpp had instead been llama.rs, I feel like we would have had a runaway success.

We'll be checking this out! Kudos, and keep it up!

9 hours agoby echelon