HN Reader

328

135

7 hours agoby tosh

Coolest part of Qwen3-Next, in my opinion, (after the linear attention parts) is that they do MTP without adding another un-embedding matrix.

Deepseek R1 also has a MTP layer (layer 61) https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/mod...

But Deepseek R1 adds embed_tokens and shared_head.head tensors, which are [129280, 7168] or about 2GB in size at FP8.

Qwen3-Next doesn't have that: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob...

So it saves a few GB in active parameters for MTP, which is a Big Deal. This is one of the changes that helps significantly speeds up inference.

6 hours agoby jychang

> The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and shows clear advantages in tasks requiring ultra-long context (up to 256K tokens).

This is pretty impressive and a bit like how the GPT-OSS-120B came out and scored pretty well on the benchmarks despite its somewhat limited size.

That said, using LLMs for software dev use cases, I wouldn't call 256K tokens "ultra-long" context, I regularly go over 100K when working on tasks with bigger scope, e.g.:

  Look at the existing code related to this functionality and the existing design patterns in the code as well as the guidelines.
  Then plan out the implementation in detail and ask me a few questions along the way to figure the details out better.
  Finally, based on everything so far, do the actual implementation.
  Then look it over and tell me if anything has been missed from the plan, then refactor the code in any number of ways.

It could be split up into multiple separate tasks, but I find that the context being more complete (unless the model starts looping garbage, which poisons the context) leads to better results.

My current setup of running Qwen3 Coder 480B on Cerebras bumps into the 131K token limit. If not for the inference speed there (seriously great) and good enough model quality, I'd probably look more in the direction of Gemini or Claude again.

22 minutes agoby KronisLV

Alibaba keeps releasing gold content

I just tried Qwen3-Next-80B-A3B on Qwen chat, and it's fast! The quality seem to match Qwen3-235B-A22B. Quite impressive how they achieved this. Can't wait for the benchmarks at Artificial analysis

According to Qwen Chat, Qwen3-Next has the following limits:

Maximum context length: 262,144 tokens

Max summary generation length: 32,768 tokens

This is 2x higher on context length and 4x higher on summary generation compared to Qwen3-235B-A22B, damn

> Qwen3-Next [...] excels in ultra-long-context understanding and complex tasks

Even though their new hybrid architecture is fascinating, I think I'll continue to stick with Qwen2.5-Turbo because it's one of the few models that supports 1M tokens in context length. My use case is uploading large pdfs and ask questions across chapters

3 hours agoby Alifatisk

The same week Oracle is forecasting huge data center demand and the stock is rallying. If these 10x gains in efficiency hold true then this could lead to a lot less demand for Nvidia, Oracle, Coreweave etc

2 hours agoby mmmllm

The craziest part is how far MoE has come thanks to Qwen. This beats all those 72B dense models we’ve had before and runs faster than 14B model depending on how you off load your VRAM and CPU. That’s insane.

5 hours agoby syntaxing

llm -m qwen3-next-80b-a3b-thinking "An ASCII of spongebob"

Here's a classic ASCII art representation of SpongeBob SquarePants:

     .------.
    /  o o  \
   |         |
   |  \___/  |
    \_______/


  llm -m chutes/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  "An ASCII of spongebob"

  Here's an ASCII art of SpongeBob SquarePants:

      .--..--..--..--..--..--.
    .' \  (`._   (_)     _   \
  .'    |  '._)         (_)  |
  \ _.')\      .----..--.'  /
  |(_.'  |    /    .-\-.  \---.
  \     0|    |   ( O| O) |    |
   |  _  |  .--.____.'._.-.  |
   \ (_) | o         -` .-`  |
    |    \   |`-._ _ _ _ _\ /
    \    |   |  `. |_||_|   |
     | o |    \_       \    |     -.   .-.
     |.-.  \    `--..-'   O |     `.`-' .'
   _.'  .' |     `-.-'      /-.__   ' .-'
  .' `-.` '.|='=.='=.='=.='=|._/_ `-'.'
  `-._  `.  |________/\_____|    `-.'
     .'   ).| '=' '='\/ '=' |
     `._.`  '---------------'
             //___\   //___\
               ||       ||
               ||_.-.   ||_.-.
              (_.--__) (_.--__)

Meta: I generated a few dozen spongebobs last night on the same model and NONE where as good as this. Most started well but collapsed into decoherence at the end - missing the legs off. Then this morning the very same prompt to the same model API produced a perfect bob on the first attempt. Can utilization affect response quality, if all else remains constant? Or was it just random luck?

Edit: Ok, the very next attempt, a few minutes later, failed, so I guess it is just random, and you have about a 1 in 10 chance of getting a perfect spongebob from qwen3-coder, and ~0 chance with qwen3-next.

5 hours agoby irthomasthomas

For anyone curious about what the Gated Delta Network is: https://arxiv.org/pdf/2412.06464

5 hours agoby mynti

Seems impressive, i believe better architectures are really the path forward, i don't think you need more than 100B params taking this model and what GPT OSS 120B can acchieve

6 hours agoby Jgoauh

https://archive.is/JH9XL

6 hours agoby croemer

Hmm. 80B. These days I am on the lookout for new models in the 32B range, since that is what fits and runs comfortably on my MacBook Pro (M4, 64GB).

I use ollama every day for spam filtering: gemma3:27b works great, but I use gpt-oss:20b on a daily basis because it's so much faster and comparable in performance.

4 hours agoby jwr

Complete newbie here - some questions, if I may!

This stuff can run on a local machine without internet access, correct?

And it can pretty much match Nano Banana? https://github.com/PicoTrex/Awesome-Nano-Banana-images/blob/...

Also -- what are the specs for a machine to run it (even if slowly!)

5 hours agoby slimebot80

qwen3-max was also released last week.

36 minutes agoby esafak

I’ve been using gpt-oss-120B with CPU MoE offloading on a 24GB GPU and it’s very usable. Excited to see if I can get good results on this now!

1 hour agoby binary132

I was getting a bunch of strange hallucinations and weird dialog. It sounds like some exasperated person on the verge of a mental breakdown

2 hours agoby kristopolous