HN Reader

Hypernetworks: Neural Networks for Hierarchical Data

Factorization is key here. It separates dataset-level structure from observation-level computation so the model doesn't waste capacity rediscovering structure.

I've been arguing the same for code generation. LLMs flatten parse trees into token sequences, then burn compute reconstructing hierarchy as hidden states. Graph transformers could be a good solution for both: https://manidoraisamy.com/ai-mother-tongue.html

17 hours agoby QueensGambit

What a good post! I loved the takeaways at the end of each section.

I think it would maybe get more traction if the code was in pytorch or JAX. It’s been a long while since I’ve seen people use Keras.

1 day agoby stephantul

Odd that the author didn’t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. There’s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.

1 day agoby joefourier

This is actually the way to AGI, ngl. Come back when it lands and see that it's right.

17 hours agoby keepamovin