I implemented a neural network from scratch in x86 assembly (no frameworks, no Python) to recognize handwritten digits from MNIST.
Feedback on performance optimizations or next steps is welcome
Uses AVX-512 SIMD for parallel float32 ops (~7× faster than NumPy).
Runs in a lightweight Debian Slim Docker container.
The goal was to understand neural networks at the CPU level.
23 hours agoby mghaderi