A spiking neural network allegedly reached 1.088 billion parameters and trained from random initialization to a reported loss of 4.4 before the builder ran out of money. That is the claim. The more interesting question is whether the evidence says pure SNN language models are finally leaving the toy-demo phase.
I started out expecting the answer to be “not really.” Most prior work says direct training is the hard part. Spikes are binary events, gradients are awkward, and deeper networks tend to lose activity as signals propagate. The best-known prior baseline here, SpikeGPT, reported 45M and 216M-parameter models and framed direct SNN language generation as notable partly because the field had not gotten very far at scale.
What changed my mind is not that one 18-year-old on the internet says he built a big model. It’s that the pattern in the self-reported results is exactly the kind of pattern you’d expect if scale starts changing SNN behavior: extreme sparsity stays intact, persistent memory becomes more useful, and odd capabilities appear mid-training. Some of that is verified by prior literature. Some is only the builder’s claim. Separating those two is the whole story.
Why a 1B spiking neural network matters now
The scale milestone itself is plausible but not independently verified. The builder claims a 1.088B-parameter pure SNN language model, trained from scratch, with a 12GB checkpoint including weights and optimizer state on GitHub. Without an external replication, that is still one person’s report.
Even so, the scale claim matters because it would move past the prior public direct-training baseline by a lot. SpikeGPT explicitly described 216M parameters as, to the authors’ knowledge, the largest backpropagation-trained SNN model at the time. Going from 216M to 1.088B is not a rounding error. It is roughly 5x larger.
| Model | Reported scale | Training claim | Verification status |
|---|---|---|---|
| SpikeGPT | 45M, 216M | Directly trained SNN language model | Verified by paper |
| Indie project | 1.088B | Pure SNN from random init, stopped at 27k steps | Unverified beyond self-report |
The obvious objection is that parameter count alone proves little. True. A bad billion-parameter model is still a bad model. The builder says generations were “janky and nowhere near GPT-2 fluency yet,” which is the sort of limitation people usually omit when they are trying to sell you magic beans. That honesty counts for something.
The deeper reason scale matters is architectural. Training a pure SNN from random initialization is considered hard because spikes quantize continuous membrane values into binary events, which causes information loss and can produce vanishing spikes in deep layers. That is not forum folklore. It is the explicit premise of the paper on deep activity propagation via weight initialization, which studies how to preserve activity through deep SNNs.
If a pure spiking neural network really did cross 1B parameters and still converge at all, the interesting question becomes less “can SNNs work?” and more “what behaviors appear only once they get large?”
What actually made this SNN converge
Here the evidence is mixed.
What is verified: deep SNNs are hard to train, and initialization matters a lot. The Deep activity propagation paper argues standard ANN-style initialization misses the special problem in SNNs: binary spiking can kill activity as depth increases. Their proposed initialization is designed specifically to preserve spikes through deep networks, and they show better convergence in experiments.
What is plausible but unverified in this specific project: the model converged because of a particular set of bounded neuron dynamics, surrogate gradients, and memory-heavy recurrent design choices. We do not have a paper, ablation study, or independent reproduction. We have code references, a checkpoint claim, and discussion in comments.
That comment thread is more useful than it sounds. One technically detailed reply drills into exactly the right failure points: decay initialization, clamping versus functional bounding, stochastic versus deterministic spikes, surrogate choice like sigmoid(4x) versus Atan(2), and separate positive/negative traces. In other words, the discussion immediately moved to the known pain points of SNN training, not vague cheering.
That matters because it suggests the builder is operating inside the real design space, not inventing sci-fi nouns. The linked traceTorch project is relevant here too. It is a real PyTorch library built around SNN, RNN, and state-space-style recurrent layers, with explicit state management, multiple spiking layer types, and support for persistent hidden dynamics. That does not validate this 1B run. It does validate that the surrounding implementation ideas are standard enough to be legible.
The missing piece is the one you’d want most: an ablation showing why this model converged when others fail. Right now, “a pure SNN converged from random init” is unverified outside the builder’s own report. “Initialization and spike-preserving dynamics are central to whether deep SNNs train at all” is well supported by literature.
The three results that change the SNN conversation
The first result is ~93% sparsity, meaning roughly 7% of neurons fire per token. That number is unverified for this run, but the general premise is verified: sparse, event-driven activations are the whole reason SNNs attract interest. SpikeGPT argued neuromorphic hardware could cut operations by 20x when it can exploit those sparse events.
This is where people often get sloppy. 93% sparsity does not automatically mean 93% lower real-world cost. On ordinary GPUs, irregular sparsity is often annoying rather than cheap. Memory movement, kernel overhead, and control flow can eat the gains. On neuromorphic hardware, though, sparse activations are much closer to the hardware’s native economics. That is why the builder’s question about Loihi is sensible.
The second result is the weird one: Russian text reportedly emerged around 25K steps without special weighting in the dataset mix. That is entirely unverified. It could be real emergence. It could be contamination from multilingual training data. It could be cherry-picked samples. Treat it as an anecdote, not a finding.
Still, the anecdote is interesting for one reason. In large models, odd capabilities often show up as phase changes during training rather than smooth progress. We have seen versions of that pattern elsewhere. If sparse recurrent SNNs have their own capability thresholds, the exact question for future work is not “did Russian appear?” but “what metrics jumped around that step?” Loss curve? routing entropy? layer firing rates? No data, no conclusion.
The third result is the strongest: a reported 39% shift of activation routing into a persistent memory module when scaling from 600M to 1B. Again, this is only the builder’s claim. But if it holds up, it is the one result that would matter most strategically.
Why? Because it hints that scale may push SNNs toward memory-first computation. A sparse recurrent system that increasingly routes work through persistent state is not just a worse transformer. It is a different tradeoff surface. That connects directly to why memory architectures are getting renewed attention in mainstream AI as well, see our piece on the AI memory system. Bigger models may not just need more parameters. They may need better places to store useful temporal structure.
What the work does and doesn’t prove
Here is the clean version.
Verified:
– Deep SNNs are unusually hard to train from scratch because spiking can kill activity in deep networks.
– Prior direct-training SNN language models existed, including SpikeGPT at 45M and 216M parameters.
– Sparse activations are the core promise of SNNs, especially for neuromorphic hardware.
– Recurrent state handling and explicit memory dynamics are normal, important design concerns in SNN tooling.
Plausible but not independently verified:
– A pure spiking neural network reached 1.088B parameters.
– It trained from random initialization to loss 4.4 in 27K steps.
– It maintained ~93% sparsity.
– Scaling caused a 39% routing shift into persistent memory.
Unverified anecdote:
– The mid-training emergence of structurally correct Russian text.
That leaves us with a conclusion that is narrower than the hype and more interesting than the dismissal. This does not prove SNNs are ready to replace dense language models. The reported loss is high. The generations are reportedly poor. There is no benchmark table, no energy measurement on target hardware, and no third-party reproduction.
But it may prove something else: once SNN language models get large enough, the main story might stop being “can they imitate transformers?” and start being “what new training dynamics appear when sparse recurrence and persistent memory have enough scale to matter?” That is a much better question. It also overlaps with adjacent efforts in neuro-symbolic AI and more automated research workflows like Karpathy Autoresearch: the frontier is shifting from raw size to what structures let models use computation more selectively.
Key Takeaways
- The 1.088B-parameter spiking neural network is a meaningful claim, not a verified milestone. Treat it as promising until reproduced.
- The strongest evidence is historical, not personal: prior papers already show direct SNN language training is possible and deep SNN convergence depends heavily on initialization.
- 93% sparsity is only valuable if hardware can exploit it. On neuromorphic systems, maybe. On GPUs, not automatically.
- The memory-routing shift is the most important reported result. If real, it suggests scale changes how SNNs compute, not just how big they are.
- The right question has changed. For large SNNs, the bottleneck may be less “can they work?” and more “what architectures emerge once they do?”
Further Reading
- SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks, Prior direct-training SNN language model baseline at 45M and 216M parameters.
- Deep activity propagation via weight initialization in spiking neural networks, Why deep SNNs are hard to train and how initialization can preserve activity.
- traceTorch, A PyTorch library for SNNs, RNNs, and SSMs that shows the practical state-management patterns discussed here.
- IBM’s TrueNorth spiking-neural-network work, Historical background on why neuromorphic hardware keeps pulling people back to spiking models.
A billion-parameter SNN is not the story. The story is that if these results replicate, spiking neural network research may be entering the phase where scale stops being a bragging right and starts being a source of new behavior.
