R
AI & ML interests
Recent Activity
Organizations
Pending Access Request to Join HF Blog Explorers
Not all is lost however. The outcome was a very in depth neural network atlas complete with its own SQLite queryable database for the Qwen3-8B model I can now share with you all. The data base combines these methods for a full in depth dive:
- Neuron Taxonomy
- Category Separation Scoring
- Co-activation Analysis
- Per-Head Decomposition
- Component Comparison
- Attribution Patching
- Sparse Non-negative Matrix Factorization
- NeuronLens
- DAS SVD rotation
- Cross-layer Coherence
- SQLite database
So if you've ever wondered where a specific behaviour or ability lives in the hidden dimensions of Qwen-8B or perhaps wanted to make informed quantization decisions please enjoy the fruits of my ill-informed labour lol. 😂
juiceb0xc0de/qwen3-8b-atlas
Qwen/Qwen3-8B
Apply for a GPU community grant: Personal project
Filter Models page by Base Models only
I applaud you in your journey into the void with small models. I too am deeply fascinated with the optimization of smaller models rather than asking for more parameters and terabytes of scraped internet data. I hope to see what you've come up with in a few weeks time.
I just finished designing a sparsity training scheduler that trains on average 35% of a models available weights with almost no hidden dimensions between transformers adjoined and zero throughput while randomizing trainable locations. It cuts VRAM and training time down and the models set higher benchmarks on mathematics than FFT models trained on the same corpus. I discovered this while fucking around for fun.
I don't doubt the discoveries to be made with training smaller architectures have many more surprises in store for us.
@danielhanchen what happened to this magnificent model!? I had the perfect place to slot it in to my team of AI bros! I would love to see this back on HF. 🤗