Michel V1
Collection
All first generation Michel models. β’ 3 items β’ Updated
A tiny 55 million parameter model trained on 1.3 billion tokens using a custom dataset mixture. Context length of 1024 tokens.
| Dataset | Weight |
|---|---|
HuggingFaceFW/fineweb-edu |
50% |
epfml/FineWeb-HQ |
30% |
HuggingFaceTB/cosmopedia (stories split) |
20% |
The tokenizer is a basic bpe tokenizer that was trained on a smaller subset of 80_000 samples of this same data mixture with a vocab size of 8000.
This model has not undergone any post-training.
This base model is best suited for fine-tuning on specific tasks. On its own, it is very limited, but it is a pretty flexible foundation for applications such as toxic comment detection or sentiment analysis.