A tiny 55 million parameter model trained on 1.3 billion tokens using a custom dataset mixture. Context length of 1024 tokens.

Dataset Weight
HuggingFaceFW/fineweb-edu 50%
epfml/FineWeb-HQ 30%
HuggingFaceTB/cosmopedia (stories split) 20%

The tokenizer is a basic bpe tokenizer that was trained on a smaller subset of 80_000 samples of this same data mixture with a vocab size of 8000.

This model has not undergone any post-training.

This base model is best suited for fine-tuning on specific tasks. On its own, it is very limited, but it is a pretty flexible foundation for applications such as toxic comment detection or sentiment analysis.

Downloads last month
737
Safetensors
Model size
55.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using finnianx/michel-tiny 2

Collection including finnianx/michel-tiny