A tiny 55 million parameter model trained on 1.3 billion tokens using a custom dataset mixture. Context length of 1024 tokens.

Dataset	Weight
`HuggingFaceFW/fineweb-edu`	50%
`epfml/FineWeb-HQ`	30%
`HuggingFaceTB/cosmopedia` (stories split)	20%

The tokenizer is a basic bpe tokenizer that was trained on a smaller subset of 80_000 samples of this same data mixture with a vocab size of 8000.

This model has not undergone any post-training.

This base model is best suited for fine-tuning on specific tasks. On its own, it is very limited, but it is a pretty flexible foundation for applications such as toxic comment detection or sentiment analysis.

Downloads last month: 737

Safetensors

Model size

55.7M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using finnianx/michel-tiny 2

Collection including finnianx/michel-tiny

Michel V1

Collection

All first generation Michel models. • 3 items • Updated 2 days ago