yosefw/amharic-news-retrieval-dataset-v2-with-negatives-V2
Viewer • Updated • 68.3k • 19
How to use kiyam/Harrier-270M-Amharic with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("kiyam/Harrier-270M-Amharic")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]Microsoft Harrier OSS v1 270M fine-tuned on Amharic passage retrieval supervision. This model was introduced in the paper The Multilingual Curse at the Retrieval Layer: Evidence from Amharic.
Code: https://github.com/rasyosef/amharic-neural-ir
| Model | R@5 | R@10 | MRR@10 | NDCG@10 |
|---|---|---|---|---|
microsoft/harrier-oss-v1-270m (zero-shot, prompted) |
0.697 | 0.753 | 0.576 | 0.619 |
| This model (fine-tuned) | 0.860 | 0.903 | 0.760 | 0.795 |
Fine-tuning yields a +32.0% relative MRR@10 gain over zero-shot. This is the strongest Amharic-fine-tuned multilingual retriever in the paper.
Evaluation dataset: rasyosef/Amharic-Passage-Retrieval-Dataset-V2
This fine-tuned model is used without the zero-shot instruction prompt:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("kiyam/Harrier-270M-Amharic")
queries = ["የኢትዮጵያ ዋና ከተማ የትኛው ናት?"]
passages = ["አዲስ አበባ የኢትዮጵያ ዋና ከተማ ናት።"]
query_embeddings = model.encode(queries, normalize_embeddings=True)
passage_embeddings = model.encode(passages, normalize_embeddings=True)
scores = query_embeddings @ passage_embeddings.T
This model uses Matryoshka embeddings — you can truncate to shorter dimensions (e.g. 256) for faster retrieval at a small quality cost:
query_embeddings = model.encode(queries, normalize_embeddings=True)[:, :256]
microsoft/harrier-oss-v1-270m (270M parameters)yosefw/amharic-news-retrieval-dataset-v2-with-negatives-V2@inproceedings{alemneh2026amharicir,
title = {The Multilingual Curse at the Retrieval Layer: Evidence from Amharic},
author = {Alemneh, Yosef Worku and Mekonnen, Kidist Amde and de Rijke, Maarten},
booktitle = {Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM), ACL 2026},
year = {2026},
}
Base model
microsoft/harrier-oss-v1-270m