Ā·
AI & ML interests
Natural Language Processing, Poetry Generation, Linguistics, Low-resource languages
Recent Activity
posted an update 12 days ago š„ New Russian Stylometry Dataset!
Russian Stylometric Dataset (RSD) ā 322 texts from the 19th ā early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).
š What's inside?
Fiction, journalism, scientific texts, drama, poetry
Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)
Character speech (Tolstoy, Gogol, Ostrovsky)
Generated texts (LSTM, GPT)
š Use cases: authorship attribution, clustering, classification, benchmarking methods.
š Public domain + GPL-3.0 license.
š Learn more: https://github.com/nevmenandr/RSD
DOI: 10.5281/zenodo.20701309 View all activity Organizations