Bijankhan Corpus
![](http://upload.wikimedia.org/wikipedia/commons/2/22/Bijankhan_Corpus_Logo.gif)
The Bijankhan corpus (Persian: پیکرهٔ بیجنخان) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.
The Bijankhan corpus was created by the Database Research Group at the University of Tehran.[1] The corpus is non-free in that it is not free for commercial use, although these restrictions vary by country. The Bijankhan corpus is named after Mahmood Bijankhan, professor of linguistics at the University of Tehran due to his contributions in this area.
See also
- Hamshahri Corpus
- Persian Today Corpus
References
- ^ "Database Research Group". Archived from the original on 2017-05-15. Retrieved 2016-12-25.
External links
- [1].
- v
- t
- e
English
- American National Corpus
- Bank of English
- Bergen Corpus of London Teenage Language
- British National Corpus
- Brown Corpus
- Buckeye Corpus
- Cambridge English Corpus
- Corpus of Contemporary American English
- Enron Corpus
- EnTenTen
- International Corpus of English
- Lancaster-Oslo-Bergen Corpus
- Oxford English Corpus
- PropBank
- Spoken English Corpus
- Switchboard Telephone Speech Corpus
- TIMIT
- VerbNet
- Wellington Corpus of Spoken New Zealand English
non-English
- Bijankhan Corpus
- CHILDES
- CorCenCC National Corpus of Contemporary Welsh
- Croatian Language Corpus
- Croatian National Corpus
- Czech National Corpus
- Europarl Corpus
- German Reference Corpus
- Hamshahri Corpus
- National Corpus of Polish
- Neo-Assyrian Text Corpus Project
- Persian Speech Corpus
- Quranic Arabic Corpus
- Russian National Corpus
- Scottish Corpus of Texts and Speech
- Slovenian National Corpus
- TalkBank
- Tatoeba
- Tehran Monolingual Corpus
- Tekstaro de Esperanto
- TenTen Corpus Family
- Thesaurus Linguae Graecae
![]() | This article about a digital library is a stub. You can help Wikipedia by expanding it. |
- v
- t
- e
![]() | This Indo-European languages-related article is a stub. You can help Wikipedia by expanding it. |
- v
- t
- e