Skip to content

Latest commit

 

History

History
39 lines (28 loc) · 2.9 KB

named_entity_recognition.md

File metadata and controls

39 lines (28 loc) · 2.9 KB

Named entity recognition

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.

Example:

Mark Watney visited Mars
B-PER I-PER O B-LOC

ArmanPersoNERCorpus

The ArmanPersoNERCorpus dataset contains 7,682 sentences with 250,015 tokens tagged in IOB format in six different classes, Organization, Person, Location, Facility, Event, and Product.

Download Links: ARMAN

Model F1 Paper / Source Code
ParsBERT (Farahani et al., 2020) 99.84 ParsBERT: Transformer-based Model for Persian Language Understanding Official
LSTM-CRF (Hafezi, Rezaeian, 2018) 86.55 Neural Architecture for Persian Named Entity Recognition -
mBERT (Taher et al., 2020) 84.03 Beheshti-NER: Persian Named Entity Recognition Using BERT Official
Deep-CRF (Bokaei, Mahmoudi, 2018) 81.50 Improved Deep Persian Named Entity Recognition -
Deep-Local (Bokaei, Mahmoudi, 2018) 79.19 Improved Deep Persian Named Entity Recognition -
BiLSTM-CRF (Poostchi et al., 2018) 77.45 BiLSTM-CRF for Persian Named-Entity Recognition -
SVM-HMM (Poostchi et al., 2016) 72.59 PersoNER: Persian Named-Entity Recognition -

PEYMA

The PEYMA dataset includes 7,145 sentences with 302,530 tokens from which 41,148 tokens are tagged in IOB format in with seven different classes, Organization, Percent, Money, Location, Date, Time, and Person.

Download Links: PEYMA

Model F1 Paper / Source Code
ParsBERT (Farahani et al., 2020) 93.40 ParsBERT: Transformer-based Model for Persian Language Understanding Official
mBERT (Taher et al., 2020) 90.59 Beheshti-NER: Persian Named Entity Recognition Using BERT Official
Rule-Based-CRF (Shahshahani et al., 2018) 84.00 PEYMA: A Tagged Corpus for Persian Named Entities -