Handling Label Sparsity and Inconsistency
in Named Entity Recognition
May 28, 2019 – August 9, 2019 (10-Weeks)
Named Entity Recognition (NER), the identification of names of people, organizations, locations, etc., in text, is often considered a close-to-solved problem. For clean newswire text in high resource languages with large amounts of training data, this claim may be accurate. But when training labels are sparse or inconsistent, NER performance drops dramatically. Sparse and inconsistent labels can arise in a variety of situations:
- A user wants to add one or more new entity types (e.g., vehicle or operating system), or split a known type into several finer-grained types (e.g., doctor or scientist in lieu of person)
- A user corrects an error made by the NER system, and wants that correction to be honored in subsequent system output.
- NER training data may have errors and/or differences in tagging conventions.
- Errors (e.g., optical character recognition or speech-to-text errors) in system input introduces character sequences that are poorly represented in the training data.
- System input that contains more than one language is poorly covered by NER models, which are customarily trained for a single language.
In Summer 2019, we investigated techniques that handle the sparse and inconsistent labels that arise in these contexts. Research questions to be addressed included:
- What are the features of sparse and inconsistent training sets that lead to the majority of errors made by existing approaches to NER?
- How can new entity types or refinements of old entity types be added with a minimum of new labels?
- Can multilingual training be used to reduce sparsity in a given language?
- How can partially annotated training sets be used most effectively?
- How should NER models adapt to corrected training data?
- How can neural NER models be effectively trained in the presence of label sparsity and inconsistency?
While NER over low resource languages is an important problem, this project will not focus on label sparsity for such languages; the primary languages covered by this project were be Russian, Chinese, and English. Performance was measured both with traditional intrinsic measures (e.g., precision and recall), and by an extrinsic retrieval task.