SCALE 2010

All-Source Knowledge Base Population

While traditional extraction research has focused on lexically‐anchored facts, typically within a sentence, SCALE 2010 explored research opportunities in inferring facts from non‐explicit or latent features spanning multiple utterances, documents, or conversations. Examples include inferring the relationship between dialogue participants based on an analysis of register, discourse structure, utterance length, emotion, prosody, etc.; and identifying the sentiment of speakers or authors towards specific entities or topics. Knowledge‐Base Population (KBP) is related to other areas of HLT such as extraction and question‐answering, but is focused on the insertion of information into a knowledge base.

Much of the prior HLT work has focused primarily on vast quantities of English newswire. SCALE 2010 focused on different sources, especially informal communications, spoken as well as written. The team also explored what could be done with multiple sources, an additional language, and limited training data. To address these challenges, SCALE 2010 focused on all‐source knowledge base population. Research directions included:

  • Multiple‐sources: including conversational speech and informal text genres
  • Languages: English and Arabic, individually and in combination
  • Linking entities into a knowledge base
  • Slot‐filling
  • Inference of higher‐order knowledge units such as relations and sentiment
  • Maintaining knowledge base viability when augmenting it over time

Johns Hopkins University

Human Language Technology Center of Excellence

810 Wyman Park Drive, Baltimore, MD 21218

  • 410-516-4800

Human Language Technology Center of Excellence