The team’s approach was to detect and provide semantic structure to High-Information-Value Elements (HIVEs) in foreign language to inform the machine translation (MT) process and produce better translations.
When attempting to extract information from a speech corpus, it is desirable to employ speech processing tools, such as speech recognizers, that will facilitate the task. In many situations such tools may not be available because of the lack of transcribed speech for training the recognizers. The workshop focused on this problem by furthering the technology for automatically training speech recognizers without the use of manual transcriptions. Previous work had demonstrated that automatically trained speech recognizers without transcriptions can be useful for information extraction from speech. This previous work, while establishing the feasibility of the training of speech recognizers without transcriptions still required much work to improve performance and robustness. In addition, the team wanted to learn how to more effectively use small amounts (less than an hour) of manually transcribed data as it becomes available. The approach to training this recognizer followed a sequential, hierarchical approach to learning, where at each stage of the process they extracted information from the corpus to make information available for the next stage.