Given a report (from a newspaper, or information analyst) and a collection of material that contains supporting evidence, can one identify and align relevant source materials to this summary document? Keeping with the structure of SCALE 12, this project served as a high level end-task that decomposed into a number of interesting sub-projects, in this case relevant to DARPA’s DEFT program. At a coarse level, the task can be viewed as finding source documents that are relevant to a report, but the primary focus was on a finer grain task of linking predicates and their arguments, both within and across documents.
Research in the field of automatic speaker recognition has made great progress over the past few years, as demonstrated by impressive performance in NIST Speaker Recognition Evaluations. However, real applications can present difficulties not captured by current efforts. In this workshop, we planned to address the following challenges:
Limited training data
Recent speaker recognition techniques often assume a large set of in-domain labeled development data from which to estimate model parameters. The focus of this problem was to develop new methods of training speaker recognition systems where labeled data is limited.
Noisy labels
A common assumption feeding many speaker recognition modeling techniques is that the labels on training and development data are 100% accurate. This challenge was to understand and develop techniques using only noisy labels for speaker recognition systems.
Large-scale speaker clustering with side-information
For some applications, speaker clustering is more interesting than per cut speaker identification. This problem addressed research of algorithms to efficiently address speaker clustering as well as the related problems of cluster merging and splitting.