Given a report (from a newspaper, or information analyst) and a collection of material that contains supporting evidence, can one identify and align relevant source materials to this summary document? Keeping with the structure of SCALE 12, this project serves as a high level end-task that decomposes into a number of interesting sub-projects, in this case relevant to DARPA’s DEFT program. At a coarse level, the task can be viewed as finding source documents that are relevant to a report, but the primary focus will be on a finer grain task of linking predicates and their arguments, both within and across documents.
Research in the field of automatic speaker recognition has made great progress over the past few years, as demonstrated by impressive performance in NIST Speaker Recognition Evaluations. However, real applications can present difficulties not captured by current efforts. In this workshop, we plan to address the following challenges:
Limited training data
Recent speaker recognition techniques often assume a large set of in-domain labeled development data from which to estimate model parameters. The focus of this problem is to develop new methods of training speaker recognition systems where labeled data is limited.
A common assumption feeding many speaker recognition modeling techniques is that the labels on training and development data are 100% accurate. This challenge is to understand and develop techniques using only noisy labels for speaker recognition systems.
Large-scale speaker clustering with side-information
For some applications, speaker clustering is more interesting than per cut speaker identification. This problem addresses research of algorithms to efficiently address speaker clustering as well as the related problems of cluster merging and splitting.