SCALE 2011

Vertex Nomination

If I know of a few “interesting'” people, how can human language technology and graph theory help me find other “interesting” people? If I know of a few people committing a crime (e.g. fraud), how can I determine who their co-conspirators are?

Given a set of actors deemed “interesting”, we aim to find other actors who are similarly “interesting”. We are given a collection of informal communications (written and spoken) and a corresponding communications graph. In this graph, each vertex represents an actor and each edge connects a pair of actors that communicate. Attached to each edge is the set of documents where that pair of actors communicate, providing content in context (i.e. the language of a communication in the context of who speaks to whom). In this set of documents, our identified “interesting” set communicates with each other and with other actors, whose “interestingness” is unknown. Our objective is to nominate one vertex from all candidate vertices (those with unknown “interestingness”), which is most likely “interesting”.

For an illustrative example, the email corpus of a hypothetical corporation consists of communications between actors, a few of which are committing fraud. Some of their fraudulent activity is captured in emails between them, along with many other innocuous emails (both between the fraudsters and between the other employees in the company). We are given the identities of a few fraudster vertices and asked to nominate one other vertex in the graph as likely representing another actor committing fraud.

Human Language Technology Center of Excellence