If I know of a few “interesting'” people, how can human language technology and graph theory help me find other interesting people? If I know of a few people committing a crime (e.g. fraud), how can I determine who their co-conspirators are?
If I can infer basic properties of an individual, does this help? Given a set of actors deemed interesting, we aim to find other actors who are similarly interesting. We are given a collection of informal communications (written and spoken) and a corresponding communications graph.
In this graph, each vertex represents either a communication handle, or a communication (e.g., email), and each edge connects between a handle and a communication that that handle participated in. Our goals are three-fold: (1) posit a set of actors that use one or more handles; (2) associate author attributes with actors, based on communication content; and (3) nominate an actor as interesting, based on other actors already labeled interesting.
For an illustrative example, consider a corporate email corpus that consists of communications between actors, a few of which are committing fraud.
Some of their fraudulent activity is captured in emails between them, along with many other innocuous emails (both between the fraudsters and between the other employees in the company).
Some accounts may be used by multiple actors, such as an administrative account used by multiple administrators. Some actors may use multiple accounts, such as an administrator that uses the administrative account as well as their individual email address. We are to assign basic properties to the actors based on their language use.
We are then given the identities of a few fraudster vertices and asked to nominate one other vertex in the graph as likely representing another actor committing fraud.