SCALE 2022

Authorship Identification

June 6 – August 4, 2022, Baltimore, Maryland


The SCALE 22 workshop will consider the problem of authorship identification (AID), which deals with predicting whether two documents were likely to have been composed by the same author. More generally, given a collection of documents composed by the same author and a further collection composed by another author, the objective is to predict whether the two authors are the same.

The problem has potential applications in a number of areas, such as the following:

  • Plagiarism detection. AID provides a way to identify that different passages of a supposedly single-author document might have been composed by different authors. It may further be employed to explicitly identify all the contributors to a document.
  • Social media moderation. Anonymous social media platforms provide an important platform for free speech, but are vulnerable to abuse by permitting misinformation, hate speech, radicalization and content inciting violence. Banning the accounts of users posting such content is the usual recourse. AID provides a mechanism for moderators to identify users attempting to circumvent account bans, possibly in a fully automated fashion.
  • Detecting social media account takeovers. AID can be used to identify accounts that have been stolen for the purposes of phishing, ransom, or misinformation by indicating that a change in authorship has occurred.

Recently, data-driven methods that learn representations of authorship from scratch have shown initial promise [1, 2, 3]. However, there remain a number of open research questions, including: (a) how to characterize uncertainty, such as producing well-calibrated confidences along with system decisions, (b) how to justify and explain system decisions, (c) how to account for multiple authorship in a stream of documents, and (d) how to effectively incorporate non-textual signal, such as communication graphs, into the learned representations.

We invite interested researchers and students to apply to the SCALE 2022 program, which is a funded 10-week research workshop. The workshop will be in-person.


  1. Learning Universal Authorship Representations. EMNLP (2021)
    Rafael Rivera-Soto, Olivia Miano, Juanita Ordonez, Barry Chen, Aleem Khan, Marcus Bishop and Nicholas Andrews
  2. A Deep Metric Learning Approach to Account Linking. NAACL (2021)
    Aleem Khan, Elizabeth Fleming, Noah Schofield, Marcus Bishop, Nicholas Andrews
  3. Learning Invariant Representations of Social Media Users. EMNLP (2019)
    Nicholas Andrews and Marcus Bishop


For Additional Information or to Apply*:  

Contact us at [email protected]

*Interested participants should send CVs along with a short message detailing their interest.

The latest we will consider applications is May 1st, but we will make decisions on a rolling basis as applications are received; so we encourage you to apply as soon as possible.



Human Language Technology Center of Excellence