SCALE 2025

RAG for Request-Guided Summarization of Multilingual Sources 

June 2nd to August 8th 2025 

************************************************************************************************

With the advancement of generative models, users have started to move away from information access through a search service, such as Google, to more direct, interactive interfaces, such as ChatGPT. However, to ensure the coverage and integrity of generated responses, users still need access to the source documents used by those generative models to create the summary.  In this setting we would like an integrated system that can access multilingual documents and generate an informative, credible summary that is responsive to a user’s stated multi-faceted information needs.

 

Multilingual Information Retrieval (MLIR) models provide the ability to find information about a searcher’s information need across many languages; retrieval-augmented generation (RAG) systems allow clear and succinct summarization of such information. Combining MLIR with RAG will allow a user to obtain summaries of multilingual content in their own language that are tailored to their stated requirements.

 

A SCALE system will be given a report request comprising a user story and a problem statement. A user story explains the report requester’s background, problem setting, and report-writing philosophy, as well as a description of the audience for the report. The problem statement indicates the content that the report must contain. It may include background information that describes what is already known about the topic that need not appear in the report and constraints, such as a limit on the length of the report or a temporal window for sources. The system must then generate a summary of the source information tailored to the report request, with citations to supporting documents.

 

SCALE research will focus on ensuring that MLIR retrieval is useful for a generative system and how best to approach summary generation. In addition, the workshop will develop new approaches to evaluating the output of a RAG system using manual, semi-manual, and automatic techniques. Our research questions include:

  • What are the most effective retrieval models to feed a RAG system?
  • What is the best way to select retrieved documents for summarization?
  • How can a RAG system best take a detailed report request into account?
  • What is the most effective approach to directed summary generation?
  • How can generated reports best be evaluated? How might the evaluation be automated? What metrics should be used?

************************************************************************************************

For Additional Information or to Apply 

We invite interested researchers and students to apply to the SCALE 2025 program — a funded 10-week research workshop hosted at the Human Language Technology Center of Excellence (HLTCOE) at Johns Hopkins University. The workshop will be in-person in Baltimore, Maryland. 

Please contact us at [email protected]. Interested participants should send CVs along with a short message detailing their interest. 

For priority consideration, please apply by 15 December 2024. The latest we will consider applications is 15 April 2025, but we will make decisions on a rolling basis as applications are received; so, we encourage you to apply as soon as possible. 

Human Language Technology Center of Excellence