Researcher

Ben Van Durme

SENIOR RESEARCH SCIENTIST

ASSOCIATE PROFESSOR

Primary Appointment: Department of Computer Science

Secondary Appointment: Department of Cognitive Science

Research Interests

  • Artificial Intelligence
  • Natural Language Processing (Computational Semantics)
  • Streaming Algorithms

My research is in the area of knowledge acquisition, covering: NLP (extracting structure from text), data mining (efficient algorithms and data structures for working with large collections), and linguistic semantics (understanding how people convey knowledge).

Some of my collaborators at JHU include: Kevin Duh, David Yarowsky, Jason Eisner, Matthew Post, James Mayfield, Paul McNamee, Max Thomas, Tom Lippincott and Mark Dredze. Various artifacts from our work can be found at the HLTCOE Github page.

Along with Chris Callison-Burch at UPenn, and his students Ellie Pavlick and Juri Ganitkevitch, we are pursuing methods for paraphrasing natural language text. This has led to the largest collection of paraphrases in the world, spanning multiple languages, found at: paraphrase.org.

Teamed with Kyle Rawlins, a linguist in the Cognitive Science Department at JHU, together with our students and post-docs we are pursuing a multi-lingual, decompositional semantics. Kyle and I often co-instruct graduate seminars that mix readings between theoretical and computational semantics/pragmatics. In a related vein I’ve begun working with Kevin Duh on neural models for textual inference.

Ashwin Lall, Miles Osborne and I have explored algorithms for handling large quantities of (streaming) data. This helped lead to efforts with Aren Jansen (now at Google) to create exceptionally scalable tools for searching raw speech data. Lately I’ve been working with Vladimir Braverman, of JHU CS, on related streaming and sampling techniques, along with students. Much of the earlier efforts are bundled into the Jerboa package.

My thesis was joint in Computer Science and Linguistics, titled: Extracting Implicit Knowledge from Text. The committee consisted of Len Schubert and Dan Gildea of Rochester Computer Science, Greg Carlson of RochesterLinguistics, and William Cohen of the Machine Learning Department at CMU. This work fell under the KNEXTproject, aimed at extracting commonsense knowledge from text. I spent two Summers as an intern at Google Research, working with Marius Pasca on knowledge extraction from search engine query logs. At Rochester I interacted heavily with the Human Language Procesing Lab (HLP) led by T. Florian Jaeger (I’m interested in how data-driven psycholingustic models, and more generally the notion of bounded rationality can inform AI). Prior to Rochester I was a student in the Language Technologies Institute at CMU, working with Eric Nyberg and colleagues on Question Answering (which eventually helped lead to IBM’s Watson). Before that I worked as a research engineer in the AI Division of the Advanced Technology Laboratory of Lockheed Martin.

Human Language Technology Center of Excellence