Publications

Additional (non-HLTCOE) publications may be found on researchers' personal websites.


Loading...

2013 (57 total)

Findings of the 2013 Workshop on Statistical Machine Translation
Ondrej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{bojar-EtAl:2013:WMT, author = {Ondrej Bojar and Christian Buck and Callison-Burch, Chris and Christian Federmann and Barry Haddow and Philipp Koehn and Christof Monz and Post, Matt and Radu Soricut and Lucia Specia}, title = {Findings of the 2013 Workshop on Statistical Machine Translation}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {1--44}, url = {http://www.aclweb.org/anthology/W13-2201} }

Joshua 5.0: Sparser, Better, Faster, Server
Matt Post, Juri Ganitkevitch, Luke Orland, Jonathan Weese, Yuan Cao and Chris Callison-Burch
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{post-EtAl:2013:WMT, author = {Post, Matt and Juri Ganitkevitch and Orland, Luke and Jonathan Weese and Yuan Cao and Callison-Burch, Chris}, title = {Joshua 5.0: Sparser, Better, Faster, Server}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {206--212}, url = {http://www.aclweb.org/anthology/W13-2226} }

Combining Bilingual and Comparable Corpora for Low Resource Machine Translation
Ann Irvine and Chris Callison-Burch
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{irvine-callisonburch:2013:WMT, author = {Irvine, Ann and Callison-Burch, Chris}, title = {Combining Bilingual and Comparable Corpora for Low Resource Machine Translation}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {262--270}, url = {http://www.aclweb.org/anthology/W13-2233} }

Next Generation Storage for the HLTCOE
Scott Roberts
Technical Report 9, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.
@techreport{roberts_tech:2013, author = {Scott Roberts}, title = {Next Generation Storage for the HLTCOE}, number = {9}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, month = {April}, year = {2013}, abstract = {The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.} }

Improved Speech-to-Text Translation with the Fisher and Callhome Spanish--English Speech Translation Corpus
Matt Post, Gaurav Kumar, Adam Lopez, Damianos Karakos, Chris Callison-Burch and Sanjeev Khudanpur
Proceedings of the International Workshop on Spoken Language Translation (IWSLT) – 2013

[pdf] | [bib]

@inproceedings{post2013improved, author = {Post, Matt and Gaurav Kumar and Lopez, Adam and Karakos, Damianos and Callison-Burch, Chris and Khudanpur, Sanjeev}, title = {Improved Speech-to-Text Translation with the Fisher and Callhome Spanish--English Speech Translation Corpus}, booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)}, month = {December}, year = {2013}, address = {Heidelberg, Germany} }

Beyond Bitext: Five open problems in machine translation
Adam Lopez and Matt Post
Twenty Years of Bitext – 2013

[pdf] | [bib]

@inproceedings{lopez2013beyond, author = {Lopez, Adam and Post, Matt}, title = {Beyond Bitext: Five open problems in machine translation}, booktitle = {Twenty Years of Bitext}, month = {October}, year = {2013}, address = {Seattle, Washington, USA} }

Answer Extraction as Sequence Tagging with Tree Edit Distance
Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark and Callison-Burch, Chris}, title = {Answer Extraction as Sequence Tagging with Tree Edit Distance}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1106.pdf} }

Generating Expressions that Refer to Visible Objects
Margaret Mitchell, Kees van Deemter and Ehud Reiter
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Kees van Deemter and Ehud Reiter}, title = {Generating Expressions that Refer to Visible Objects}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.m-mitchell.com/papers/MitchellEtAl-13-VisObjects.pdf} }

Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He, Jimmy Lin and Adam Lopez
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Hua He and Jimmy Lin and Lopez, Adam}, title = {Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1033.pdf} }

Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Ann Irvine and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Irvine, Ann and Callison-Burch, Chris}, title = {Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology//N/N13/N13-1056.pdf} }

PPDB: The Paraphrase Database
Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Juri Ganitkevitch and Van Durme, Benjamin and Callison-Burch, Chris}, title = {PPDB: The Paraphrase Database}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1092.pdf} }

Improving the Quality of Minority Class Identification in Dialog Act Tagging
Adinoyi Omuya, Vinodkumar Prabhakaran and Owen Rambow
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Adinoyi Omuya and Vinodkumar Prabhakaran and Owen Rambow}, title = {Improving the Quality of Minority Class Identification in Dialog Act Tagging}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.newdesign.aclweb.org/anthology-new/N/N13/N13-1099.pdf} }

Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter
Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson and David Yarowsky
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{bergsma:2013, author = {Bergsma, Shane and Dredze, Mark and Van Durme, Benjamin and Wilson, Theresa and Yarowsky, David}, title = {Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~vandurme/papers/broadly-improving-user-classfication-via-communication-based-name-and-location-clustering-on-twitter.pdf} }

What's in a Domain? Multi-Domain Learning for Multi-Attribute Data
Mahesh Joshi, Mark Dredze, William Cohen and Carolyn P. Rose
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{joshi:2013, author = {Mahesh Joshi and Dredze, Mark and William Cohen and Carolyn P. Rose}, title = {What's in a Domain? Multi-Domain Learning for Multi-Attribute Data}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1080.pdf} }

Separating Fact from Fear: Tracking Flu Infections on Twitter
Alex Lamb, Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{lamb:2013, author = {Alex Lamb and Michael Paul and Dredze, Mark}, title = {Separating Fact from Fear: Tracking Flu Infections on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~mdredze/publications/naacl_2013_flu.pdf} }

Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{Paul:2013, author = {Michael Paul and Dredze, Mark}, title = {Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~mdredze/publications/naacl_2013_drugs.pdf} }

Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation
Damianos Karakos, Mark Dredze and Sanjeev Khudanpur
Technical Report 8, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language modelís parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.
@techreport{karakos_tech:2013, author = {Karakos, Damianos and Dredze, Mark and Khudanpur, Sanjeev}, title = {Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation}, number = {8}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language modelís parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.} }

Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds
Michael A. Carlin and Mounya Elhilali
PLoS Computational Biology – 2013

[bib]

@article{Carlin2013, author = {Carlin, Michael and Mounya Elhilali}, title = {Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds}, year = {2013}, url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002982} }

Nonconvex Global Optimization for Latent Variable Models
Matthew R. Gormley and Jason Eisner
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{gormley-eisner:2013, author = {Matthew R. Gormley and Eisner, Jason}, title = {Nonconvex Global Optimization for Latent Variable Models}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-1044.pdf} }

Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT
Adam Lopez, Matt Post, Chris Callison-Burch, Jonathan Weese, Juri Ganitkevitch, Narges Ahmidi, Olivia Buzek, Leah Hanson, Beenish Jamil, Matthias Lee, Ya-Ting Lin, Henry Pao, Fatima Rivera, Leili Shahriyari, Debu Sinha, Adam Teichert, Stephen Wampler, Michael Weinberger, Daguang Xu, Lin Yang and Shang Zhao
Transactions of the Association for Computational Linguistics – 2013

[bib]

@article{Lopez+etal:2013:tacl:mt-class, author = {Lopez, Adam and Post, Matt and Callison-Burch, Chris and Jonathan Weese and Juri Ganitkevitch and Narges Ahmidi and Buzek, Olivia and Leah Hanson and Beenish Jamil and Matthias Lee and Ya-Ting Lin and Henry Pao and Fatima Rivera and Leili Shahriyari and Debu Sinha and Adam Teichert and Stephen Wampler and Michael Weinberger and Daguang Xu and Lin Yang and Shang Zhao}, title = {Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT}, year = {2013}, url = {https://aclweb.org/anthology/Q/Q13/Q13-1014.pdf} }

Dirt Cheap Web-Scale Parallel Text from the Common Crawl
Jason R Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch and Adam Lopez
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Smith+etal:2013:acl, author = {Smith, Jason and Herve Saint-Amand and Magdalena Plamada and Philipp Koehn and Callison-Burch, Chris and Lopez, Adam}, title = {Dirt Cheap Web-Scale Parallel Text from the Common Crawl}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology/P/P13/P13-1135.pdf} }

KELVIN: a tool for automated knowledge base construction
Paul McNamee, James Mayfield, Tim Finin, Tim Oates, Dawn Lawrie, Tan Xu and Douglas W. Oard
North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT) – 2013

[bib]

@{, author = {McNamee, Paul and Mayfield, James and Finin, Tim and Oates, Tim and Lawrie, Dawn and Tan Xu and Douglas W. Oard}, title = {KELVIN: a tool for automated knowledge base construction}, booktitle = {North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT)}, year = {2013}, url = {http://aclweb.org/anthology//N/N13/N13-3008.pdf} }

Using Conceptual Class Attributes to Characterize Social Media Users
Shane Bergsma and Benjamin Van Durme
Association for Computational Linguistics (ACL) – 2013

[bib]

@{, author = {Bergsma, Shane and Van Durme, Benjamin}, title = {Using Conceptual Class Attributes to Characterize Social Media Users}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-1070.pdf} }

SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat, Hal Daume III, Katharine Henry, Ann Irvine, Jagadeesh Jagarlamudi and Rachel Rudinger
Association for Computational Linguistics (ACL) – 2013

[bib]

@{, author = {Marine Carpuat and Hal Daume III and Katharine Henry and Irvine, Ann and Jagadeesh Jagarlamudi and Rachel Rudinger}, title = {SenseSpotting: Never let your parallel data tie you to an old domain}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology/P/P13/P13-1141.pdf} }

Lightly Supervised Learning of Procedural Dialog Systems
Svitlana Volkova, Pallavi Choudhury, Chris Quirk, Bill Dolan and Luke Zettlemoyer
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Volkova_2013_ACL, author = {Volkova, Svitlana and Pallavi Choudhury and Chris Quirk and Bill Dolan and Luke Zettlemoyer}, title = {Lightly Supervised Learning of Procedural Dialog Systems}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-1164.pdf} }

Learning to Relate Literal and Sentimental Descriptions of Visual Properties
Mark Yatskar, Svitlana Volkova, Alsi Celikyilmaz, Bill Dolan and Luke Zettlemoyer
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{Volkova_2013_NAACL, author = {Mark Yatskar and Volkova, Svitlana and Alsi Celikyilmaz and Bill Dolan and Luke Zettlemoyer}, title = {Learning to Relate Literal and Sentimental Descriptions of Visual Properties}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1043.pdf} }

Supervector Bayesian Speaker Comparison
Bengt J. Borgstro ̈m and Alan McCree
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@{, author = {Bengt J. Borgstro ̈m and McCree, Alan}, title = {Supervector Bayesian Speaker Comparison}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

Discriminatively Trained Bayesian Speaker Comparison of I-Vectors
Bengt J. Borgstro ̈m and Alan McCree
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@{, author = {Bengt J. Borgstro ̈m and McCree, Alan}, title = {Discriminatively Trained Bayesian Speaker Comparison of I-Vectors}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems
Lushan Han, Abhay L. Kashyap, Tim Finin, James Mayfield and Jonathan Weese
Joint Conference on Lexical and Computational Semantics (*SEM) – 2013

[abstract] [pdf] | [bib]

Abstract

We describe three semantic text similarity systems developed for the *SEM 2013 STS shared task and the results of the corresponding three runs. All of them used a word similarity feature that combined LSA word similarity and WordNet knowledge. The first run, which achieved the top mean score on the task of all the submissions, used a simple term alignment algorithm. The other two runs, ranked second and fourth, used SVM models to combine a larger sets of features.
@{, author = {Lushan Han and Abhay L. Kashyap and Finin, Tim and Mayfield, James and Jonathan Weese}, title = {UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems}, booktitle = {Joint Conference on Lexical and Computational Semantics (*SEM)}, year = {2013}, abstract = {We describe three semantic text similarity systems developed for the *SEM 2013 STS shared task and the results of the corresponding three runs. All of them used a word similarity feature that combined LSA word similarity and WordNet knowledge. The first run, which achieved the top mean score on the task of all the submissions, used a simple term alignment algorithm. The other two runs, ranked second and fourth, used SVM models to combine a larger sets of features.} }

Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition
Carolina Parada, Mark Dredze, Abhinav Sethy and Ariya Rastrow
Technical Report 10, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.
@techreport{, author = {Carolina Parada and Dredze, Mark and Abhinav Sethy and Ariya Rastrow}, title = {Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition}, number = {10}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.} }

PARMA: A Predicate Argument Aligner
Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu and Xuchen Yao
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Wolfe:2013lr, author = {Wolfe, Travis and Van Durme, Benjamin and Dredze, Mark and Andrews, Nicholas and Charley Beller and Callison-Burch, Chris and Jay DeYoung and Justin Snyder and Jonathan Weese and Tan Xu and Xuchen Yao}, title = {PARMA: A Predicate Argument Aligner}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-2012.pdf} }

Explicit and Implicit Syntactic Features for Text Classification
Matt Post and Shane Bergsma
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{, author = {Post, Matt and Bergsma, Shane}, title = {Explicit and Implicit Syntactic Features for Text Classification}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-2150.pdf} }

Frequency Offset Correction in Speech without Detecting Pitch
Pascal Clark, Sri Harish Mallidi, Aren Jansen and Hynek Hermansky
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@inproceedings{, author = {Clark, Pascal and Sri Harish Mallidi and Jansen, Aren and Hermansky, Hynek}, title = {Frequency Offset Correction in Speech without Detecting Pitch}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

Complementary envelope estimation for frequency-modulated random signals
Pascal Clark, Ivars Kirsteins and Les Atlas
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@inproceedings{, author = {Clark, Pascal and Ivars Kirsteins and Les Atlas}, title = {Complementary envelope estimation for frequency-modulated random signals}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

A Lightweight and High Performance Monolingual Word Aligner
Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{yao-EtAl:2013:ACL, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark and Callison-Burch, Chris}, title = {A Lightweight and High Performance Monolingual Word Aligner}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, url = {http://aclweb.org/anthology//P/P13/P13-2123.pdf} }

Arabic Dialect Identification
Omar F Zaidan and Chris Callison-Burch
Computational Linguistics – 2013

[bib]

@article{zaidan-callisonburch:CL:2013, author = {Omar F Zaidan and Callison-Burch, Chris}, title = {Arabic Dialect Identification}, year = {2013}, url = {https://www.cs.jhu.edu/~ccb/publications/arabic-dialect-id.pdf} }

Automatic Coupling of Answer Extraction and Information Retrieval
Xuchen Yao, Benjamin Van Durme and Peter Clark
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{yao1-EtAl:2013:ACL, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark}, title = {Automatic Coupling of Answer Extraction and Information Retrieval}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, url = {http://aclweb.org/anthology//P/P13/P13-2029.pdf} }

A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition
Balaji Vasan Srinivasa, Yuancheng Luo, Daniel Garcia-Romero, Dmitry N. Zotkin and Ramani Duraiswami
IEEE Transactions on Audio, Speech, and Language Processing – 2013

[bib]

@article{6480796, author = {Balaji Vasan Srinivasa and Yuancheng Luo and Garcia-Romero, Daniel and Dmitry N. Zotkin and Ramani Duraiswami}, title = {A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition}, year = {2013}, pages = {1415-1423}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6480796} }

Subspace-constrained Supervector PLDA for Speaker Verification
Daniel Garcia-Romero and Alan McCree
International Speech Communication Association (INTERSPEECH) – 2013

[bib]

@inproceedings{dgr-IS13, author = {Garcia-Romero, Daniel and McCree, Alan}, title = {Subspace-constrained Supervector PLDA for Speaker Verification}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2013} }

Nerit: Named Entity Recognition for Informal Text
David Etter, Francis Ferraro, Ryan Cotterell, Olivia Buzek and Benjamin Van Durme
Technical Report 11, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

We describe a multilingual named entity recognition system using language inde- pendent feature templates, designed for processing short, informal media arising from Twitter and other microblogging ser- vices. We crowdsource the annotation of tens of thousands of English and Spanish tweets and present classification results on this resource.
@techreport{, author = {David Etter and Francis Ferraro and Ryan Cotterell and Buzek, Olivia and Van Durme, Benjamin}, title = {Nerit: Named Entity Recognition for Informal Text}, number = {11}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {We describe a multilingual named entity recognition system using language inde- pendent feature templates, designed for processing short, informal media arising from Twitter and other microblogging ser- vices. We crowdsource the annotation of tens of thousands of English and Spanish tweets and present classification results on this resource.} }

Open Domain Targeted Sentiment
Margaret Mitchell, Jacqui Aguilar, Theresa Wilson and Benjamin Van Durme
Empirical Methods in Natural Language Processing (EMNLP) – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Jacqui Aguilar and Wilson, Theresa and Van Durme, Benjamin}, title = {Open Domain Targeted Sentiment}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2013}, url = {http://aclweb.org/anthology/D/D13/D13-1171.pdf} }

Typicality and Object Reference
Margaret Mitchell, Ehud Reiter and Kees van Deemter
Annual Meeting of the Cognitive Science Society (CogSci) – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Ehud Reiter and Kees van Deemter}, title = {Typicality and Object Reference}, booktitle = {Annual Meeting of the Cognitive Science Society (CogSci)}, year = {2013}, url = {http://mindmodeling.org/cogsci2013/papers/0547/paper0547.pdf} }

Attributes in Visual Object Reference
Margaret Mitchell, Kees van Deemter and Ehud Reiter
PRE-CogSci – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Kees van Deemter and Ehud Reiter}, title = {Attributes in Visual Object Reference}, booktitle = {PRE-CogSci}, year = {2013}, url = {http://pre2013.uvt.nl/pdf/mitchell-reiter-vandeemter.pdf} }

Graphs and Spatial Relations in the Generation of Referring Expressions
Jette Viethen, Margaret Mitchell and Emiel Krahmer
European Workshop on Natural Language Generation (ENLG) – 2013

[bib]

@inproceedings{, author = {Jette Viethen and Mitchell, Margaret and Emiel Krahmer}, title = {Graphs and Spatial Relations in the Generation of Referring Expressions}, booktitle = {European Workshop on Natural Language Generation (ENLG)}, year = {2013}, url = {http://bridging.uvt.nl/pdf/viethen_mitchell_krahmer_enlg_2013.pdf} }

Semi-Markov Phrase-based Monolingual Alignment
Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch and Peter Clark
Empirical Methods in Natural Language Processing (EMNLP) – 2013

[bib]

@inproceedings{yao-EtAl:2013:EMNLP, author = {Xuchen Yao and Van Durme, Benjamin and Callison-Burch, Chris and Peter Clark}, title = {Semi-Markov Phrase-based Monolingual Alignment}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2013}, address = {Seattle, Washington}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/semi-markov-phrase-based-monolingual-alignment.pdf} }

Intrinsic Spectral Analysis
Aren Jansen and Partha Niyogi
Signal Processing, IEEE Transactions on – 2013

[bib]

@article{jan_tsp13, author = {Jansen, Aren and Partha Niyogi}, title = {Intrinsic Spectral Analysis}, year = {2013}, publisher = {IEEE}, pages = {1698--1710}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06409472} }

Fixed-Dimensional Acoustic Embeddings of Variable-Length Segments in Low-Resource Settings
Keith Levin, Katharine Henry, Aren Jansen and Karen Livescu
ASRU – 2013

[bib]

@inproceedings{jan_asru13, author = {Levin, Keith and Katharine Henry and Jansen, Aren and Karen Livescu}, title = {Fixed-Dimensional Acoustic Embeddings of Variable-Length Segments in Low-Resource Settings}, booktitle = {ASRU}, year = {2013} }

Text-to-Speech Inspired Duration Modeling for Improved Whole-Word Acoustic Models
Keith Kintzley, Aren Jansen and Hynek Hermansky
Proceedings of Interspeech – 2013

[bib]

@inproceedings{jan_is13a, author = {Keith Kintzley and Jansen, Aren and Hermansky, Hynek}, title = {Text-to-Speech Inspired Duration Modeling for Improved Whole-Word Acoustic Models}, booktitle = {Proceedings of Interspeech}, year = {2013}, url = {http://www.isca-speech.org/archive/interspeech_2013/i13_1253.html} }

Semi-Supervised Manifold Learning Approaches for Spoken Term Verification
Atta Norouzian, Rick Rose and Aren Jansen
Interspeech – 2013

[bib]

@inproceedings{jan_is13b, author = {Atta Norouzian and Rose, Rick and Jansen, Aren}, title = {Semi-Supervised Manifold Learning Approaches for Spoken Term Verification}, booktitle = {Interspeech}, year = {2013}, url = {http://www.isca-speech.org/archive/interspeech_2013/i13_2594.html} }

Evaluating Speech Features with the Minimal-Pair ABX Task: Analysis of the Classical MFC/PLP Pipeline
Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky and Emmanuel Dupoux
Proceedings of Interspeech – 2013

[bib]

@inproceedings{jan_is13c, author = {Thomas Schatz and Vijayaditya Peddinti and Francis Bach and Jansen, Aren and Hermansky, Hynek and Emmanuel Dupoux}, title = {Evaluating Speech Features with the Minimal-Pair ABX Task: Analysis of the Classical MFC/PLP Pipeline}, booktitle = {Proceedings of Interspeech}, year = {2013}, url = {http://www.isca-speech.org/archive/interspeech_2013/i13_1781.html} }

Weak Top-Down Constraints for Unsupervised Acoustic Model Training
Aren Jansen, Samuel Thomas and Hynek Hermansky
Proceedings of ICASSP – 2013

[bib]

@inproceedings{jan_icassp13a, author = {Jansen, Aren and Samuel Thomas and Hermansky, Hynek}, title = {Weak Top-Down Constraints for Unsupervised Acoustic Model Training}, booktitle = {Proceedings of ICASSP}, year = {2013}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639241} }

A Summary of the 2012 CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard Rose, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Borschinger, Justin Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz and Samuel Thomas
Proceedings of ICASSP – 2013

[bib]

@inproceedings{jan_icassp13b, author = {Jansen, Aren and Emmanuel Dupoux and Sharon Goldwater and Mark Johnson and Khudanpur, Sanjeev and Kenneth Church and Naomi Feldman and Hermansky, Hynek and Florian Metze and Richard Rose and Mike Seltzer and Clark, Pascal and Ian McGraw and Balakrishnan Varadarajan and Erin Bennett and Benjamin Borschinger and Justin Chiu and Ewan Dunbar and Abdellah Fourtassi and David Harwath and Chia-ying Lee and Levin, Keith and Atta Norouzian and Vijayaditya Peddinti and Rachael Richardson and Thomas Schatz and Samuel Thomas}, title = {A Summary of the 2012 CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition}, booktitle = {Proceedings of ICASSP}, year = {2013}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639245} }

Zero Resource Graph-Based Confidence Estimation for Open Vocabulary Spoken Term Detection
Atta Norouzian, Richard Rose, Sina Ghalehjegh and Aren Jansen
Proceedings of ICASSP – 2013

[bib]

@inproceedings{jan_icassp13d, author = {Atta Norouzian and Richard Rose and Sina Ghalehjegh and Jansen, Aren}, title = {Zero Resource Graph-Based Confidence Estimation for Open Vocabulary Spoken Term Detection}, booktitle = {Proceedings of ICASSP}, year = {2013}, url = {http://www.ece.mcgill.ca/~rrose1/papers/AttaRoseZeroResource_ICASSP13.pdf} }

Measuring Machine Translation Errors in New Domains
Ann Irvine, John Morgan, Marine Carpuat, Hal Daume III and Dragos Munteanu
Transactions of the Association for Computational Linguistics (TACL) – 2013

[bib]

@article{IrvineEtAlDAMTErrors, author = {Irvine, Ann and John Morgan and Marine Carpuat and Hal Daume III and Dragos Munteanu}, title = {Measuring Machine Translation Errors in New Domains}, year = {2013}, url = {https://aclweb.org/anthology/Q/Q13/Q13-1035.pdf} }

Monolingual Marginal Matching for Translation Model Adaptation
Ann Irvine, Chris Quirk and Hal Daume III
Empirical Methods in Natural Language Processing (EMNLP) – 2013

[bib]

@inproceedings{irvineQuirkDaumeEMNLP13, author = {Irvine, Ann and Chris Quirk and Hal Daume III}, title = {Monolingual Marginal Matching for Translation Model Adaptation}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2013}, url = {http://aclweb.org/anthology//D/D13/D13-1109.pdf} }

Bayesian Tree Substitution Grammars as a Usage-Based Approach
Matt Post and Daniel Gildea
Language and Speech – 2013

[bib]

@article{post2013bayesian, author = {Post, Matt and Daniel Gildea}, title = {Bayesian Tree Substitution Grammars as a Usage-Based Approach}, year = {2013}, pages = {291--308}, url = {http://las.sagepub.com/content/56/3/291.abstract} }

Back to Top

2012 (80 total)

Detecting Power Relations from Written Dialog
Vinodkumar Prabhakaran
Proceedings of ACL 2012 Student Research Workshop – 2012

[bib]

@inproceedings{prabhakaran:2012:SRW, author = {Vinodkumar Prabhakaran}, title = {Detecting Power Relations from Written Dialog}, booktitle = {Proceedings of ACL 2012 Student Research Workshop}, month = {July}, year = {2012}, address = {Jeju Island, Korea}, publisher = {Association for Computational Linguistics}, pages = {7--12}, url = {http://www.aclweb.org/anthology/W12-3302} }

Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing
Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow and Benjamin Van Durme
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics – 2012

[bib]

@inproceedings{prabhakaran-EtAl:2012:ExProM, author = {Vinodkumar Prabhakaran and Bloodgood, Michael and Mona Diab and Dorr, Bonnie and Lori Levin and Christine D. Piatko and Owen Rambow and Van Durme, Benjamin}, title = {Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing}, booktitle = {Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics}, month = {July}, year = {2012}, address = {Jeju, Republic of Korea}, publisher = {Association for Computational Linguistics}, pages = {57--64}, url = {http://www.aclweb.org/anthology/W12-3807} }

A Context-Aware Approach to Entity Linking
Veselin Stoyanov, James Xu, Douglas Oard, Dawn Lawrie, Tim Oates and Tim Finin
Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) – 2012

[bib]

@inproceedings{stoyanov-EtAl:2012:AKBC-WEKEX, author = {Stoyanov, Veselin and James Xu and Douglas Oard and Lawrie, Dawn and Oates, Tim and Finin, Tim}, title = {A Context-Aware Approach to Entity Linking}, booktitle = {Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {62--67}, url = {http://www.aclweb.org/anthology/W12-3012} }

Evaluating the Quality of a Knowledge Base Populated from Text
James Mayfield and Tim Finin
Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) – 2012

[bib]

@inproceedings{mayfield-finin:2012:AKBC-WEKEX, author = {Mayfield, James and Finin, Tim}, title = {Evaluating the Quality of a Knowledge Base Populated from Text}, booktitle = {Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {68--73}, url = {http://www.aclweb.org/anthology/W12-3013} }

Findings of the 2012 Workshop on Statistical Machine Translation
Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{callisonburch-EtAl:2012:WMT, author = {Callison-Burch, Chris and Philipp Koehn and Christof Monz and Post, Matt and Radu Soricut and Lucia Specia}, title = {Findings of the 2012 Workshop on Statistical Machine Translation}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {10--51}, url = {http://cs.jhu.edu/~ccb/publications/findings-of-the-wmt12-shared-tasks.pdf} }

Using Categorial Grammar to Label Translation Rules
Jonathan Weese, Chris Callison-Burch and Adam Lopez
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{weese-callisonburch-lopez:2012:WMT, author = {Jonathan Weese and Callison-Burch, Chris and Lopez, Adam}, title = {Using Categorial Grammar to Label Translation Rules}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {222--231}, url = {http://aclweb.org/anthology//W/W12/W12-3127.pdf} }

Joshua 4.0: Packing, PRO, and Paraphrases
Juri Ganitkevitch, Yuan Cao, Jonathan Weese, Matt Post and Chris Callison-Burch
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{ganitkevitch-EtAl:2012:WMT, author = {Juri Ganitkevitch and Yuan Cao and Jonathan Weese and Post, Matt and Callison-Burch, Chris}, title = {Joshua 4.0: Packing, PRO, and Paraphrases}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {283--291}, url = {http://cs.jhu.edu/~ccb/publications/joshua-4.0.pdf} }

Monolingual Distributional Similarity for Text-to-Text Generation
Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch
*SEM First Joint Conference on Lexical and Computational Semantics – 2012

[bib]

@inproceedings{Ganitkevitch-etal:2012:StarSEM, author = {Juri Ganitkevitch and Van Durme, Benjamin and Callison-Burch, Chris}, title = {Monolingual Distributional Similarity for Text-to-Text Generation}, booktitle = {*SEM First Joint Conference on Lexical and Computational Semantics}, month = {June}, year = {2012}, address = {Montreal}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/monolingual-distributional-similarity-for-text-to-text-generation.pdf} }

Machine Translation of Arabic Dialects
Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stallard, Spyros Matsoukas, Richard Schwartz, John Makhoul, Omar Zaidan and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{Zbib-etal:2012:NAACL, author = {Rabih Zbib and Erika Malchiodi and Jacob Devlin and David Stallard and Spyros Matsoukas and Richard Schwartz and John Makhoul and Omar Zaidan and Callison-Burch, Chris}, title = {Machine Translation of Arabic Dialects}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, address = {Montreal}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/machine-translation-of-arabic-dialects.pdf} }

Language Identification for Creating Language-Specific Twitter Collections
Shane Bergsma, Paul McNamee, Mossaab Bagdouri, Clay Fink and Theresa Wilson
Workshop on Language and Social Media at the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{bergsma2012lid, author = {Bergsma, Shane and McNamee, Paul and Mossaab Bagdouri and Clay Fink and Wilson, Theresa}, title = {Language Identification for Creating Language-Specific Twitter Collections}, booktitle = {Workshop on Language and Social Media at the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, publisher = {Association for Computational Linguistics}, url = {http://aclweb.org/anthology//W/W12/W12-2108.pdf} }

Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing
Matt Post, Chris Callison-Burch and Miles Osborne
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{post-callisonburch-osborne:2012:WMT, author = {Post, Matt and Callison-Burch, Chris and Miles Osborne}, title = {Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, publisher = {Association for Computational Linguistics}, pages = {401--409}, url = {http://www.aclweb.org/anthology/W12-3152} }

Predicting Overt Display of Power in Written Dialogs
Vinodkumar Prabhakaran, Owen Rambow and Mona Diab
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{prabhakaran_et_al_naacl2012, author = {Vinodkumar Prabhakaran and Owen Rambow and Mona Diab}, title = {Predicting Overt Display of Power in Written Dialogs}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, url = {http://aclweb.org/anthology//N/N12/N12-1057.pdf} }

Findings of the 2012 Workshop on Statistical Machine Translation
Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{callisonburch-EtAl:2012:WMT, author = {Callison-Burch, Chris and Philipp Koehn and Christof Monz and Post, Matt and Radu Soricut and Lucia Specia}, title = {Findings of the 2012 Workshop on Statistical Machine Translation}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {10--51}, url = {http://cs.jhu.edu/~ccb/publications/findings-of-the-wmt12-shared-tasks.pdf} }

Creating and Curating a Cross-Language Entity Linking Collection
Dawn Lawrie, James Mayfield, Paul McNamee and Douglas Oard
Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC) – 2012

[bib]

@inproceedings{2012-LREC-Lawrie, author = {Lawrie, Dawn and Mayfield, James and McNamee, Paul and Douglas Oard}, title = {Creating and Curating a Cross-Language Entity Linking Collection}, booktitle = {Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC)}, month = {May}, year = {2012}, url = {http://www.lrec-conf.org/proceedings/lrec2012/pdf/655_Paper.pdf} }

Annotations for Power Relations on Email Threads
Vinodkumar Prabhakaran, Huzaifa Neralwala, Owen Rambow and Mona Diab
Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC'12) – 2012

[bib]

@inproceedings{prabhakaran_et_al_lrec2012, author = {Vinodkumar Prabhakaran and Huzaifa Neralwala and Owen Rambow and Mona Diab}, title = {Annotations for Power Relations on Email Threads}, booktitle = {Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC'12)}, month = {May}, year = {2012}, address = {Istanbul, Turkey}, publisher = {European Language Resources Association (ELRA)}, url = {http://www.lrec-conf.org/proceedings/lrec2012/pdf/1006_Paper.pdf} }

Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data
James Tilton, Douglas Comer, Carey Priebe, Daniel Sussman and Li Chen
SPIE Defense, Security, and Sensing – 2012

[bib]

@article{tilton2012refinement, author = {James Tilton and Douglas Comer and Priebe, Carey and Daniel Sussman and Li Chen}, title = {Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data}, month = {April}, year = {2012}, pages = {23--27}, url = {http://proceedings.spiedigitallibrary.org/data/Conferences/SPIEP/67559/83901K.pdf} }

Toward Statistical Machine Translation without Parallel Corpora
Alex Klementiev, Ann Irvine, Chris Callison-Burch and David Yarowsky
Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL) – 2012

[bib]

@inproceedings{klementiev-etal:2012:EACL, author = {Alex Klementiev and Irvine, Ann and Callison-Burch, Chris and Yarowsky, David}, title = {Toward Statistical Machine Translation without Parallel Corpora}, booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL)}, month = {April}, year = {2012}, address = {Avignon, France}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/toward-statistical-machine-translation-without-parallel-corpora.pdf} }

Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification
Shuai Huang, Glen Coppersmith and Damianos Karakos
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Huang:2012fk, author = {Shuai Huang and Coppersmith, Glen and Karakos, Damianos}, title = {Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_2037.html} }

Indexing Raw Acoustic Features for Scalable Zero Resource Search
Aren Jansen and Benjamin Van Durme
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{janvan12, author = {Jansen, Aren and Van Durme, Benjamin}, title = {Indexing Raw Acoustic Features for Scalable Zero Resource Search}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_2466.html} }

Inverting the Point Process Model for Fast Phonetic Keyword Search
Keith Kintzley, Aren Jansen, Ken Church and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

Normally, we represent speech as a long sequence of frames and model the keyword with a relatively small set of parameters, commonly with a hidden Markov model (HMM). However, since the input speech is much longer than the keyword, suppose instead that we represent the speech as a relatively sparse set of impulses (roughly one per phoneme) and model the keyword as a filter-bank where each filter's impulse response relates to the likelihood of a phone at a given position within a word. Evaluating keyword detections can then be seen as a convolution of an impulse train with an array of filters. This view enables huge speedups; runtime no longer depends on the frame rate and is instead linear in the number of events (impulses). We apply this intuition to redesign the runtime engine behind the point process model for keyword spotting. We demonstrate impressive real-time speedups (500,000x faster than real-time) with minimal loss in search accuracy.
@inproceedings{kintzley-jansen-church-hermansky:is2012b, author = {Keith Kintzley and Jansen, Aren and Church, Ken and Hermansky, Hynek}, title = {Inverting the Point Process Model for Fast Phonetic Keyword Search}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, address = {Portland, Oregon, USA}, publisher = {International Speech Communication Association}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_2438.html}, abstract = {Normally, we represent speech as a long sequence of frames and model the keyword with a relatively small set of parameters, commonly with a hidden Markov model (HMM). However, since the input speech is much longer than the keyword, suppose instead that we represent the speech as a relatively sparse set of impulses (roughly one per phoneme) and model the keyword as a filter-bank where each filter's impulse response relates to the likelihood of a phone at a given position within a word. Evaluating keyword detections can then be seen as a convolution of an impulse train with an array of filters. This view enables huge speedups; runtime no longer depends on the frame rate and is instead linear in the number of events (impulses). We apply this intuition to redesign the runtime engine behind the point process model for keyword spotting. We demonstrate impressive real-time speedups (500,000x faster than real-time) with minimal loss in search accuracy.} }

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors
Keith Kintzley, Aren Jansen and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The intrinsic advantages of whole-word acoustic modeling are offset by the problem of data sparsity. To address this, we present several parametric approaches to estimating intra-word phonetic timing models under the assumption that relative timing is independent of word duration. We show evidence that the timing of phonetic events is well described by the Gaussian distribution. We explore the construction of models in the absence of keyword examples (dictionary-based), when keyword examples are abundant (Gaussian mixture models), and also present a Bayesian approach which unifies the two. Applying these techniques in a point process model keyword spotting framework, we demonstrate a 55\% relative improvement in performance for models constructed from few examples.
@inproceedings{kintzley-jansen-hermansky:is2012a, author = {Keith Kintzley and Jansen, Aren and Hermansky, Hynek}, title = {MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, address = {Portland, Oregon, USA}, publisher = {International Speech Communication Association}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_0787.html}, abstract = {The intrinsic advantages of whole-word acoustic modeling are offset by the problem of data sparsity. To address this, we present several parametric approaches to estimating intra-word phonetic timing models under the assumption that relative timing is independent of word duration. We show evidence that the timing of phonetic events is well described by the Gaussian distribution. We explore the construction of models in the absence of keyword examples (dictionary-based), when keyword examples are abundant (Gaussian mixture models), and also present a Bayesian approach which unifies the two. Applying these techniques in a point process model keyword spotting framework, we demonstrate a 55\% relative improvement in performance for models constructed from few examples.} }

Acoustic and Data-driven Features for Robust Speech Activity Detection
Samuel Thomas, Sri Harish Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab Shamma, Tim Ng, Bing Zhang, Long Nguyen and Spyros Matsoukas
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{samuel_acoustic:2012, author = {Samuel Thomas and Sri Harish Mallidi and Thomas Janu and Hermansky, Hynek and Nima Mesgarani and Xinhui Zhou and Shihab Shamma and Tim Ng and Bing Zhang and Long Nguyen and Spyros Matsoukas}, title = {Acoustic and Data-driven Features for Robust Speech Activity Detection}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1985.html} }

Analysis of Temporal Resolution in Frequency Domain Linear Prediction
Sriram Ganapathy and Hynek Hermanksy
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{sriram_analysis:2012, author = {Ganapathy, Sriram and Hynek Hermanksy}, title = {Analysis of Temporal Resolution in Frequency Domain Linear Prediction}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1828.html} }

Data-driven Posterior Features for Low Resource Speech Recognition Applications
Samuel Thomas, Sriram Ganapathy, Aren Jansen and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{samuel_data-driven:2012, author = {Samuel Thomas and Ganapathy, Sriram and Jansen, Aren and Hermansky, Hynek}, title = {Data-driven Posterior Features for Low Resource Speech Recognition Applications}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_0791.html} }

Estimating Classifier Performance in Unknown Noise
Ehsan Variani and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{ehsan_estimating:2012, author = {Ehsan Variani and Hermansky, Hynek}, title = {Estimating Classifier Performance in Unknown Noise}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1800.html} }

Exploiting Discriminative Point Process Models for Spoken Term Detection
Atta Norouzian, Aren Jansen, Richard C. Rose and Samuel Thomas
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{atta_exploiting:2012, author = {Atta Norouzian and Jansen, Aren and Richard C. Rose and Samuel Thomas}, title = {Exploiting Discriminative Point Process Models for Spoken Term Detection}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_2442.html} }

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition
Sriram Ganapathy, Samuel Thomas and Hynek Hermansky
Odyssey Speaker and Language Recognition Workshop – 2012

[bib]

@inproceedings{sriram_feature:2012, author = {Ganapathy, Sriram and Samuel Thomas and Hermansky, Hynek}, title = {Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition}, booktitle = {Odyssey Speaker and Language Recognition Workshop}, year = {2012} }

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition
Aren Jansen, Samuel Thomas and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{aren_intrinsic:2012, author = {Jansen, Aren and Samuel Thomas and Hermansky, Hynek}, title = {Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_0879.html} }

Multilingual MLP Features For Low-resource LVCSR Systems
Samuel Thomas, Sriram Ganapathy and Hynek Hermansky
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{samuel_multilingual:2012, author = {Samuel Thomas and Ganapathy, Sriram and Hermansky, Hynek}, title = {Multilingual MLP Features For Low-resource LVCSR Systems}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6288862} }

Phone recognition in critical bands using sub-band temporal modulations
Feipeng Li, Sri Harish Mallidi and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{feipeng_phone:2012, author = {Feipeng Li and Sri Harish Mallidi and Hermansky, Hynek}, title = {Phone recognition in critical bands using sub-band temporal modulations}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1816.html} }

Robust phoneme recognition Based on biomimetic speech contours
Michael A. Carlin, Kailash Patil, Sridhar Nemala and Mounya Elhilali
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Carlin2012b, author = {Carlin, Michael and Kailash Patil and Sridhar Nemala and Mounya Elhilali}, title = {Robust phoneme recognition Based on biomimetic speech contours}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1348.html} }

Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation
Michael A. Carlin, Nicolas Malyska and Thomas F. Quatieri
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Carlin2012d, author = {Carlin, Michael and Nicolas Malyska and Thomas F. Quatieri}, title = {Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_0583.html} }

A Flexible Solver for Finite Arithmetic Circuits
Nathaniel Wesley Filardo and Jason Eisner
Technical Communications of the 28th International Conference on Logic Programming, ICLP 2012 – 2012

[bib]

@inproceedings{filardo-eisner-2012-iclp, author = {Filardo, Nathaniel and Eisner, Jason}, title = {A Flexible Solver for Finite Arithmetic Circuits}, booktitle = {Technical Communications of the 28th International Conference on Logic Programming, ICLP 2012}, year = {2012}, url = {http://cs.jhu.edu/~jason/papers/#filardo-eisner-2012-iclp} }

How Social Media Will Change Public Health
Mark Dredze
IEEE Intelligent Systems – 2012

[bib]

@article{Dredze:2012qy, author = {Dredze, Mark}, title = {How Social Media Will Change Public Health}, year = {2012}, pages = {81-84}, url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6285937} }

Factorial LDA: Sparse Multi-Dimensional Text Models
Michael J Paul and Mark Dredze
Neural Information Processing Systems (NIPS) – 2012

[bib]

@inproceedings{Paul:2012lr, author = {Michael J Paul and Dredze, Mark}, title = {Factorial LDA: Sparse Multi-Dimensional Text Models}, booktitle = {Neural Information Processing Systems (NIPS)}, year = {2012}, url = {http://books.nips.cc/papers/files/nips25/NIPS2012_1224.pdf} }

Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics
Alex Lamb, Michael Paul and Mark Dredze
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.
@inproceedings{lamb:2012, author = {Alex Lamb and Michael Paul and Dredze, Mark}, title = {Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, url = {http://www.aaai.org/ocs/index.php/FSS/FSS12/paper/view/5571/5856}, abstract = {We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.} }

Malpractice and Malcontent: Analyzing Medical Complaints in Twitter
Atul Nakhasi, Ralph J Passarella, Sarah G Bell, Michael J Paul, Mark Dredze and Peter J Pronovost
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.
@inproceedings{nakhasi:2012, author = {Atul Nakhasi and Ralph J Passarella and Sarah G Bell and Michael J Paul and Dredze, Mark and Peter J Pronovost}, title = {Malpractice and Malcontent: Analyzing Medical Complaints in Twitter}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, url = {http://www.aaai.org/ocs/index.php/FSS/FSS12/paper/view/5572/5857}, abstract = {In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.} }

Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions
Michael J Paul and Mark Dredze
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.
@inproceedings{Paul:2012fk, author = {Michael J Paul and Dredze, Mark}, title = {Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, url = {http://www.aaai.org/ocs/index.php/FSS/FSS12/paper/view/5573/5848}, abstract = {Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.} }

Twitter as a Source for Learning about Patient Safety Events
Ralph Passarella, Atul Nakhasi, Sarah Bell, Michael Paul, Peter Pronovost and Mark Dredze
Annual Symposium of the American Medical Informatics Association (AMIA) – 2012

[bib]

@inproceedings{Passarella:2012fk, author = {Ralph Passarella and Atul Nakhasi and Sarah Bell and Michael Paul and Peter Pronovost and Dredze, Mark}, title = {Twitter as a Source for Learning about Patient Safety Events}, booktitle = {Annual Symposium of the American Medical Informatics Association (AMIA)}, year = {2012}, url = {http://knowledge.amia.org/amia-55142-a2012a-1.636547/t-006-1.640361/f-001-1.640362/a-255-1.640447/an-255-1.640449} }

Deriving conversation-based features from unlabeled speech for discriminative language modeling
Damianos Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Dan Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philip Koehn, Adam Lopez, Matt Post and Darcey Riley
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.
@inproceedings{Karakos:2012fk, author = {Karakos, Damianos and Brian Roark and Izhak Shafran and Kenji Sagae and Maider Lehr and Emily Prud'hommeaux and Puyang Xu and Nathan Glenn and Khudanpur, Sanjeev and Murat Saraclar and Dan Bikel and Dredze, Mark and Callison-Burch, Chris and Yuan Cao and Keith Hall and Eva Hasler and Philip Koehn and Lopez, Adam and Post, Matt and Darcey Riley}, title = {Deriving conversation-based features from unlabeled speech for discriminative language modeling}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_0202.html}, abstract = {The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.} }

Efficient Structured Language Modeling for Speech Recognition
Ariya Rastrow, Mark Dredze and Sanjeev Khudanpur
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.
@inproceedings{Rastrow:2012, author = {Ariya Rastrow and Dredze, Mark and Khudanpur, Sanjeev}, title = {Efficient Structured Language Modeling for Speech Recognition}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, url = {http://www.isca-speech.org/archive/interspeech_2012/i12_1660.html}, abstract = {The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.} }

Multi-Domain Learning: When Do Domains Matter?
Mahesh Joshi, Mark Dredze, William W Cohen and Carolyn P Rose
Empirical Methods in Natural Language Processing (EMNLP) – 2012

[abstract] [bib]

Abstract

We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
@inproceedings{Joshi:2012fk, author = {Mahesh Joshi and Dredze, Mark and William W Cohen and Carolyn P Rose}, title = {Multi-Domain Learning: When Do Domains Matter?}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2012}, url = {http://aclweb.org/anthology//D/D12/D12-1119.pdf}, abstract = {We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.} }

Revisiting the Case for Explicit Syntactic Information in Language Models
Ariya Rastrow, Sanjeev Khudanpur and Mark Dredze
NAACL Workshop on the Future of Language Modeling for HLT – 2012

[abstract] [bib]

Abstract

Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.
@inproceedings{Rastrow:2012fl, author = {Ariya Rastrow and Khudanpur, Sanjeev and Dredze, Mark}, title = {Revisiting the Case for Explicit Syntactic Information in Language Models}, booktitle = {NAACL Workshop on the Future of Language Modeling for HLT}, year = {2012}, url = {http://aclweb.org/anthology//W/W12/W12-2707.pdf}, abstract = {Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.} }

Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
Ariya Rastrow, Mark Dredze and Sanjeev Khudanpur
Association for Computational Linguistics (ACL) – 2012

[abstract] [bib]

Abstract

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.
@inproceedings{Rastrow:2012fk, author = {Ariya Rastrow and Dredze, Mark and Khudanpur, Sanjeev}, title = {Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2012}, url = {http://aclweb.org/anthology//P/P12/P12-1019.pdf}, abstract = {Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.} }

Back to Top

Displaying 1 - 100 of 395 total matches