Publications

Additional (non-HLTCOE) publications may be found on researchers' personal websites.


Loading...

2013 (33 total)

Next Generation Storage for the HLTCOE
Scott Roberts
Technical Report 9, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.
@techreport{roberts_tech:2013, author = {Scott Roberts}, title = {Next Generation Storage for the HLTCOE}, number = {9}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, month = {April}, year = {2013}, abstract = {The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.} }

Answer Extraction as Sequence Tagging with Tree Edit Distance
Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark and Callison-Burch, Chris}, title = {Answer Extraction as Sequence Tagging with Tree Edit Distance}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Generating Expressions that Refer to Visible Objects
Margaret Mitchell, Kees van Deemter and Ehud Reiter
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Kees van Deemter and Ehud Reiter}, title = {Generating Expressions that Refer to Visible Objects}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He, Jimmy Lin and Adam Lopez
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Hua He and Jimmy Lin and Lopez, Adam}, title = {Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Ann Irvine and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Irvine, Ann and Callison-Burch, Chris}, title = {Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

PPDB: The Paraphrase Database
Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Juri Ganitkevitch and Van Durme, Benjamin and Callison-Burch, Chris}, title = {PPDB: The Paraphrase Database}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Improving the Quality of Minority Class Identification in Dialog Act Tagging
Omuya, Adinoyi, Vinodkumar Prabhakaran and Owen Rambow
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Omuya and Adinoyi and Vinodkumar Prabhakaran and Owen Rambow}, title = {Improving the Quality of Minority Class Identification in Dialog Act Tagging}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter
Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson and David Yarowsky
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{bergsma:2013, author = {Bergsma, Shane and Dredze, Mark and Van Durme, Benjamin and Wilson, Theresa and Yarowsky, David}, title = {Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

What's in a Domain? Multi-Domain Learning for Multi-Attribute Data
Mahesh Joshi, Mark Dredze, William Cohen and Carolyn P. Rose
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{joshi:2013, author = {Mahesh Joshi and Dredze, Mark and William Cohen and Carolyn P. Rose}, title = {What's in a Domain? Multi-Domain Learning for Multi-Attribute Data}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Separating Fact from Fear: Tracking Flu Infections on Twitter
Alex Lamb, Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{lamb:2013, author = {Alex Lamb and Michael Paul and Dredze, Mark}, title = {Separating Fact from Fear: Tracking Flu Infections on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{Paul:2013, author = {Michael Paul and Dredze, Mark}, title = {Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation
Damianos Karakos, Mark Dredze and Sanjeev Khudanpur
Technical Report 8, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language model’s parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.
@techreport{karakos_tech:2013, author = {Karakos, Damianos and Dredze, Mark and Khudanpur, Sanjeev}, title = {Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation}, number = {8}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language model’s parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.} }

Topic Models and Metadata for Visualizing Text Corpora
Justin Snyder, Rebecca Knowles, Mark Dredze, Matt Gormley and Travis Wolfe
North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper) – 2013

[abstract] [bib]

Abstract

Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool.
@inproceedings{Snyder:2013lr, author = {Justin Snyder and Rebecca Knowles and Dredze, Mark and Gormley, Matt and Wolfe, Travis}, title = {Topic Models and Metadata for Visualizing Text Corpora}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper)}, year = {2013}, abstract = {Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool.} }

Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds
Michael A. Carlin and Mounya Elhilali
PLoS Computational Biology – 2013

[bib]

@article{Carlin2013, author = {Carlin, Michael and Mounya Elhilali}, title = {Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds}, year = {2013} }

Nonconvex Global Optimization for Grammar Induction
Matt Gormley and Jason Eisner
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{gormley-eisner:2013, author = {Gormley, Matt and Eisner, Jason}, title = {Nonconvex Global Optimization for Grammar Induction}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

Learning to translate with products of novices: Teaching MT with open-ended challenge problems
Adam Lopez, Matt Post, Chris Callison-Burch, Jonathan Weese, Juri Ganitkevitch, Narges Ahmidi, Olivia Buzek, Leah Hanson, Beenish Jamil, Nam Lee, Michael A. Carlin, Henry Pao, Fatima Rivera, Leili Shahriyari, Debu Sinha, Adam Teichert, Stephen Wampler, Michael Weinberger, Daguang Xu, Lin Yang and Shang Zhao
Transactions of the Association for Computational Linguistics – 2013

[bib]

@article{Lopez+etal:2013:tacl:mt-class, author = {Lopez, Adam and Post, Matt and Callison-Burch, Chris and Jonathan Weese and Juri Ganitkevitch and Narges Ahmidi and Buzek, Olivia and Leah Hanson and Beenish Jamil and Lee, Nam and Carlin, Michael and Henry Pao and Fatima Rivera and Leili Shahriyari and Debu Sinha and Adam Teichert and Stephen Wampler and Michael Weinberger and Daguang Xu and Lin Yang and Shang Zhao}, title = {Learning to translate with products of novices: Teaching MT with open-ended challenge problems}, year = {2013} }

Dirt Cheap Web-Scale Parallel Text from the Common Crawl
Jason R Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch and Adam Lopez
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Smith+etal:2013:acl, author = {Smith, Jason and Herve Saint-Amand and Magdalena Plamada and Philipp Koehn and Callison-Burch, Chris and Lopez, Adam}, title = {Dirt Cheap Web-Scale Parallel Text from the Common Crawl}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

KELVIN: a tool for automated knowledge base construction
Paul McNamee, James Mayfield, Tim Finin, Tim Oates, Dawn Lawrie, Tan Xu and Douglas W. Oard
North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT) – 2013

[bib]

@{, author = {McNamee, Paul and Mayfield, James and Finin, Tim and Oates, Tim and Lawrie, Dawn and Tan Xu and Douglas W. Oard}, title = {KELVIN: a tool for automated knowledge base construction}, booktitle = {North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT)}, year = {2013} }

Using Conceptual Class Attributes to Characterize Social Media Users
Shane Bergsma and Benjamin Van Durme
Association for Computational Linguistics (ACL) – 2013

[bib]

@{, author = {Bergsma, Shane and Van Durme, Benjamin}, title = {Using Conceptual Class Attributes to Characterize Social Media Users}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat, Hal Daume III, Katie Henry, Ann Irvine, Jagadeesh Jagarlamudi and Rachel Rudinger
Association for Computational Linguistics (ACL) – 2013

[bib]

@{, author = {Marine Carpuat and Hal Daume III and Katie Henry and Irvine, Ann and Jagadeesh Jagarlamudi and Rachel Rudinger}, title = {SenseSpotting: Never let your parallel data tie you to an old domain}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

Lightly Supervised Learning of Procedural Dialog Systems
Svitlana Volkova, Pallavi Choudhury, Chris Quirk, Bill Dolan and Luke Zettlemoyer
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Volkova_2013_ACL, author = {Volkova, Svitlana and Pallavi Choudhury and Chris Quirk and Bill Dolan and Luke Zettlemoyer}, title = {Lightly Supervised Learning of Procedural Dialog Systems}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

Learning to Relate Literal and Sentimental Descriptions of Visual Properties
Mark Yatskar, Svitlana Volkova, Alsi Celikyilmaz, Bill Dolan and Luke Zettlemoyer
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{Volkova_2013_NAACL, author = {Mark Yatskar and Volkova, Svitlana and Alsi Celikyilmaz and Bill Dolan and Luke Zettlemoyer}, title = {Learning to Relate Literal and Sentimental Descriptions of Visual Properties}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013} }

Supervector Bayesian Speaker Comparison
Bengt J. Borgstro ̈m and Alan McCree
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@{, author = {Bengt J. Borgstro ̈m and McCree, Alan}, title = {Supervector Bayesian Speaker Comparison}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

Discriminatively Trained Bayesian Speaker Comparison of I-Vectors
Bengt J. Borgstro ̈m and Alan McCree
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@{, author = {Bengt J. Borgstro ̈m and McCree, Alan}, title = {Discriminatively Trained Bayesian Speaker Comparison of I-Vectors}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems
Lushan Han, Abhay L. Kashyap, Tim Finin, James Mayfield and Jonathan Weese
Joint Conference on Lexical and Computational Semantics (*SEM) – 2013

[abstract] [pdf] | [bib]

Abstract

We describe three semantic text similarity systems developed for the *SEM 2013 STS shared task and the results of the corresponding three runs. All of them used a word similarity feature that combined LSA word similarity and WordNet knowledge. The first run, which achieved the top mean score on the task of all the submissions, used a simple term alignment algorithm. The other two runs, ranked second and fourth, used SVM models to combine a larger sets of features.
@{, author = {Lushan Han and Abhay L. Kashyap and Finin, Tim and Mayfield, James and Jonathan Weese}, title = {UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems}, booktitle = {Joint Conference on Lexical and Computational Semantics (*SEM)}, year = {2013}, abstract = {We describe three semantic text similarity systems developed for the *SEM 2013 STS shared task and the results of the corresponding three runs. All of them used a word similarity feature that combined LSA word similarity and WordNet knowledge. The first run, which achieved the top mean score on the task of all the submissions, used a simple term alignment algorithm. The other two runs, ranked second and fourth, used SVM models to combine a larger sets of features.} }

Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition
Carolina Parada, Mark Dredze, Abhinav Sethy and Ariya Rastrow
Technical Report 10, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.
@techreport{, author = {Carolina Parada and Dredze, Mark and Abhinav Sethy and Ariya Rastrow}, title = {Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition}, number = {10}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.} }

PARMA: A Predicate Argument Aligner
Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu and Xuchen Yao
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Wolfe:2013lr, author = {Wolfe, Travis and Van Durme, Benjamin and Dredze, Mark and Andrews, Nicholas and Charley Beller and Callison-Burch, Chris and Jay DeYoung and Justin Snyder and Jonathan Weese and Tan Xu and Xuchen Yao}, title = {PARMA: A Predicate Argument Aligner}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

Explicit and Implicit Syntactic Features for Text Classification
Matt Post and Shane Bergsma
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{, author = {Post, Matt and Bergsma, Shane}, title = {Explicit and Implicit Syntactic Features for Text Classification}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013} }

Frequency Offset Correction in Speech without Detecting Pitch
Pascal Clark, Sri Harish Mallidi, Aren Jansen and Hynek Hermansky
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@inproceedings{, author = {Clark, Pascal and Sri Harish Mallidi and Jansen, Aren and Hermansky, Hynek}, title = {Frequency Offset Correction in Speech without Detecting Pitch}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

Complementary envelope estimation for frequency-modulated random signals
Pascal Clark, Ivars Kirsteins and Les Atlas
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2013

[pdf] | [bib]

@inproceedings{, author = {Clark, Pascal and Ivars Kirsteins and Les Atlas}, title = {Complementary envelope estimation for frequency-modulated random signals}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2013} }

A Lightweight and High Performance Monolingual Word Aligner
Xuchen Yao, Peter Clark, Benjamin Van Durme and Chris Callison-Burch
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{yao-EtAl:2013:ACL, author = {Xuchen Yao and Peter Clark and Van Durme, Benjamin and Callison-Burch, Chris}, title = {A Lightweight and High Performance Monolingual Word Aligner}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics} }

Arabic Dialect Identification
Omar Zaidan and Chris Callison-Burch
Computational Linguistics – 2013

[bib]

@article{zaidan-callisonburch:CL:2013, author = {Omar Zaidan and Callison-Burch, Chris}, title = {Arabic Dialect Identification}, year = {2013} }

Automatic Coupling of Answer Extraction and Information Retrieval
Xuchen Yao, Benjamin Van Durme and Peter Clark
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{yao1-EtAl:2013:ACL, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark}, title = {Automatic Coupling of Answer Extraction and Information Retrieval}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics} }

Back to Top

2012 (79 total)

Detecting Power Relations from Written Dialog
Vinodkumar Prabhakaran
Proceedings of ACL 2012 Student Research Workshop – 2012

[bib]

@inproceedings{prabhakaran:2012:SRW, author = {Vinodkumar Prabhakaran}, title = {Detecting Power Relations from Written Dialog}, booktitle = {Proceedings of ACL 2012 Student Research Workshop}, month = {July}, year = {2012}, address = {Jeju Island, Korea}, publisher = {Association for Computational Linguistics}, pages = {7--12}, url = {http://www.aclweb.org/anthology/W12-3302} }

Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing
Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine Piatko, and Benjamin Van Durme
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics – 2012

[bib]

@inproceedings{prabhakaran-EtAl:2012:ExProM, author = {Vinodkumar Prabhakaran and Bloodgood, Michael and Mona Diab and Dorr, Bonnie and Lori Levin and Piatko, Christine and and Van Durme, Benjamin}, title = {Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing}, booktitle = {Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics}, month = {July}, year = {2012}, address = {Jeju, Republic of Korea}, publisher = {Association for Computational Linguistics}, pages = {57--64}, url = {http://www.aclweb.org/anthology/W12-3807} }

A Context-Aware Approach to Entity Linking
Veselin Stoyanov, James Xu, Douglas Oard, Dawn Lawrie, Tim Oates and Tim Finin
Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) – 2012

[bib]

@inproceedings{stoyanov-EtAl:2012:AKBC-WEKEX, author = {Stoyanov, Veselin and James Xu and Douglas Oard and Lawrie, Dawn and Oates, Tim and Finin, Tim}, title = {A Context-Aware Approach to Entity Linking}, booktitle = {Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {62--67}, url = {http://www.aclweb.org/anthology/W12-3012} }

Evaluating the Quality of a Knowledge Base Populated from Text
James Mayfield and Tim Finin
Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) – 2012

[bib]

@inproceedings{mayfield-finin:2012:AKBC-WEKEX, author = {Mayfield, James and Finin, Tim}, title = {Evaluating the Quality of a Knowledge Base Populated from Text}, booktitle = {Proceedings of the NAACL Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {68--73}, url = {http://www.aclweb.org/anthology/W12-3013} }

Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing
Matt Post, Chris Callison-Burch and Miles Osborne
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{post-callisonburch-osborne:2012:WMT, author = {Post, Matt and Callison-Burch, Chris and Miles Osborne}, title = {Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, publisher = {Association for Computational Linguistics}, pages = {401--409}, url = {http://www.aclweb.org/anthology/W12-3152} }

Findings of the 2012 Workshop on Statistical Machine Translation
Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{callisonburch-EtAl:2012:WMT, author = {Callison-Burch, Chris and Philipp Koehn and Christof Monz and Post, Matt and Radu Soricut and Lucia Specia}, title = {Findings of the 2012 Workshop on Statistical Machine Translation}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {10--51}, url = {http://cs.jhu.edu/~ccb/publications/findings-of-the-wmt12-shared-tasks.pdf} }

Using Categorial Grammar to Label Translation Rules
Jonathan Weese, Chris Callison-Burch and Adam Lopez
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{weese-callisonburch-lopez:2012:WMT, author = {Jonathan Weese and Callison-Burch, Chris and Lopez, Adam}, title = {Using Categorial Grammar to Label Translation Rules}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {222--231} }

Joshua 4.0: Packing, PRO, and Paraphrases
Juri Ganitkevitch, Yuan Cao, Jonathan Weese, Matt Post and Chris Callison-Burch
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{ganitkevitch-EtAl:2012:WMT, author = {Juri Ganitkevitch and Yuan Cao and Jonathan Weese and Post, Matt and Callison-Burch, Chris}, title = {Joshua 4.0: Packing, PRO, and Paraphrases}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics}, pages = {283--291}, url = {http://cs.jhu.edu/~ccb/publications/joshua-4.0.pdf} }

Monolingual Distributional Similarity for Text-to-Text Generation
Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch
*SEM First Joint Conference on Lexical and Computational Semantics – 2012

[bib]

@inproceedings{Ganitkevitch-etal:2012:StarSEM, author = {Juri Ganitkevitch and Van Durme, Benjamin and Callison-Burch, Chris}, title = {Monolingual Distributional Similarity for Text-to-Text Generation}, booktitle = {*SEM First Joint Conference on Lexical and Computational Semantics}, month = {June}, year = {2012}, address = {Montreal}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/monolingual-distributional-similarity-for-text-to-text-generation.pdf} }

Predicting Overt Display of Power in Written Dialogs
Vinodkumar Prabhakaran, and Mona Diab
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{prabhakaran_et_al_naacl2012, author = {Vinodkumar Prabhakaran and and Mona Diab}, title = {Predicting Overt Display of Power in Written Dialogs}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, address = {Montreal, Canada}, publisher = {Association for Computational Linguistics} }

Machine Translation of Arabic Dialects
Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stallard, Spyros Matsoukas, Richard Schwartz, John Makhoul, Omar Zaidan and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{Zbib-etal:2012:NAACL, author = {Rabih Zbib and Erika Malchiodi and Jacob Devlin and David Stallard and Spyros Matsoukas and Richard Schwartz and John Makhoul and Omar Zaidan and Callison-Burch, Chris}, title = {Machine Translation of Arabic Dialects}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, address = {Montreal}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/machine-translation-of-arabic-dialects.pdf} }

Language Identification for Creating Language-Specific Twitter Collections
Shane Bergsma, Paul McNamee, Mossaab Bagdouri, Clay Fink and Theresa Wilson
Workshop on Language and Social Media at the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{bergsma2012lid, author = {Bergsma, Shane and McNamee, Paul and Mossaab Bagdouri and Clay Fink and Wilson, Theresa}, title = {Language Identification for Creating Language-Specific Twitter Collections}, booktitle = {Workshop on Language and Social Media at the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}, month = {June}, year = {2012}, publisher = {Association for Computational Linguistics} }

Annotations for Power Relations on Email Threads
Vinodkumar Prabhakaran, and Mona Diab
Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC'12) – 2012

[bib]

@inproceedings{prabhakaran_et_al_lrec2012, author = {Vinodkumar Prabhakaran and and Mona Diab}, title = {Annotations for Power Relations on Email Threads}, booktitle = {Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC'12)}, month = {May}, year = {2012}, address = {Istanbul, Turkey}, publisher = {European Language Resources Association (ELRA)} }

Creating and Curating a Cross-Language Entity Linking Collection
Dawn Lawrie, James Mayfield, Paul McNamee and Douglas Oard
Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC) – 2012

[bib]

@inproceedings{2012-LREC-Lawrie, author = {Lawrie, Dawn and Mayfield, James and McNamee, Paul and Douglas Oard}, title = {Creating and Curating a Cross-Language Entity Linking Collection}, booktitle = {Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC)}, month = {May}, year = {2012} }

Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data
James Tilton, Douglas Comer, Carey Priebe, Daniel Sussman and Li Chen
SPIE Defense, Security, and Sensing – 2012

[bib]

@article{tilton2012refinement, author = {James Tilton and Douglas Comer and Priebe, Carey and Daniel Sussman and Li Chen}, title = {Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data}, month = {April}, year = {2012}, pages = {23--27} }

Toward Statistical Machine Translation without Parallel Corpora
Alex Klementiev, Ann Irvine, Chris Callison-Burch and David Yarowsky
Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL) – 2012

[bib]

@inproceedings{klementiev-etal:2012:EACL, author = {Alex Klementiev and Irvine, Ann and Callison-Burch, Chris and Yarowsky, David}, title = {Toward Statistical Machine Translation without Parallel Corpora}, booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL)}, month = {April}, year = {2012}, address = {Avignon, France}, publisher = {Association for Computational Linguistics}, url = {http://cs.jhu.edu/~ccb/publications/toward-statistical-machine-translation-without-parallel-corpora.pdf} }

Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification
Shuai Huang, Glen Coppersmith and Damianos Karakos
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Huang:2012fk, author = {Shuai Huang and Coppersmith, Glen and Karakos, Damianos}, title = {Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Indexing Raw Acoustic Features for Scalable Zero Resource Search
Aren Jansen and Benjamin Van Durme
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{janvan12, author = {Jansen, Aren and Van Durme, Benjamin}, title = {Indexing Raw Acoustic Features for Scalable Zero Resource Search}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Inverting the Point Process Model for Fast Phonetic Keyword Search
Keith Kintzley, Aren Jansen, Ken Church and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

Normally, we represent speech as a long sequence of frames and model the keyword with a relatively small set of parameters, commonly with a hidden Markov model (HMM). However, since the input speech is much longer than the keyword, suppose instead that we represent the speech as a relatively sparse set of impulses (roughly one per phoneme) and model the keyword as a filter-bank where each filter's impulse response relates to the likelihood of a phone at a given position within a word. Evaluating keyword detections can then be seen as a convolution of an impulse train with an array of filters. This view enables huge speedups; runtime no longer depends on the frame rate and is instead linear in the number of events (impulses). We apply this intuition to redesign the runtime engine behind the point process model for keyword spotting. We demonstrate impressive real-time speedups (500,000x faster than real-time) with minimal loss in search accuracy.
@inproceedings{kintzley-jansen-church-hermansky:is2012b, author = {Keith Kintzley and Jansen, Aren and Church, Ken and Hermansky, Hynek}, title = {Inverting the Point Process Model for Fast Phonetic Keyword Search}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, address = {Portland, Oregon, USA}, publisher = {International Speech Communication Association}, abstract = {Normally, we represent speech as a long sequence of frames and model the keyword with a relatively small set of parameters, commonly with a hidden Markov model (HMM). However, since the input speech is much longer than the keyword, suppose instead that we represent the speech as a relatively sparse set of impulses (roughly one per phoneme) and model the keyword as a filter-bank where each filter's impulse response relates to the likelihood of a phone at a given position within a word. Evaluating keyword detections can then be seen as a convolution of an impulse train with an array of filters. This view enables huge speedups; runtime no longer depends on the frame rate and is instead linear in the number of events (impulses). We apply this intuition to redesign the runtime engine behind the point process model for keyword spotting. We demonstrate impressive real-time speedups (500,000x faster than real-time) with minimal loss in search accuracy.} }

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors
Keith Kintzley, Aren Jansen and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The intrinsic advantages of whole-word acoustic modeling are offset by the problem of data sparsity. To address this, we present several parametric approaches to estimating intra-word phonetic timing models under the assumption that relative timing is independent of word duration. We show evidence that the timing of phonetic events is well described by the Gaussian distribution. We explore the construction of models in the absence of keyword examples (dictionary-based), when keyword examples are abundant (Gaussian mixture models), and also present a Bayesian approach which unifies the two. Applying these techniques in a point process model keyword spotting framework, we demonstrate a 55\% relative improvement in performance for models constructed from few examples.
@inproceedings{kintzley-jansen-hermansky:is2012a, author = {Keith Kintzley and Jansen, Aren and Hermansky, Hynek}, title = {MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, address = {Portland, Oregon, USA}, publisher = {International Speech Communication Association}, abstract = {The intrinsic advantages of whole-word acoustic modeling are offset by the problem of data sparsity. To address this, we present several parametric approaches to estimating intra-word phonetic timing models under the assumption that relative timing is independent of word duration. We show evidence that the timing of phonetic events is well described by the Gaussian distribution. We explore the construction of models in the absence of keyword examples (dictionary-based), when keyword examples are abundant (Gaussian mixture models), and also present a Bayesian approach which unifies the two. Applying these techniques in a point process model keyword spotting framework, we demonstrate a 55\% relative improvement in performance for models constructed from few examples.} }

Acoustic and Data-driven Features for Robust Speech Activity Detection
Samuel Thomas, Sri Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab Shamma, Tim Ng, Bing Zhang, Long Nguyen and Spyros Matsoukas
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{samuel_acoustic:2012, author = {Samuel Thomas and Sri Mallidi and Thomas Janu and Hermansky, Hynek and Nima Mesgarani and Xinhui Zhou and Shihab Shamma and Tim Ng and Bing Zhang and Long Nguyen and Spyros Matsoukas}, title = {Acoustic and Data-driven Features for Robust Speech Activity Detection}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Adaptation Transforms of Auto-Associative Neural Networks as Features for Speaker Verification
Samuel Thomas, Sri Mallidi, Sriram Ganapathy and Hynek Hermansky
Odyssey Speaker and Language Recognition Workshop – 2012

[bib]

@inproceedings{samuel_adaptation:2012, author = {Samuel Thomas and Sri Mallidi and Ganapathy, Sriram and Hermansky, Hynek}, title = {Adaptation Transforms of Auto-Associative Neural Networks as Features for Speaker Verification}, booktitle = {Odyssey Speaker and Language Recognition Workshop}, year = {2012} }

Analysis of Temporal Resolution in Frequency Domain Linear Prediction
Sriram Ganapathy and Hynek Hermanksy
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{sriram_analysis:2012, author = {Ganapathy, Sriram and Hynek Hermanksy}, title = {Analysis of Temporal Resolution in Frequency Domain Linear Prediction}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Data-driven Posterior Features for Low Resource Speech Recognition Applications
Samuel Thomas, Sriram Ganapathy, Aren Jansen and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{samuel_data-driven:2012, author = {Samuel Thomas and Ganapathy, Sriram and Jansen, Aren and Hermansky, Hynek}, title = {Data-driven Posterior Features for Low Resource Speech Recognition Applications}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Estimating Classifier Performance in Unknown Noise
Ehsan Variani and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{ehsan_estimating:2012, author = {Ehsan Variani and Hermansky, Hynek}, title = {Estimating Classifier Performance in Unknown Noise}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Exploiting Discriminative Point Process Models for Spoken Term Detection
Atta Norouzian, Aren Jansen, Richard Rose and Samuel Thomas
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{atta_exploiting:2012, author = {Atta Norouzian and Jansen, Aren and Richard Rose and Samuel Thomas}, title = {Exploiting Discriminative Point Process Models for Spoken Term Detection}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition
Sriram Ganapathy, Samuel Thomas and Hynek Hermansky
Odyssey Speaker and Language Recognition Workshop – 2012

[bib]

@inproceedings{sriram_feature:2012, author = {Ganapathy, Sriram and Samuel Thomas and Hermansky, Hynek}, title = {Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition}, booktitle = {Odyssey Speaker and Language Recognition Workshop}, year = {2012} }

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition
Aren Jansen, Samuel Thomas and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{aren_intrinsic:2012, author = {Jansen, Aren and Samuel Thomas and Hermansky, Hynek}, title = {Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Multilingual MLP Features For Low-resource LVCSR Systems
Samuel Thomas, Sriram Ganapathy and Hynek Hermansky
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{samuel_multilingual:2012, author = {Samuel Thomas and Ganapathy, Sriram and Hermansky, Hynek}, title = {Multilingual MLP Features For Low-resource LVCSR Systems}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012} }

Phone recognition in critical bands using sub-band temporal modulations
Feipeng Li, Sri Mallidi and Hynek Hermansky
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{feipeng_phone:2012, author = {Feipeng Li and Sri Mallidi and Hermansky, Hynek}, title = {Phone recognition in critical bands using sub-band temporal modulations}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Robust phoneme recognition using biomimetic speech contours
Michael A. Carlin, Kailash Patil, Sridhar Nemala and Mounya Elhilali
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Carlin2012b, author = {Carlin, Michael and Kailash Patil and Sridhar Nemala and Mounya Elhilali}, title = {Robust phoneme recognition using biomimetic speech contours}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation
Michael A. Carlin, Nicolas Malyska and Thomas Quatieri
International Speech Communication Association (INTERSPEECH) – 2012

[bib]

@inproceedings{Carlin2012d, author = {Carlin, Michael and Nicolas Malyska and Thomas Quatieri}, title = {Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012} }

A Flexible Solver for Finite Arithmetic Circuits
Nathaniel Wesley Filardo and Jason Eisner
Technical Communications of the 28th International Conference on Logic Programming, ICLP 2012 – 2012

[bib]

@inproceedings{filardo-eisner-2012-iclp, author = {Filardo, Nathaniel and Eisner, Jason}, title = {A Flexible Solver for Finite Arithmetic Circuits}, booktitle = {Technical Communications of the 28th International Conference on Logic Programming, ICLP 2012}, year = {2012}, url = {http://cs.jhu.edu/~jason/papers/#filardo-eisner-2012-iclp} }

Fast and Accurate Prediction via Evidence-Specific MRF Structure
Veselin Stoyanov and Jason Eisner
ICML Workshop on Inferning: Interactions between Inference and Learning – 2012

[bib]

@inproceedings{stoyanov-eisner-2012-icmlw, author = {Stoyanov, Veselin and Eisner, Jason}, title = {Fast and Accurate Prediction via Evidence-Specific MRF Structure}, booktitle = {ICML Workshop on Inferning: Interactions between Inference and Learning}, year = {2012}, url = {http://cs.jhu.edu/~jason/papers/#stoyanov-eisner-2012-icmlw} }

How Social Media Will Change Public Health
Mark Dredze
IEEE Intelligent Systems – 2012

[bib]

@article{Dredze:2012qy, author = {Dredze, Mark}, title = {How Social Media Will Change Public Health}, year = {2012}, pages = {81-84} }

Factorial LDA: Sparse Multi-Dimensional Text Models
Michael Paul and Mark Dredze
Neural Information Processing Systems (NIPS) – 2012

[bib]

@inproceedings{Paul:2012lr, author = {Michael Paul and Dredze, Mark}, title = {Factorial LDA: Sparse Multi-Dimensional Text Models}, booktitle = {Neural Information Processing Systems (NIPS)}, year = {2012} }

Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics
Alex Lamb, Michael Paul and Mark Dredze
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.
@inproceedings{lamb:2012, author = {Alex Lamb and Michael Paul and Dredze, Mark}, title = {Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, abstract = {We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.} }

Malpractice and Malcontent: Analyzing Medical Complaints in Twitter
Atul Nakhasi, Ralph Passarella, Sarah Bell, Michael Paul, Mark Dredze and Peter Pronovost
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.
@inproceedings{nakhasi:2012, author = {Atul Nakhasi and Ralph Passarella and Sarah Bell and Michael Paul and Dredze, Mark and Peter Pronovost}, title = {Malpractice and Malcontent: Analyzing Medical Complaints in Twitter}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, abstract = {In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.} }

Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions
Michael Paul and Mark Dredze
AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text – 2012

[abstract] [bib]

Abstract

Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.
@inproceedings{Paul:2012fk, author = {Michael Paul and Dredze, Mark}, title = {Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, year = {2012}, abstract = {Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.} }

Twitter as a Source for Learning about Patient Safety Events
Ralph Passarella, Atul Nakhasi, Sarah Bell, Michael Paul, Peter Pronovost and Mark Dredze
Annual Symposium of the American Medical Informatics Association (AMIA) – 2012

[bib]

@inproceedings{Passarella:2012fk, author = {Ralph Passarella and Atul Nakhasi and Sarah Bell and Michael Paul and Peter Pronovost and Dredze, Mark}, title = {Twitter as a Source for Learning about Patient Safety Events}, booktitle = {Annual Symposium of the American Medical Informatics Association (AMIA)}, year = {2012} }

Deriving conversation-based features from unlabeled speech for discriminative language modeling
Damianos Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Dan Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philip Koehn, Adam Lopez, Matt Post and Darcey Riley
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.
@inproceedings{Karakos:2012fk, author = {Karakos, Damianos and Brian Roark and Izhak Shafran and Kenji Sagae and Maider Lehr and Emily Prud'hommeaux and Puyang Xu and Nathan Glenn and Khudanpur, Sanjeev and Murat Saraclar and Dan Bikel and Dredze, Mark and Callison-Burch, Chris and Yuan Cao and Keith Hall and Eva Hasler and Philip Koehn and Lopez, Adam and Post, Matt and Darcey Riley}, title = {Deriving conversation-based features from unlabeled speech for discriminative language modeling}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, abstract = {The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.} }

Efficient Structured Language Modeling for Speech Recognition
Ariya Rastrow, Mark Dredze and Sanjeev Khudanpur
International Speech Communication Association (INTERSPEECH) – 2012

[abstract] [bib]

Abstract

The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.
@inproceedings{Rastrow:2012, author = {Ariya Rastrow and Dredze, Mark and Khudanpur, Sanjeev}, title = {Efficient Structured Language Modeling for Speech Recognition}, booktitle = {International Speech Communication Association (INTERSPEECH)}, year = {2012}, abstract = {The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.} }

Multi-Domain Learning: When Do Domains Matter?
Mahesh Joshi, Mark Dredze, William Cohen and Carolyn Rose
Empirical Methods in Natural Language Processing (EMNLP) – 2012

[abstract] [bib]

Abstract

We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
@inproceedings{Joshi:2012fk, author = {Mahesh Joshi and Dredze, Mark and William Cohen and Carolyn Rose}, title = {Multi-Domain Learning: When Do Domains Matter?}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2012}, abstract = {We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.} }

Revisiting the Case for Explicit Syntactic Information in Language Models
Ariya Rastrow, Sanjeev Khudanpur and Mark Dredze
NAACL Workshop on the Future of Language Modeling for HLT – 2012

[abstract] [bib]

Abstract

Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.
@inproceedings{Rastrow:2012fl, author = {Ariya Rastrow and Khudanpur, Sanjeev and Dredze, Mark}, title = {Revisiting the Case for Explicit Syntactic Information in Language Models}, booktitle = {NAACL Workshop on the Future of Language Modeling for HLT}, year = {2012}, abstract = {Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.} }

Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
Ariya Rastrow, Mark Dredze and Sanjeev Khudanpur
Association for Computational Linguistics (ACL) – 2012

[abstract] [bib]

Abstract

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.
@inproceedings{Rastrow:2012fk, author = {Ariya Rastrow and Dredze, Mark and Khudanpur, Sanjeev}, title = {Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2012}, abstract = {Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.} }

Processing Informal, Romanized Pakistani Text Messages
Ann Irvine, Jonathan Weese and Chris Callison-Burch
Proceedings of the NAACL Workshop on Language in Social Media – 2012

[bib]

@inproceedings{IrvineWeeseCallisonburchSMS12, author = {Irvine, Ann and Jonathan Weese and Callison-Burch, Chris}, title = {Processing Informal, Romanized Pakistani Text Messages}, booktitle = {Proceedings of the NAACL Workshop on Language in Social Media}, year = {2012} }

Digitizing 18th-Century French Literature: Comparing transcription methods for a critical edition text
Ann Irvine, Laure Marcellesi and Afra Zomorodian
Proceedings of the NAACL Workshop on Computational Linguistics for Literature – 2012

[bib]

@inproceedings{IrvineMarcellesiZomorodianFrench12, author = {Irvine, Ann and Laure Marcellesi and Afra Zomorodian}, title = {Digitizing 18th-Century French Literature: Comparing transcription methods for a critical edition text}, booktitle = {Proceedings of the NAACL Workshop on Computational Linguistics for Literature}, year = {2012} }

Vertex Nomination via Content and Context
Glen Coppersmith and Carey Priebe
arXiv:1201.4118v1 – 2012

[bib]

@article{coppersmith2012vertex, author = {Coppersmith, Glen and Priebe, Carey}, title = {Vertex Nomination via Content and Context}, year = {2012} }

Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown
D.E. Fishkind, D.L. Sussman, M. Tang, J.T. Vogelstein and Carey Priebe
arXiv:1205.0309v1 – 2012

[bib]

@article{STFPV, author = {D.E. Fishkind and D.L. Sussman and M. Tang and J.T. Vogelstein and Priebe, Carey}, title = {Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown}, year = {2012} }

An implied latent position process for doubly stochastic messaging activities
N. Lee, Carey Priebe and M. Tang
Annual International Conference on Computational Mathematics, Computational Geometry \& Statistics (CMCGS 2012) – 2012

[bib]

@article{lee2012implied, author = {N. Lee and Priebe, Carey and M. Tang}, title = {An implied latent position process for doubly stochastic messaging activities}, year = {2012}, pages = {30--31} }

Manifold Matching: Joint Optimization of Fidelity and Commensurability
Carey Priebe, D. Marchette, Z. Ma and S. Adali
Brazilian Journal of Probability and Statistics, accepted for publication, February – 2012

[bib]

@article{priebe2012manifold, author = {Priebe, Carey and D. Marchette and Z. Ma and S. Adali}, title = {Manifold Matching: Joint Optimization of Fidelity and Commensurability}, year = {2012} }

Quantitative Horizon Scanning for Mitigating Technological Surprise: Detecting the potential for collaboration at the interface
Carey Priebe, J. Solka, D. Marchette and A. Bryant
Statistical Analysis and Data Mining – 2012

[bib]

@article{priebe2012quantitative, author = {Priebe, Carey and J. Solka and D. Marchette and A. Bryant}, title = {Quantitative Horizon Scanning for Mitigating Technological Surprise: Detecting the potential for collaboration at the interface}, year = {2012} }

On the Limiting Distribution of a Graph Scan Statistic
A. Rukhin and Carey Priebe
Communications in Statistics - Theory and Methods – 2012

[bib]

@article{rukhin2012on, author = {A. Rukhin and Priebe, Carey}, title = {On the Limiting Distribution of a Graph Scan Statistic}, year = {2012}, pages = {1151--1170} }

A consistent dot product embedding for stochastic blockmodel graphs
D. Sussman, M. Tang, D. Fishkind and Carey Priebe
Journal of the American Statistical Association – 2012

[bib]

@article{sussman2012, author = {D. Sussman and M. Tang and D. Fishkind and Priebe, Carey}, title = {A consistent dot product embedding for stochastic blockmodel graphs}, year = {2012} }

Hallucinated N-Best Lists for Discriminative Labguage Modeling
Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat clar, Izhak Shafran, Daniel Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post and Darcey Riley
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{Sagae+etal:2012:icassp, author = {Kenji Sagae and Maider Lehr and Emily Prud'hommeaux and Puyang Xu and Nathan Glenn and Karakos, Damianos and Khudanpur, Sanjeev and Brian Roark and Murat clar and Izhak Shafran and Daniel Bikel and Callison-Burch, Chris and Yuan Cao and Keith Hall and Eva Hasler and Philipp Koehn and Lopez, Adam and Post, Matt and Darcey Riley}, title = {Hallucinated N-Best Lists for Discriminative Labguage Modeling}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012} }

Continuous Space Discriminative Language Modeling
Puyang Xu, Sanjeev Khudanpur, Maider Lehr, Emily Prud'hommeaux, Nathan Glenn, Damianos Karakos, Brian Roark, Kenji Sagae, Murat clar, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post and Darcey Riley
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{Xu+etal:2012:icassp, author = {Puyang Xu and Khudanpur, Sanjeev and Maider Lehr and Emily Prud'hommeaux and Nathan Glenn and Karakos, Damianos and Brian Roark and Kenji Sagae and Murat clar and Izhak Shafran and Dan Bikel and Callison-Burch, Chris and Yuan Cao and Keith Hall and Eva Hasler and Philipp Koehn and Lopez, Adam and Post, Matt and Darcey Riley}, title = {Continuous Space Discriminative Language Modeling}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012} }

Semi-Supervised Discriminative Language Modeling for Turkish ASR
Arda Celebi, Ha\c Sak, Erin\c Dikici, Murat clar, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Kenji Sagae, Izhak Shafran, Dan Bikel, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post and Darcey Riley
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{Celebi+etal:2012:icassp, author = {Arda Celebi and Ha\c Sak and Erin\c Dikici and Murat clar and Maider Lehr and Emily Prud'hommeaux and Puyang Xu and Nathan Glenn and Karakos, Damianos and Khudanpur, Sanjeev and Brian Roark and Kenji Sagae and Izhak Shafran and Dan Bikel and Callison-Burch, Chris and Yuan Cao and Keith Hall and Eva Hasler and Philipp Koehn and Lopez, Adam and Post, Matt and Darcey Riley}, title = {Semi-Supervised Discriminative Language Modeling for Turkish ASR}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012} }

Putting Human Assessments of Machine Translation Systems in Order
Adam Lopez
Proceedings of the Seventh Workshop on Statistical Machine Translation – 2012

[bib]

@inproceedings{Lopez:2012:wmt, author = {Lopez, Adam}, title = {Putting Human Assessments of Machine Translation Systems in Order}, booktitle = {Proceedings of the Seventh Workshop on Statistical Machine Translation}, year = {2012} }

Name Phylogeny: A Generative Model of String Variation
Nicholas Andrews, Jason Eisner and Mark Dredze
Empirical Methods in Natural Language Processing (EMNLP) – 2012

[bib]

@inproceedings{Andrews:2012uq, author = {Andrews, Nicholas and Eisner, Jason and Dredze, Mark}, title = {Name Phylogeny: A Generative Model of String Variation}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, year = {2012} }

CLex: A Lexicon for Exploring Color, Concept and Emotion Associations in Language
Svitlana Volkova, Bill Dolan and Theresa Wilson
Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL) – 2012

[bib]

@inproceedings{prabhakaran_et_al_naacl2012, author = {Volkova, Svitlana and Bill Dolan and Wilson, Theresa}, title = {CLex: A Lexicon for Exploring Color, Concept and Emotion Associations in Language}, booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL)}, year = {2012} }

Use of Modality and Negation in Semantically-Informed Syntactic MT
Kathryn Baker, Bonnie Dorr, Michael Bloodgood, Chris Callison-Burch, Nathaniel Wesley Filardo, Christine Piatko, Lori Levin and Scott Miller
Computational Linguistics – 2012

[bib]

@article{baker-etal:2012:CL, author = {Kathryn Baker and Dorr, Bonnie and Bloodgood, Michael and Callison-Burch, Chris and Filardo, Nathaniel and Piatko, Christine and Lori Levin and Scott Miller}, title = {Use of Modality and Negation in Semantically-Informed Syntactic MT}, year = {2012}, url = {http://cs.jhu.edu/~ccb/publications/modality-and-negation-in-semantically-informed-syntactic-mt.pdf} }

Stylometric Analysis of Scientific Articles
Shane Bergsma, Matt Post and David Yarowsky
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{bergsma2012stylometric, author = {Bergsma, Shane and Post, Matt and Yarowsky, David}, title = {Stylometric Analysis of Scientific Articles}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2012} }

Shared Components Topic Models
Matthew Gormley, Mark Dredze, Benjamin Van Durme and Jason Eisner
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{Gormley:2012fk, author = {Matthew Gormley and Dredze, Mark and Van Durme, Benjamin and Eisner, Jason}, title = {Shared Components Topic Models}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2012} }

Entity Clustering Across Languages
Spence Green, Nicholas Andrews, Matthew Gormley, Mark Dredze and Christopher Manning
North American Chapter of the Association for Computational Linguistics (NAACL) – 2012

[bib]

@inproceedings{Green:2012uq, author = {Spence Green and Andrews, Nicholas and Matthew Gormley and Dredze, Mark and Christopher Manning}, title = {Entity Clustering Across Languages}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2012} }

Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy
Lushan Han, Tim Finin, Paul McNamee, Anupam Joshi and Yelena Yesha
IEEE Transactions on Knowledge and Data Engineering, IEEE Computer Society – 2012

[bib]

@article{Han:lr, author = {Lushan Han and Finin, Tim and McNamee, Paul and Anupam Joshi and Yelena Yesha}, title = {Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy}, year = {2012}, url = {http://ebiquity.umbc.edu/get/a/publication/615.pdf} }

New H-∞ Bounds for the Recursive Least Squares Algorithm Exploiting Input Structure
Koby Crammer, Alex Kulesza and Mark Dredze
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2012

[bib]

@inproceedings{Crammer:2012fk, author = {Koby Crammer and Alex Kulesza and Dredze, Mark}, title = {New H-∞ Bounds for the Recursive Least Squares Algorithm Exploiting Input Structure}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012} }

Confidence-Weighted Linear Classification for Text Categorization
Koby Crammer, Mark Dredze and Fernando Pereira
Journal of Machine Learning Research (JMLR) – 2012

[abstract] [bib]

Abstract

Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.
@article{Pereira:2011fk, author = {Koby Crammer and Dredze, Mark and Fernando Pereira}, title = {Confidence-Weighted Linear Classification for Text Categorization}, year = {2012}, abstract = {Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.} }

Back to Top

Displaying 1 - 100 of 372 total matches