Publications

Additional (non-HLTCOE) publications may be found on researchers' personal websites.


Loading...

2015 (6 total)

Inferring Latent User Properties from Texts Published in Social Media (Demo)
Svitlana Volkova, Yoram Bachrach, Michael Armstrong and Vijay Sharma
Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI) – 2015

[pdf] | [bib]

@inproceedings{volkova-EtAl:2015:AAAI, author = {Volkova, Svitlana and Yoram Bachrach and Michael Armstrong and Vijay Sharma}, title = {Inferring Latent User Properties from Texts Published in Social Media (Demo)}, booktitle = {Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)}, month = {January}, year = {2015}, address = {Austin, TX}, url = {http://www.aclweb.org/anthology/P/P14/P14-1018} }

Online Bayesian Models for Personal Analytics in Social Media
Svitlana Volkova and Benjamin Van Durme
Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI) – 2015

[bib]

@inproceedings{volkova-vandurme:2015:AAAI, author = {Volkova, Svitlana and Van Durme, Benjamin}, title = {Online Bayesian Models for Personal Analytics in Social Media}, booktitle = {Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)}, month = {January}, year = {2015}, address = {Austin, TX} }

The Hurricane Sandy Twitter Corpus
Haoyu Wang, Eduard Hovy and Mark Dredze
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2015

[bib]

@inproceedings{Wang:2015ve, author = {Haoyu Wang and Eduard Hovy and Dredze, Mark}, title = {The Hurricane Sandy Twitter Corpus}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2015} }

Social Media as a Sensor of Air Quality and Public Response in China
Shiliang Wang, Michael Paul and Mark Dredze
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2015

[bib]

@inproceedings{Wang:2015eu, author = {Shiliang Wang and Michael Paul and Dredze, Mark}, title = {Social Media as a Sensor of Air Quality and Public Response in China}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2015} }

Worldwide Influenza Surveillance through Twitter
Michael Paul, Mark Dredze, David Broniatowski and Nicholas Generous
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2015

[bib]

@inproceedings{Paul:2015la, author = {Michael Paul and Dredze, Mark and David Broniatowski and Nicholas Generous}, title = {Worldwide Influenza Surveillance through Twitter}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2015} }

Tobacco Watcher: Real-time Global Surveillance for Tobacco Control
Joanna Cohen, John Ayers and Mark Dredze
World Conference on Tobacco or Health (WCTOH) – 2015

[bib]

@inproceedings{Cohen:2015zl, author = {Joanna Cohen and John Ayers and Dredze, Mark}, title = {Tobacco Watcher: Real-time Global Surveillance for Tobacco Control}, booktitle = {World Conference on Tobacco or Health (WCTOH)}, year = {2015} }

Back to Top

2014 (60 total)

Entity Type Recognition for Heterogeneous Semantic Graphs
Jennifer Sleeman, Tim Finin and Anupam Joshi
AI Magazine – 2014

[pdf] | [bib]

@article{Entity_Type_Recognition_for_Heterogeneous_Semantic_Graphs, author = {Jennifer Sleeman and Finin, Tim and Anupam Joshi}, title = {Entity Type Recognition for Heterogeneous Semantic Graphs}, month = {September}, year = {2014} }

Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity systems
Abhay Kashyap, Lushan Han, Roberto Yus, Jennifer Sleeman, Taneeya Satyapanich, Sunil Gandhi and Tim Finin
Proceedings of the 8th International Workshop on Semantic Evaluation – 2014

[pdf] | [bib]

@inproceedings{Meerkat_Mafia_Multilingual_and_Cross_Level_Semantic_Textual_Similarity_systems, author = {Abhay Kashyap and Lushan Han and Roberto Yus and Jennifer Sleeman and Taneeya Satyapanich and Sunil Gandhi and Finin, Tim}, title = {Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity systems}, booktitle = {Proceedings of the 8th International Workshop on Semantic Evaluation}, month = {August}, year = {2014}, publisher = {Association for Computational Linguistics}, pages = {416-423} }

Efficient Elicitation of Annotations for Human Evaluation of Machine Translation
Keisuke Sakaguchi, Matt Post and Benjamin Van Durme
Proceedings of the Workshop on Statistical Machine Translation – 2014

[bib]

@inproceedings{sakaguchi2014efficient, author = {Keisuke Sakaguchi and Post, Matt and Van Durme, Benjamin}, title = {Efficient Elicitation of Annotations for Human Evaluation of Machine Translation}, booktitle = {Proceedings of the Workshop on Statistical Machine Translation}, month = {June}, year = {2014}, address = {Baltimore, Maryland}, publisher = {Association for Computational Linguistics} }

Low-Resource Semantic Role Labeling
Matt Gormley, Margaret Mitchell, Benjamin Van Durme and Mark Dredze
Association for Computational Linguistics (ACL) – 2014

[bib]

@inproceedings{gormley-etal:2014:SRL, author = {Gormley, Matt and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark}, title = {Low-Resource Semantic Role Labeling}, booktitle = {Association for Computational Linguistics (ACL)}, month = {June}, year = {2014}, url = {http://www.cs.jhu.edu/~mrg/publications/srl-acl-2014.pdf} }

Robust Feature Extraction Using Modulation Filtering of Autoregressive Models
Sriram Ganapathy, Sri Harish and Hynek Hermansky
2014

[abstract] [pdf] | [bib]

Abstract

Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.
@{, author = {Ganapathy, Sriram and Sri Harish and Hynek Hermansky}, title = {Robust Feature Extraction Using Modulation Filtering of Autoregressive Models}, month = {June}, year = {2014}, publisher = {IEEE}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6826560&queryText%3DRobust+Feature+Extraction+Using+Modulation+Filtering+of+Autoregressive+models}, abstract = {Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.} }

Findings of the 2014 Workshop on Statistical Machine Translation
Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia and Aleš Tamchyna
Proceedings of the Ninth Workshop on Statistical Machine Translation – 2014

[bib]

@inproceedings{bojar-EtAl:2014:W14-33, author = {Ondrej Bojar and Christian Buck and Christian Federmann and Barry Haddow and Koehn, Philipp and Johannes Leveling and Christof Monz and Pavel Pecina and Post, Matt and Herve Saint-Amand and Radu Soricut and Lucia Specia and Aleš Tamchyna}, title = {Findings of the 2014 Workshop on Statistical Machine Translation}, booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation}, month = {June}, year = {2014}, address = {Baltimore, Maryland, USA}, publisher = {Association for Computational Linguistics}, pages = {12--58}, url = {http://aclweb.org/anthology/W/W14/W14-3302.pdf} }

Some Insights From Translating Conversational Telephone Speech
Gaurav Kumar, Matt Post, Daniel Povey and Sanjeev Khudanpur
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2014

[bib]

@inproceedings{kumar2014some, author = {Gaurav Kumar and Post, Matt and Povey, Daniel and Khudanpur, Sanjeev}, title = {Some Insights From Translating Conversational Telephone Speech}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, month = {May}, year = {2014}, address = {Florence, Italy}, url = {http://cs.jhu.edu/~post/papers/kumar2013some.pdf} }

Population Health Concerns During the United States' Great Recession
Ben Althouse, Jon-Patrick Allem, Matt Childers, Mark Dredze and John Ayers
American Journal of Preventive Medicine – 2014

[bib]

@article{Althouse:2014lr, author = {Ben Althouse and Jon-Patrick Allem and Matt Childers and Dredze, Mark and John Ayers}, title = {Population Health Concerns During the United States' Great Recession}, month = {February}, year = {2014}, pages = {166-170} }

The Language Demographics of Amazon Mechanical Turk
Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev and Chris Callison-Burch
Transactions of the Association for Computational Linguistics – 2014

[bib]

@article{pavlick2014language, author = {Ellie Pavlick and Post, Matt and Irvine, Ann and Dmitry Kachaev and Callison-Burch, Chris}, title = {The Language Demographics of Amazon Mechanical Turk}, month = {February}, year = {2014}, pages = {79--92}, url = {http://www.cis.upenn.edu/~ccb/publications/language-demographics-of-mechanical-turk.pdf} }

Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales
Svitlana Volkova and David Yarowsky
NIPS 2014 Workshop on Personalization: Methods and Applications – 2014

[bib]

@inproceedings{volkova-yarowsky:2014, author = {Volkova, Svitlana and Yarowsky, David}, title = {Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales}, booktitle = {NIPS 2014 Workshop on Personalization: Methods and Applications}, month = {December}, year = {2014}, address = {Montreal, Canada} }

KELVIN: Extracting Knowledge from Large Text Collections
James Mayfield, Paul McNamee, Craig Harmon, Tim Finin and Dawn Lawrie
AAAI Fall Symposium on Natural Language Access to Big Data – 2014

[pdf] | [bib]

@inproceedings{KELVIN_Extracting_Knowledge_from_Large_Text_Collections, author = {Mayfield, James and McNamee, Paul and Craig Harmon and Finin, Tim and Lawrie, Dawn}, title = {KELVIN: Extracting Knowledge from Large Text Collections}, booktitle = {AAAI Fall Symposium on Natural Language Access to Big Data}, month = {November}, year = {2014}, publisher = {AAAI Press} }

Infoboxer: Using Statistical and Semantic Knowledge to Help Create Wikipedia Infoboxes
Roberto Yus, Varish Mulwad, Tim Finin and Eduardo Mena
13th International Semantic Web Conference (ISWC 2014), Riva del Garda (Italy) – 2014

[pdf] | [bib]

@inproceedings{Infoboxer_Using_Statistical_and_Semantic_Knowledge_to_Help_Create_Wikipedia_Infoboxes, author = {Roberto Yus and Varish Mulwad and Finin, Tim and Eduardo Mena}, title = {Infoboxer: Using Statistical and Semantic Knowledge to Help Create Wikipedia Infoboxes}, booktitle = {13th International Semantic Web Conference (ISWC 2014), Riva del Garda (Italy)}, month = {October}, year = {2014} }

Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization
Ning Gao, William Webber and Douglas W Oard
The 36th European Conference on Information Retrieval – 2014

[bib]

@inproceedings{Gao2014ECIR, author = {Ning Gao and William Webber and Douglas W Oard}, title = {Reducing Reliance on Relevance Judgments for System Comparison by Using Expectation-Maximization}, booktitle = {The 36th European Conference on Information Retrieval}, year = {2014}, publisher = {Springer}, pages = {1--12}, url = {http://terpconnect.umd.edu/~oard/pdf/ecir14.pdf} }

A Wikipedia-based Corpus for Contextualized Machine Translation.
Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme and Matt Post
Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC) – 2014

[bib]

@inproceedings{Drexler2014, author = {Jennifer Drexler and Pushpendre Rastogi and Aguilar, Jacqueline and Van Durme, Benjamin and Post, Matt}, title = {A Wikipedia-based Corpus for Contextualized Machine Translation.}, booktitle = {Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC)}, year = {2014} }

Learning Polylingual Topic Models from Code-Switched Social Media Documents
Nanyun Peng, Yiming Wang and Mark Dredze
Association for Computational Linguistics (ACL) – 2014

[abstract] [bib]

Abstract

Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multi-lingual corpus analysis. We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perplexity over LDA, and learns semantically coherent aligned topics as judged by human annotators.
@inproceedings{Peng:2014fk, author = {Nanyun Peng and Yiming Wang and Dredze, Mark}, title = {Learning Polylingual Topic Models from Code-Switched Social Media Documents}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2014}, abstract = {Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multi-lingual corpus analysis. We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perplexity over LDA, and learns semantically coherent aligned topics as judged by human annotators.} }

Measuring Post Traumatic Stress Disorder in Twitter
Glen Coppersmith, Craig Harman and Mark Dredze
International Conference on Weblogs and Social Media (ICWSM) – 2014

[abstract] [pdf] | [bib]

Abstract

Traditional mental health studies rely on information primarily collected and analyzed through personal contact with a health care professional. Recent work has shown the utility of social media data for studying depression, but there have been limited evaluations of other mental health conditions. We consider post traumatic stress disorder (PTSD), a serious condition that affects millions worldwide, with especially high rates in military veterans. We show how to obtain a PTSD classifier for social media using simple searches of available Twitter data, a significant reduction in training data cost compared to previous work on mental health. We demonstrate its utility by an examination of language use from PTSD individuals, and by detecting elevated rates of PTSD at and around US military bases using our classifiers.
@inproceedings{Coppersmith:2014lr, author = {Coppersmith, Glen and Harman, Craig and Dredze, Mark}, title = {Measuring Post Traumatic Stress Disorder in Twitter}, booktitle = {International Conference on Weblogs and Social Media (ICWSM)}, year = {2014}, abstract = {Traditional mental health studies rely on information primarily collected and analyzed through personal contact with a health care professional. Recent work has shown the utility of social media data for studying depression, but there have been limited evaluations of other mental health conditions. We consider post traumatic stress disorder (PTSD), a serious condition that affects millions worldwide, with especially high rates in military veterans. We show how to obtain a PTSD classifier for social media using simple searches of available Twitter data, a significant reduction in training data cost compared to previous work on mental health. We demonstrate its utility by an examination of language use from PTSD individuals, and by detecting elevated rates of PTSD at and around US military bases using our classifiers.} }

Robust Entity Clustering via Phylogenetic Inference
Nicholas Andrews, Jason Eisner and Mark Dredze
Association for Computational Linguistics (ACL) – 2014

[abstract] [bib]

Abstract

Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline architecture that clusters the mentions using fixed or learned measures of name and context similarity. In this paper, we propose a model for cross-document coreference resolution that achieves robustness by learning similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and optionally mutating an earlier name from a similar context. Clustering the mentions into entities depends on recovering this copying tree jointly with estimating models of the mutation process and parent selection process. We present a block Gibbs sampler for posterior inference and an empirical evalution on several datasets. On a challenging Twitter corpus, our method outperforms the best baseline by 12.6 points of F1 score.
@inproceedings{Andrews:2014fk, author = {Andrews, Nicholas and Eisner, Jason and Dredze, Mark}, title = {Robust Entity Clustering via Phylogenetic Inference}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2014}, abstract = {Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline architecture that clusters the mentions using fixed or learned measures of name and context similarity. In this paper, we propose a model for cross-document coreference resolution that achieves robustness by learning similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and optionally mutating an earlier name from a similar context. Clustering the mentions into entities depends on recovering this copying tree jointly with estimating models of the mutation process and parent selection process. We present a block Gibbs sampler for posterior inference and an empirical evalution on several datasets. On a challenging Twitter corpus, our method outperforms the best baseline by 12.6 points of F1 score.} }

Quantifying Mental Health Signals in Twitter
Glen Coppersmith, Mark Dredze and Craig Harman
ACL Workshop on Computational Linguistics and Clinical Psychology – 2014

[abstract] [pdf] | [bib]

Abstract

The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. We present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. We present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder (PTSD), major depressive disorder, bipolar disorder, and seasonal affective disorder. We intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information.
@inproceedings{Coppersmith:2014fk, author = {Coppersmith, Glen and Dredze, Mark and Harman, Craig}, title = {Quantifying Mental Health Signals in Twitter}, booktitle = {ACL Workshop on Computational Linguistics and Clinical Psychology}, year = {2014}, abstract = {The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. We present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. We present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder (PTSD), major depressive disorder, bipolar disorder, and seasonal affective disorder. We intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information.} }

A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards.
Jacqueline Aguilar, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song and Joe Ellis
ACL Workshop: EVENTS – 2014

[bib]

@inproceedings{Aguilar2014, author = {Aguilar, Jacqueline and Charley Beller and McNamee, Paul and Van Durme, Benjamin and Stephanie Strassel and Zhiyi Song and Joe Ellis}, title = {A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards.}, booktitle = {ACL Workshop: EVENTS}, year = {2014}, url = {https://www.aclweb.org/anthology/W/W14/W14-2907.pdf} }

Facebook, Twitter and Google Plus for Breaking News: Is there a winner?
Miles Osborne and Mark Dredze
International Conference on Weblogs and Social Media (ICWSM) – 2014

[abstract] [bib]

Abstract

Twitter is widely seen as being the go to place for breaking news. Recently however, competing Social Media have begun to carry news. Here we examine how Facebook, Google Plus and Twitter report on breaking news. We consider coverage (whether news events are reported) and latency (the time when they are reported). Using data drawn from three weeks in December 2013, we identify 29 major news events, ranging from celebrity deaths, plague outbreaks to sports events. We find that all media carry the same major events, but Twitter continues to be the preferred medium for breaking news, almost consistently leading Facebook or Google Plus. Facebook and Google Plus largely repost newswire stories and their main research value is that they conveniently package multitple sources of information together.
@inproceedings{Osborne:2014fk, author = {Miles Osborne and Dredze, Mark}, title = {Facebook, Twitter and Google Plus for Breaking News: Is there a winner?}, booktitle = {International Conference on Weblogs and Social Media (ICWSM)}, year = {2014}, abstract = {Twitter is widely seen as being the go to place for breaking news. Recently however, competing Social Media have begun to carry news. Here we examine how Facebook, Google Plus and Twitter report on breaking news. We consider coverage (whether news events are reported) and latency (the time when they are reported). Using data drawn from three weeks in December 2013, we identify 29 major news events, ranging from celebrity deaths, plague outbreaks to sports events. We find that all media carry the same major events, but Twitter continues to be the preferred medium for breaking news, almost consistently leading Facebook or Google Plus. Facebook and Google Plus largely repost newswire stories and their main research value is that they conveniently package multitple sources of information together.} }

Featherweight Phonetic Keyword Search for Conversational Speech
Keith Kintzley, Aren Jansen and Hynek Hermansky
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2014

[bib]

@inproceedings{kintzleyfeatherweight, author = {Keith Kintzley and Jansen, Aren and Hermansky, Hynek}, title = {Featherweight Phonetic Keyword Search for Conversational Speech}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2014} }

Unsupervised Idiolect Discovery for Speaker Recognition
Aren Jansen, Daniel Garcia-Romero, Pascal Clark and Jaime Hernandez-Cordero
International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2014

[bib]

@inproceedings{jansenidiolect, author = {Jansen, Aren and Garcia-Romero, Daniel and Clark, Pascal and Jaime Hernandez-Cordero}, title = {Unsupervised Idiolect Discovery for Speaker Recognition}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2014} }

Bridging the Gap between Speech Technology and Natural Language Processing: An Evaluation Toolbox for Term Discovery Systems
Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson and Emmanuel Dupoux
Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC) – 2014

[bib]

@inproceedings{jansenlrec, author = {Bogdan Ludusan and Maarten Versteegh and Jansen, Aren and Guillaume Gravier and Xuan-Nga Cao and Mark Johnson and Emmanuel Dupoux}, title = {Bridging the Gap between Speech Technology and Natural Language Processing: An Evaluation Toolbox for Term Discovery Systems}, booktitle = {Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC)}, year = {2014} }

Could Behavioral Medicine Lead the Web Data Revolution?
John Ayers, Benjamin Althouse and Mark Dredze
Journal of the American Medical Association (JAMA) – 2014

[bib]

@article{Ayers:2014fk, author = {John Ayers and Benjamin Althouse and Dredze, Mark}, title = {Could Behavioral Medicine Lead the Web Data Revolution?}, year = {2014} }

Improving Lexical Embeddings with Semantic Knowledge
Mo Yu and Mark Dredze
Association for Computational Linguistics (ACL) – 2014

[abstract] [bib]

Abstract

Word embeddings learned on unlabeled data are a popular tool in semantics, but may not capture the desired semantics. We propose a new learning objective that incorporates both a neural language model objective and prior knowledge from semantic resources to learn improved lexical semantic embeddings. We demonstrate that our embeddings improve over those learned solely on raw text in three settings: language modeling, measuring semantic similarity, and predicting human judgements.
@inproceedings{Yu:2014, author = {Mo Yu and Dredze, Mark}, title = {Improving Lexical Embeddings with Semantic Knowledge}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2014}, abstract = {Word embeddings learned on unlabeled data are a popular tool in semantics, but may not capture the desired semantics. We propose a new learning objective that incorporates both a neural language model objective and prior knowledge from semantic resources to learn improved lexical semantic embeddings. We demonstrate that our embeddings improve over those learned solely on raw text in three settings: language modeling, measuring semantic similarity, and predicting human judgements.} }

What's the Healthiest Day? Circaseptan (Weekly) Rhythms in Healthy Considerations
John Ayers, Benjamin Althouse, Morgan Johnson, Mark Dredze and Joanna Cohen
American Journal of Preventive Medicine – 2014

[bib]

@article{Ayers:2014lr, author = {John Ayers and Benjamin Althouse and Morgan Johnson and Dredze, Mark and Joanna Cohen}, title = {What's the Healthiest Day? Circaseptan (Weekly) Rhythms in Healthy Considerations}, year = {2014} }

Biases in Predicting the Human Language Model
Alex Fine, Austin Frank, T. Jaeger and Benjamin Van Durme
Association for Computational Linguistics (ACL), Short Papers – 2014

[bib]

@inproceedings{FineFrankJaegerVanDurmeACL14, author = {Alex Fine and Austin Frank and T. Jaeger and Van Durme, Benjamin}, title = {Biases in Predicting the Human Language Model}, booktitle = {Association for Computational Linguistics (ACL), Short Papers}, year = {2014} }

I'm a Belieber: Social Roles via Self-identification and Conceptual Attributes
Charley Beller, Rebecca Knowles, Craig Harman, Shane Bergsma, Margaret Mitchell and Benjamin Van Durme
Association for Computational Linguistics (ACL), Short Papers – 2014

[bib]

@inproceedings{BellerKnowlesHarmanBergsmaMitchellVanDurmeACL14, author = {Charley Beller and Rebecca Knowles and Harman, Craig and Bergsma, Shane and Mitchell, Margaret and Van Durme, Benjamin}, title = {I'm a Belieber: Social Roles via Self-identification and Conceptual Attributes}, booktitle = {Association for Computational Linguistics (ACL), Short Papers}, year = {2014}, url = {http://aclweb.org/anthology/P14-2030} }

Freebase QA: Information Extraction or Semantic Parsing?
Xuchen Yao, Jonathan Berant and Benjamin Van Durme
Association for Computational Linguistics (ACL), Workshop on Semantic Parsing – 2014

[bib]

@inproceedings{YaoBerantVanDurmeACL14, author = {Xuchen Yao and Jonathan Berant and Van Durme, Benjamin}, title = {Freebase QA: Information Extraction or Semantic Parsing?}, booktitle = {Association for Computational Linguistics (ACL), Workshop on Semantic Parsing}, year = {2014} }

Wikipedia-based Corpus for Contextualized Machine Translation
Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme and Matt Post
LREC – 2014

[bib]

@inproceedings{drexlerLREC14, author = {Jennifer Drexler and Pushpendre Rastogi and Aguilar, Jacqueline and Van Durme, Benjamin and Post, Matt}, title = {Wikipedia-based Corpus for Contextualized Machine Translation}, booktitle = {LREC}, year = {2014} }

Is the Stanford Dependency Representation Semantic?
Rachel Rudinger and Benjamin Van Durme
Association for Computational Linguistics (ACL), Workshop on EVENTS – 2014

[bib]

@inproceedings{RudingerVanDurmeACL14, author = {Rachel Rudinger and Van Durme, Benjamin}, title = {Is the Stanford Dependency Representation Semantic?}, booktitle = {Association for Computational Linguistics (ACL), Workshop on EVENTS}, year = {2014} }

Augmenting FrameNet Via PPDB
Pushpendre Rastogi and Benjamin Van Durme
Association for Computational Linguistics (ACL), Workshop on EVENTS – 2014

[bib]

@inproceedings{RastogiVanDurmeACL14, author = {Pushpendre Rastogi and Van Durme, Benjamin}, title = {Augmenting FrameNet Via PPDB}, booktitle = {Association for Computational Linguistics (ACL), Workshop on EVENTS}, year = {2014} }

Predicting Fine-grained Social Roles with Selectional Preferences
Charley Beller, Craig Harman and Benjamin Van Durme
Association for Computational Linguistics (ACL), Workshop on Language Technologies and Computational Social Science (LACSS) – 2014

[bib]

@inproceedings{BellerHarmanVanDurmeACL14, author = {Charley Beller and Harman, Craig and Van Durme, Benjamin}, title = {Predicting Fine-grained Social Roles with Selectional Preferences}, booktitle = {Association for Computational Linguistics (ACL), Workshop on Language Technologies and Computational Social Science (LACSS)}, year = {2014}, url = {https://www.aclweb.org/anthology/W/W14/W14-2515.pdf} }

A Wikipedia-based Corpus for Contextualized Machine Translation
Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme and Matt Post
Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC) – 2014

[bib]

@inproceedings{DrexlerRastogiAguilarVanDurmePostACL14, author = {Jennifer Drexler and Pushpendre Rastogi and Aguilar, Jacqueline and Van Durme, Benjamin and Post, Matt}, title = {A Wikipedia-based Corpus for Contextualized Machine Translation}, booktitle = {Proceedings of the Eighth international Conference on Language Resources and Evaluation (LREC)}, year = {2014} }

Information Extraction over Structured Data: Question Answering with Freebase
Xuchen Yao and Benjamin Van Durme
Association for Computational Linguistics (ACL) – 2014

[bib]

@inproceedings{YaoVanDurmeACL14, author = {Xuchen Yao and Van Durme, Benjamin}, title = {Information Extraction over Structured Data: Question Answering with Freebase}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2014} }

Inferring User Political Preferences from Streaming Communications
Svitlana Volkova, Glen Coppersmith and Benjamin Van Durme
Association for Computational Linguistics (ACL) – 2014

[bib]

@inproceedings{VolkovaCoppersmithVanDurmeACL14, author = {Volkova, Svitlana and Coppersmith, Glen and Van Durme, Benjamin}, title = {Inferring User Political Preferences from Streaming Communications}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2014} }

Particle Filter Rejuvenation and Latent Dirichlet Allocation
Chandler May, Alex Clemmer and Benjamin Van Durme
Association for Computational Linguistics (ACL), Short Papers – 2014

[pdf] | [bib]

@inproceedings{MayClemmerVanDurmeACL14, author = {Chandler May and Alex Clemmer and Van Durme, Benjamin}, title = {Particle Filter Rejuvenation and Latent Dirichlet Allocation}, booktitle = {Association for Computational Linguistics (ACL), Short Papers}, year = {2014} }

Exponential Reservoir Sampling for Streaming Language Models
Miles Osborne, Ashwin Lall and Benjamin Van Durme
Association for Computational Linguistics (ACL), Short Papers – 2014

[bib]

@inproceedings{OsborneLallVanDurmeACL14, author = {Miles Osborne and Ashwin Lall and Van Durme, Benjamin}, title = {Exponential Reservoir Sampling for Streaming Language Models}, booktitle = {Association for Computational Linguistics (ACL), Short Papers}, year = {2014} }

A long, deep and wide artificial neural net for robust speech recognition in unknown noise
Feipeng Li, Phani Sankar Nidadavolu and Hynek Hermansky
2014

[abstract] [bib]

Abstract

A long deep and wide artificial neural net (LDWNN) with multiple ensemble neural nets for individual frequency subbands is proposed for robust speech recognition in unknown noise. It is assumed that the effect of arbitrary additive noise on speech recognition can be approximated by white noise (or speech-shaped noise) of similar level across multiple frequency subbands. The ensemble neural nets are trained in clean and speech-shaped noise at 20, 10, and 5 dB SNR to accommodate noise of different levels, followed by a neural net trained to select the most suitable neural net for optimum information extraction within a frequency subband. The posteriors from multiple frequency subbands are fused by another neural net to give a more reliable estimation. Experimental results show that the subband ensemble net adapts well to unknown noise.
@{, author = {Feipeng Li and Phani Sankar Nidadavolu and Hermansky, Hynek}, title = {A long, deep and wide artificial neural net for robust speech recognition in unknown noise}, year = {2014}, publisher = {INTERSPEECH}, url = {http://www.researchgate.net/publication/261707505_A_long_deep_and_wide_artificial_neural_net_for_robust_speech_recognition_in_unknown_noise}, abstract = {A long deep and wide artificial neural net (LDWNN) with multiple ensemble neural nets for individual frequency subbands is proposed for robust speech recognition in unknown noise. It is assumed that the effect of arbitrary additive noise on speech recognition can be approximated by white noise (or speech-shaped noise) of similar level across multiple frequency subbands. The ensemble neural nets are trained in clean and speech-shaped noise at 20, 10, and 5 dB SNR to accommodate noise of different levels, followed by a neural net trained to select the most suitable neural net for optimum information extraction within a frequency subband. The posteriors from multiple frequency subbands are fused by another neural net to give a more reliable estimation. Experimental results show that the subband ensemble net adapts well to unknown noise.} }

The Machine Translation Leaderboard
Matt Post and Adam Lopez
The Prague Bulletin of Mathematical Linguistics – 2014

[bib]

@article{post2014machine, author = {Post, Matt and Lopez, Adam}, title = {The Machine Translation Leaderboard}, year = {2014}, pages = {37--46}, url = {http://cs.jhu.edu/~post/papers/post-lopez-2014-mt-leaderboard.pdf} }

Music Tonality Features for Speech/Music Discrimination
Greg Sell and Pascal Clark
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2014

[pdf] | [bib]

@inproceedings{Sell.Clark:2014A, author = {Sell, Greg and Clark, Pascal}, title = {Music Tonality Features for Speech/Music Discrimination}, booktitle = {Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2014} }

Automatic Carrier Pitch Estimation for Coherent Demodulation
Greg Sell
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – 2014

[pdf] | [bib]

@inproceedings{Sell:2014A, author = {Sell, Greg}, title = {Automatic Carrier Pitch Estimation for Coherent Demodulation}, booktitle = {Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2014} }

Speaker Diarization with PLDA I-Vector Scoring and Unsupervised Calibration
Greg Sell and Daniel Garcia-Romero
Proceedings of the IEEE Spoken Language Technology Workshop – 2014

[pdf] | [bib]

@inproceedings{Sell.Garcia-Romero:2014A, author = {Sell, Greg and Garcia-Romero, Daniel}, title = {Speaker Diarization with PLDA I-Vector Scoring and Unsupervised Calibration}, booktitle = {Proceedings of the IEEE Spoken Language Technology Workshop}, year = {2014} }

UNSUPERVISED LEXICAL CLUSTERING OF SPEECH SEGMENTS USING FIXED-DIMENSIONAL ACOUSTIC EMBEDDINGS
Herman Kamper, Aren Jansen, Simon King and Sharon Goldwater
IEEE Workshop on Spoken Language Technology – 2014

[pdf] | [bib]

@article{kamperunsupervised, author = {Herman Kamper and Jansen, Aren and Simon King and Sharon Goldwater}, title = {UNSUPERVISED LEXICAL CLUSTERING OF SPEECH SEGMENTS USING FIXED-DIMENSIONAL ACOUSTIC EMBEDDINGS}, booktitle = {IEEE Workshop on Spoken Language Technology}, year = {2014} }

A KEYWORD SEARCH SYSTEM USING OPEN SOURCE SOFTWARE
Jan Trmal, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur, Pegah Ghahremani, Xiaohui Zhang, Vimal Manohar, Chunxi Liu, Aren Jansen and Dietrich Klakow
IEEE Workshop on Spoken Language Technology – 2014

[pdf] | [bib]

@article{trmalkeyword, author = {Jan Trmal and Guoguo Chen and Povey, Daniel and Khudanpur, Sanjeev and Pegah Ghahremani and Xiaohui Zhang and Vimal Manohar and Chunxi Liu and Jansen, Aren and Dietrich Klakow}, title = {A KEYWORD SEARCH SYSTEM USING OPEN SOURCE SOFTWARE}, booktitle = {IEEE Workshop on Spoken Language Technology}, year = {2014} }

Low-Resource Open Vocabulary Keyword Search Using Point Process Models
Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal and Sanjeev Khudanpur
Fifteenth Annual Conference of the International Speech Communication Association – 2014

[pdf] | [bib]

@inproceedings{liu2014low, author = {Chunxi Liu and Jansen, Aren and Guoguo Chen and Keith Kintzley and Jan Trmal and Khudanpur, Sanjeev}, title = {Low-Resource Open Vocabulary Keyword Search Using Point Process Models}, booktitle = {Fifteenth Annual Conference of the International Speech Communication Association}, year = {2014} }

Social Media Analytics for Smart Health
Ahmed Abbasi, Donald Adjeroh, Mark Dredze, Michael Paul, Fatemeh Zahedi, Huimin Zhao, Nitin Walia, Hemant Jain, Patrick Sanvanson, Reza Shaker, Marco Huesch, Richard Beal, Wanhong Zheng, Marie Abate and Arun Ross
IEEE Intelligent Systems – 2014

[bib]

@article{Dredze:2014lq, author = {Ahmed Abbasi and Donald Adjeroh and Dredze, Mark and Michael Paul and Fatemeh Zahedi and Huimin Zhao and Nitin Walia and Hemant Jain and Patrick Sanvanson and Reza Shaker and Marco Huesch and Richard Beal and Wanhong Zheng and Marie Abate and Arun Ross}, title = {Social Media Analytics for Smart Health}, year = {2014}, pages = {60--80} }

A Test Collection for Email Entity Linking
Ning Gao, Douglas Oard and Mark Dredze
NIPS Workshop on Automated Knowledge Base Construction – 2014

[bib]

@inproceedings{Gao:2014ty, author = {Ning Gao and Douglas Oard and Dredze, Mark}, title = {A Test Collection for Email Entity Linking}, booktitle = {NIPS Workshop on Automated Knowledge Base Construction}, year = {2014} }

Faster (and Better) Entity Linking with Cascades
Adrian Benton, Jay Deyoung, Adam Teichert, Mark Dredze, Benjamin Van Durme, Stephen Mayhew and Karen Daughton-Thomas
NIPS Workshop on Automated Knowledge Base Construction – 2014

[bib]

@inproceedings{Benton:2014qe, author = {Adrian Benton and Jay Deyoung and Adam Teichert and Dredze, Mark and Van Durme, Benjamin and Stephen Mayhew and Daughton-Thomas, Karen}, title = {Faster (and Better) Entity Linking with Cascades}, booktitle = {NIPS Workshop on Automated Knowledge Base Construction}, year = {2014} }

Factor-based Compositional Embedding Models
Mo Yu, Matt Gormley and Mark Dredze
NIPS Workshop on Learning Semantics – 2014

[pdf] | [bib]

@inproceedings{Mo-Yu:2014qv, author = {Mo Yu and Gormley, Matt and Dredze, Mark}, title = {Factor-based Compositional Embedding Models}, booktitle = {NIPS Workshop on Learning Semantics}, year = {2014} }

High Risk Pregnancy Prediction from Clinical Text
Rebecca Knowles, Mark Dredze, Kathleen Evans, Elyse Lasser, Tom Richards, Jonathan Weiner and Hadi Kharrazi
NIPS Workshop on Machine Learning for Clinical Data Analysis – 2014

[bib]

@inproceedings{Knowles:2014ly, author = {Rebecca Knowles and Dredze, Mark and Kathleen Evans and Elyse Lasser and Tom Richards and Jonathan Weiner and Hadi Kharrazi}, title = {High Risk Pregnancy Prediction from Clinical Text}, booktitle = {NIPS Workshop on Machine Learning for Clinical Data Analysis}, year = {2014} }

Twitter Improves Influenza Forecasting
Michael Paul, Mark Dredze and David Broniatowski
PLOS Currents Outbreaks – 2014

[abstract] [bib]

Abstract

Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.
@article{Paul_Dredze_Broniatowski:2014, author = {Michael Paul and Dredze, Mark and David Broniatowski}, title = {Twitter Improves Influenza Forecasting}, year = {2014}, abstract = {Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.} }

What Are Health-related Users Tweeting? A Qualitative Content Analysis of Health-related Users and their Messages on Twitter
Joy Lee, Matthew DeCamp, Mark Dredze, Margaret Chisolm and Zackary Berger
Journal of Medical Internet Research (JMIR) – 2014

[bib]

@article{Lee:2014ve, author = {Joy Lee and Matthew DeCamp and Dredze, Mark and Margaret Chisolm and Zackary Berger}, title = {What Are Health-related Users Tweeting? A Qualitative Content Analysis of Health-related Users and their Messages on Twitter}, year = {2014} }

Discovering Health Topics in Social Media Using Topic Models
Michael Paul and Mark Dredze
PLoS ONE – 2014

[abstract] [bib]

Abstract

By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.
@article{Paul:2014rt, author = {Michael Paul and Dredze, Mark}, title = {Discovering Health Topics in Social Media Using Topic Models}, year = {2014}, abstract = {By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.} }

Twitter: Big Data Opportunities (Letter)
David Broniatowski, Michael Paul and Mark Dredze
Science – 2014

[bib]

@article{Broniatowski:2014nr, author = {David Broniatowski and Michael Paul and Dredze, Mark}, title = {Twitter: Big Data Opportunities (Letter)}, year = {2014}, pages = {148} }

A Large-Scale Quantitative Analysis of Latent Factors and Sentiment in Online Doctor Reviews
Byron Wallace, Michael Paul, Urmimala Sarkar, Thomas Trikalinos and Mark Dredze
Journal of the American Medical Informatics Association (JAMIA) – 2014

[bib]

@article{Wallace:2014qd, author = {Byron Wallace and Michael Paul and Urmimala Sarkar and Thomas Trikalinos and Dredze, Mark}, title = {A Large-Scale Quantitative Analysis of Latent Factors and Sentiment in Online Doctor Reviews}, year = {2014} }

HealthTweets.org: A Platform for Public Health Surveillance using Twitter
Mark Dredze, Renyuan Cheng, Michael Paul and David Broniatowski
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2014

[abstract] [bib]

Abstract

We present HealthTweets.org, a new platform for sharing the latest research results on Twitter data with researchers and public officials. In this demo paper, we describe data collection, processing, and features of the site. The goal of this service is to transition results from research to practice.
@inproceedings{Dredze:2014fk, author = {Dredze, Mark and Renyuan Cheng and Michael Paul and David Broniatowski}, title = {HealthTweets.org: A Platform for Public Health Surveillance using Twitter}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2014}, abstract = {We present HealthTweets.org, a new platform for sharing the latest research results on Twitter data with researchers and public officials. In this demo paper, we describe data collection, processing, and features of the site. The goal of this service is to transition results from research to practice.} }

Challenges in Influenza Forecasting and Opportunities for Social Media
Michael Paul, Mark Dredze and David Broniatowski
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2014

[bib]

@inproceedings{paul_dredze_aaai:14, author = {Michael Paul and Dredze, Mark and David Broniatowski}, title = {Challenges in Influenza Forecasting and Opportunities for Social Media}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2014} }

Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo
Shiliang Wang, Michael Paul and Mark Dredze
AAAI Workshop on the World Wide Web and Public Health Intelligence – 2014

[abstract] [bib]

Abstract

This paper seeks to identify and characterize health-related topics discussed on the Chinese microblogging website, Sina Weibo. We identified nearly 1 million messages containing health-related keywords, filtered from a dataset of 93 million messages spanning five years. We applied probabilistic topic models to this dataset and identified the prominent health topics. We show that a variety of health topics are discussed in Sina Weibo, and that four flu-related topics are correlated with monthly influenza case rates in China.
@inproceedings{Wang:2014fk, author = {Shiliang Wang and Michael Paul and Dredze, Mark}, title = {Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, year = {2014}, abstract = {This paper seeks to identify and characterize health-related topics discussed on the Chinese microblogging website, Sina Weibo. We identified nearly 1 million messages containing health-related keywords, filtered from a dataset of 93 million messages spanning five years. We applied probabilistic topic models to this dataset and identified the prominent health topics. We show that a variety of health topics are discussed in Sina Weibo, and that four flu-related topics are correlated with monthly influenza case rates in China.} }

Concretely Annotated Corpora
Francis Ferraro, Max Thomas, Matt Gormley, Travis Wolfe, Craig Harman and Benjamin Van Durme
4th Workshop on Automated Knowledge Base Construction (AKBC) – 2014

[pdf] | [bib]

@inproceedings{concretely-annotated-2014, author = {Francis Ferraro and Max Thomas and Gormley, Matt and Wolfe, Travis and Harman, Craig and Van Durme, Benjamin}, title = {Concretely Annotated Corpora}, booktitle = {4th Workshop on Automated Knowledge Base Construction (AKBC)}, year = {2014} }

Back to Top

2013 (75 total)

Perceptual Properties of Current Speech Recognition Technology
Hynek Hermansky, Jordan R. Cohen and Richard M. Stern
2013

[abstract] [pdf] | [bib]

Abstract

In recent years, a number of feature extraction procedures for automatic speech recognition (ASR) systems have been based on models of human auditory processing, and one often hears arguments in favor of implementing knowledge of human auditory perception and cognition into machines for ASR. This paper takes a reverse route, and argues that the engineering techniques for automatic recognition of speech that are already in widespread use are often consistent with some well-known properties of the human auditory system.
@{, author = {Hermansky, Hynek and Jordan R. Cohen and Richard M. Stern}, title = {Perceptual Properties of Current Speech Recognition Technology}, month = {September}, year = {2013}, publisher = {IEEE}, pages = {1968 - 1985}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6566018}, abstract = {In recent years, a number of feature extraction procedures for automatic speech recognition (ASR) systems have been based on models of human auditory processing, and one often hears arguments in favor of implementing knowledge of human auditory perception and cognition into machines for ASR. This paper takes a reverse route, and argues that the engineering techniques for automatic recognition of speech that are already in widespread use are often consistent with some well-known properties of the human auditory system.} }

Findings of the 2013 Workshop on Statistical Machine Translation
Ondrej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{bojar-EtAl:2013:WMT, author = {Ondrej Bojar and Christian Buck and Callison-Burch, Chris and Christian Federmann and Barry Haddow and Philipp Koehn and Christof Monz and Post, Matt and Radu Soricut and Lucia Specia}, title = {Findings of the 2013 Workshop on Statistical Machine Translation}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {1--44}, url = {http://www.aclweb.org/anthology/W13-2201} }

Joshua 5.0: Sparser, Better, Faster, Server
Matt Post, Juri Ganitkevitch, , Jonathan Weese, Yuan Cao and Chris Callison-Burch
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{post-EtAl:2013:WMT, author = {Post, Matt and Juri Ganitkevitch and and Jonathan Weese and Yuan Cao and Callison-Burch, Chris}, title = {Joshua 5.0: Sparser, Better, Faster, Server}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {206--212}, url = {http://www.aclweb.org/anthology/W13-2226} }

Combining Bilingual and Comparable Corpora for Low Resource Machine Translation
Ann Irvine and Chris Callison-Burch
Eighth Workshop on Statistical Machine Translation – 2013

[bib]

@inproceedings{irvine-callisonburch:2013:WMT, author = {Irvine, Ann and Callison-Burch, Chris}, title = {Combining Bilingual and Comparable Corpora for Low Resource Machine Translation}, booktitle = {Eighth Workshop on Statistical Machine Translation}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {262--270}, url = {http://www.aclweb.org/anthology/W13-2233} }

Multi-stream recognition of noisy speech with performance monitoring
Ehsan Variani, Feipeng Li and Hynek Hermansky
2013

[pdf] | [bib]

@{, author = {Ehsan Variani and Feipeng Li and Hermansky, Hynek}, title = {Multi-stream recognition of noisy speech with performance monitoring}, month = {August}, year = {2013}, pages = {2978--2981}, url = {http://www.isca-speech.org/archive/interspeech_2013/i13_2978.html} }

Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
Svitlana Volkova, Theresa Wilson and David Yarowsky
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) – 2013

[bib]

@inproceedings{volkova-EtAl:2013:ACL, author = {Volkova, Svitlana and Wilson, Theresa and Yarowsky, David}, title = {Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams}, booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)}, month = {August}, year = {2013}, address = {Sofia, Bulgaria}, publisher = {Association for Computational Linguistics}, pages = {505--510}, url = {http://www.aclweb.org/anthology/P13-2090} }

Developing a Speaker Identification System for The Darpa Rats Project
Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, Jeff Ma, S. Cumani, O. Glembek, Hynek Hermansky, S.H. Mallidi, N. Mesgarani, R. Schwartz, M. Soufifar, Z.H. Tan, S. Thomas, B. Zhang and X. Zhou
Proceedings of ICASSP 2013 – 2013

[abstract] [pdf] | [bib]

Abstract

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.
@{, author = {Oldrich Plchot and Spyros Matsoukas and Pavel Matejka and Najim Dehak and Jeff Ma and S. Cumani and O. Glembek and Hynek Hermansky and S.H. Mallidi and N. Mesgarani and R. Schwartz and M. Soufifar and Z.H. Tan and S. Thomas and B. Zhang and X. Zhou}, title = {Developing a Speaker Identification System for The Darpa Rats Project}, booktitle = {Proceedings of ICASSP 2013}, month = {May}, year = {2013}, address = {Vancouver, BC}, publisher = {IEEE}, pages = {6768 - 6772}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6638972&queryText%3DDeveloping+A+Speaker+Identification+System+For+The+DARPA+RATS+Project}, abstract = {This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.} }

Filter-Bank Optimization for Frequency Domain Linear Prediction
Vijayaditya Peddinti and Hynek Hermansky
ICASSP'13 – 2013

[abstract] [pdf] | [bib]

Abstract

The sub-band Frequency Domain Linear Prediction (FDLP) technique estimates autoregressive models of Hilbert envelopes of subband signals, from segments of discrete cosine transform (DCT) of a speech signal, using windows. Shapes of the windows and their positions on the cosine transform of the signal determine implied filtering of the signal. Thus, the choices of shape, position and number of these windows can be critical for the performance of the FDLP technique. So far, we have used Gaussian or rectangular windows. In this paper asymmetric cochlear-like filters are being studied. Further, a frequency differentiation operation, that introduces an additional set of parameters describing local spectral slope in each frequency sub-band, is introduced to increase the robustness of sub-band envelopes in noise. The performance gains achieved by these changes are reported in a variety of additive noise conditions, with an average relative improvement of 8.04% in phoneme recognition accuracy.
@{, author = {Vijayaditya Peddinti and Hynek Hermansky}, title = {Filter-Bank Optimization for Frequency Domain Linear Prediction}, booktitle = {ICASSP'13}, month = {May}, year = {2013}, address = {Vancouver, BC}, publisher = {IEEE}, pages = {7102-7106}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6639040&queryText%3DFilter-Bank+Optimization+for+Frequency+Domain+Linear+Prediction}, abstract = {The sub-band Frequency Domain Linear Prediction (FDLP) technique estimates autoregressive models of Hilbert envelopes of subband signals, from segments of discrete cosine transform (DCT) of a speech signal, using windows. Shapes of the windows and their positions on the cosine transform of the signal determine implied filtering of the signal. Thus, the choices of shape, position and number of these windows can be critical for the performance of the FDLP technique. So far, we have used Gaussian or rectangular windows. In this paper asymmetric cochlear-like filters are being studied. Further, a frequency differentiation operation, that introduces an additional set of parameters describing local spectral slope in each frequency sub-band, is introduced to increase the robustness of sub-band envelopes in noise. The performance gains achieved by these changes are reported in a variety of additive noise conditions, with an average relative improvement of 8.04% in phoneme recognition accuracy.} }

Mean Temporal Distance: Predicting ASR Error from Temporal Properties of Speech Signal
Hynek Hermansky, Vijayaditya Peddinti and Ehsan Variani
2013

[abstract] [bib]

Abstract

Extending previous work on prediction of phoneme recognition error from unlabeled data that were corrupted by unpredictable factors, the current work investigates a simple but effective method of estimating ASR performance by computing a function M(Δt), which represents the mean distance between speech feature vectors evaluated over certain finite time interval, determined as a function of temporal distance Δt between the vectors. It is shown that M(Δt) is a function of signal-to-noise ratio of speech signal. Comparing M(Δt) curves, derived on data used for training of the classifier, and on test utterances, allows for predicting error on the test data. Another interesting observation is that M(Δt) remains approximately constant, as temporal separation Δt exceeds certain critical interval (about 200 ms), indicating the extent of coarticulation in speech sounds.
@{, author = {Hermansky, Hynek and Vijayaditya Peddinti and Ehsan Variani}, title = {Mean Temporal Distance: Predicting ASR Error from Temporal Properties of Speech Signal}, month = {May}, year = {2013}, address = {Vancouver, BC}, publisher = {IEEE}, abstract = {Extending previous work on prediction of phoneme recognition error from unlabeled data that were corrupted by unpredictable factors, the current work investigates a simple but effective method of estimating ASR performance by computing a function M(Δt), which represents the mean distance between speech feature vectors evaluated over certain finite time interval, determined as a function of temporal distance Δt between the vectors. It is shown that M(Δt) is a function of signal-to-noise ratio of speech signal. Comparing M(Δt) curves, derived on data used for training of the classifier, and on test utterances, allows for predicting error on the test data. Another interesting observation is that M(Δt) remains approximately constant, as temporal separation Δt exceeds certain critical interval (about 200 ms), indicating the extent of coarticulation in speech sounds.} }

Effect Of Filter Bandwidth and Spectral Sampling Rate of Analysis Filterbank on Automatic Phoneme Recognition
Feipeng Li and Hynek Hermansky
2013

[abstract] [pdf] | [bib]

Abstract

In this study we investigate the effect of filter bandwidth and spectral sampling rate of analysis filterbank for speech recognition. Two experiments are conducted to evaluate the performance of an automatic phoneme recognition system on clean speech and speech in noise as the filter bandwidth increases from 0.5 to 3.5 ERB and the spectral resolution changes from 1, 1.5, 2, 3, 4, to 6 samples per Bark. Results indicate that the optimum filter bandwidth varies for different speech sounds at different frequency ranges. A spectral sampling of 4 filters per Bark with the filter bandwidth being ≈ 1 ERB produces the best performance on average.
@{, author = {Feipeng Li and Hermansky, Hynek}, title = {Effect Of Filter Bandwidth and Spectral Sampling Rate of Analysis Filterbank on Automatic Phoneme Recognition}, month = {May}, year = {2013}, address = {Vancouver, BC}, publisher = {IEEE}, pages = {7121-7124}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6639044&queryText%3DEffect+Of+Filter+Bandwidth+and+Spectral+Sampling+Rate+of+Analysis+Filterbank+on+Automatic+Phoneme+Recognition}, abstract = {In this study we investigate the effect of filter bandwidth and spectral sampling rate of analysis filterbank for speech recognition. Two experiments are conducted to evaluate the performance of an automatic phoneme recognition system on clean speech and speech in noise as the filter bandwidth increases from 0.5 to 3.5 ERB and the spectral resolution changes from 1, 1.5, 2, 3, 4, to 6 samples per Bark. Results indicate that the optimum filter bandwidth varies for different speech sounds at different frequency ranges. A spectral sampling of 4 filters per Bark with the filter bandwidth being ≈ 1 ERB produces the best performance on average.} }

Deep Neural Network Features and Semi-Supervised Training for Low Resource Speech Recognition
Samuel Thomas, Michael Seltzer, Kenneth Church and Hynek Hermansky
2013

[abstract] [pdf] | [bib]

Abstract

We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-ends for large vocabulary continuous speech recognition (LVCSR) in low resource settings. To circumvent the lack of sufficient training data for acoustic modeling in these scenarios, we use transcribed multilingual data and semi-supervised training to build the proposed feature front-ends. In our experiments, the proposed features provide an absolute improvement of 16% in a low-resource LVCSR setting with only one hour of in-domain training data. While close to three-fourths of these gains come from DNN-based features, the remaining are from semi-supervised training.
@{, author = {Samuel Thomas and Michael Seltzer and Kenneth Church and Hermansky, Hynek}, title = {Deep Neural Network Features and Semi-Supervised Training for Low Resource Speech Recognition}, month = {May}, year = {2013}, address = {Vancouver, BC}, publisher = {IEEE}, pages = {6704 - 6708}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6638959&queryText%3DDeep+Neural+Network+Features+and+Semi-Supervised+Training+for+Low+Resource+Speech+Recognition}, abstract = {We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-ends for large vocabulary continuous speech recognition (LVCSR) in low resource settings. To circumvent the lack of sufficient training data for acoustic modeling in these scenarios, we use transcribed multilingual data and semi-supervised training to build the proposed feature front-ends. In our experiments, the proposed features provide an absolute improvement of 16% in a low-resource LVCSR setting with only one hour of in-domain training data. While close to three-fourths of these gains come from DNN-based features, the remaining are from semi-supervised training.} }

Dealing with Unknown Unknowns: Multi-stream Recognition of Speech
Hynek Hermansky
2013

[abstract] [pdf] | [bib]

Abstract

The paper discusses an approach for dealing with unexpected acoustic elements in speech. The approach is motivated by observations of human performance on such problems, which indicate the existence of multiple parallel processing streams in the human speech processing cognitive system, combined with the human ability to know when the correct information is being received. Some earlier relevant engineering approaches in multistream automatic recognition of speech (ASR) that aimed at processing of noisy speech and at dealing with unexpected out-of-vocabulary words are reviewed. The paper also reviews some currently active research in multistream ASR, focusing mainly on feedback-based techniques involving fusion of information between individual processing streams. The difference between the system behavior on its training data and during its operation is proposed as a substitute for the human ability of knowing when knowing. Most recent results indicate 9% relative improvement in error rates in phoneme recognition of high signal-to-noise ratio speech and as high as 30% relative improvements in moderate noise.
@{, author = {Hynek Hermansky}, title = {Dealing with Unknown Unknowns: Multi-stream Recognition of Speech}, month = {May}, year = {2013}, publisher = {IEEE}, pages = {1076 - 1088}, abstract = {The paper discusses an approach for dealing with unexpected acoustic elements in speech. The approach is motivated by observations of human performance on such problems, which indicate the existence of multiple parallel processing streams in the human speech processing cognitive system, combined with the human ability to know when the correct information is being received. Some earlier relevant engineering approaches in multistream automatic recognition of speech (ASR) that aimed at processing of noisy speech and at dealing with unexpected out-of-vocabulary words are reviewed. The paper also reviews some currently active research in multistream ASR, focusing mainly on feedback-based techniques involving fusion of information between individual processing streams. The difference between the system behavior on its training data and during its operation is proposed as a substitute for the human ability of knowing when knowing. Most recent results indicate 9% relative improvement in error rates in phoneme recognition of high signal-to-noise ratio speech and as high as 30% relative improvements in moderate noise.} }

Next Generation Storage for the HLTCOE
Scott Roberts
Technical Report 9, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.
@techreport{roberts_tech:2013, author = {Roberts, Scott}, title = {Next Generation Storage for the HLTCOE}, number = {9}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, month = {April}, year = {2013}, abstract = {The explosion of unstructured data in high performance computing presents a challenge for existing storage architecture and design. We present a combination of hardware and software which addresses the storage needs of our center's compute cluster. We also demonstrate that at a constant total cost of ownership our proposed solution provides an order of magnitude better performance that then Johns Hopkins University's GrayWulf cluster and is two orders of magnitude faster than the center's existing storage array.} }

Improved Speech-to-Text Translation with the Fisher and Callhome Spanish--English Speech Translation Corpus
Matt Post, Gaurav Kumar, Adam Lopez, Damianos Karakos, Chris Callison-Burch and Sanjeev Khudanpur
Proceedings of the International Workshop on Spoken Language Translation (IWSLT) – 2013

[pdf] | [bib]

@inproceedings{post2013improved, author = {Post, Matt and Gaurav Kumar and Lopez, Adam and Karakos, Damianos and Callison-Burch, Chris and Khudanpur, Sanjeev}, title = {Improved Speech-to-Text Translation with the Fisher and Callhome Spanish--English Speech Translation Corpus}, booktitle = {Proceedings of the International Workshop on Spoken Language Translation (IWSLT)}, month = {December}, year = {2013}, address = {Heidelberg, Germany} }

HLTCOE Participation at TAC 2013
Paul McNamee, Tim Finin, Dawn Lawrie and James Mayfield
Proceedings of the Sixth Text Analysis Conference – 2013

[pdf] | [bib]

@inproceedings{HLTCOE_Participation_at_TAC_2013, author = {McNamee, Paul and Finin, Tim and Lawrie, Dawn and Mayfield, James}, title = {HLTCOE Participation at TAC 2013}, booktitle = {Proceedings of the Sixth Text Analysis Conference}, month = {November}, year = {2013}, publisher = {National Institute of Standards and Technology} }

Comparing and Evaluating Semantic Data Automatically Extracted from Text
Dawn Lawrie, Tim Finin, James Mayfield and Paul McNamee
AAAI 2013 Fall Symposium on Semantics for Big Data – 2013

[pdf] | [bib]

@inproceedings{Comparing_and_Evaluating_Semantic_Data_Automatically_Extracted_from_Text_, author = {Lawrie, Dawn and Finin, Tim and Mayfield, James and McNamee, Paul}, title = {Comparing and Evaluating Semantic Data Automatically Extracted from Text}, booktitle = {AAAI 2013 Fall Symposium on Semantics for Big Data}, month = {November}, year = {2013}, publisher = {AAAI Press} }

Beyond Bitext: Five open problems in machine translation
Adam Lopez and Matt Post
Twenty Years of Bitext – 2013

[pdf] | [bib]

@inproceedings{lopez2013beyond, author = {Lopez, Adam and Post, Matt}, title = {Beyond Bitext: Five open problems in machine translation}, booktitle = {Twenty Years of Bitext}, month = {October}, year = {2013}, address = {Seattle, Washington, USA} }

Answer Extraction as Sequence Tagging with Tree Edit Distance
Xuchen Yao, Benjamin Van Durme, Peter Clark and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Xuchen Yao and Van Durme, Benjamin and Peter Clark and Callison-Burch, Chris}, title = {Answer Extraction as Sequence Tagging with Tree Edit Distance}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1106.pdf} }

Generating Expressions that Refer to Visible Objects
Margaret Mitchell, Kees van Deemter and Ehud Reiter
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Mitchell, Margaret and Kees van Deemter and Ehud Reiter}, title = {Generating Expressions that Refer to Visible Objects}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.m-mitchell.com/papers/MitchellEtAl-13-VisObjects.pdf} }

Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He, Jimmy Lin and Adam Lopez
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Hua He and Jimmy Lin and Lopez, Adam}, title = {Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1033.pdf} }

Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Ann Irvine and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Irvine, Ann and Callison-Burch, Chris}, title = {Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology//N/N13/N13-1056.pdf} }

PPDB: The Paraphrase Database
Juri Ganitkevitch, Benjamin Van Durme and Chris Callison-Burch
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Juri Ganitkevitch and Van Durme, Benjamin and Callison-Burch, Chris}, title = {PPDB: The Paraphrase Database}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1092.pdf} }

Improving the Quality of Minority Class Identification in Dialog Act Tagging
Adinoyi Omuya, Vinodkumar Prabhakaran and Owen Rambow
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{, author = {Adinoyi Omuya and Vinodkumar Prabhakaran and Owen Rambow}, title = {Improving the Quality of Minority Class Identification in Dialog Act Tagging}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.newdesign.aclweb.org/anthology-new/N/N13/N13-1099.pdf} }

Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter
Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson and David Yarowsky
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{bergsma:2013, author = {Bergsma, Shane and Dredze, Mark and Van Durme, Benjamin and Wilson, Theresa and Yarowsky, David}, title = {Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~vandurme/papers/broadly-improving-user-classfication-via-communication-based-name-and-location-clustering-on-twitter.pdf} }

What's in a Domain? Multi-Domain Learning for Multi-Attribute Data
Mahesh Joshi, Mark Dredze, William Cohen and Carolyn P. Rose
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{joshi:2013, author = {Mahesh Joshi and Dredze, Mark and William Cohen and Carolyn P. Rose}, title = {What's in a Domain? Multi-Domain Learning for Multi-Attribute Data}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://aclweb.org/anthology/N/N13/N13-1080.pdf} }

Separating Fact from Fear: Tracking Flu Infections on Twitter
Alex Lamb, Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{lamb:2013, author = {Alex Lamb and Michael Paul and Dredze, Mark}, title = {Separating Fact from Fear: Tracking Flu Infections on Twitter}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~mdredze/publications/naacl_2013_flu.pdf} }

Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Michael Paul and Mark Dredze
North American Chapter of the Association for Computational Linguistics (NAACL) – 2013

[bib]

@inproceedings{Paul:2013, author = {Michael Paul and Dredze, Mark}, title = {Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2013}, url = {http://www.cs.jhu.edu/~mdredze/publications/naacl_2013_drugs.pdf} }

Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation
Damianos Karakos, Mark Dredze and Sanjeev Khudanpur
Technical Report 8, Human Language Technology Center of Excellence, Johns Hopkins University,
2013

[abstract] [pdf] | [bib]

Abstract

Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language models parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.
@techreport{karakos_tech:2013, author = {Karakos, Damianos and Dredze, Mark and Khudanpur, Sanjeev}, title = {Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation}, number = {8}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, year = {2013}, abstract = {Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language models parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.} }

Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds
Michael A. Carlin and Mounya Elhilali
PLoS Computational Biology – 2013

[bib]

@article{Carlin2013, author = {Carlin, Michael and Mounya Elhilali}, title = {Sustained Firing of Model Central Auditory Neurons Yields a Discriminative Spectro-temporal Representation for Natural Sounds}, year = {2013}, url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002982} }

Nonconvex Global Optimization for Latent Variable Models
Matthew R. Gormley and Jason Eisner
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{gormley-eisner:2013, author = {Matthew R. Gormley and Eisner, Jason}, title = {Nonconvex Global Optimization for Latent Variable Models}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-1044.pdf} }

Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT
Adam Lopez, Matt Post, Chris Callison-Burch, Jonathan Weese, Juri Ganitkevitch, Narges Ahmidi, Olivia Buzek, Leah Hanson, Beenish Jamil, Matthias Lee, Ya-Ting Lin, Henry Pao, Fatima Rivera, Leili Shahriyari, Debu Sinha, Adam Teichert, Stephen Wampler, Michael Weinberger, Daguang Xu, Lin Yang and Shang Zhao
Transactions of the Association for Computational Linguistics – 2013

[bib]

@article{Lopez+etal:2013:tacl:mt-class, author = {Lopez, Adam and Post, Matt and Callison-Burch, Chris and Jonathan Weese and Juri Ganitkevitch and Narges Ahmidi and Buzek, Olivia and Leah Hanson and Beenish Jamil and Matthias Lee and Ya-Ting Lin and Henry Pao and Fatima Rivera and Leili Shahriyari and Debu Sinha and Adam Teichert and Stephen Wampler and Michael Weinberger and Daguang Xu and Lin Yang and Shang Zhao}, title = {Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT}, year = {2013}, url = {https://aclweb.org/anthology/Q/Q13/Q13-1014.pdf} }

Dirt Cheap Web-Scale Parallel Text from the Common Crawl
Jason R Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch and Adam Lopez
Association for Computational Linguistics (ACL) – 2013

[bib]

@inproceedings{Smith+etal:2013:acl, author = {Smith, Jason and Herve Saint-Amand and Magdalena Plamada and Philipp Koehn and Callison-Burch, Chris and Lopez, Adam}, title = {Dirt Cheap Web-Scale Parallel Text from the Common Crawl}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology/P/P13/P13-1135.pdf} }

KELVIN: a tool for automated knowledge base construction
Paul McNamee, James Mayfield, Tim Finin, , Dawn Lawrie, Tan Xu and Douglas W. Oard
North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT) – 2013

[bib]

@{, author = {McNamee, Paul and Mayfield, James and Finin, Tim and and Lawrie, Dawn and Tan Xu and Douglas W. Oard}, title = {KELVIN: a tool for automated knowledge base construction}, booktitle = {North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session (NAACL-HLT)}, year = {2013}, url = {http://aclweb.org/anthology//N/N13/N13-3008.pdf} }

Using Conceptual Class Attributes to Characterize Social Media Users
Shane Bergsma and Benjamin Van Durme
Association for Computational Linguistics (ACL) – 2013

[bib]

@{, author = {Bergsma, Shane and Van Durme, Benjamin}, title = {Using Conceptual Class Attributes to Characterize Social Media Users}, booktitle = {Association for Computational Linguistics (ACL)}, year = {2013}, url = {http://aclweb.org/anthology//P/P13/P13-1070.pdf} }

Back to Top

Displaying 1 - 100 of 2078 total matches