Topical Coherence in LDA-based Models through Induced Segmentation

/, Paper, Semantic Parsing/Topical Coherence in LDA-based Models through Induced Segmentation

Topical Coherence in LDA-based Models through Induced Segmentation

View Paper

Authors

Hesam Amoualian, Wei Lu, Eric Gaussier, Massih R Amini, Georgios Balikas and Marianne Clausel

Publisher

In Proceedings of ACL 2017 ( the 55th annual meeting of the Association for Computational Linguistics )

Abstract

This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.

Bibtex

@InProceedings{amoualian-EtAl:2017:Long,
author    = {Amoualian, Hesam  and  Lu, Wei  and  Gaussier, Eric  and  Balikas, Georgios  and  
Amini, Massih R  and  Clausel, Marianne},
title     = {Topical Coherence in LDA-based Models through Induced Segmentation},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational 
Linguistics (Volume 1: Long Papers)},
month     = {July},
year      = {2017},
address   = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages     = {1799--1809},
abstract  = {This paper presents an LDA-based model that generates topically coherent
segments within documents by jointly segmenting documents and assigning topics
to their words. The coherence between topics is ensured through a copula,
binding the topics associated to the words of a segment. In addition, this
model relies on both document and segment specific topic distributions so as to
capture fine grained differences in topic assignments. We show that the
proposed model naturally encompasses other state-of-the-art LDA-based models
designed for similar tasks. Furthermore, our experiments, conducted on six
different publicly available datasets, show the effectiveness of our model in
terms of perplexity, Normalized Pointwise Mutual Information, which captures
the coherence between the generated topics, and the Micro F1 measure for text
classification.},
url       = {http://aclweb.org/anthology/P17-1165}
}

Code

2017-10-01T13:53:07+00:00