Module topicnet.viewers

Viewers

Module viewers provides information from a topic model allowing to estimate the model quality. Its advantage is in unified call ifrastucture to the topic model making the routine and tedious task of extracting the information easy.

Currently module contains the following viewers:

base_viewer (BaseViewer)

Module responsible for base infrastructure.

document_cluster (DocumentClusterViewer)

Module which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from sklearn library.

    <img src="../docs/images/doc_cluster__plot.png" width="80%" alt/>
</div>
<em>
    Visualisation of reduced document embeddings colored according to their topic made by DocumentClusterViewer.
</em>

spectrum (TopicSpectrumViewer)

Module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrum.

    <img src="../docs/images/topic_spectrum__refined_view.png" width="80%" alt/>
</div>
<em>
    Each point on the plot represents some topic.
    The viewer helped to calculate such a route between topics when one topic is connected with similar one, and so on, forming a circle.
</em>

top_documents_viewer (TopDocumentsViewer)

Module with functions that work with dataset document collections.

    <img src="../docs/images/top_doc__view.png" width="80%" alt/>
</div>
<em>
    The viewer shows fragments of top documents corresponding to some topic.
</em>

top_similar_documents_viewer (TopSimilarDocumentsViewer)

Module containing class for finding similar document for a given one. This viewer helps to estimate homogeneity of clusters given by the model.

    <img src="../docs/images/top_sim_doc__refined_view.png" width="80%" alt/>
</div>
<em>
    Some document from text collection (on top), and documents nearest to it given topic model.
    The viewer (currently) gives only document names as output, but the picture is not very difficult to be made.
</em>

top_tokens_viewer (TopTokensViewer)

Module with class for displaying the most relevant tokens in each topic of the model.

    <img src="../docs/images/top_tokens__view.png" width="80%" alt/>
</div>
<em>
    Output of the TopTokensViewer. Token score in the topic is calculated for every token, score function can be specified at the stage of a viewer initialization.
</em>

topic_mapping (TopicMapViewer)

Module allowing to compare topics between two different models trained on the same collection.

    <img src="../docs/images/topic_map__view.png" width="80%" alt/>
</div>
<em>
    The mapping between topics of two models (currently only topic names are displayed).
</em>

Deprecated

  • initial_doc_to_topic_viewer — first edition of TopDocumentsViewer

  • tokens_viewer - first edition of TopTokensViewer

Expand source code
from .base_viewer import BaseViewer
from .document_cluster import DocumentClusterViewer
from .spectrum import TopicSpectrumViewer
from .top_documents_viewer import TopDocumentsViewer
from .top_similar_documents_viewer import TopSimilarDocumentsViewer
from .top_tokens_viewer import TopTokensViewer
from .topic_mapping import TopicMapViewer

Sub-modules

topicnet.viewers.base_viewer
topicnet.viewers.document_cluster
topicnet.viewers.initial_doc_to_topic_viewer
topicnet.viewers.spectrum

A few ways to obtain "decent" solution to TSP problem which returns a spectre of topics in our case.
If speed is the essence I recommend to use …

topicnet.viewers.top_documents_viewer
topicnet.viewers.top_similar_documents_viewer
topicnet.viewers.top_tokens_viewer
topicnet.viewers.topic_flow_viewer
topicnet.viewers.topic_mapping