Module topicnet.viewers
Viewers
Module viewers
provides information from a topic model
allowing to estimate the model quality. Its advantage is in unified call
ifrastucture to the topic model making the routine and tedious task of
extracting the information easy.
Currently module contains the following viewers:
base_viewer
(BaseViewer
)
Module responsible for base infrastructure.
document_cluster
(DocumentClusterViewer
)
Module which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from sklearn library.
<img src="../docs/images/doc_cluster__plot.png" width="80%" alt/>
</div>
<em>
Visualisation of reduced document embeddings colored according to their topic made by DocumentClusterViewer.
</em>
spectrum
(TopicSpectrumViewer
)
Module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrum.
<img src="../docs/images/topic_spectrum__refined_view.png" width="80%" alt/>
</div>
<em>
Each point on the plot represents some topic.
The viewer helped to calculate such a route between topics when one topic is connected with similar one, and so on, forming a circle.
</em>
top_documents_viewer
(TopDocumentsViewer
)
Module with functions that work with dataset document collections.
<img src="../docs/images/top_doc__view.png" width="80%" alt/>
</div>
<em>
The viewer shows fragments of top documents corresponding to some topic.
</em>
top_similar_documents_viewer
(TopSimilarDocumentsViewer
)
Module containing class for finding similar document for a given one. This viewer helps to estimate homogeneity of clusters given by the model.
<img src="../docs/images/top_sim_doc__refined_view.png" width="80%" alt/>
</div>
<em>
Some document from text collection (on top), and documents nearest to it given topic model.
The viewer (currently) gives only document names as output, but the picture is not very difficult to be made.
</em>
top_tokens_viewer
(TopTokensViewer
)
Module with class for displaying the most relevant tokens in each topic of the model.
<img src="../docs/images/top_tokens__view.png" width="80%" alt/>
</div>
<em>
Output of the TopTokensViewer. Token score in the topic is calculated for every token, score function can be specified at the stage of a viewer initialization.
</em>
topic_mapping
(TopicMapViewer
)
Module allowing to compare topics between two different models trained on the same collection.
<img src="../docs/images/topic_map__view.png" width="80%" alt/>
</div>
<em>
The mapping between topics of two models (currently only topic names are displayed).
</em>
Deprecated
initial_doc_to_topic_viewer
— first edition ofTopDocumentsViewer
tokens_viewer
- first edition ofTopTokensViewer
Expand source code
from .base_viewer import BaseViewer
from .document_cluster import DocumentClusterViewer
from .spectrum import TopicSpectrumViewer
from .top_documents_viewer import TopDocumentsViewer
from .top_similar_documents_viewer import TopSimilarDocumentsViewer
from .top_tokens_viewer import TopTokensViewer
from .topic_mapping import TopicMapViewer
Sub-modules
topicnet.viewers.base_viewer
topicnet.viewers.document_cluster
topicnet.viewers.initial_doc_to_topic_viewer
topicnet.viewers.spectrum
-
A few ways to obtain "decent" solution to TSP problem which returns a spectre of topics in our case.
If speed is the essence I recommend to use … topicnet.viewers.top_documents_viewer
topicnet.viewers.top_similar_documents_viewer
topicnet.viewers.top_tokens_viewer
topicnet.viewers.topic_flow_viewer
topicnet.viewers.topic_mapping