What’s hidden in your document cabinet – Clouds and Topics!!

Organizations – big and small – accumulate enormous amount of text data in terms of email, documents, customer feedback etc. It requires enormous amounts of manual effort and domain expertise to make sense of the content of the text data and tease out useful nuggets that are actionable.

Using Text Analytics or Natural Language Processing

Text analytics or Natural Language Processing (NLP) tools can help to quickly analyze the data and gain insights. There are many open source tools available at the disposal of a Data Scientist.

Starting from simple word clouds to sophisticated topic models, these analytical tools are indispensable for an organization that depends on text data insights for business efficiency.

Word Clouds and Topic Models

Let’s first look at word clouds. Word clouds are simple tools to communicate certain ideas across quickly to audience with multiple backgrounds. A picture is worth a thousand words – but a word cloud is worth a thousand words, PLUS we have a picture!

With word clouds and topic models, trends and patterns in text that are difficult to see in a table or a paragraph pop out clearly. They are especially useful for the following applications:

  • Understand your Customers:
    • What are the customers talking about your services? You can quickly analyze their feedback expressed in several sentences into positive and negative clouds.
    • Similarly, identify the important topics of discussion and how you can address those to win customer loyalty.
  • Insights from survey data:
    • Thousands of lines of survey data can be quickly summarized, and the top terms and topics can be identified.
  • Search terms for your website:
    • Word cloud from your website contents can help identify the keywords that will improve your SEO.

A common word cloud illustrates words appearing in the order of the frequency with which they are used given a topic.

Here, I used the data that describe Syntelli Solutions. From the cloud, you can comprehend that Syntelli is a wonderful place to work, and clients love our products, we use Python, Microsoft Azure, and Hadoop for Big data etc. However, a more fun option could be to use the Logo of interest and generate the cloud which speaker even better as below:


To complete the story, one can use a combination of word clouds, which includes LetterCloud, and Image Clouds in tandem to make a word cloud sentence!

R code for such implementation:

And the result is:

Using Natural Language Processing to Optimize Text for Topics

We can extend the NLP techniques to work with a corpus of text data, find the optimal number of topics hidden in the text and identify those topics. For example, we worked with a client who has customer complaints data to efficiently identify the topics the customers are calling about. This efficiency would reduce the time and cost to service the customers.


sing techniques like LDA (Latent Dirichlet Allocation) and open source methods, we have implemented a topic modelling solution for the text data. For example, let’s take the service calls coming to a TV manufacturer.

A simple visual tool can help them identify the topics and the terms associated in the topics. This information can then be used to channel the customer complaints efficiently, which will then enable huge cost savings in terms of tech time and call time.

Please follow and like us: