{"id":41350,"date":"2020-08-10T15:03:00","date_gmt":"2020-08-10T13:03:00","guid":{"rendered":"https:\/\/www.sotrender.com\/blog\/?p=41350\/"},"modified":"2022-09-20T18:48:21","modified_gmt":"2022-09-20T16:48:21","slug":"social-media-and-topic-modeling","status":"publish","type":"post","link":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/","title":{"rendered":"Social media and topic modeling: how to analyze posts in practice"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">There is a substantial amount of data generated on the internet every second &#8211; posts, comments, photos, and videos. These different data types mean that there is a lot of ground to cover, so let\u2019s focus on one &#8211; text.<\/span><\/p>\n<p><!--more--><\/p>\n<p><span style=\"font-weight: 400;\">All social conversations are based on written words &#8211; tweets, Facebook posts, comments, online reviews, and so on. Being a social media marketer, a Facebook group\/profile moderator, or trying to promote your business on social media requires you to know how your audience reacts to the content you are uploading. One way is to read it all, mark hateful comments, divide them into similar topic groups, calculate statistics and&#8230; lose a big chunk of your time just to see that there are thousands of new comments to add to your calculations. Fortunately, there is another solution to this problem &#8211; <\/span><b>machine learning<\/b><span style=\"font-weight: 400;\">. From this text you will learn:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Why do you need specialised tools for social media analyses?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">What can you get from topic modeling and how it is done?<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">How to automatically look for hate speech in comments?<\/span><\/li>\n<\/ul>\n<h2><strong>Why are social media texts unique?<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Before jumping to the analyses, it is really important to understand why social media texts are so unique:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Posts and comments are short. They mostly contain one simple sentence or even single word or expression. This gives us a limited amount of information to obtain just from one post.<br \/>\n<a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-41351\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image8.png\" alt=\"\" width=\"506\" height=\"83\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image8.png 512w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image8-475x78.png 475w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><\/a> <a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-41352\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image6.png\" alt=\"\" width=\"226\" height=\"34\" \/><\/a><\/span><\/span><\/li>\n<li><span style=\"font-weight: 400;\">Emojis and smiley faces &#8211; used almost exclusively on social media. They give additional details about the author&#8217;s emotions and context.<br \/>\n<a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image11.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-41353\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image11.png\" alt=\"\" width=\"506\" height=\"60\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image11.png 701w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image11-475x56.png 475w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><\/a> <a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-41354\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image3.png\" alt=\"\" width=\"226\" height=\"31\" \/><\/a><br \/>\n<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/li>\n<li><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Slang phrases which make posts resemble spoken language rather than written. It makes statements appear more casual.<br \/>\n<a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image10.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-41355\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image10.png\" alt=\"\" width=\"346\" height=\"85\" \/><\/a> <a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image14.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-41356\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image14.png\" alt=\"\" width=\"57\" height=\"31\" \/><\/a><\/span><\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">These features make social media a whole different source of information and demand special attention while running an analysis using machine learning. In contrast, most open-source machine learning solutions are based on long, formal text, like Wikipedia articles and other website posts. As a result, these models perform badly on social media data, because they don\u2019t understand additional forms of expression included. This problem is called domain shift and is a typical NLP problem. Different data also require customised <a href=\"https:\/\/improvado.io\/blog\/data-preparation-tools\">data preparation methods<\/a> called preprocessing. The step consists of cleaning text from invaluable tokens like URLs or mentions and conversion to machine readable format (<a href=\"https:\/\/towardsdatascience.com\/modelling-hate-speech-in-polish-importance-of-domain-specific-embeddings-206a02fb3a3b\" target=\"_blank\" rel=\"dofollow noopener\">more about how we do it in Sotrender<\/a>)<\/span><span style=\"font-weight: 400;\">. This is why <\/span><b>it is crucial to use tools created especially for your data source to get the best results<\/b><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<h2><strong>Topic Modeling for social media<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Machine learning for text analysis (<\/span><a href=\"https:\/\/www.sotrender.com\/blog\/2019\/06\/natural-language-processing\/\" target=\"_blank\" rel=\"dofollow noopener\"><span style=\"font-weight: 400;\">Natural Language Processing<\/span><\/a><span style=\"font-weight: 400;\">) is a vast field with lots of different model types that can gain insight into your data. One of the areas that can answer the question \u201cwhat are the topics of given pieces of texts?\u201d is <\/span><b>topic modeling<\/b><span style=\"font-weight: 400;\">. These models help with understanding what people are talking about in general. It does not require any specially prepared data set with predefined topics. It can find topics which are patterns hidden within the data on its own without supervision and help &#8211; which makes it an <\/span><b>unsupervised machine learning <\/b><span style=\"font-weight: 400;\">method. This means that <\/span><b>it is easy to build a model for each individual problem<\/b><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are lots of different algorithms that can be used for this task, but the most common and widely used is LDA (Latent Dirichlet Allocation). It\u2019s based on word frequencies and topics distribution in texts. To put it simply, this method counts words in a given data set and groups them based on their co-occurrence into topics. Then the percentage distribution of topics in each document is calculated. As a result this method assumes that each text is a mixture of topics which works great with long documents where every paragraph relates to a different matter.<\/span><\/p>\n<div id=\"attachment_41357\" style=\"width: 634px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41357\" class=\"wp-image-41357 size-full\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image2.png\" alt=\"\" width=\"624\" height=\"400\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image2.png 624w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image2-475x304.png 475w\" sizes=\"(max-width: 624px) 100vw, 624px\" \/><\/a><p id=\"caption-attachment-41357\" class=\"wp-caption-text\"><strong>Figure 1.<\/strong> LDA algorithm (Credit: <a href=\"http:\/\/www.cs.columbia.edu\/~blei\/papers\/Blei2012.pdf\" target=\"_blank\" rel=\"nofollow noopener\">Columbia University<\/a>)<\/p><\/div>\n<p><b>That\u2019s why social media texts need a different procedure<\/b><span style=\"font-weight: 400;\">. One of the new algorithms is GSDMM (Gibbs sampling algorithm for a Dirichlet Mixture Model). What makes this one so different?:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">It is fast,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">designed for short texts,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">easily explained with an analogy of a teacher (algorithm) that wants to divide students (texts) into groups (topics) of similar interests.<\/span><\/li>\n<\/ol>\n<div id=\"attachment_41358\" style=\"width: 613px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image18.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41358\" class=\"wp-image-41358 size-full\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image18.png\" alt=\"\" width=\"603\" height=\"203\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image18.png 603w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image18-475x160.png 475w\" sizes=\"(max-width: 603px) 100vw, 603px\" \/><\/a><p id=\"caption-attachment-41358\" class=\"wp-caption-text\"><strong>Figure 2.<\/strong> Group assignment algorithm<\/p><\/div>\n<p><span style=\"font-weight: 400;\">Students are told to write down some movie titles they liked within 2 minutes. Most students are able to list 3-5 movies with this time frame (it corresponds to a limited number of words for social media texts). Then they are randomly assigned to a group. The last step is for every student to pick a different table with two rules in mind:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">pick a group with more students &#8211; favours bigger groups\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">or a group with the most similar movie titles &#8211; makes groups more cohesive.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">This last step is repeated multiple times. First rule that favours bigger groups is crucial to ensure that groups are not excessively fragmented. Due to the limited number of movie titles (words) for each student (text), each group (topic) is bound to have members with different movies in their lists but from the same genre.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As A result of the GSDMM algorithm you obtain an assignment of each text to one topic, as well as a list of the most important words for every topic.<\/span><\/p>\n<div id=\"attachment_41359\" style=\"width: 424px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image9.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41359\" class=\"size-full wp-image-41359\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image9.png\" alt=\"\" width=\"414\" height=\"227\" \/><\/a><p id=\"caption-attachment-41359\" class=\"wp-caption-text\">Figure 3. Documents assignment to topics and getting topic words<\/p><\/div>\n<p><span style=\"font-weight: 400;\">The tricky part is to decide upon number of topic (problem of every unsupervised method) but when you finally do this you can gain quite of a lot of insights from the data:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Distribution of topics in your data<br \/>\n<\/span><\/span><\/p>\n<div id=\"attachment_41360\" style=\"width: 415px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41360\" class=\"size-full wp-image-41360\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16.png\" alt=\"\" width=\"405\" height=\"275\" \/><\/a><p id=\"caption-attachment-41360\" class=\"wp-caption-text\"><strong>Figure 4.<\/strong> Topic distribution in data<\/p><\/div>\n<p>&nbsp;<\/li>\n<li><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Word Clouds &#8211; allows us to comprehend the topic and name it. It is a quick and easy solution that can replace reading the whole set of text and spare you hours of tedious work of dividing it into sets.<br \/>\n<\/span><\/span><\/p>\n<p><div id=\"attachment_41366\" style=\"width: 739px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41366\" class=\" wp-image-41366\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80-950x279.png\" alt=\"\" width=\"729\" height=\"214\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80-950x279.png 950w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80-475x139.png 475w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80-768x226.png 768w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/Screenshot-80.png 1100w\" sizes=\"(max-width: 729px) 100vw, 729px\" \/><\/a><p id=\"caption-attachment-41366\" class=\"wp-caption-text\"><strong>Figure 5.<\/strong> You can see in the picture above three examples of word clouds. Looking from left to right, the first one contains words: president, government, disease, covid &#8211; we can assume the main theme is politics. There are also less prominent words like cough, sick and health so it\u2019s a topic about government actions regarding health issues.<\/p><\/div><\/li>\n<li><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Time series analysis of topics &#8211; As we can see in the plot below some topics can gain more attention like number 7 and some of them fade away like number 4. Trying to grasp the idea of what is popular or can be popular in the future is a good thing to look back and see how topics were changing in the past.<\/span><\/span><\/li>\n<\/ol>\n<div id=\"attachment_41367\" style=\"width: 610px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41367\" class=\" wp-image-41367\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1-950x901.png\" alt=\"\" width=\"600\" height=\"569\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1-950x901.png 950w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1-475x451.png 475w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1-768x728.png 768w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image1.png 1941w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><p id=\"caption-attachment-41367\" class=\"wp-caption-text\"><strong>Figure 6.<\/strong> Distribution of topics over time.<\/p><\/div>\n<h2><\/h2>\n<h2><b>Use case<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">In one of our recent projects for Collegium Civitas we analyzed 50 000 social media posts and comments and performed topic analysis on them. It allowed our client to answer questions like:\u00a0<\/span><\/p>\n<p><b><i>1) What was discussed in the time span of 2 months in social media?<\/i><\/b><\/p>\n<p><span style=\"font-weight: 400;\">In the dataset we were able to distinguish 10 different topics, revolving around COVID-19. Discussions covered statistics and COVID-19 etiology, everyday life, government response to pandemic, consequences of limitations in traveling, trade market and supplies, everyday life, health care during pandemic, church and politics, common knowledge and conspiracy theories of COVID-19, politics and economy, spam messages and ads.\u00a0<\/span><\/p>\n<p><b><i>2) How were the discussions influenced by the pandemic situation?<\/i><\/b><\/p>\n<p><span style=\"font-weight: 400;\">During the pandemic burst the biggest theme was the origin and statistics of COVID-19. People talked about how the situation is changing and exchanged information about ways of disease spreading . To read more visit <a href=\"https:\/\/www.civitas.edu.pl\/pl\/uczelnia\/aktualnosci\/raport-czas-pandemii-w-polskich-mediach-spolecznosciowych\" target=\"_blank\" rel=\"dofollow noopener\">Collegium Civitas&#8217;s site<\/a> (there is only a Polish version).\u00a0<\/span><\/p>\n<h2><strong>Hate speech recognition<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Another question that can be answered with machine learning is \u201c<\/span><b>what kind of emotion do people express in their comments or posts?<\/b><span style=\"font-weight: 400;\">\u201d or \u201c<\/span><b>is my content generating hateful comments?<\/b><span style=\"font-weight: 400;\">\u201d. There are only a few solutions for these tasks in the Polish language. That is why we build models based on social media text for Sentiment and Hate Speech recognition at <\/span><b>Sotrender<\/b><span style=\"font-weight: 400;\">. Our solutions were built in two steps.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first step is to convert text and emojis into numerical vector representation (embeddings) to be used later in neural networks. The main goal of this step is to achieve some kind of language model (LM) that has the knowledge of a human language so that vectors representing similar words are close to each other <\/span><span style=\"font-weight: 400;\">(for example: queen and king or paragraph and article) which implies that these words have similar meaning (semantic similarity)<\/span><span style=\"font-weight: 400;\">.<\/span><span style=\"font-weight: 400;\"> The property is shown on the graph below.<\/span><\/p>\n<div id=\"attachment_41381\" style=\"width: 509px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16-1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41381\" class=\" wp-image-41381\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16-1.png\" alt=\"\" width=\"499\" height=\"252\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16-1.png 483w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image16-1-475x240.png 475w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/><\/a><p id=\"caption-attachment-41381\" class=\"wp-caption-text\">Figure 7. The intuition behind word similarity<\/p><\/div>\n<p><span style=\"font-weight: 400;\">Training this model is similar to teaching a child how to speak by talking to them. Children by listening to their parents talk are able to grasp the meaning of words and the more they hear the more they understand.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">According to this analogy, we have to use a huge set of social media text to train our model to understand its language. That is why we used a set of 100 millions posts and comments to train our model so it can properly assign vectors to words as well as to emojis. Tokens vectorised with an embeddings model provide the input to the neural network.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The second step is designing neural networks for a specific task &#8211; Hate speech recognition. The most important thing is the data set &#8211; the model needs examples of hate speech and non-hateful texts to learn how to tell them apart. In order to get best results you need to experiment with different architectures and model\u2019s hyperparameters.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As a result of the hate speech recognition model, we get another grouping of our data set. Now we can see <\/span><b>how our audience reacts, how many hateful comments or posts it&#8217;s creating<\/b><span style=\"font-weight: 400;\">. What\u2019s more, by combining it again with the time of publication of each comment, we can see <\/span><b>if there was a specific time period when the most hateful comments were generated <\/b><span style=\"font-weight: 400;\">like shown in a histogram below.<\/span><\/p>\n<div id=\"attachment_41369\" style=\"width: 617px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41369\" class=\" wp-image-41369\" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17-950x809.png\" alt=\"\" width=\"607\" height=\"517\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17-950x809.png 950w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17-475x404.png 475w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17-768x654.png 768w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image17.png 1359w\" sizes=\"(max-width: 607px) 100vw, 607px\" \/><\/a><p id=\"caption-attachment-41369\" class=\"wp-caption-text\">Figure 8. Hate Speech distribution over time<\/p><\/div>\n<p><span style=\"font-weight: 400;\">Combining this distribution with recent posts or events can give you<\/span><b> insight into the type of content that provokes people<\/b><span style=\"font-weight: 400;\">. Also changes of hate speech contribution in time can be related with changes in topic distribution. Combining all the information from analysis can provide an in-depth image of the dataset.<\/span><\/p>\n<div id=\"attachment_41370\" style=\"width: 814px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-41370\" class=\"wp-image-41370 \" src=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5-950x378.png\" alt=\"\" width=\"804\" height=\"320\" srcset=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5-950x378.png 950w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5-475x189.png 475w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5-768x306.png 768w, https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/image5.png 1737w\" sizes=\"(max-width: 804px) 100vw, 804px\" \/><\/a><p id=\"caption-attachment-41370\" class=\"wp-caption-text\"><strong>Figure 9.<\/strong> Weekly text count with hate speech<\/p><\/div>\n<p><span style=\"font-weight: 400;\">As the histogram above shows most hate is connected to topic 3, 6 and 7. <\/span><b>Knowing what makes people angry gives the opportunity to avoid sensitive topics in the future.<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Same goes for sentiment analysis. We can produce similar visualizations for positive, negative or neutral comments and see their distribution in time or topics. If you would like to read the whole report build based on our analysis of there 8 weeks of data you can find it <a href=\"https:\/\/www.civitas.edu.pl\/wp-content\/uploads\/2020\/07\/czas_pandemii_w_polskich_mediach_spolecznosciowych.pdf\" target=\"_blank\" rel=\"dofollow noopener\">here<\/a> (only Polish version).<\/span><\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">We have models for hate speech and sentiment recognition that are constantly improved and updated for social media texts here at Sotrender. What\u2019s more, we have experience in building topic modeling models for individual cases. As you can see there\u2019s a lot of benefits coming from this type of analysis:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Getting to know your audience<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Having in depth look into topics of comments<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Discovering trending themes<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Finding source of hatred or negativity in our content<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">To name just a few!<\/span><\/p>\n<p>If you&#8217;re interested in learning more about how we can use our machine learning models for your brand&#8217;s social media profiles, feel free to contact our <a href=\"mailto:sales@sotrender.com\">Sales team<\/a> for more information. \ud83d\ude09<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There is a substantial amount of data generated on the internet every second &#8211; posts, comments, photos, and videos. These different data types mean that there is a lot of ground to cover, so let\u2019s focus on one &#8211; text.<\/p>\n","protected":false},"author":186,"featured_media":41372,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1779],"tags":[1281,2939,3072,3073],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Social media and topic modeling: how to analyze posts in practice<\/title>\n<meta name=\"description\" content=\"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here&#039;s how it&#039;s done.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Social media and topic modeling: how to analyze posts in practice\" \/>\n<meta property=\"og:description\" content=\"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here&#039;s how it&#039;s done.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/\" \/>\n<meta property=\"og:site_name\" content=\"Sotrender Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Sotrender\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-10T13:03:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-09-20T16:48:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"3456\" \/>\n\t<meta property=\"og:image:height\" content=\"2304\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Dominika Sagan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Sotrender\" \/>\n<meta name=\"twitter:site\" content=\"@Sotrender\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dominika Sagan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/\",\"url\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/\",\"name\":\"Social media and topic modeling: how to analyze posts in practice\",\"isPartOf\":{\"@id\":\"https:\/\/www.sotrender.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg\",\"datePublished\":\"2020-08-10T13:03:00+00:00\",\"dateModified\":\"2022-09-20T16:48:21+00:00\",\"author\":{\"@id\":\"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/3a995d2440c23466d51eec75615e950d\"},\"description\":\"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here's how it's done.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage\",\"url\":\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg\",\"contentUrl\":\"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg\",\"width\":3456,\"height\":2304},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.sotrender.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Social media and topic modeling: how to analyze posts in practice\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/#website\",\"url\":\"https:\/\/www.sotrender.com\/blog\/\",\"name\":\"Sotrender Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.sotrender.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/3a995d2440c23466d51eec75615e950d\",\"name\":\"Dominika Sagan\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/4a75a66798f9706d03f14938b7720baa?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/4a75a66798f9706d03f14938b7720baa?s=96&d=mm&r=g\",\"caption\":\"Dominika Sagan\"},\"description\":\"At Sotrender, she is involved in all stages of NLP modeling - from data analysis, creating baseline models, to their development and quality improvement. Fan of K-dramas and amateur ornithologist.\",\"url\":\"https:\/\/www.sotrender.com\/blog\/author\/d-sagan\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Social media and topic modeling: how to analyze posts in practice","description":"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here's how it's done.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/","og_locale":"en_US","og_type":"article","og_title":"Social media and topic modeling: how to analyze posts in practice","og_description":"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here's how it's done.","og_url":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/","og_site_name":"Sotrender Blog","article_publisher":"https:\/\/www.facebook.com\/Sotrender","article_published_time":"2020-08-10T13:03:00+00:00","article_modified_time":"2022-09-20T16:48:21+00:00","og_image":[{"width":3456,"height":2304,"url":"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg","type":"image\/jpeg"}],"author":"Dominika Sagan","twitter_card":"summary_large_image","twitter_creator":"@Sotrender","twitter_site":"@Sotrender","twitter_misc":{"Written by":"Dominika Sagan","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/","url":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/","name":"Social media and topic modeling: how to analyze posts in practice","isPartOf":{"@id":"https:\/\/www.sotrender.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage"},"image":{"@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage"},"thumbnailUrl":"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg","datePublished":"2020-08-10T13:03:00+00:00","dateModified":"2022-09-20T16:48:21+00:00","author":{"@id":"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/3a995d2440c23466d51eec75615e950d"},"description":"By using topic modeling to analyze your social media, you can find trending themes in your comments and get to know your audience. Here's how it's done.","breadcrumb":{"@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#primaryimage","url":"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg","contentUrl":"https:\/\/www.sotrender.com\/blog\/wp-content\/uploads\/2020\/08\/alina-grubnyak-ZiQkhI7417A-unsplash.jpg","width":3456,"height":2304},{"@type":"BreadcrumbList","@id":"https:\/\/www.sotrender.com\/blog\/2020\/08\/social-media-and-topic-modeling\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.sotrender.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Social media and topic modeling: how to analyze posts in practice"}]},{"@type":"WebSite","@id":"https:\/\/www.sotrender.com\/blog\/#website","url":"https:\/\/www.sotrender.com\/blog\/","name":"Sotrender Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.sotrender.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/3a995d2440c23466d51eec75615e950d","name":"Dominika Sagan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.sotrender.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/4a75a66798f9706d03f14938b7720baa?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4a75a66798f9706d03f14938b7720baa?s=96&d=mm&r=g","caption":"Dominika Sagan"},"description":"At Sotrender, she is involved in all stages of NLP modeling - from data analysis, creating baseline models, to their development and quality improvement. Fan of K-dramas and amateur ornithologist.","url":"https:\/\/www.sotrender.com\/blog\/author\/d-sagan\/"}]}},"_links":{"self":[{"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/posts\/41350"}],"collection":[{"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/users\/186"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/comments?post=41350"}],"version-history":[{"count":14,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/posts\/41350\/revisions"}],"predecessor-version":[{"id":46213,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/posts\/41350\/revisions\/46213"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/media\/41372"}],"wp:attachment":[{"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/media?parent=41350"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/categories?post=41350"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sotrender.com\/blog\/wp-json\/wp\/v2\/tags?post=41350"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}