• About Me
  • Media Mentions
  • Contact

Social Metrics

Social Media, Organizations, and Academic Research

  • Email
  • Facebook
  • GitHub
  • LinkedIn
  • Pinterest
  • RSS
  • Twitter
  • Home
  • Blog
  • Publications
  • Teaching
  • Python
  • About Me
You are here: Home / python / Tag Cloud Tutorial

Tag Cloud Tutorial

November 23, 2014 by Gregory Saxton Leave a Comment

1072645_98618032

In this post I’ll provide a brief tutorial on how to create a tag cloud, as seen here.

First, this assumes you have downloaded a set of tweets into an SQLite database. If you are using a different database please modify accordingly. Also, to get to this stage, work through the first 8 tutorials listed here (you can skip over the seventh tutorial on downloading tweets by a list of users).

Here is the full code for processing the Twitter data you’ve downloaded so that you can generate a tag cloud:

Now I’ll try to walk you through the basic steps here. For those of you who are completely new to Python, you should work through some of my other tutorials.

Understanding the Code

The first line in the code above is the shebang — you’ll find this in all Python code.

#!/usr/bin/env python

 

Lines 3 – 6 contain the docstring — also a Python convention. This is a multi-line comment that describes the code. For single-line comments, use the # symbol at the start of the line.

"""

tags_from_tweets.py - Take hashtags from tweets in SQLite database. 
Output to text file.

"""

 

Next we’ll import several Python packages needed to run the code.


import sys
import re
import sqlite3

 

In lines 14-16 we create a connection with the SQLite database, make a query to select all of the tweets in the database, and assign the returned tweets to the variable tweets.

    database = "arnova14.sqlite"
    conn = sqlite3.connect(database)
    c = conn.cursor()
    c.execute('SELECT * FROM search_tweets')  
    tweets = c.fetchall() 

 

Line 21 creates an empty dictionary in which we will place all of the hashtags from each tweet.

    all_text = []

 

In lines 23-36 we loop over each tweet. First we identify the two specific columns in the database we’re interested in (the tweet id and the hashtags column), then add the tags to the all_text variable created earlier.

    for row in tweets:
        id = row[0]
        hashtags = row[31] #the tags   
        if hashtags:       
            tags = hashtags.lower() 
            print tags
        else:
            tags = ''
        tags = re.sub('\n', ' ', tags)
        
        # to remove 'u' before each tweet in the list --> DOESN'T WORK WITH SQLITE INSERTION
        tags = tags.encode("utf-8")                  
            
        all_text.append(tags)     

 

Finally, in lines 43-46 we translate the all_text variable from a dictionary to a string, then output it to a text file.

    all_hashtags = ' '.join(all_text)
    out=file('all_text_HASHTAGS.txt','w')
    out.write(all_hashtags)    

 

Once you’ve got this text file, open it, copy all of the text, and use it to create your own word cloud on Wordle.

I hope this helps. If you need help with actually getting the tweets into your database, take a look at some of the other tutorials I’ve posted. If you have any questions, let me know, and have fun with the data!

Share Button
image_pdfimage_print

Filed Under: python, Twitter Tagged With: python, tutorial, Twitter

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Featured Posts

Python Data Analytics Tutorials

The bulk of my research involves some degree of 'Big Data' … [Read More...]

Downloading Tweets, Take III – MongoDB

In this tutorial I walk you through how to use Python and … [Read More...]

Does Twitter Matter?

Twitter is not the Gutenberg Press. The 'Big Data' … [Read More...]

Archives

  • November 2020
  • October 2020
  • July 2020
  • January 2019
  • October 2018
  • July 2018
  • May 2018
  • April 2018
  • March 2018
  • October 2017
  • September 2017
  • November 2016
  • October 2015
  • June 2015
  • May 2015
  • April 2015
  • November 2014
  • October 2014
  • September 2014
  • May 2014
  • April 2014

E-mail sign-up

Every time I post something new to my blog, receive it free by email.

No spam.

Contact Information

Gregory D. Saxton
Schulich School of Business
York University
Toronto, ON
gsaxton@yorku.ca

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Tag Cloud

academia academic research AccountingAnalytics arnova14 Big Data conference Data Analytics Database hashtags ica iPython MongoDB nonprofits PANDAS PhD_studies Programming python replication research social media socialmedia tutorial Twitter

Copyright © 2021 · Metro Pro Theme on Genesis Framework · WordPress · Log in