Tag Cloud Tutorial

In this post I’ll provide a brief tutorial on how to create a tag cloud, as seen here.

First, this assumes you have downloaded a set of tweets into an SQLite database. If you are using a different database please modify accordingly. Also, to get to this stage, work through the first 8 tutorials listed here (you can skip over the seventh tutorial on downloading tweets by a list of users).

Here is the full code for processing the Twitter data you’ve downloaded so that you can generate a tag cloud:

Now I’ll try to walk you through the basic steps here. For those of you who are completely new to Python, you should work through some of my other tutorials.

Understanding the Code

The first line in the code above is the shebang — you’ll find this in all Python code.

[python]#!/usr/bin/env python[/python]

Lines 3 – 6 contain the docstring — also a Python convention. This is a multi-line comment that describes the code. For single-line comments, use the # symbol at the start of the line.

[python] """

tags_from_tweets.py – Take hashtags from tweets in SQLite database.
Output to text file.

"""
[/python]

Next we’ll import several Python packages needed to run the code.

[python]

import sys
import re
import sqlite3

[/python]

In lines 14-16 we create a connection with the SQLite database, make a query to select all of the tweets in the database, and assign the returned tweets to the variable tweets.

[python] database = "arnova14.sqlite"
conn = sqlite3.connect(database)
c = conn.cursor()
c.execute(‘SELECT * FROM search_tweets’)
tweets = c.fetchall()
[/python]

Line 21 creates an empty dictionary in which we will place all of the hashtags from each tweet.

[python] all_text = [] [/python]

In lines 23-36 we loop over each tweet. First we identify the two specific columns in the database we’re interested in (the tweet id and the hashtags column), then add the tags to the all_text variable created earlier.

[python] for row in tweets:
id = row[0] hashtags = row[31] #the tags
if hashtags:
tags = hashtags.lower()
print tags
else:
tags = ”
tags = re.sub(‘\n’, ‘ ‘, tags)

# to remove ‘u’ before each tweet in the list –> DOESN’T WORK WITH SQLITE INSERTION
tags = tags.encode("utf-8")

all_text.append(tags)
[/python]

Finally, in lines 43-46 we translate the all_text variable from a dictionary to a string, then output it to a text file.

[python] all_hashtags = ‘ ‘.join(all_text)
out=file(‘all_text_HASHTAGS.txt’,’w’)
out.write(all_hashtags)
[/python]

Once you’ve got this text file, open it, copy all of the text, and use it to create your own word cloud on Wordle.

I hope this helps. If you need help with actually getting the tweets into your database, take a look at some of the other tutorials I’ve posted. If you have any questions, let me know, and have fun with the data!

Understanding the Code

Explore more

Contact Information