• About Me
  • Media Mentions
  • Contact

Social Metrics

Social Media, Organizations, and Academic Research

  • Email
  • Facebook
  • GitHub
  • LinkedIn
  • Pinterest
  • RSS
  • Twitter
  • Home
  • Blog
  • Publications
  • Teaching
  • Python
  • About Me
You are here: Home / Big Data / Analyzing Big Data with Python PANDAS

Analyzing Big Data with Python PANDAS

October 23, 2015 by Gregory Saxton 6 Comments

This is a series of iPython notebooks for analyzing Big Data — specifically Twitter data — using Python’s powerful PANDAS (Python Data Analysis) library. Through these tutorials I’ll walk you through how to analyze your raw social media data using a typical social science approach.

The target audience is those who are interested in covering key steps involved in taking a social media dataset and moving it through the stages needed to deliver a valuable research product. I’ll show you how to import your data, aggregate tweets by organization and by time, how to analyze hashtags, how to create new variables, how to produce a summary statistics table for publication, how to analyze audience reaction (e.g., # of retweets) and, finally, how to run a logistic regression to test your hypotheses. Collectively, these tutorials cover essential steps needed to move from the data collection to the research product stage.

Prerequisites

I’ve put these tutorials in a GitHub repository called PANDAS. For these tutorials I am assuming you have already downloaded some data and are now ready to begin examining it. In the first notebook I will show you how to set up your ipython working environment and import the Twitter data we have downloaded. If you are new to Python, you may wish to go through a series of tutorials I have created in order.

If you want to skip the data download and just use the sample data, but don’t yet have Python set up on your computer, you may wish to go through the tutorial “Setting up Your Computer to Use My Python Code”.

Also note that we are using the iPython notebook interactive computing framework for running the code in this tutorial. If you’re unfamiliar with this see this tutorial “Four Ways to Run your Code”.

For a more general set of PANDAS notebook tutorials, I’d recommend this cookbook by Julia Evans. I also have a growing list of “recipes” that contains frequently used PANDAS commands.

As you may know from my other tutorials, I am a big fan of the free Anaconda version of Python 2.7. It contains all of the prerequisites you need and will save you a lot of headaches getting your system set up.

Chapters:

At the GitHub site you’ll find the following chapters in the tutorial set:

Chapter 1 – Import Data, Select Cases and Variables, Save DataFrame.ipynb
Chapter 2 – Aggregating and Analyzing Data by Twitter Account.ipynb
Chapter 3 – Analyzing Twitter Data by Time Period.ipynb
Chapter 4 – Analyzing Hashtags.ipynb
Chapter 5 – Generating New Variables.ipynb
Chapter 6 – Producing a Summary Statistics Table for Publication.ipynb
Chapter 7 – Analyzing Audience Reaction on Twitter.ipynb
Chapter 8 – Running, Interpreting, and Outputting Logistic Regression.ipynb

I hope you find these tutorials helpful; please acknowledge the source in your own research papers if you’ve found them useful:

    Saxton, Gregory D. (2015). Analyzing Big Data with Python. Buffalo, NY: http://social-metrics.org

Also, please share and spread the word to help build a vibrant community of PANDAS users.

Happy coding!

Share Button
image_pdfimage_print

Filed Under: Big Data, ipython_notebook, notebooks, pandas, python, research, Twitter Tagged With: academic research, Big Data, hashtags, iPython, PANDAS, Programming, python, research, social media, socialmedia, tutorial, Twitter

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Related posts

  • Levels of Analysis in Big Data
  • Python Tutorials for Downloading Twitter Data

Featured Posts

Python Data Analytics Tutorials

The bulk of my research involves some degree of 'Big Data' … [Read More...]

Downloading Tweets, Take III – MongoDB

In this tutorial I walk you through how to use Python and … [Read More...]

Does Twitter Matter?

Twitter is not the Gutenberg Press. The 'Big Data' … [Read More...]

Archives

  • November 2020
  • October 2020
  • July 2020
  • January 2019
  • October 2018
  • July 2018
  • May 2018
  • April 2018
  • March 2018
  • October 2017
  • September 2017
  • November 2016
  • October 2015
  • June 2015
  • May 2015
  • April 2015
  • November 2014
  • October 2014
  • September 2014
  • May 2014
  • April 2014

E-mail sign-up

Every time I post something new to my blog, receive it free by email.

No spam.

Contact Information

Gregory D. Saxton
Schulich School of Business
York University
Toronto, ON
gsaxton@yorku.ca

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Tag Cloud

academia academic research AccountingAnalytics arnova14 Big Data conference Data Analytics Database hashtags ica iPython MongoDB nonprofits PANDAS PhD_studies Programming python replication research social media socialmedia tutorial Twitter

Copyright © 2021 · Metro Pro Theme on Genesis Framework · WordPress · Log in