• About Me
  • Media Mentions
  • Contact

Social Metrics

Social Media, Organizations, and Academic Research

  • Email
  • Facebook
  • GitHub
  • LinkedIn
  • Pinterest
  • RSS
  • Twitter
  • Home
  • Blog
  • Publications
  • Teaching
  • Python
  • About Me
You are here: Home / Featured / Why I Use Python for Academic Research

Why I Use Python for Academic Research

April 24, 2014 by Gregory Saxton 12 Comments

1206711_41147487

Academics and other researchers have to choose from a variety of research skills. Most social scientists do not add computer programming into their skill set. As a strong proponent of the value of learning a programming language, I will lay out how this has proven to be useful for me. A budding programmer could choose from a number of good options — including perl, C++, Java, PHP, or others — but Python has a reputation as being one of the most accessible and intuitive. I obviously like it.

No matter your choice of language, there are variety of ways learning programming will be useful for social scientists and other data scientists. The most important areas are data gathering, data manipulation, and data visualization and analysis.

Data Gathering

When I started learning Python four years ago, I kept a catalogue of the various scripts I wrote. Going over these scripts, I have personally written Python code to gather the following data:

  • Download lender and borrower information for thousands of donation transactions on kiva.org.
  • Download tweets from a list of 100 large nonprofit organizations.
  • Download Twitter profile information from a 150 advocacy nonprofits.
  • Scrape the ‘Walls’ from 65 organizations’ Facebook accounts.
  • Download @messages sent to 38 community foundations.
  • Traverse and download html files for thousands of webpages on large accounting firms’ websites.
  • Scrape data from 1,000s of organizational profiles on a charity rating site.
  • Scrape data from several thousand organizations raising money on the crowdfunding site Indiegogo.
  • Download hundreds of YouTube videos used in Indiegogo fundraising campaigns.
  • Gather data available through the InfoChimps API.
  • Scrape pinning and re-pinning data from health care organizations’ Pinterest accounts.
  • Tap into the Facebook Graph API to download status updates and number of likes, comments and shares for 100 charities.

This is just a sample. The point is that you can use a programming language like Python to get just about any data from the Web. When the website or social media platform makes available an API (application programming interface), accessing the data is easy. Twitter is fantastic for this very reason. In other cases — including most websites — you will have to scrape the data through creative use of programming. Either way, you can gain access to valuable data.

There’s no need to be an expert to obtain real-world benefits from programming. I started learning Python four years ago (I now consider myself an intermediate-level programmer) and gained substantive benefits right from the start.

Data Manipulation

Budding researchers often seem to under-estimate how much time they will be spending on manipulating, reshaping, and processing their data. Python excels at data munging. I have recently used Python code to

  • Loop over hundreds of thousands of tweets and modify characters, convert date formats, etc.
  • Identify and delete duplicate entries in an SQL database.
  • Loop over 74 nonprofit organizations’ Twitter friend-follower lists to create a 74 x 74 friendship network.
  • Read in and write text and CSV data.
  • Countless grouping, merging, and aggregation functions.
  • Automatically count the number of “negative” words in thousands of online donation appeals.
  • Loop over hundreds of thousands of tweets to create an edge list for a retweet network.
  • Compute word counts for a word-document matrix from thousands of crowdfunding appeals.
  • Create text files combining all of an organizations’ tweets for use in creating word clouds.
  • Download images included in a set of tweets.
  • Merging text files.
  • Count number of Facebook statuses per organization.
  • Loop over hundreds of thousands of rows of tweets in an SQLite database and create additional variables for future analysis.
  • Dealing with missing data.
  • Creating dummy variables.
  • Find the oldest entry for each organization in a Twitter database.
  • Use pandas (Python Data Analysis Library) to aggregate Twitter data to the daily, weekly, and monthly level.
  • Create a text file of all hashtags in a Twitter database.

Data Visualization and Analysis

With the proliferation of scientific computing modules such as pandas and statsmodels and scikit-learn, Python’s data analysis capabilities have gotten much more powerful over the past few years. With such tools Python can now compete in many areas with devoted statistical programs such as R or Stata, which I have traditionally used for most of my data analysis and visualization. Lately I’m doing more and more of this work directly in Python. Here are some of the analyses I have run recently using Python:

  • Implement a naive Bayesian classifier to classify the sentiment in hundreds of thousands of tweets.
  • Linguistic analysis of donation appeals and tweets using Python’s Natural Language Tool Kit.
  • Create plots of number of tweets, retweets, and public reply messages per day, week, and month.
  • Run descriptive statistics and multiple regressions.

Summary

Learning a programming language is a challenge. Of that there is little doubt. Yet the payoff in improved productivity alone can be substantial. Add to that the powerful analytical and data visualization capabilities that open up to the researcher who is skilled in a programming language. Lastly, leaving aside the buzzword “Big Data,” programming opens up a world of new data found on websites, social media platforms, and online data repositories. I would thus go so far as to say that any researcher interested in social media is doing themselves a great disservice by not learning some programming. For this very reason, one of my goals on this site is to provide guidance to those who are interested in getting up and running on Python for conducting academic and social media research.

Share Button
image_pdfimage_print

Filed Under: Featured, python, research Tagged With: academic research, python

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Related posts

  • How Organizations Use Social Media: Engaging the Public
  • Your First Steps with Python: Part II -- Four Ways to Run your Code
  • Python: Where to Start?
  • Your First Steps with Python: Part I -- Running your First Python Code

Featured Posts

Python Data Analytics Tutorials

The bulk of my research involves some degree of 'Big Data' … [Read More...]

Downloading Tweets, Take III – MongoDB

In this tutorial I walk you through how to use Python and … [Read More...]

Does Twitter Matter?

Twitter is not the Gutenberg Press. The 'Big Data' … [Read More...]

Archives

  • November 2020
  • October 2020
  • July 2020
  • January 2019
  • October 2018
  • July 2018
  • May 2018
  • April 2018
  • March 2018
  • October 2017
  • September 2017
  • November 2016
  • October 2015
  • June 2015
  • May 2015
  • April 2015
  • November 2014
  • October 2014
  • September 2014
  • May 2014
  • April 2014

E-mail sign-up

Every time I post something new to my blog, receive it free by email.

No spam.

Contact Information

Gregory D. Saxton
Schulich School of Business
York University
Toronto, ON
gsaxton@yorku.ca

Recent Posts

  • Making a Contribution in Accounting Research, Part IV: Mapping the Conceptual Relationships in Nonprofit Accounting Articles
  • Making a Contribution in Accounting Research, Part II: Focus on Nonprofit Accounting
  • Making a Contribution in Accounting Research, Part I: Types of Contributions
  • Making a Contribution in Accounting Research, Part III: Relationships in Top Nonprofit Accounting Articles
  • Quest for Attention: Nonprofit Advocacy in a Social Media Age

Tag Cloud

academia academic research AccountingAnalytics arnova14 Big Data conference Data Analytics Database hashtags ica iPython MongoDB nonprofits PANDAS PhD_studies Programming python replication research social media socialmedia tutorial Twitter

Copyright © 2021 · Metro Pro Theme on Genesis Framework · WordPress · Log in