Python Data Analytics Tutorials: PANDAS, Machine Learning, and Big Data

The bulk of my research involves some degree of ‘Big Data’ — such as datasets with a million or more tweets. Getting these data prepped for analysis can involve massive amounts of data manipulation — anything from aggregating data to the daily or organizational level, to merging in additional variables, to generating data required for social network analysis. For all such steps I now almost exclusively use Python’s PANDAS library (‘Python Data Analysis Library’). In conjunction with the Jupyter Notebook interactive computing framework and packages such as NetworkX, you will have a powerful set of analysis tools at your disposal. This page contains links to the tutorials I have created to help you learn data analytics in Python. I also have a page with shorter (typically one-liner) data analytic code bytes.

Data Collection

Data Analysis

Analyzing Big Data with Python PANDAS (Overview)
Set up Jupyter, Import Twitter Data and Select Cases
Aggregating and Analyzing Data by Twitter Account (coming soon)
Analyzing Twitter Data by Time Period (coming soon)
Analyzing Hashtags (coming soon)

Generating New Variables (coming soon)
Producing a Summary Statistics Table for Publication (coming soon)
Analyzing Audience Reaction on Twitter (coming soon)
Running, Interpreting, and Outputting Logistic Regression (coming soon)

I hope you have found this helpful. If so, please spread the word, and happy coding!

Data Collection

Data Analysis

Explore more

Contact Information