The bulk of my research involves some degree of ‘Big Data’ — such as datasets with a million or more tweets. Getting these data prepped for analysis can involve massive amounts of data manipulation — anything from aggregating data to the daily or organizational level, to merging in additional variables, to generating data required for social network analysis. For all such steps I now almost exclusively use Python’s PANDAS library (‘Python Data Analysis Library’). In conjunction with the Jupyter Notebook interactive computing framework and packages such as NetworkX, you will have a powerful set of analysis tools at your disposal. This page contains links to the tutorials I have created to help you learn data analytics in Python. I also have a page with shorter (typically one-liner) data analytic code bytes.
- Analyzing Big Data with Python PANDAS (Overview)
- Set up iPython, Import Twitter Data and Select Cases (coming soon)
- Aggregating and Analyzing Data by Twitter Account (coming soon)
- Analyzing Twitter Data by Time Period (coming soon)
- Analyzing Hashtags (coming soon)
- Generating New Variables (coming soon)
- Producing a Summary Statistics Table for Publication (coming soon)
- Analyzing Audience Reaction on Twitter (coming soon)
- Running, Interpreting, and Outputting Logistic Regression (coming soon)
I hope you have found this helpful. If so, please spread the word, and happy coding!