Module 4

In module 3 you learned to import data. However, data by themselves are pretty useless so we need to start doing some basic care and feeding of the data we’ve imported. In this module we investigate good practices for when we get a new data set. Spending a little time up front to understand your data will help speed up your analysis later on. Thus, for this session we are going to focus on three objectives that we should have when we first open up a new data set:

  1. Review the codebook
  2. Learn about the data
  3. Visualize the data

Tutorials & Resources

Please work through the following tutorials prior to class. The skills and functions introduced in these tutorials will be necessary to complete your project deliverable #2.

1. Review the codebook: Understanding the source data is crucial to any analysis. A codebook is the documentation that explicitly tells you about the data you are working with and should be the first thing you review before starting any kind of analysis. Read Review the Codebook to get a taste of what to look for.

2. Learn about the data: When first opening a data set it is important to get a basic understanding of the data dimensions (rows and columns), what the data looks like, how many missing values are in the data, and some basic summary statistics such as mean, median, and the range of each variable. Read and work through Learn About the Data to understand some of the first things you should do with a fresh data set.

3. Quick visualizations: It is also good to get an initial understanding of your data through visual means. Module 5 will deep dive into creating more sophisticated visualizations; however, it is important to understand how to do some basic plotting for quick data exploration. Read and work through Getting Started with Charts in R.


Class Prep

  1. Work through the exercises in the module 3 & 4 tutorials. Bring your answers (and code) to class.
  2. Bring your thesis/dissertation data to class along with the codebook (if available). If you do not have your thesis data then identify an interesting data set you want to analyze for your final project and bring it to class. Be ready to import and get to know your data in class!

You can download class material here: