Module 6

Module 6 builds onto our exploratory data analysis capabilities by focusing on the transformation component of EDA. Common transformation procedures performed include filter observations by their values, reorder the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics. Furthermore, we often group our data at different aggregation levels when performing these transformation tasks. This module covers these basic capabilities.


Tutorials & Resources

Please work through the following tutorials prior to class. The skills and functions introduced in these tutorials will be necessary to complete your project deliverable #3.

  1. Transform your data: Although many fundamental data manipulation functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together. dplyr is one such package which was built for the sole purpose of simplifying the process of manipulating, sorting, summarizing, and joining data frames. Read and work through Chapter 5: Data Transformation in R for Data Science.

  2. Exploratory data analysis: Combining visualization and data transformation allows you to efficiently explore your data. Read and work through Chapter 7: Exploratory Data Analysis in R for Data Science.


Class Prep

  1. Work through the exercises in Chapters 5 & 7 of R for Data Science.
  2. Building onto the last module, expand the previous list of questions or identify at least 10 new questions you want to ask of your thesis data. What exploratory data analysis sequences do you need to implement to answer these questions? Be ready to use dplyr and ggplot2 to answer these questions in class.

You can download class material here: