Module 6 builds onto our exploratory data analysis capabilities by focusing on the transformation component of EDA. Common transformation procedures performed include filter observations by their values, reorder the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics. Furthermore, we often group our data at different aggregation levels when performing these transformation tasks. This module covers these basic capabilities.
Please work through the following tutorials prior to class. The skills and functions introduced in these tutorials will be necessary to complete your project deliverable #3.
Transform your data: Although many fundamental data manipulation functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together. dplyr
is one such package which was built for the sole purpose of simplifying the process of manipulating, sorting, summarizing, and joining data frames. Read and work through Chapter 5: Data Transformation in R for Data Science.
Exploratory data analysis: Combining visualization and data transformation allows you to efficiently explore your data. Read and work through Chapter 7: Exploratory Data Analysis in R for Data Science.
dplyr
and ggplot2
to answer these questions in class.