Last week we discussed general guidelines for first interacting with a new data set. This week we want to build on those activities by performing early exploratory data analysis to answer questions about your data via visualizing and transforming your data. We have two objectives for this week:
ggplot2
package to advance your visualization skills to systematically analyze your data.dplyr
package to perform many common data transformation and manipulation tasks.Combining the activities of data transformation and visualization in a methodical way is what defines exploratory data analysis (EDA). Only by systematically applying these techniques will you be able to answer and refine questions about your data. Module 5 focuses on the visualization component of EDA.
Being able to create visualizations (graphical representations) of data is a key step in data analysis. In this module you will learn to use the ggplot2
library to visualize your data. As illustrated in the last module R does provide built-in plotting functions; however, the ggplot2
library implements what is known as the Grammar of Graphics. This makes it particularly effective for describing how visualizations should represent data, and has turned it into the preeminent plotting library in R.
The following tutorials will provide you the knowledge and skills required to create the meaningful, elegant, and finely tuned data visualizations that I will be looking for in the remainder of your project deliverables.
Introduction to ggplot2
: Read and work through Chapter 3: Data Visualization in R for Data Science to get an introduction to the ggplot2
package.
Advancing your visualizations: In your final project I will be looking for publication worthy visualizations. Thus, I fully expect your visualizations to improve with each deliverable submitted. Therefore it is essential that you learn how to use some of the more advanced features of ggplot2
and other packages that work with ggplot2
. Here are some resources to help you take your visualizations to the next level:
ggplot2
to answer these questions in class.