Welcome to Operational Data Science with R! This course provides an intensive, hands-on introduction to data science with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, visualize, and model data in a computing environment that fosters reproducibility.
Data Science is the study of the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set spanning operations research, statistics, and computer science along with a good understanding of crafting a problem formulation for effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques and tools they need to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, descriptive and predictive modeling, data product creation, evaluation, and effective communication. The focus in the treatment of these topics will be on breadth, rather than depth, and emphasis will be placed on integration and synthesis of concepts and their application to solving problems. To make the learning contextual, real datasets from a variety of Air Force domains will be used.
Upon successfully completing this course, you will be able to:
…all with R!
This class emphasizes a “flipped” class style where you learn the material outside of class and then spend the majority of in-class time reviewing and executing code.
Each week I plan to have you read through selected tutorials on specific analytic activities in R. I will assign problems/activities that you will need to perform prior to each session. Then in each class I’ll spend about 15-30 minutes reviewing the analytic activity and answer any burning questions. Then you will break up into defined small groups and review each others code and approaches to solving the assigned problems. And finally, for the 30-45 minutes of class you and your small group will work together to complete another task.
The purpose for this course structure is multi-dimensional:
The second half of the quarter will be student driven. Students will select an analytic technique that is relevant to their thesis. Students will learn how to execute this technique along with interpreting and validating the results. Students will generate a toy-problem tutorial and present/teach to the class. Furthermore, students will apply this technique to their thesis data for their final project. By the end of this class you should have initial results for one of your thesis objectives!
All required classroom material will be provided in class or online. Any recommended yet optional material will also be provided in the classroom notes.
|Week||Lesson Description & Material|
|1||Introduction & Reproducibility|
|Jun 26||Intro to data science, R, and course outline|
|Jun 28||Managing workflow & reproducibility|
|2||First Date Guidelines for Data|
|Jul 3||Importing & exporting data (No Class)|
|Jul 5||Getting to know your data|
|3||Exploratory Data Analysis|
|Jul 10||Advancing your visualizations with
|4||Controlling Your Data|
|Jul 17||Data structures & tidyness|
|Jul 19||Relational data|
|5||Dealing with Different Types of Data|
|Jul 24||Text mining|
|Jul 26||Factors & dates|
|6||Creating Efficient Code|
|Jul 31||Writing functions|
|Aug 7||Analytic development|
|Aug 9||Analytic development|
|8||Student-led Analytic Learning|
|Aug 14||Analytic technique|
|Aug 16||Analytic technique|
|9||Student-led Analytic Learning|
|Aug 21||Analytic technique|
|Aug 23||Analytic technique|
|10||Student-led Analytic Learning|
|Aug 28||Analytic technique|
|Aug 30||Analytic technique|
|Sep 6||No class - final project due|
Course grades will consist of:
40% final project
Final grades will be distributed according to the following cutoffs:
We will use this software during the course. Plan on bringing a computer to each class meeting.
I have drawn ideas or readings from the following syllabi: