OPER 682: Operational Data Science with R

Welcome to Operational Data Science with R! This course provides an intensive, hands-on introduction to data science with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, visualize, and model data in a computing environment that fosters reproducibility.

Course Overview

Data Science is the study of the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set spanning operations research, statistics, and computer science along with a good understanding of crafting a problem formulation for effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques and tools they need to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, descriptive and predictive modeling, data product creation, evaluation, and effective communication. The focus in the treatment of these topics will be on breadth, rather than depth, and emphasis will be placed on integration and synthesis of concepts and their application to solving problems. To make the learning contextual, real datasets from a variety of Air Force domains will be used.

Class Information

Course Objectives

Upon successfully completing this course, you will be able to:

  • Perform your data analysis in a literate programming environment
  • Import and manage structured and unstructured data
  • Manipulate, transform, and summarize your data
  • Join disparate data sources
  • Methodically explore and visualize your data
  • Perform iterative functions
  • Write your own functions
  • Understand the implementation of a breadth of modeling techniques
  • Execute an analytic technique thoroughly

…all with R!

Class Structure

This class emphasizes a “flipped” class style where you learn the material outside of class and then spend the majority of in-class time reviewing and executing code.

First Half of Quarter

Each week I plan to have you read through selected tutorials on specific analytic activities in R. I will assign problems/activities that you will need to perform prior to each session. Then in each class I’ll spend about 15-30 minutes reviewing the analytic activity and answer any burning questions. Then you will break up into defined small groups and review each others code and approaches to solving the assigned problems. And finally, for the 30-45 minutes of class you and your small group will work together to complete another task.

The purpose for this course structure is multi-dimensional:

  1. It will teach you to read and learn R programming tutorials and techniques on your own
  2. The out-of-class assignments will force you to come to each class prepared and these assignments will also prepare you for your final project
  3. The in-class peer review will help you get feedback on your code and also teach you to review other people’s code
  4. The in-class small group work will teach you to work on a coding task collaboratively and within a constrained time limit

Second Half of Quarter

The second half of the quarter will be student driven. Students will select an analytic technique that is relevant to their thesis. Students will learn how to execute this technique along with interpreting and validating the results. Students will generate a toy-problem tutorial and present/teach to the class. Furthermore, students will apply this technique to their thesis data for their final project. By the end of this class you should have initial results for one of your thesis objectives!

Material

All required classroom material will be provided in class or online. Any recommended yet optional material will also be provided in the classroom notes.

Schedule

tentative

Week Lesson Description & Material
1 Introduction & Reproducibility
Jun 26 Intro to data science, R, and course outline   
Jun 28 Managing workflow & reproducibility   
2 First Date Guidelines for Data
Jul 3 Importing & exporting data (No Class)   
Jul 5 Getting to know your data   
3 Exploratory Data Analysis
Jul 10 Advancing your visualizations with ggplot2   
Jul 12 dplyr for data transformation   
4 Controlling Your Data
Jul 17 Data structures & tidyness   
Jul 19 Relational data   
5 Dealing with Different Types of Data
Jul 24 Text mining   
Jul 26 Factors & dates   
6 Creating Efficient Code
Jul 31 Writing functions   
Aug 2 Iteration   
7 Bonus Week
Aug 7 Analytic development
Aug 9 Analytic development
8 Student-led Analytic Learning
Aug 14 Analytic technique
Aug 16 Analytic technique
9 Student-led Analytic Learning
Aug 21 Analytic technique
Aug 23 Analytic technique
10 Student-led Analytic Learning
Aug 28 Analytic technique
Aug 30 Analytic technique
11 Finals Week:
Sep 6 No class - final project due

Grading Policies

Course grades will consist of:

Final grades will be distributed according to the following cutoffs:

  • A     94 – 100%
  • A-    90 – 93%
  • B+    87 – 89%
  • B      83 – 86%
  • B-    80 – 82%
  • C+    77 – 79%
  • C      73 – 76%
  • C-    70 – 72%
  • D & F   Hopefully None!

Software

We will use this software during the course. Plan on bringing a computer to each class meeting.

  • R and RStudio will be used to perform all programming activities, assignments, and the final project. You can find details on how to download these here.
  • Slack will replace e-mail and Blackboard for our course. You will receive an invitation to the AFIT DSL slack team. You may wish to install one of the apps.

Policies:

  1. Attendance: Attendance at all class sessions and exams is mandatory for military and civilians assigned to AFIT as full-time students except for extenuating circumstances. Scheduled classes and exams are defined by the instructor and they are documented in the course schedule. Part-time students are expected to attend scheduled classes, and absences should be explained to the instructor. The student should provide advance notice, if possible. (References: Student Handbook, Graduate School Catalog)
  2. Academic Integrity: All students must adhere to the highest standards of academic integrity. Students are prohibited from engaging in plagiarism, cheating, misrepresentation, or any other act constituting a lack of academic integrity. Failure on the part of any individual to practice academic integrity is not condoned and will not be tolerated. Individuals who violate this policy are subject to adverse administrative action including disenrollment from school and disciplinary action. Individuals subject to the Uniform Code of Military Justice may be prosecuted under it. Violations by government civilian employees may result in administrative disciplinary action without regard to otherwise applicable criminal or civil sanctions for violations of related laws. (References: Student Handbook, ENOI 36 – 107, Academic Integrity)
  3. Academic Grievance: AFIT and the Graduate School of Engineering and Management affirm the right of each student to resolve grievances with the Institution. Students are guaranteed the right of fair hearing and appeal in all matters of judgment of academic performance. Procedures are detailed in ENOI 36 – 138, Student Academic Performance Appeals.
  4. Testing Policy: This is a project-based course. Consequently there will be no midterm or final exam.
  5. Late Assignments and Make-Ups: Late submissions will not be accepted.
  6. Tentative Plan: The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.

Acknowledgments:

I have drawn ideas or readings from the following syllabi: