OPER 682: Operational Data Science with R
Welcome to Operational Data Science with R! This course provides an intensive, hands-on introduction to data science with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, visualize, and model data in a computing environment that fosters reproducibility.
Course Overview
Data Science is the study of the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set spanning operations research, statistics, and computer science along with a good understanding of crafting a problem formulation for effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques and tools they need to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, descriptive and predictive modeling, data product creation, evaluation, and effective communication. The focus in the treatment of these topics will be on breadth, rather than depth, and emphasis will be placed on integration and synthesis of concepts and their application to solving problems. To make the learning contextual, real datasets from a variety of Air Force domains will be used.
Course Objectives
Upon successfully completing this course, you will be able to:
- Perform your data analysis in a literate programming environment
- Import and manage structured and unstructured data
- Manipulate, transform, and summarize your data
- Join disparate data sources
- Methodically explore and visualize your data
- Perform iterative functions
- Write your own functions
- Understand the implementation of a breadth of modeling techniques
- Execute an analytic technique thoroughly
…all with R!
Class Structure
This class emphasizes a “flipped” class style where you learn the material outside of class and then spend the majority of in-class time reviewing and executing code.
First Half of Quarter
Each week I plan to have you read through selected tutorials on specific analytic activities in R. I will assign problems/activities that you will need to perform prior to each session. Then in each class I’ll spend about 15-30 minutes reviewing the analytic activity and answer any burning questions. Then you will break up into defined small groups and review each others code and approaches to solving the assigned problems. And finally, for the 30-45 minutes of class you and your small group will work together to complete another task.
The purpose for this course structure is multi-dimensional:
- It will teach you to read and learn R programming tutorials and techniques on your own
- The out-of-class assignments will force you to come to each class prepared and these assignments will also prepare you for your final project
- The in-class peer review will help you get feedback on your code and also teach you to review other people’s code
- The in-class small group work will teach you to work on a coding task collaboratively and within a constrained time limit
Second Half of Quarter
The second half of the quarter will be student driven. Students will select an analytic technique that is relevant to their thesis. Students will learn how to execute this technique along with interpreting and validating the results. Students will generate a toy-problem tutorial and present/teach to the class. Furthermore, students will apply this technique to their thesis data for their final project. By the end of this class you should have initial results for one of your thesis objectives!
Material
All required classroom material will be provided in class or online. Any recommended yet optional material will also be provided in the classroom notes.
Schedule
tentative
Week |
Lesson Description & Material |
1 |
Introduction & Reproducibility |
Jun 26 |
Intro to data science, R, and course outline |
Jun 28 |
Managing workflow & reproducibility |
2 |
First Date Guidelines for Data |
Jul 3 |
Importing & exporting data (No Class) |
Jul 5 |
Getting to know your data |
3 |
Exploratory Data Analysis |
Jul 10 |
Advancing your visualizations with ggplot2 |
Jul 12 |
dplyr for data transformation |
4 |
Controlling Your Data |
Jul 17 |
Data structures & tidyness |
Jul 19 |
Relational data |
5 |
Dealing with Different Types of Data |
Jul 24 |
Text mining |
Jul 26 |
Factors & dates |
6 |
Creating Efficient Code |
Jul 31 |
Writing functions |
Aug 2 |
Iteration |
7 |
Bonus Week |
Aug 7 |
Analytic development |
Aug 9 |
Analytic development |
8 |
Student-led Analytic Learning |
Aug 14 |
Analytic technique |
Aug 16 |
Analytic technique |
9 |
Student-led Analytic Learning |
Aug 21 |
Analytic technique |
Aug 23 |
Analytic technique |
10 |
Student-led Analytic Learning |
Aug 28 |
Analytic technique |
Aug 30 |
Analytic technique |
11 |
Finals Week: |
Sep 6 |
No class - final project due |
Grading Policies
Course grades will consist of:
Final grades will be distributed according to the following cutoffs:
- A 94 – 100%
- A- 90 – 93%
- B+ 87 – 89%
- B 83 – 86%
- B- 80 – 82%
- C+ 77 – 79%
- C 73 – 76%
- C- 70 – 72%
- D & F Hopefully None!
Software
We will use this software during the course. Plan on bringing a computer to each class meeting.
- R and RStudio will be used to perform all programming activities, assignments, and the final project. You can find details on how to download these here.
- Slack will replace e-mail and Blackboard for our course. You will receive an invitation to the AFIT DSL slack team. You may wish to install one of the apps.
Policies:
- Attendance: Attendance at all class sessions and exams is mandatory for military and civilians assigned to AFIT as full-time students except for extenuating circumstances. Scheduled classes and exams are defined by the instructor and they are documented in the course schedule. Part-time students are expected to attend scheduled classes, and absences should be explained to the instructor. The student should provide advance notice, if possible. (References: Student Handbook, Graduate School Catalog)
- Academic Integrity: All students must adhere to the highest standards of academic integrity. Students are prohibited from engaging in plagiarism, cheating, misrepresentation, or any other act constituting a lack of academic integrity. Failure on the part of any individual to practice academic integrity is not condoned and will not be tolerated. Individuals who violate this policy are subject to adverse administrative action including disenrollment from school and disciplinary action. Individuals subject to the Uniform Code of Military Justice may be prosecuted under it. Violations by government civilian employees may result in administrative disciplinary action without regard to otherwise applicable criminal or civil sanctions for violations of related laws. (References: Student Handbook, ENOI 36 – 107, Academic Integrity)
- Academic Grievance: AFIT and the Graduate School of Engineering and Management affirm the right of each student to resolve grievances with the Institution. Students are guaranteed the right of fair hearing and appeal in all matters of judgment of academic performance. Procedures are detailed in ENOI 36 – 138, Student Academic Performance Appeals.
- Testing Policy: This is a project-based course. Consequently there will be no midterm or final exam.
- Late Assignments and Make-Ups: Late submissions will not be accepted.
- Tentative Plan: The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.
Acknowledgments:
I have drawn ideas or readings from the following syllabi: