LOGM 655: Text Mining

Text mining is the organization, classification, labeling and extraction of information from text sources. In these days of more information readily available through the internet, analysts and decision makers find themselves overloaded with data. Text mining is an application which can help analysts glean necessary information either for general understanding about a corpus of text documents, or for putting text into a form useful for the application of alternative analysis techniques.

This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques and tools they need to deal with text mining practice in a Joint military context.

Class Information

Course Objectives

The primary objective of this course is to help the student understand the basic techniques and processes for text mining and how to use them to make better decisions. More specifically, at the completion of the course, each student should be able to:

  • Understanding when, where, and how to perform text mining to appropriate problems and data sets.
  • Code and understand code using relevant R packages in text mining.
  • Lead and work within a group of researchers analyzing a specific text mining problem.

Course Text

  1. Text Mining with R, A Tidy Approach, J. Silge and D. Robinson, free webversion: http://tidytextmining.com/
  2. AFIT Data Science Lab R Programming gude. https://afit-r.github.io/descriptive#text-mining

Class Structure

This course blends textbook reading with online lectures and demonstrations that emphasize discussion and illustration of methods, as well as hands-on, practical applications that provide both a sound base of learning and an opportunity to test and develop skill. A flipped classroom will be emphasized where students spend time outside the classroom learning the material via the textbook and online material and the majority of in-class activities will be reserved to review, clarify, and do hands-on projects and coding. Thus, students should bring a laptop to class and be prepared to implement the tools and skills they are learning. Students should expect to dedicate approximately 2 hours of time outside of the classroom performing coursework for every 1 hour in the classroom.

Performance Evaluation

Your final course grade will be determined according to the following requirements and their respective weights.

Final grades will be distributed according to the following cutoffs:

  • A     94 – 100%
  • A-    90 – 93%
  • B+    87 – 89%
  • B      83 – 86%
  • B-    80 – 82%
  • C+    77 – 79%
  • C      73 – 76%
  • C-    70 – 72%
  • D & F   Hopefully None!

Schedule

tentative

Week Dates Lesson Description Learning Material Deliverables
1 Oct 2-6 Regular expressions  
2 Oct 9-13 Parsing & tidying text  
3 Oct 16-20 Methods for word relationship analysis Student-led
4 Oct 23-27 Methods for topic analysis Student-led
5 Oct 30-Nov 3 Methods for text classification   Student-led
6 Nov 6-10 Methods for text clusering   Student-led
7 Nov 13-17 Methods for sentiment analysis   Student-led
8 Nov 20-24 TBD    
9 Nov 27-Dec 1 Student presentations   Presentation
10 Dec 4-8 Student presentations   Presentation

Software

We will use this software during the course. Plan on bringing a computer to each class meeting.

  • R and RStudio will be used to perform all programming activities, assignments, and the final project. You can find details on how to download these here.
  • Slack will replace e-mail and Blackboard for our course. You will receive an invitation to the AFIT DSL slack team. You may wish to install one of the apps.

Policies:

  1. Attendance: Attendance at all class sessions and exams is mandatory for military and civilians assigned to AFIT as full-time students except for extenuating circumstances. Scheduled classes and exams are defined by the instructor and they are documented in the course schedule. Part-time students are expected to attend scheduled classes, and absences should be explained to the instructor. The student should provide advance notice, if possible. (References: Student Handbook, Graduate School Catalog)
  2. Academic Integrity: All students must adhere to the highest standards of academic integrity. Students are prohibited from engaging in plagiarism, cheating, misrepresentation, or any other act constituting a lack of academic integrity. Failure on the part of any individual to practice academic integrity is not condoned and will not be tolerated. Individuals who violate this policy are subject to adverse administrative action including disenrollment from school and disciplinary action. Individuals subject to the Uniform Code of Military Justice may be prosecuted under it. Violations by government civilian employees may result in administrative disciplinary action without regard to otherwise applicable criminal or civil sanctions for violations of related laws. (References: Student Handbook, ENOI 36 – 107, Academic Integrity)
  3. Academic Grievance: AFIT and the Graduate School of Engineering and Management affirm the right of each student to resolve grievances with the Institution. Students are guaranteed the right of fair hearing and appeal in all matters of judgment of academic performance. Procedures are detailed in ENOI 36 – 138, Student Academic Performance Appeals.
  4. Testing Policy: This is a project-based course. Consequently there will be no midterm or final exam.
  5. Late Assignments and Make-Ups: Late submissions will not be accepted.
  6. Tentative Plan: The course syllabus is a general plan for the course; deviations announced to the class by the instructor may be necessary.