Module 3

“What we have is a data glut.” - Vernon Vinge

Data are being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Countless databases collect it. Data are arriving from multiple sources at an alarming rate and analysts and organizations are seeking ways to leverage these new sources of information. Consequently, analysts need to understand how to get data from these data sources. Furthermore, since analysis is often a collaborative effort analysts also need to know how to share their data.

Welcome to module 3! This session will cover the process of importing, exporting, and scraping data. First, you will learn the basics of importing tabular and spreadsheet data. You will also cover the equally important process of getting data out of R. Then, since modern day data wrangling often includes scraping data from the flood of web-based data becoming available to organizations and analysts, you will learn the fundamentals of web-scraping with R. This includes importing spreadsheet data files stored online, scraping HTML text and data tables, and leveraging APIs.

Consequently, this session will give you a strong foundation for the different ways to get your data into and out of R.


Tutorials & Resources

Read and work through the following tutorials. Due to the holiday we do not have class. However, the skills and functions introduced in these tutorials will be built on with the module 4 material and are also necessary to complete your project deliverable due at the end of the week…so don’t put this material off until the last minute!

Importing and exporting spreadsheet data

Scraping text & tables


Class Prep

There is no class; however, I would advise you to follow along and complete the exercises throughout the tutorials.