A codebook is a technical description of the data that was collected for a particular purpose. It describes how the data are arranged in the computer file or files, what the various numbers and letters mean, and any special instructions on how to use the data properly. Like any other kind of “book,” some codebooks are better than others. The best codebooks have:
R comes with many built-in data sets. To see the 100+ data sets that come with R just type data()
in your console and you’ll see a list that looks like:
For any of these built-in data sets you will find the “codebook,” the technical description of the data by typing ?
and then the name of the data set. This will bring up the “codebook” in your Help console. For instance, ?mtcars
will provide you with the technical information regarding the mtcars
built-in data set.
Getting the codebook for data that you are importing and using for your own analysis is a little more difficult. If you are using organizational data at your employer, this will likely require you to request the codebook from the database engineers or other folks that are intimately familiar with the data source. This seemingly simple task will surprise you by illustrating how few people truly understand the technical details underlying organizational data. If you are using publicly available online data, you may need to do some searching to identify the data. Sometimes codebooks are obviously and explicitly linked on the website, other times you have to do some digging to find the codebook. Some examples of codebooks follow:
The important thing to remember is that you need to identify the documentation that explicitly tells you about the data you are working with. If not then in your analysis you need to state what the implied meaning of the data is; however, you should also state that ambiguity may exist if a codebook can not be identified. With your final project, I expect you to explain the describe the source data you analyze and provide a citation (or URL link if possible) to the source database and codebook.
This site contains files of daily average temperatures for 157 U.S. cities. Identify the following: