Crime by the Numbers
This book introduces the programming language R and is meant for undergrads or graduate students studying criminology. R is a programming language that is well-suited to the type of work frequently done in criminology - taking messy data and turning it into useful information. While R is a useful tool for many fields of study, this book focuses on the skills criminologists should know and uses crime data for the example data sets.
For this book you should have the latest version of R installed and be running it through RStudio Desktop (The free version) . We’ll get into what R and RStudio are soon but please have them installed to be able to follow along with each chapter. I highly recommend following along with the code for each lesson and then try to use the lessons learned on a data set you are interested in. To download the data used in this book please see here.
Why learn to program?
With the exception of some more advanced techniques like scraping data from websites or from PDFs, nearly everything we do here can be done through Excel, a software you’re probably more familiar with. The basic steps for research projects are generally:
- Open up a data set - which frequently comes as an Excel file!
- Change some values - misspellings or too specific categories for our purposes are very common in crime data
- Delete some values - such as states you won’t be studying
- Make some graphs
- Calculate some values - such as number of crimes per year
- Sometimes do a statistical analysis depending on the type of project
- Write up what you find
R can do all of this but why should you want (or have) to learn a entirely new skill just to do something you can already do? R is useful for two main reasons: scale and reproducibility.
If you do a one-off project in your career such as downloading some data and making a graph out of it, it makes sense to stick with software like Excel. The cost (in time and effort) of learning R is certainly not worth it for a single (or even several) project - even one perfectly suited for using R. R (and many programming languages more generally, such as Python) has its strength in doing something fairly simple many many times. For example, it may be quicker to download one file yourself than it is to write the code in R to download that file. But when it comes to downloading hundreds of files, writing the R code becomes very quickly the better option than doing it by hand.
For most tasks you do in criminology when dealing with data you will end up them doing many times (including doing the same task in future projects). So R offers the trade-off of spending time upfront by learning the code with the benefit of that code being able to do work at a large scale with little extra work from you. Please keep in mind this trade-off - you need to front-load the costs of learning R for the rewards of making your life easier when dealing with data - when feeling discouraged about the small returns you get early in learning R.
The second major benefit of using R over something like Excel is that R is reproducible. Every action you take is written down. This is useful when collaborating with others (including your future self) as they can look at your code and follow along what you did without you having to show them every click you made as you frequently would on Excel. Your collaborator can look at your code to help you figure out a bug in the code or to add their own code to yours.
In the research context specifically you want to have code to give to people to ensure that your research was done correctly and there aren’t bugs in the code. Additionally, if you build a tool to, for example, interpret raw crime data from an agency and turn it into a map, being able to share the code so others can modify it for their own city saves these people a lot of time and effort.
What you will learn
For many of the lessons we will be working through real research questions and working from start to finish as you would on your own project. This involves thinking about what you want to accomplish from the data you have and what steps you need to take to reach that goal. This involves more than just knowing what code to write - it includes figuring out what your data has, whether it can answer the question you’re asking, and planning out (without writing any code yet) what you need to do when you start coding.
There is a large range of skills in criminological research - far too large to cover in a single book. Here we will attempt to teach fundamental skills to build a solid foundation for future work. We’ll be focusing on the following skills and trying to reinforce our skills with each lesson.
- Subsetting - Taking only certain rows or columns from a data set
- Regular expressions - Essentially R’s “Find and Replace” function for text
- Getting data from websites (webscraping)
- Getting data from PDFs
- Writing documents through R
Criminology has a large - and growing - number of data sets publicly available for us to use. In this book we will focus on a few prominent ones including the following:
- Uniform Crime Report (UCR) - A FBI data set with agency-level crime data for nearly every agency in the United States
We’ll also cover a number of other data sets such as local police data and government data on alcohol consumption in the United States.
What you won’t learn
This book is not a statistics book so we will not be covering any statistical techniques. Though some data sets we handle are fairly large, this book does not discuss how to deal with Big Data. While the lessons you learn in this book can apply to larger data sets, Big Data (which I tend to define loosely as data that are too large for my computer to handle) requires special skills that are outside the realm of this book. If you do intend to deal with huge data sets I recommend you look at the R package data.table which is an excellent resource for it. While we briefly cover mapping, this book will not cover working with geographic data in detail. For a comprehensive look at geographic data please see this book.
Simple vs Easy
In the course of this book we will cover things that are very simple. For example, we’ll take a data set (think of it like an Excel file) with crime for nearly every agency in the United States and keep only data from Colorado for a small number of years. We’ll then find out how many murders happened in Colorado each year. This is a fairly simple task - it can be expressed in two sentences. You’ll find that most of what you do is simple like this - it is quick to talk about what you are doing and the concepts are not complicated. What it isn’t is easy. To actually write the R code to do this takes knowing a number of interrelated concepts in R and several lines of code to implement each step.
While this distinction may seem minor, I think it is important for newer programmers to understand that what they are doing may be simple to talk about but hard to implement. It is easy to feel like a bad programmer because something that can be articulated in 10 seconds may take hours to do. So during times when you are working with R try to keep in mind that even though a project may be simple to articulate, it may be hard to code and that there is often very little correlation between the two.
How to Contribute
If you have any questions, suggestions, or find any issues please email me at firstname.lastname@example.org. If this book has helped you, also email me so I can try to measure the book’s impact and who is using it.