Introduction to R Markdown for a Thesis

Writing a thesis is stressful. R Markdown can help by reducing the number of things you have to think about. This guide can be used as an introduction to R Markdown, specifically for people working on a research thesis.

If you’d like to follow along with the actual R Markdown file, here is a link to that. The file you want is “2018-03-11-introduction-to-r-markdown.Rmd” and you can download it by right clicking and downloading. And here is a link to the images included in this post - you’ll need those images to fully run the Markdown file (or you can delete the references to them.)

Abstract

R’s version of Markdown is essentially a Word document that allows R code. You can write in it just like in Word and include R code which will output right in the document. This is useful because if you change your code (such as changing a graph), it will also be changed in the document. You no longer have to remember to copy over the new results (graphs, tables, etc.) from R to Word. If you are changing your results (and you will more than you expect), using R Markdown will save you time.

Plain text

Markdown works nearly identical to Word. The major difference is that it allows R code and you have to do more formatting yourself. If you do use Markdown, the main advice is to ignore formatting until the end! Just ignore formatting until you absolutely need to. It is easiest to output to an HTML file until you are ready to format. HTML generally ignores formatting issues (such as figure alignment) that PDF or Word encounter.

That said, let’s start with making a Markdown file in RStudio.

Opening a new R Markdown file

Opening a new R Markdown file

As seen above, you make a new R Markdown file using the same menu as making an R Script file, but now select R Markdown.... From here it’ll open up a window where you select the title, author, and type of output. You can always change all three of these selections right in the Markdown file. Selecting PDF may require you to download additional software to get it to output - some OS may already have the software installed. For a nice guide to using PDF with R Markdown, see here. Formatting is also trickier in PDF than in HTML or Word.

Selecting the file’s name, author, and output type When you click OK, it will open a new R Markdown file that is already populated with example plain text and code. You can delete this entirely or modify what’s there to work with what you need.

The default R Markdown file

The default R Markdown file

Let’s start with the text and its formatting. The first line of text shows two pound signs (#) followed by “R Markdown”. When a line start with # it means that that line is a header. Sections of the thesis should start with an # to indicate a new section. There are 6 types of headers and each is smaller than the previous one (1 is biggest, 6 is smallest). Header 1 has one # (e.g. # Header 1), header 6 has 6 # (e.g. ###### Header 6). Headers after header 1 are good for making subsections.

Header 1

Header 2

Header 3

Header 4

Header 5
Header 6

New paragraphs can be made by ending a line with two spaces (or, for simplicity you can just have an entire blank line between paragraphs). Though it is not common in academic writing, you can make a word (or words) italic by surrounding it by a single asterix or underscore, like so (*so*) and so (_so_). To bold a word (or words), use two asterix, like so (**so**) and so (__so__). Include a block quote by starting a line with > followed by your quote

This is a block quoted. Good for multiline quotes.

You can make tables using formatting in the text, but it’s easier to do so using code. We will see that soon.

Remembering all the formatting rules gets tricky and is not a good use of your time or effort. Refer to this reference guide for all the rules (such as how to include links in text). The reference guide includes more formatting options than noted here. This section of the free book R for Data Science is also an excellent resource.

Again, focus on the content - write as if you were just using Word. Worry about formatting last.

Code

The reason R Markdown is useful is because you can include code output in the file. Here is an example before we get into using code in Markdown. Below is an arbitrary graph.

plot(1:100, type = "h", col = "yellow",
     main = "Awful graph title")

If you had put this graph in your Word document and decided to change it you would have to replace it every time you made a change. While this is a minor procedure, it adds up if you have many graphs or decide to change the graph frequently. Below is the same graph with just the title and color changed. If this were my thesis, I could have changed the original code and been assured that the most up-to-date graph (or table, etc.) is always the one included with the thesis. Change the code and never think about it.

plot(1:100, type = "h", col = "blue",
     main = "Great graph title!")

Only use code to make output

Code in R Markdown is for showing results. This means only include the code that makes the graphs, tables (including regression output), etc.. The code to read in data and clean it should remain in your R Scripts. The code you do include here should only read in the cleaned data and create the output to include in the document.

The terminology used here will be chunks. This just means pieces of R code included among the plain text. All code must be in these chunks to be run. Inside a chunk, you can run any code you like. To make a chunk, click Insert and then R. It will add a chunk wherever your cursor is.

If you do not have the R Markdown file in the same folder as your data, you’ll need to set the working directory in a chunk before reading the data. However, once a working directory is set, or data is read, it applies for all following chunks. You will also need to run any packages (using library()) to use them in a chunk.

Adding a new code chunk

Adding a new code chunk

A code chunk

A code chunk

Notice the three ` at the top and bottom of the chunk. Don’t touch these! They tell R that anything in it is a chunk (i.e. that R should run the code). Inside the squiggly brackets are instructions about how the code is outputted. Here you can specify, among other things (see reference guide for more), if the code will be outputted or just the output itself, captions for tables or graphs, and formatting for output. Include all of these options after the r in the squiggly brackets. Multiple options must be separated by a comma (just like options in normal R functions). The most common option is echo which says whether to show the code in the document or not. For the thesis you only want to show the output, not the code so include echo = FALSE.

Here’s an example, first with default options then with echo = FALSE.

print("Hello")
## [1] "Hello"
## [1] "Hello"

Though you can’t see it unless looking at the Markdown file, there are two code chunks, the second which has the echo = FALSE option. That is why you only see the code “print(”Hello“)” above the first result. When you make the output file (called “knitting” the file), all code chunks will run and you will see the output included in the document. To run the code inside of RStudio, click the green right-arrow at the top-right of each chunk. This will run all code in that chunk.

Run the code in a chunk

Run the code in a chunk

Figures and Tables

Above was a code chunk that makes a graph. The only addition you will likely make to this type of code is to add a caption. Do this by adding the option fig.cap = "" with the caption in the quotes.

This is an example figure caption

Figure 1: This is an example figure caption

Note the figure caption is in the chunk options section (squiggly brackets)

Note the figure caption is in the chunk options section (squiggly brackets)

There a number of packages that change how tables are displayed. We will use the basic knitr package. The easiest way to make a table in Markdown is to make a data.frame with all the data (and column names) you want and then show that data.frame. For this example we will subset the mtcars data (which is included in R) to just the first 5 rows and columns. The kable function from the knitr package will then make a nice looking table. With kable you can add the caption directly in the kable() function rather than using fig.cap ="" in the chunk. The option echo is set to TRUE here so you can see the code without looking at the Markdown file itself.

library(knitr)
mtcars_small <- mtcars[1:5, 1:5]
kable(mtcars_small, caption = "This is an example table caption")
Table 1: This is an example table caption
mpg cyl disp hp drat
Mazda RX4 21.0 6 160 110 3.90
Mazda RX4 Wag 21.0 6 160 110 3.90
Datsun 710 22.8 4 108 93 3.85
Hornet 4 Drive 21.4 6 258 110 3.08
Hornet Sportabout 18.7 8 360 175 3.15

Finally, making the output file

To create the Word or PDF output click Knit and it will create the output in the format set in the very top. To change this format click the white down-arrow directly to the right of Knit and it will drop-down a menu with output options. Click the option you want and it will output it in that format and change that to the new default. Sometimes it takes a while for it to output, so be patient. And sometimes it fails so just try to output again.

Change the output format

Change the output format

Markdown has many more options than presented here. This lesson should get you through the thesis. If you are interested in Markdown, there are great resources online.

Related