knitr
It's easy to generate reports dynamically in R
. Literate programming is a paradigm that originally came from Donald Knuth (Knuth, 1984).
Basic idea: Write data + software + documentation (or in this case manuscripts, reports) together.
Analysis code can be divided into text and code "chunks". Doing so allows us to extract the code for machine readable documents (technically referred to as a tangle
) or produce a human-readable document (also called weave
).
Literate programming involves three main steps:
Results from scientific research have to be easy to reproduce so others can verify results making them trustworthy. Otherwise we risk producing one off results that no one outside the original research group can reproduce. In this lesson we will learn reproducible research, which is one of the by products of dynamics report generation. However, this process alone will not always guarantee reproducibility.
knitr
# Installing knitr is quite easy.
install.packages("knitr", dependencies = TRUE)
install.packages("pander") # Pander is a useful package for formatting tables in markdown.
Knitr supports a variety of documentation formats including markdown
, html
and LaTeX
. It also allows for easy export to PDF
and HTML
.
Markdown is an incredibly simple semantic file format, not too dissimilar from .doc, .rtf, or .txt. Markdown makes it easy for even those without significant knowledge of markup languages like html or LaTex to write any sort of text (including with links, lists, bullets, etc.) and have it parsed into a variety of formats. To learn more about the basics of markdown, peruse this short tutorial on the format.
csv
over xls
). This will ensure that your data will be readable long into the future.Good and Bad Practices
Related: See best practices in the R-basics folder.
In RStudio, choose new R Markdown file (easiest way)
or you can create a new text file and save it with extension .Rmd
.
A basic code chunk looks like this:
```{r}
# some R code
```
You can knit this document using the knit button or do it programmatically using the knit()
function.
library(knitr)
knit("file.Rmd")
What just happened?
knitr read the Rmd file, then located and ran all the code chunks identified by the backticks, and replaced it with the output of R function calls. If figures are generated from any such calls, they will be included in markdown syntax.
You can also name your code chunks. This allows you to keep all the code in a separate script and just refer to code chunks using meaningful names (e.g. data-processing, analysis, model-fitting, visualization, figures, tables)
{r, chunk_name}
Some rules on naming chunks
In addition to naming chunks within the curly braces, you can also add a bunch of other options on how that particular code chunk should behave.
Other options you can add to the tag
Option | Description |
---|---|
echo = TRUE or FALSE | to show or hide code. |
eval = TRUE or FALSE | to run or skip the code. |
warning = TRUE or FALSE | to show or hide function warnings. |
message = TRUE or FALSE | to show or hide function R messages. |
results = "hide" | will hide results. They will still be executed |
fig.height = | Height of figure |
fig.width = | width of figure |
Once your output markdown files (.md
) files are generated, you should never edit them because they are automatically generated. Next time you knit the original .Rmd
files, all the changes in the .md
file will get wiped out.
Write sentences in text with inline output
Include some text `r mean(1:5)`.
Summarizing output from models.
```{r fit_model}
library(datasets)
data(airquality)
fit = lm(Ozone ~ Wind + Temp + Solar.R, data = airquality)
```
## Including formatted tables in markdown
```{r showtable, results="asis", echo = FALSE, message = FALSE, eval = FALSE, warning = FALSE}
library(pandoc)
pander(fit)
```
Global options are shared across all the following chunks after the location in which the options are set, and local options in the chunk header can override global options.
```{r setoptions, eval = FALSE, echo = FALSE}
options(width = 60, show.signif.stars = FALSE)
opts_chunk$set(echo = FALSE,
results = "asis",
warning = FALSE,
message = FALSE,
fig.width = 5,
fig.height = 4,
tidy = TRUE,
fig.align = 'center')
```
Dealing with long running process
By adding cache = TRUE
to a code block definition. After the first run, results will be cached. We'll discuss better ways to acheive the same thing using Make
in the next section.
Generating reports in knitr doesn't always have to involve a laborious .Rmd
file where scripts need to be broken down into smaller chunks. Sometimes a user might need a simple report generated very quickly from an existing script. The function stitch()
in knitr makes it possible to generate nicely formatted reports from R scripts.
knitr provides a template of the source document with default settings which allows the user to simply pass any R script into this template (consider this one giant code chunk). knitr will then compile the template to a report. The package currently has build in support for a range of templates from LateX, html, and markdown. To stitch a report:
library(knitr)
stitch("your-script.R")
Chunks are extremely flexible and more options (beyond the ones listed in the table above) can be included in the header. These look exactly like the kinds of arguments that one might pass to standard R functions. In the example below, the chunk will only be executed if the condition (in this case x less than 5) is satisfied.
{r chunk_name, eval = if (x < 5) TRUE else FALSE}
This allows your document to be dynamic allowing certain chunks to be executed only when specific conditions are met.
By default knitr
will not stop execution if it encounters an error. It will continue through to the end of the document and include any errors that arise within chunks. The reason for this behavior is that knitr treats the code as if it were fed directly into the R console. Any errors get printed to the screen and the remaining commands are executed.
To stop knitr as soon as it encounters an error, one can set an option explicitly:
opts_knit$set(stop_on_error = 2L)
Possible options for error handling
Option | What it does |
---|---|
* 0L | do not stop on errors, continue on as if code was pasted into R console |
* 1L | when an error occurs, return the results up to this point, ignore the rest of the code within that particular chunk without reporting any further errors. |
* 2L | Completely stop upon encountering the first error. |
knitr
If you use ggplot2
from the data visualization section, you can have that easily parsed into your document.
library(ggplot2)
p <- qplot(carat, price, data = diamonds) + geom_hex()
p
# no need to explicitly print(p)
Sometimes it can be rather tedious to include dozens of lines of code in the same file as the narrative. In such cases, one can improve readability by externalizing the code into a separate script and simply calling the chunks at the appropriate locations in a document. The code will get read in and executed at those points. There are two steps to making this happen.
First, read the script using read_chunk()
at the top of any .Rmd
file
read_chunk("source_code.R")
This is usually done in an early chunk such as the first chunk of a document, and we can use the chunk data-processing
later in the source document:
{r, data-processing}
Then simply call any chunk as needed simply by using its label. You do not have to include any code between the backticks.
## @knitr data-processing
Be sure to leave a blank line between chunks.
Pandoc (http://johnmacfarlane.net/pandoc) is a universal document converter. In particular, Pandoc can convert Markdown to many other document formats, including LATEX, HTML, Rich Text Format (.rtf), E- Book (.epub), Microsoft Word (.docx) and OpenDocument Text (.odt), etc. Pandoc is a command line tool. Linux users should be fine with it; for Windows users, the command window can be accessed via the Start menu, then Run cmd. Once we have opened a command window (or terminal), we can type commands like this to convert a Markdown file, say, test.md, to other formats:
pandoc test.md -o test.html
pandoc test.md -o test.odt
pandoc test.md -o test.rtf
pandoc test.md -o test.docx
pandoc test.md -o test.pdf
You can add more options to the basic Pandoc call. To see a full list of options
pandoc --help
A commonly used option is to add margins using the -V
argument (in this case 1 inch):
pandoc -V geometry:margin=1in test.md -o test.pdf
You can use Make
to automate much of this process. For example, by setting up a series of dependencies, you can have a new document knitted if the underlying data or code changes but not otherwise.