Friday, February 20, 2015

R dynamic report generation with Knitr


Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

              Donald E. Knuth, Literate Programming, 1984


Overview and Motivation


So what is dynamic documentation and why do we need it. As opposed to usual programming, R programs were intended to used as report for not development oriented folks, whether they are data scientists, statisticians or managers. Moreover by nature, R programs don't tend to be huge spanning across hundreds of thousands of code lines. All these led to a huge demand good documentation framework. But how do you document a report? One may of course is to write a passage and then paste a copied graph into it, however once something changes one must re-copy all the graph, which is of course very tedious and non-rewarding procedure.

Knitr


Knitr is an R package that allows straightforward integration of R code for writing reports and is developed by Yihui Xie. It is very powerful and easy to get started with and has potentially a lot of uses.
All you need is to create a rmd file and then use a regular markdown syntax with some additional features. For example:
## Loading and preprocessing the data
```{r echo = TRUE}
data = read.csv("activity.csv")
summary(data)
```
Here we create a header and then insert a code snippet wrapped in ```{r} ```. echo = TRUE tells knitr you want to render the code and the results. That's it - it's that easy. The variables you create in one section are visible in others, so you can write your program as usual using any packages or functions you want. In the end, the knitting process will parse your document, run all the R snippents and append the code and the results, where needed, to the generated HTML or PDF file.

Knitr if fully integrated with RStudio, however if you're using another IDE or just a fan of a console, you can always knit your program by running the following command:
Rscript -e "library(knitr); knit('./file-here.rmd')"
For more advanced options and more detailed examples please read the documentation and demos sections in knitr site.

Publish


When it comes to sharing your report, it has always been an obstacle. An endless email thread with pdf attachments - sounds familiar? One of the best features of RStudio is an ability to publish the Knitr reports at rpubs.com. You can share the link then with everyone you need for them to view the report. Here is an example of my whether events analysis - Simple whether events analysis
Bare in mind and reports published on rPubs are publicly available, so you probably shouldn't publish something classified there.

Hope you enjoyed the article and stay tuned for the next one, of course :)