Explore tidyverse with liftr
Nan Xiao <https://nanx.me>
Source:vignettes/liftr-tidyverse.Rmd
liftr-tidyverse.Rmd
Introduction
Creating Docker images from scratch can be time and labor consuming. Fortunately, many pre-built and regularly updated Docker images for the R community are ready for use, especially when creating your own containerized R Markdown documents with liftr.
Such sources of pre-built Docker images include the rocker project and Bioconductor Docker containers. In this article, we will use the tidyverse image provided by rocker. This image includes the essential tidyverse packages and devtools environment loved by many data scientists (Wickham 2014). We will demonstrate how to containerize and render your tidyverse-heavy R Markdown document using Docker in only a few minutes.
Install Docker
If Docker has not been installed on your system, please use
install_docker()
and follow the guidelines to install it.
After that, check_docker_install()
and
check_docker_running()
would help you make sure that Docker
has been installed and running properly.
Example document
Let’s create a new folder first and copy the example R Markdown document to this folder:
path = paste0("~/liftr-tidyverse/")
dir.create(path)
file.copy(system.file("examples/liftr-tidyverse.Rmd", package = "liftr"), path)
input = paste0(path, "liftr-tidyverse.Rmd")
If we open the R Markdown file, we will see the header section
includes a liftr
section, which defines the Docker system
environment required to render this document. For our case, it is very
straightforward and simple indeed:
---
title: "Explore tidyverse with liftr"
author: "Nan Xiao <<me@nanx.me>>"
date: "2024-03-11"
output:
rmarkdown::pdf_document:
toc: true
number_sections: true
liftr:
from: "rocker/tidyverse:latest"
maintainer: "Nan Xiao"
email: "me@nanx.me"
pandoc: false
texlive: true
cran:
- nycflights13
---
Most of the fields are self-explanatory:
- Here we simply specified the latest
rocker/tidyverse
image as our base image, which would save us a lot of time creating a custom base image with all the tidyverse dependencies. - The custom
pandoc
installation was not included because the tidyverse image already includespandoc
. - We included TeXLive here since we intend to render a PDF file in the end.
- The CRAN data package
nycflights13
will be installed.
Containerize the document
Let’s containerize this document by generating a
Dockerfile
for it, using liftr::lift
:
lift(input)
A file named Dockerfile
will be generated under the same
directory of the input RMD file. It contains the necessary commands for
building the Docker container for rendering the document.
Render the document
We can use render_docker()
to start the Docker
container, and render the document inside it:
render_docker(input)
Let’s view the rendered document:
In the last section of the rendered PDF, we will see that the session information are probably different with your current system’s information. Yes, that is because the document is completed generated by a newly built, isolated Linux system environment, using Docker.
In this way, the R Markdown document gains a higher, system level reproducibility, thus easily replicable by other users who might not have the identical system and R package environment to yours. This is a good thing for team collaboration and large-scale document orchestration. The best part is, all you need to share is still the document itself, only with a few extra metadata fields.
Housekeeping
The Docker images stored in your system could take a few gigabytes and get larger gradually as you build more images. Let’s remove the generated Docker image to save some disk space:
prune_image(paste0(path, "liftr-tidyverse.docker.yml"))
If we do this, the Docker container will be rebuilt next time when
you use render_docker()
. If not, the image will be cached
in the system and reused when compiling the document later and save some
time for you.