A Quick Introduction to liftr
Nan Xiao <https://nanx.me>
Source:vignettes/liftr-intro.Rmd
liftr-intro.Rmd
Introduction
In essence, liftr aims to solve the problem of persistent reproducible reporting. To achieve this goal, it extends the R Markdown metadata format, and uses Docker to containerize and render R Markdown documents.
Metadata for containerization
To containerize your R Markdown document, the first step is adding
liftr
fields to the YAML metadata section of the document.
For example:
---
title: "The Missing Example of liftr"
author: "Author Name"
date: "2024-03-11"
output: rmarkdown::html_document
liftr:
maintainer: "Maintainer Name"
email: "name@example.com"
from: "rocker/r-base:latest"
pandoc: true
texlive: false
sysdeps:
- gfortran
cran:
- glmnet
bioc:
- Gviz/3.9
remotes:
- "nanxstats/liftr"
include: "DockerfileSnippet"
---
All available metadata fields are expained below.
Required metadata
-
maintainer
Maintainer’s name for the Dockerfile.
-
email
Maintainer’s email address for the Dockerfile.
Optional metadata
-
from
Base image for building the docker image. Default is
"rocker/r-base:latest"
. For R users, the images offered by the rocker project and Bioconductor can be considered first. -
pandoc
Should we install pandoc in the container? Default is
true
.If pandoc was already installed in the base image, this should be set to
false
to avoid potential errors. For example, forrocker/rstudio
images andbioconductor/...
images, this option will be automatically set tofalse
since they already have pandoc installed. -
texlive
Is TeX environment needed when rendering the document? Default is
false
. Should betrue
particularly when the output format is PDF. -
sysdeps
Debian/Ubuntu system software packages depended in the document.
Please also include software packages depended by the R packages below. For example, here
gfortran
is required for compilingglmnet
. -
cran
CRAN packages depended in the document.
If only
pkgname
is provided,liftr
will install the latest version of the package on CRAN. To improve reproducibility, we recommend to use the package name with a specified version number:pkgname/pkgversion
(e.g.ggplot2/1.0.0
), even if the version is the current latest version. Note:pkgversion
must be provided to install the archived versions of packages. -
bioc
Bioconductor packages depended in the document. If used, the first package’s name must be followed by the desired Bioconductor version (e.g.
Gviz/3.9
). All the packages used must be installed from the same Bioconductor version. -
remotes
Remote R packages that are not available from CRAN or Bioconductor.
The remote package naming specification from devtools is adopted here. Packages can be installed from GitHub, Bitbucket, Git/SVN servers, URLs, etc.
-
include
The path to a text file that contains custom Dockerfile snippet. The snippet will be included in the generated Dockerfile. This can be used to install additional software packages or further configure the system environment.
Note that this file should be in the same directory as the input R Markdown file.
Containerize the document
After adding proper liftr
metadata to the document YAML
data block, we can use lift()
to parse the document and
generate a Dockerfile.
We will use a minimal example included in the liftr package. First, we create a new directory and copy the R Markdown document into the directory:
path = "~/liftr-minimal/"
dir.create(path)
file.copy(system.file("examples/liftr-minimal.Rmd", package = "liftr"), path)
Then, we use lift()
to parse the document and generate
the Dockerfile:
After successfully running lift()
, the Dockerfile will
be in the ~/liftr-minimal/
directory.
Render the document
Now we can use render_docker()
to render the document
into an HTML file, under a Docker container:
render_docker(input)
The function render_docker()
will parse the Dockerfile,
build a new Docker image, and run a Docker container to render the input
document. If successfully rendered, the output
liftr-minimal.html
will be in the
~/liftr-minimal/
directory. You can also pass additional
arguments in rmarkdown::render
to this function.
In order to share the dockerized R Markdown document, simply share
the .Rmd
file. Other users can use the lift()
and render_docker()
functions to render the document as
above.
Housekeeping
Normally, the argument prune
is set to TRUE
in render_docker()
. This means any dangling containers or
images due to unsuccessful builds will be automatically cleaned.
To clean up the dangling containers, images, and everything without
specifying names, please use prune_container_auto()
,
prune_image_auto()
, and prune_all_auto()
.
If you wish to manually remove the Docker container or image (whose
information will be stored in an output YAML file) after sucessful
rendering, use prune_container()
and
prune_image()
:
purge_image(paste0(path, "liftr-minimal.docker.yml"))
The above input YAML file contains the basic information of the
Docker container, image, and commands to render the document. It is
generated by setting purge_info = TRUE
(default) in
render_docker()
.
Install Docker
Docker is an essential system requirement when using liftr to render
the R Markdown documents. install_docker()
will help you
find the proper guide to install and set up Docker in your system. To
check if Docker is correctly installed, use
check_docker_install()
; to check if the Docker daemon is
running, use check_docker_running()
. In particular, Linux
users should configure Docker to run
without sudo.