1 Commits

Author SHA1 Message Date
Eric Coissac
a8c59b7cf0 Enhance documentation and automate R package management
Update documentation to reflect that all tools are provided via a builder Docker image

- Simplify prerequisites section in Readme.md
- Add detailed explanation of the builder image and its role
- Document R package caching mechanism for faster builds
- Update start-jupyterhub.sh to build and use the builder image
- Add Dockerfile.builder to provide the build environment
- Implement automatic R dependency detection and installation
- Update Slides.qmd to use gt package for better table formatting
2026-01-22 19:46:31 +01:00
6 changed files with 412 additions and 131 deletions

3
.gitignore vendored
View File

@@ -5,6 +5,7 @@
/jupyterhub_volumes/caddy /jupyterhub_volumes/caddy
/jupyterhub_volumes/course/data/Genbank /jupyterhub_volumes/course/data/Genbank
/jupyterhub_volumes/web/ /jupyterhub_volumes/web/
/jupyterhub_volumes/builder
/**/.DS_Store /**/.DS_Store
/web_src/**/*.RData /web_src/**/*.RData
/web_src/**/*.pdf /web_src/**/*.pdf
@@ -16,4 +17,4 @@
ncbitaxo_* ncbitaxo_*
Readme_files Readme_files
Readme.html Readme.html
tmp.* tmp.*

View File

@@ -6,32 +6,51 @@ This project packages the MetabarcodingSchool training lab into one reproducible
## Prerequisites (with quick checks) ## Prerequisites (with quick checks)
You need Docker, Docker Compose, Quarto, and Python 3 available on the machine that will host the lab. You only need **Docker and Docker Compose** on the machine that will host the lab. All other tools (Quarto, Hugo, Python, R) are provided via a builder Docker image and do not need to be installed on your system.
- macOS: install [OrbStack](https://orbstack.dev/) (recommended) or Docker Desktop; both ship Docker Engine and Compose. - macOS: install [OrbStack](https://orbstack.dev/) (recommended) or Docker Desktop; both ship Docker Engine and Compose.
- Linux: install Docker Engine and the Compose plugin from your distribution (e.g., `sudo apt install docker.io docker-compose-plugin`) or from Dockers official packages. - Linux: install Docker Engine and the Compose plugin from your distribution (e.g., `sudo apt install docker.io docker-compose-plugin`) or from Dockers official packages.
- Windows: install Docker Desktop with the WSL2 backend enabled. - Windows: install Docker Desktop with the WSL2 backend enabled.
- Quarto CLI: get installers from <https://quarto.org/docs/get-started/>.
- Python 3: any recent version is fine (only the standard library is used).
Verify from a terminal; if a command is missing, install it before moving on: Verify from a terminal:
```bash ```bash
docker --version docker --version
docker compose version # or: docker-compose --version docker compose version # or: docker-compose --version
quarto --version
python3 --version
``` ```
## How the startup script works ## How the startup script works
`./start-jupyterhub.sh` is the single entry point. It builds the Docker images, renders the course website, prepares the volume folders, and starts the stack. Internally it: `./start-jupyterhub.sh` is the single entry point. It builds the Docker images, renders the course website, prepares the volume folders, and starts the stack. Internally it:
- creates the `jupyterhub_volumes/` tree (caddy, course, shared, users, web) - creates the `jupyterhub_volumes/` tree (caddy, course, shared, users, web...)
- builds the `obijupyterhub-builder` image (contains Quarto, Hugo, R, Python) if not already present
- builds `jupyterhub-student` and `jupyterhub-hub` images - builds `jupyterhub-student` and `jupyterhub-hub` images
- detects R package dependencies from Quarto files using the `{attachment}` package and installs them automatically
- renders the Quarto site from `web_src/`, generates PDF galleries and `pages.json`, and copies everything into `jupyterhub_volumes/web/` - renders the Quarto site from `web_src/`, generates PDF galleries and `pages.json`, and copies everything into `jupyterhub_volumes/web/`
- runs `docker-compose up -d --remove-orphans` - runs `docker-compose up -d --remove-orphans`
### Builder image
The builder image (`obijupyterhub-builder`) contains all the tools needed to prepare the course materials:
- **Quarto** for rendering the course website
- **Hugo** for building the obidoc documentation
- **R** with the `{attachment}` package for automatic dependency detection
- **Python 3** for utility scripts
This means you don't need to install any of these tools on your host system. The script automatically builds this image on first run and reuses it for subsequent builds. Use `--force-rebuild` to rebuild the builder image if needed.
### R package caching for builds
R packages required by your Quarto documents are automatically detected and installed during the build process. These packages are cached in `jupyterhub_volumes/builder/R_packages/` so they persist across builds. This means:
- **First build**: All R packages used in your `.qmd` files are detected and installed (may take some time)
- **Subsequent builds**: Only missing packages are installed, making builds much faster
- **Adding new packages**: Simply use `library(newpackage)` in your Quarto files; the build process will detect and install it automatically
To clear the R package cache and force a fresh installation, delete the `jupyterhub_volumes/builder/R_packages/` directory.
You can tailor what it does with a few flags: You can tailor what it does with a few flags:
- `--no-build` (or `--offline`): skip Docker image builds and reuse existing images (useful when offline). - `--no-build` (or `--offline`): skip Docker image builds and reuse existing images (useful when offline).
@@ -67,6 +86,8 @@ OBIJupyterHub
└── web_src - Quarto sources for the course website └── web_src - Quarto sources for the course website
``` ```
Note: The `obijupyterhub/` directory also contains `Dockerfile.builder` which provides the build environment, the `tools/` directory contains utility scripts including `install_quarto_deps.R` for automatic R dependency detection, and `jupyterhub_volumes/builder/` stores cached R packages for faster builds.
3) Prepare course materials (optional before first run): 3) Prepare course materials (optional before first run):
- Put notebooks, datasets, scripts, binaries, or PDFs for students under `jupyterhub_volumes/course/`. They will appear read-only at `/home/jovyan/work/course/`. - Put notebooks, datasets, scripts, binaries, or PDFs for students under `jupyterhub_volumes/course/`. They will appear read-only at `/home/jovyan/work/course/`.
- For collaborative work, drop files in `jupyterhub_volumes/shared/` (read/write for all at `/home/jovyan/work/shared/`). - For collaborative work, drop files in `jupyterhub_volumes/shared/` (read/write for all at `/home/jovyan/work/shared/`).
@@ -246,9 +267,12 @@ ports:
``` bash ``` bash
pushd obijupyterhub pushd obijupyterhub
docker-compose down -v docker-compose down -v
docker rmi jupyterhub-hub jupyterhub-student docker rmi jupyterhub-hub jupyterhub-student obijupyterhub-builder
popd popd
# Optionally clear the R package cache
rm -rf jupyterhub_volumes/builder/R_packages
# Then rebuild everything # Then rebuild everything
./start-jupyterhub.sh ./start-jupyterhub.sh
``` ```

View File

@@ -0,0 +1,68 @@
# Dockerfile.builder
# Image containing all tools needed to prepare the OBIJupyterHub stack
# This allows the host system to only require Docker to be installed
FROM ubuntu:24.04
LABEL maintainer="OBIJupyterHub"
LABEL description="Builder image for OBIJupyterHub preparation tasks"
# Avoid interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC
# Install base dependencies and R
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
wget \
git \
rsync \
python3 \
r-base \
r-base-dev \
libcurl4-openssl-dev \
libssl-dev \
libxml2-dev \
libfontconfig1-dev \
libharfbuzz-dev \
libfribidi-dev \
libfreetype6-dev \
libpng-dev \
libtiff5-dev \
libjpeg-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install the attachment package in a separate location (not overwritten by volume mount)
# This ensures attachment is always available even when site-library is mounted as a volume
ENV R_LIBS_BUILDER=/opt/R/builder-packages
RUN mkdir -p ${R_LIBS_BUILDER} \
&& R -e "install.packages('attachment', lib='${R_LIBS_BUILDER}', repos='https://cloud.r-project.org/')"
# Install Hugo (extended version for SCSS support)
# Detect architecture and download appropriate binary
ARG HUGO_VERSION=0.140.2
RUN ARCH=$(dpkg --print-architecture) \
&& case "$ARCH" in \
amd64) HUGO_ARCH="amd64" ;; \
arm64) HUGO_ARCH="arm64" ;; \
*) echo "Unsupported architecture: $ARCH" && exit 1 ;; \
esac \
&& curl -fsSL "https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_extended_${HUGO_VERSION}_linux-${HUGO_ARCH}.tar.gz" \
| tar -xz -C /usr/local/bin hugo \
&& chmod +x /usr/local/bin/hugo
# Install Quarto using the official .deb package (handles all dependencies properly)
ARG QUARTO_VERSION=1.6.42
RUN ARCH=$(dpkg --print-architecture) \
&& curl -fsSL -o /tmp/quarto.deb "https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-${ARCH}.deb" \
&& dpkg -i /tmp/quarto.deb \
&& rm /tmp/quarto.deb
# Create working directory
WORKDIR /workspace
# Default command
CMD ["/bin/bash"]

View File

@@ -7,6 +7,7 @@ set -e
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
DOCKER_DIR="${SCRIPT_DIR}/obijupyterhub/" DOCKER_DIR="${SCRIPT_DIR}/obijupyterhub/"
BUILDER_IMAGE="obijupyterhub-builder:latest"
# Colors for display # Colors for display
GREEN='\033[0;32m' GREEN='\033[0;32m'
@@ -48,44 +49,102 @@ while [[ $# -gt 0 ]]; do
done done
if $STOP_SERVER && $UPDATE_LECTURES; then if $STOP_SERVER && $UPDATE_LECTURES; then
echo " --stop-server and --update-lectures cannot be used together" >&2 echo "Error: --stop-server and --update-lectures cannot be used together" >&2
exit 1 exit 1
fi fi
echo "🚀 Starting JupyterHub for Lab" echo "Starting JupyterHub for Lab"
echo "==============================" echo "=============================="
echo "" echo ""
echo -e "${BLUE}🔨 Building the volume directories...${NC}" echo -e "${BLUE}Building the volume directories...${NC}"
pushd "${SCRIPT_DIR}/jupyterhub_volumes" >/dev/null pushd "${SCRIPT_DIR}/jupyterhub_volumes" >/dev/null
mkdir -p caddy mkdir -p caddy/data
mkdir -p caddy/config
mkdir -p course/bin mkdir -p course/bin
mkdir -p course/R_packages mkdir -p course/R_packages
mkdir -p jupyterhub mkdir -p jupyterhub
mkdir -p shared mkdir -p shared
mkdir -p users mkdir -p users
mkdir -p web/obidoc mkdir -p web/obidoc
mkdir -p builder/R_packages
popd >/dev/null popd >/dev/null
pushd "${DOCKER_DIR}" >/dev/null pushd "${DOCKER_DIR}" >/dev/null
# Check we're in the right directory # Check we're in the right directory
if [ ! -f "Dockerfile" ] || [ ! -f "docker-compose.yml" ]; then if [ ! -f "Dockerfile" ] || [ ! -f "docker-compose.yml" ]; then
echo "Error: Run this script from the jupyterhub-tp/ directory" echo "Error: Run this script from the jupyterhub-tp/ directory"
exit 1 exit 1
fi fi
check_if_image_needs_rebuild() {
local image_name="$1"
local dockerfile="$2"
# Check if image exists
if ! docker image inspect "$image_name" >/dev/null 2>&1; then
return 0 # Need to build (image doesn't exist)
fi
# If force rebuild, always rebuild
if $FORCE_REBUILD; then
return 0 # Need to rebuild
fi
# Compare Dockerfile modification time with image creation time
if [ -f "$dockerfile" ]; then
local dockerfile_mtime=$(stat -c %Y "$dockerfile" 2>/dev/null || echo 0)
local image_created=$(docker image inspect "$image_name" --format='{{.Created}}' 2>/dev/null | sed 's/\.000000000//' | xargs -I {} date -d "{}" +%s 2>/dev/null || echo 0)
if [ "$dockerfile_mtime" -gt "$image_created" ]; then
echo -e "${YELLOW}Dockerfile is newer than image, rebuild needed${NC}"
return 0 # Need to rebuild
fi
fi
return 1 # No need to rebuild
}
build_builder_image() {
if check_if_image_needs_rebuild "$BUILDER_IMAGE" "Dockerfile.builder"; then
local build_flag=()
if $FORCE_REBUILD; then
build_flag+=(--no-cache)
fi
echo ""
echo -e "${BLUE}Building builder image...${NC}"
docker build "${build_flag[@]}" -t "$BUILDER_IMAGE" -f Dockerfile.builder .
else
echo -e "${BLUE}Builder image is up to date, skipping build.${NC}"
fi
}
# Run a command inside the builder container with the workspace mounted
# R packages are persisted in jupyterhub_volumes/builder/R_packages
# R_LIBS includes both the builder packages (attachment) and the mounted volume
run_in_builder() {
docker run --rm \
-v "${SCRIPT_DIR}:/workspace" \
-v "${SCRIPT_DIR}/jupyterhub_volumes/builder/R_packages:/usr/local/lib/R/site-library" \
-e "R_LIBS=/opt/R/builder-packages:/usr/local/lib/R/site-library" \
-w /workspace \
"$BUILDER_IMAGE" \
bash -c "$1"
}
stop_stack() { stop_stack() {
echo -e "${BLUE}📦 Stopping existing containers...${NC}" echo -e "${BLUE}Stopping existing containers...${NC}"
docker-compose down 2>/dev/null || true docker-compose down 2>/dev/null || true
echo -e "${BLUE}🧹 Cleaning up student containers...${NC}" echo -e "${BLUE}Cleaning up student containers...${NC}"
docker ps -aq --filter name=jupyter- | xargs -r docker rm -f 2>/dev/null || true docker ps -aq --filter name=jupyter- | xargs -r docker rm -f 2>/dev/null || true
} }
build_images() { build_images() {
if $NO_BUILD; then if $NO_BUILD; then
echo -e "${YELLOW}⏭️ Skipping image builds (offline/no-build mode).${NC}" echo -e "${YELLOW}Skipping image builds (offline/no-build mode).${NC}"
return return
fi fi
@@ -94,20 +153,30 @@ build_images() {
build_flag+=(--no-cache) build_flag+=(--no-cache)
fi fi
echo "" # Check and build student image
echo -e "${BLUE}🔨 Building student image...${NC}" if check_if_image_needs_rebuild "jupyterhub-student:latest" "Dockerfile"; then
docker build "${build_flag[@]}" -t jupyterhub-student:latest -f Dockerfile . echo ""
echo -e "${BLUE}Building student image...${NC}"
docker build "${build_flag[@]}" -t jupyterhub-student:latest -f Dockerfile .
else
echo -e "${BLUE}Student image is up to date, skipping build.${NC}"
fi
echo "" # Check and build JupyterHub image
echo -e "${BLUE}🔨 Building JupyterHub image...${NC}" if check_if_image_needs_rebuild "jupyterhub-hub:latest" "Dockerfile.hub"; then
docker build "${build_flag[@]}" -t jupyterhub-hub:latest -f Dockerfile.hub . echo ""
echo -e "${BLUE}Building JupyterHub image...${NC}"
docker build "${build_flag[@]}" -t jupyterhub-hub:latest -f Dockerfile.hub .
else
echo -e "${BLUE}JupyterHub image is up to date, skipping build.${NC}"
fi
} }
build_obidoc() { build_obidoc() {
local dest="${SCRIPT_DIR}/jupyterhub_volumes/web/obidoc" local dest="${SCRIPT_DIR}/jupyterhub_volumes/web/obidoc"
if $NO_BUILD; then if $NO_BUILD; then
echo -e "${YELLOW}⏭️ Skipping obidoc build in offline/no-build mode.${NC}" echo -e "${YELLOW}Skipping obidoc build in offline/no-build mode.${NC}"
return return
fi fi
@@ -119,73 +188,79 @@ build_obidoc() {
fi fi
if ! $needs_build; then if ! $needs_build; then
echo -e "${BLUE} obidoc already present; skipping rebuild (use --build-obidoc to force).${NC}" echo -e "${BLUE}obidoc already present; skipping rebuild (use --build-obidoc to force).${NC}"
return return
fi fi
echo "" echo ""
echo -e "${BLUE}🔨 Building obidoc documentation...${NC}" echo -e "${BLUE}Building obidoc documentation (in builder container)...${NC}"
BUILD_DIR=$(mktemp -d -p .) run_in_builder '
pushd "$BUILD_DIR" >/dev/null set -e
git clone --recurse-submodules \ BUILD_DIR=$(mktemp -d)
--remote-submodules \ cd "$BUILD_DIR"
-j 8 \ git clone --recurse-submodules \
https://github.com/metabarcoding/obitools4-doc.git --remote-submodules \
pushd obitools4-doc >/dev/null -j 8 \
hugo -D build --baseURL "/obidoc/" https://github.com/metabarcoding/obitools4-doc.git
mkdir -p "$dest" cd obitools4-doc
rm -rf "${dest:?}/"* hugo -D build --baseURL "/obidoc/"
mv public/* "$dest" mkdir -p /workspace/jupyterhub_volumes/web/obidoc
popd >/dev/null rm -rf /workspace/jupyterhub_volumes/web/obidoc/*
popd >/dev/null mv public/* /workspace/jupyterhub_volumes/web/obidoc/
rm -rf cd /
rm -rf "$BUILD_DIR"
'
} }
build_website() { build_website() {
echo "" echo ""
echo -e "${BLUE}🔨 Building web site...${NC}" echo -e "${BLUE}Building web site (in builder container)...${NC}"
pushd ../web_src >/dev/null run_in_builder '
quarto render set -e
find . -name '*.pdf' -print \ echo "-> Detecting and installing R dependencies..."
| while read pdfname ; do Rscript /workspace/tools/install_quarto_deps.R /workspace/web_src
dest="../jupyterhub_volumes/web/pages/${pdfname}"
dirdest=$(dirname "$dest") echo "-> Rendering Quarto site..."
mkdir -p "$dirdest" cd /workspace/web_src
echo "cp '${pdfname}' '${dest}'" quarto render
done \ find . -name "*.pdf" -print | while read pdfname; do
| bash dest="/workspace/jupyterhub_volumes/web/pages/${pdfname}"
python3 ../tools/generate_pdf_galleries.py dirdest=$(dirname "$dest")
python3 ../tools/generate_pages_json.py mkdir -p "$dirdest"
popd >/dev/null cp "$pdfname" "$dest"
done
python3 /workspace/tools/generate_pdf_galleries.py
python3 /workspace/tools/generate_pages_json.py
'
} }
start_stack() { start_stack() {
echo "" echo ""
echo -e "${BLUE}🚀 Starting JupyterHub...${NC}" echo -e "${BLUE}Starting JupyterHub...${NC}"
docker-compose up -d --remove-orphans docker-compose up -d --remove-orphans
echo "" echo ""
echo -e "${YELLOW}Waiting for JupyterHub to start...${NC}" echo -e "${YELLOW}Waiting for JupyterHub to start...${NC}"
sleep 3 sleep 3
} }
print_success() { print_success() {
if docker ps | grep -q jupyterhub; then if docker ps | grep -q jupyterhub; then
echo "" echo ""
echo -e "${GREEN}JupyterHub is running!${NC}" echo -e "${GREEN}JupyterHub is running!${NC}"
echo "" echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "-------------------------------------------"
echo -e "${GREEN}🌐 JupyterHub available at: http://localhost:8888${NC}" echo -e "${GREEN}JupyterHub available at: http://localhost:8888${NC}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "-------------------------------------------"
echo "" echo ""
echo "📝 Password: metabar2025" echo "Password: metabar2025"
echo "👥 Students can connect with any username" echo "Students can connect with any username"
echo "" echo ""
echo "🔑 Admin account:" echo "Admin account:"
echo " Username: admin" echo " Username: admin"
echo " Password: admin2025" echo " Password: admin2025"
echo "" echo ""
echo "📂 Each student will have access to:" echo "Each student will have access to:"
echo " - work/ : personal workspace (everything saved)" echo " - work/ : personal workspace (everything saved)"
echo " - work/R_packages/ : personal R packages (writable)" echo " - work/R_packages/ : personal R packages (writable)"
echo " - work/shared/ : shared workspace" echo " - work/shared/ : shared workspace"
@@ -193,12 +268,12 @@ print_success() {
echo " - work/course/R_packages/ : shared R packages by prof (read-only)" echo " - work/course/R_packages/ : shared R packages by prof (read-only)"
echo " - work/course/bin/ : shared executables (in PATH)" echo " - work/course/bin/ : shared executables (in PATH)"
echo "" echo ""
echo "🔍 To view logs: docker-compose logs -f jupyterhub" echo "To view logs: docker-compose logs -f jupyterhub"
echo "🛑 To stop: docker-compose down" echo "To stop: docker-compose down"
echo "" echo ""
else else
echo "" echo ""
echo -e "${YELLOW}⚠️ JupyterHub container doesn't seem to be starting${NC}" echo -e "${YELLOW}JupyterHub container doesn't seem to be starting${NC}"
echo "Check logs with: docker-compose logs jupyterhub" echo "Check logs with: docker-compose logs jupyterhub"
exit 1 exit 1
fi fi
@@ -211,12 +286,14 @@ if $STOP_SERVER; then
fi fi
if $UPDATE_LECTURES; then if $UPDATE_LECTURES; then
build_builder_image
build_website build_website
popd >/dev/null popd >/dev/null
exit 0 exit 0
fi fi
stop_stack stop_stack
build_builder_image
build_images build_images
build_website build_website
build_obidoc build_obidoc

View File

@@ -0,0 +1,59 @@
#!/usr/bin/env Rscript
# Script to dynamically detect and install R dependencies from Quarto files
# Uses the {attachment} package to scan .qmd files for library()/require() calls
args <- commandArgs(trailingOnly = TRUE)
quarto_dir <- if (length(args) > 0) args[1] else "."
# Target library for installing packages (the mounted volume)
target_lib <- "/usr/local/lib/R/site-library"
cat("Scanning Quarto files in:", quarto_dir, "\n")
cat("Target library:", target_lib, "\n")
# Find all .qmd files
qmd_files <- list.files(
path = quarto_dir,
pattern = "\\.qmd$",
recursive = TRUE,
full.names = TRUE
)
if (length(qmd_files) == 0) {
cat("No .qmd files found.\n")
quit(status = 0)
}
cat("Found", length(qmd_files), "Quarto files\n")
# Extract dependencies using attachment
deps <- attachment::att_from_rmds(qmd_files, inline = TRUE)
if (length(deps) == 0) {
cat("No R package dependencies detected.\n")
quit(status = 0)
}
cat("\nDetected R packages:\n")
cat(paste(" -", deps, collapse = "\n"), "\n\n")
# Filter out base R packages that are always available
base_pkgs <- rownames(installed.packages(priority = "base"))
deps <- setdiff(deps, base_pkgs)
# Check which packages are not installed
installed <- rownames(installed.packages())
to_install <- setdiff(deps, installed)
if (length(to_install) == 0) {
cat("All required packages are already installed.\n")
} else {
cat("Installing missing packages:", paste(to_install, collapse = ", "), "\n\n")
install.packages(
to_install,
lib = target_lib,
repos = "https://cloud.r-project.org/",
dependencies = TRUE
)
cat("\nPackage installation complete.\n")
}

View File

@@ -3,22 +3,31 @@ title: "Biodiversity metrics \ and metabarcoding"
author: "Eric Coissac" author: "Eric Coissac"
date: "02/02/2024" date: "02/02/2024"
bibliography: inst/REFERENCES.bib bibliography: inst/REFERENCES.bib
format: format:
revealjs: revealjs:
css: ../../slides.css css: ../../slides.css
transition: slide transition: slide
scrollable: true scrollable: true
theme: beige theme: beige
html-math-method: mathjax html-math-method: mathjax
embed-resources: true embed-resources: true
editor: visual editor: visual
--- ---
```{r setup, include=FALSE} ```{r setup, include=FALSE}
library(knitr) library(knitr)
library(Rdpack)
library(tidyverse) library(tidyverse)
library(kableExtra) library(gt)
library(latex2exp) library(latex2exp)
# Install MetabarSchool if not available
if (!requireNamespace("MetabarSchool", quietly = TRUE)) {
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes", dependencies = TRUE)
}
remotes::install_git('https://forge.metabarcoding.org/MetabarcodingSchool/biodiversity-metrics.git')
}
library(MetabarSchool) library(MetabarSchool)
opts_chunk$set(echo = FALSE, opts_chunk$set(echo = FALSE,
@@ -49,7 +58,7 @@ install.packages("devtools",dependencies = TRUE)
Then you can install *MetabarSchool* Then you can install *MetabarSchool*
```{r eval=FALSE, echo=TRUE} ```{r eval=FALSE, echo=TRUE}
devtools::install_git("https://git.metabarcoding.org/MetabarcodingSchool/biodiversity-metrics.git") remotes::install_git('https://forge.metabarcoding.org/MetabarcodingSchool/biodiversity-metrics.git')
``` ```
You will also need the *vegan* package You will also need the *vegan* package
@@ -68,11 +77,15 @@ A 16 plants mock community
data("plants.16") data("plants.16")
x = cbind(` ` =seq_len(nrow(plants.16)),plants.16) x = cbind(` ` =seq_len(nrow(plants.16)),plants.16)
x$`Relative aboundance`=paste0('1/',1/x$dilution) x$`Relative aboundance`=paste0('1/',1/x$dilution)
knitr::kable(x[,-(4:5)], x[,-(4:5)] %>%
format = "html", gt() %>%
row.names = FALSE, cols_align(align = "center", columns = 1) %>%
align = "rlrr") %>% cols_align(align = "left", columns = 2) %>%
kable_styling(position = "center") cols_align(align = "right", columns = c(3, 4)) %>%
tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## The experiment {.flexbox .vcenter} ## The experiment {.flexbox .vcenter}
@@ -100,11 +113,14 @@ data("positive.motus")
- `positive.count` read count matrix $`r nrow(positive.count)` \; PCRs \; \times \; `r ncol(positive.count)` \; MOTUs$ - `positive.count` read count matrix $`r nrow(positive.count)` \; PCRs \; \times \; `r ncol(positive.count)` \; MOTUs$
```{r} ```{r}
knitr::kable(positive.count[1:5,1:5], as.data.frame(positive.count[1:5,1:5]) %>%
format="html", gt() %>%
align = 'rc') %>% cols_align(align = "right", columns = 1) %>%
kable_styling(position = "center") %>% cols_align(align = "center", columns = 2:ncol(positive.count[1:5,1:5])) %>%
row_spec(0, angle = -45) tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
<br> <br>
@@ -126,10 +142,14 @@ data("positive.motus")
- `positive.samples` a `r nrow(positive.samples)` rows `data.frame` of `r ncol(positive.samples)` columns describing each PCR - `positive.samples` a `r nrow(positive.samples)` rows `data.frame` of `r ncol(positive.samples)` columns describing each PCR
```{r} ```{r}
knitr::kable(head(positive.samples,n=3), head(positive.samples,n=3) %>%
format="html", gt() %>%
align = 'rc') %>% cols_align(align = "right", columns = 1) %>%
kable_styling(position = "center") cols_align(align = "center", columns = 2:ncol(head(positive.samples,n=3))) %>%
tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
<br> <br>
@@ -151,10 +171,16 @@ data("positive.motus")
- `positive.motus` : a `r nrow(positive.motus)` rows `data.frame` of `r ncol(positive.motus)` columns describing each MOTU - `positive.motus` : a `r nrow(positive.motus)` rows `data.frame` of `r ncol(positive.motus)` columns describing each MOTU
```{r} ```{r}
knitr::kable(head(positive.motus,n=3), head(positive.motus,n=3) %>%
format = "html", gt() %>%
align = 'rlrc') %>% cols_align(align = "right", columns = 1) %>%
kable_styling(position = "center") cols_align(align = "left", columns = 2) %>%
cols_align(align = "right", columns = 3) %>%
cols_align(align = "center", columns = 4) %>%
tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
<br> <br>
@@ -172,10 +198,17 @@ table(colSums(positive.count) == 1)
``` ```
```{r} ```{r}
kable(t(table(colSums(positive.count) == 1)), as.data.frame(t(table(colSums(positive.count) == 1))) %>%
format = "html") %>% gt() %>%
kable_styling(position = "center") %>% cols_align(align = "center", columns = everything()) %>%
row_spec(0, align = 'c') tab_style(
style = cell_text(align = "center"),
locations = cells_column_labels()
) %>%
tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
<br> <br>
@@ -195,7 +228,7 @@ positive.motus = positive.motus[are.not.singleton,]
Despite all standardization efforts Despite all standardization efforts
```{r fig.height=3} ```{r fig.height=3}
par(bg=NA) par(bg=NA)
hist(rowSums(positive.count), hist(rowSums(positive.count),
breaks = 15, breaks = 15,
xlab="Read counts", xlab="Read counts",
@@ -209,7 +242,7 @@ Is it related to the amount of DNA in the extract ?
## What do the reading numbers per PCR mean? {.smaller} ## What do the reading numbers per PCR mean? {.smaller}
```{r echo=TRUE, fig.height=4} ```{r echo=TRUE, fig.height=4}
par(bg=NA) par(bg=NA)
boxplot(rowSums(positive.count) ~ positive.samples$dilution,log="y") boxplot(rowSums(positive.count) ~ positive.samples$dilution,log="y")
abline(h = median(rowSums(positive.count)),lw=2,col="red",lty=2) abline(h = median(rowSums(positive.count)),lw=2,col="red",lty=2)
``` ```
@@ -288,7 +321,7 @@ table(are.still.present)
## Rarefying read count (4) {.flexbox .vcenter} ## Rarefying read count (4) {.flexbox .vcenter}
```{r echo=TRUE, fig.height=3.5} ```{r echo=TRUE, fig.height=3.5}
par(bg=NA) par(bg=NA)
boxplot(colSums(positive.count) ~ are.still.present, log="y") boxplot(colSums(positive.count) ~ are.still.present, log="y")
``` ```
@@ -360,10 +393,13 @@ knitr::include_graphics("figures/alpha_diversity.svg")
E1 = c(A=0.25,B=0.25,C=0.25,D=0.25,E=0,F=0,G=0) E1 = c(A=0.25,B=0.25,C=0.25,D=0.25,E=0,F=0,G=0)
E2 = c(A=0.55,B=0.07,C=0.02,D=0.17,E=0.07,F=0.07,G=0.03) E2 = c(A=0.55,B=0.07,C=0.02,D=0.17,E=0.07,F=0.07,G=0.03)
environments = t(data.frame(`Environment 1` = E1,`Environment 2` = E2)) environments = t(data.frame(`Environment 1` = E1,`Environment 2` = E2))
kable(environments, as.data.frame(environments) %>%
format="html", gt() %>%
align = 'rr') %>% cols_align(align = "right", columns = everything()) %>%
kable_styling(position = "center") tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## Richness {.flexbox .vcenter} ## Richness {.flexbox .vcenter}
@@ -379,10 +415,13 @@ S = rowSums(environments > 0)
``` ```
```{r} ```{r}
kable(data.frame(S=S), data.frame(S=S) %>%
format="html", gt() %>%
align = 'rr') %>% cols_align(align = "right", columns = everything()) %>%
kable_styling(position = "center") tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## Gini-Simpson's index {.smaller} ## Gini-Simpson's index {.smaller}
@@ -414,10 +453,13 @@ GS = 1 - rowSums(environments^2)
``` ```
```{r} ```{r}
kable(data.frame(`Gini-Simpson`=GS), data.frame(`Gini-Simpson`=GS) %>%
format="html", gt() %>%
align = 'rr') %>% cols_align(align = "right", columns = everything()) %>%
kable_styling(position = "center") tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## Shannon entropy {.smaller} ## Shannon entropy {.smaller}
@@ -443,10 +485,13 @@ H = - rowSums(environments * log(environments),na.rm = TRUE)
``` ```
```{r} ```{r}
kable(data.frame(`Shannon index`=H), data.frame(`Shannon index`=H) %>%
format="html", gt() %>%
align = 'rr') %>% cols_align(align = "right", columns = everything()) %>%
kable_styling(position = "center") tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## Hill's number {.smaller} ## Hill's number {.smaller}
@@ -476,10 +521,17 @@ D2 = exp(- rowSums(environments * log(environments),na.rm = TRUE))
``` ```
```{r} ```{r}
kable(data.frame(`Hill Numbers`=D2), data.frame(`Hill Numbers` = D2) %>%
format="html", gt() %>%
align = 'rr') %>% cols_align(align = "center") %>%
kable_styling(position = "center") tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) %>%
tab_options(
table.align = "center",
heading.align = "center"
)
``` ```
## Generalized logaritmic function {.smaller} ## Generalized logaritmic function {.smaller}
@@ -493,7 +545,7 @@ $$
The function is not defined for $q=1$ but when $q \longrightarrow 1\;,\; ^q\log(x) \longrightarrow \log(x)$ The function is not defined for $q=1$ but when $q \longrightarrow 1\;,\; ^q\log(x) \longrightarrow \log(x)$
$$ $$
^q\log(x) = \left\{ ^q\log(x) = \left\{
\begin{align} \begin{align}
\log(x),& \text{if } q = 1\\ \log(x),& \text{if } q = 1\\
\frac{x^{(1-q)}-1}{1-q},& \text{otherwise} \frac{x^{(1-q)}-1}{1-q},& \text{otherwise}
@@ -505,7 +557,7 @@ $$
log_q = function(x,q=1) { log_q = function(x,q=1) {
if (q==1) if (q==1)
log(x) log(x)
else else
(x^(1-q)-1)/(1-q) (x^(1-q)-1)/(1-q)
} }
``` ```
@@ -535,7 +587,7 @@ legend("topleft",legend = qs,fill = seq_along(qs),cex=1.5)
## And its inverse function {.flexbox .vcenter} ## And its inverse function {.flexbox .vcenter}
$$ $$
^qe^x = \left\{ ^qe^x = \left\{
\begin{align} \begin{align}
e^x,& \text{if } x = 1 \\ e^x,& \text{if } x = 1 \\
(1 + x(1-q))^{(\frac{1}{1-q})},& \text{otherwise} (1 + x(1-q))^{(\frac{1}{1-q})},& \text{otherwise}
@@ -601,14 +653,14 @@ environments.dq = apply(environments,MARGIN = 1,D_spectrum,q=qs)
```{r} ```{r}
par(mfrow=c(1,2),bg=NA) par(mfrow=c(1,2),bg=NA)
plot(qs,environments.hq[,2],type="l",col="red", plot(qs,environments.hq[,2],type="l",col="red",
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qH$'), ylab=TeX('$^qH$'),
xlim=c(-0.5,3.5), xlim=c(-0.5,3.5),
main="generalized entropy") main="generalized entropy")
points(qs,environments.hq[,1],type="l",col="blue") points(qs,environments.hq[,1],type="l",col="blue")
abline(v=c(0,1,2),lty=2,col=4:6) abline(v=c(0,1,2),lty=2,col=4:6)
plot(qs,environments.dq[,2],type="l",col="red", plot(qs,environments.dq[,2],type="l",col="red",
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qD$'), ylab=TeX('$^qD$'),
main="Hill's number") main="Hill's number")
@@ -657,7 +709,7 @@ plot(qs,H.mock,type="l",
xlim=c(-0.5,3.5), xlim=c(-0.5,3.5),
main="generalized entropy") main="generalized entropy")
abline(v=c(0,1,2),lty=2,col=4:6) abline(v=c(0,1,2),lty=2,col=4:6)
plot(qs,D.mock,type="l", plot(qs,D.mock,type="l",
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qD$'), ylab=TeX('$^qD$'),
main="Hill's number") main="Hill's number")
@@ -674,7 +726,7 @@ positive.H = apply(positive.count.relfreq,
``` ```
```{r} ```{r}
par(bg=NA) par(bg=NA)
boxplot(t(positive.H), boxplot(t(positive.H),
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qH$'), ylab=TeX('$^qH$'),
@@ -685,7 +737,7 @@ points(H.mock,col="red",type="l")
## Biodiversity spectrum and metabarcoding (2) {.flexbox .vcenter .smaller} ## Biodiversity spectrum and metabarcoding (2) {.flexbox .vcenter .smaller}
```{r} ```{r}
par(bg=NA) par(bg=NA)
boxplot(t(positive.H)[,11:31], boxplot(t(positive.H)[,11:31],
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qH$'), ylab=TeX('$^qH$'),
@@ -706,7 +758,7 @@ positive.D = apply(positive.count.relfreq,
``` ```
```{r} ```{r}
par(bg=NA) par(bg=NA)
boxplot(t(positive.D), boxplot(t(positive.D),
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qD$'), ylab=TeX('$^qD$'),
@@ -753,7 +805,7 @@ positive.clean.H = apply(positive.clean.count.relfreq,
``` ```
```{r fig.height=3.5} ```{r fig.height=3.5}
par(bg=NA) par(bg=NA)
boxplot(t(positive.clean.H), boxplot(t(positive.clean.H),
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qH$'), ylab=TeX('$^qH$'),
@@ -771,7 +823,7 @@ positive.clean.D = apply(positive.clean.count.relfreq,
``` ```
```{r} ```{r}
par(bg=NA) par(bg=NA)
boxplot(t(positive.clean.D), boxplot(t(positive.clean.D),
xlab=TeX('$q$'), xlab=TeX('$q$'),
ylab=TeX('$^qD$'), ylab=TeX('$^qD$'),
@@ -1069,7 +1121,7 @@ BC_{jk}=\frac{\sum _{i=1}^{p}(N_{ij} - min(N_{ij},N_{ik}) + (N_{ik} - min(N_{ij}
$$ $$
$$ $$
BC_{jk}=\frac{\sum _{i=1}^{p}|N_{ij} - N_{ik}|}{\sum _{i=1}^{p}N_{ij}+\sum _{i=1}^{p}N_{ik}} BC_{jk}=\frac{\sum _{i=1}^{p}|N_{ij} - N_{ik}|}{\sum _{i=1}^{p}N_{ij}+\sum _{i=1}^{p}N_{ik}}
$$ $$
$$ $$
@@ -1159,7 +1211,7 @@ legend("topleft",legend = levels(samples.type),fill = 1:4,cex=1.2)
````{=html} ````{=html}
<!--- <!---
## Computation of norms ## Computation of norms
```{r guiana_norm, echo=TRUE} ```{r guiana_norm, echo=TRUE}
guiana.n1.dist = norm(guiana.relfreq.final,l=1) guiana.n1.dist = norm(guiana.relfreq.final,l=1)
@@ -1168,7 +1220,7 @@ guiana.n3.dist = norm(guiana.relfreq.final^(1/3),l=3)
guiana.n4.dist = norm(guiana.relfreq.final^(1/100),l=100) guiana.n4.dist = norm(guiana.relfreq.final^(1/100),l=100)
``` ```
## pCoA on norms ## pCoA on norms
```{r dependson="guiana_norm"} ```{r dependson="guiana_norm"}
guiana.n1.pcoa = cmdscale(guiana.n1.dist,k=3,eig = TRUE) guiana.n1.pcoa = cmdscale(guiana.n1.dist,k=3,eig = TRUE)