Files
OBIJupyterHub/Readme.md

279 lines
12 KiB
Markdown
Raw Permalink Normal View History

2025-11-25 11:59:28 +01:00
# OBIJupyterHub - the DNA Metabarcoding Learning Server
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
## Intended use
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
This project packages the MetabarcodingSchool training lab into one reproducible bundle. You get Python, R, and Bash kernels, a Quarto-built course website, and preconfigured admin/student accounts, so onboarding a class is a single command instead of a day of setup. Everything runs locally on a single machine, student work persists between sessions, and `./start-jupyterhub.sh` takes care of building images, rendering the site, preparing volumes, and bringing JupyterHub up at `http://localhost:8888`. Defaults (accounts, passwords, volumes) live in the repo so instructors can tweak them quickly.
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
## Prerequisites (with quick checks)
2025-10-15 15:04:00 +02:00
You only need **Docker and Docker Compose** on the machine that will host the lab. All other tools (Quarto, Hugo, Python, R) are provided via a builder Docker image and do not need to be installed on your system.
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
- macOS: install [OrbStack](https://orbstack.dev/) (recommended) or Docker Desktop; both ship Docker Engine and Compose.
- Linux: install Docker Engine and the Compose plugin from your distribution (e.g., `sudo apt install docker.io docker-compose-plugin`) or from Dockers official packages.
- Windows: install Docker Desktop with the WSL2 backend enabled.
2025-11-17 14:22:19 +01:00
Verify from a terminal:
2025-10-14 17:40:41 +02:00
2025-10-16 01:07:07 +02:00
```bash
2025-11-25 10:51:23 +01:00
docker --version
docker compose version # or: docker-compose --version
2025-10-14 17:40:41 +02:00
```
2025-11-25 10:51:23 +01:00
## How the startup script works
`./start-jupyterhub.sh` is the single entry point. It builds the Docker images, renders the course website, prepares the volume folders, and starts the stack. Internally it:
- creates the `jupyterhub_volumes/` tree (caddy, course, shared, users, web...)
- builds the `obijupyterhub-builder` image (contains Quarto, Hugo, R, Python) if not already present
2025-11-25 10:51:23 +01:00
- builds `jupyterhub-student` and `jupyterhub-hub` images
- detects R package dependencies from Quarto files using the `{attachment}` package and installs them automatically
2025-11-25 10:51:23 +01:00
- renders the Quarto site from `web_src/`, generates PDF galleries and `pages.json`, and copies everything into `jupyterhub_volumes/web/`
- runs `docker-compose up -d --remove-orphans`
### Builder image
The builder image (`obijupyterhub-builder`) contains all the tools needed to prepare the course materials:
- **Quarto** for rendering the course website
- **Hugo** for building the obidoc documentation
- **R** with the `{attachment}` package for automatic dependency detection
- **Python 3** for utility scripts
This means you don't need to install any of these tools on your host system. The script automatically builds this image on first run and reuses it for subsequent builds. Use `--force-rebuild` to rebuild the builder image if needed.
### R package caching for builds
R packages required by your Quarto documents are automatically detected and installed during the build process. These packages are cached in `jupyterhub_volumes/builder/R_packages/` so they persist across builds. This means:
- **First build**: All R packages used in your `.qmd` files are detected and installed (may take some time)
- **Subsequent builds**: Only missing packages are installed, making builds much faster
- **Adding new packages**: Simply use `library(newpackage)` in your Quarto files; the build process will detect and install it automatically
To clear the R package cache and force a fresh installation, delete the `jupyterhub_volumes/builder/R_packages/` directory.
2025-11-25 11:59:28 +01:00
You can tailor what it does with a few flags:
- `--no-build` (or `--offline`): skip Docker image builds and reuse existing images (useful when offline).
- `--force-rebuild`: rebuild images without cache.
- `--stop-server`: stop the stack and remove student containers, then exit.
- `--update-lectures`: rebuild the course website only (no Docker stop/start).
- `--build-obidoc`: force rebuilding the obidoc documentation (auto-built if empty; skipped in offline mode).
2025-11-25 10:51:23 +01:00
## Installation and first run
1) Clone the project:
2025-10-16 01:07:07 +02:00
```bash
2025-11-25 10:51:23 +01:00
git clone https://forge.metabarcoding.org/MetabarcodingSchool/OBIJupyterHub.git
2025-10-16 01:07:07 +02:00
cd OBIJupyterHub
```
2025-11-25 10:51:23 +01:00
2) (Optional) glance at the structure youll populate:
2025-10-14 17:40:41 +02:00
2025-10-15 15:04:00 +02:00
```
2025-10-16 01:07:07 +02:00
OBIJupyterHub
2025-11-25 10:51:23 +01:00
├── start-jupyterhub.sh - single entry point (build + render + start)
├── obijupyterhub - Docker images and stack definitions
2025-10-16 01:07:07 +02:00
│   ├── docker-compose.yml
│   ├── Dockerfile
│   ├── Dockerfile.hub
2025-11-25 10:51:23 +01:00
│   └── jupyterhub_config.py
├── jupyterhub_volumes - data persisted on the host
│   ├── course - read-only for students (notebooks, data, bin, R packages)
│   ├── shared - shared read/write space for everyone
│   ├── users - per-user persistent data
│   └── web - rendered course website
└── web_src - Quarto sources for the course website
2025-10-14 17:40:41 +02:00
```
Note: The `obijupyterhub/` directory also contains `Dockerfile.builder` which provides the build environment, the `tools/` directory contains utility scripts including `install_quarto_deps.R` for automatic R dependency detection, and `jupyterhub_volumes/builder/` stores cached R packages for faster builds.
2025-11-25 10:51:23 +01:00
3) Prepare course materials (optional before first run):
- Put notebooks, datasets, scripts, binaries, or PDFs for students under `jupyterhub_volumes/course/`. They will appear read-only at `/home/jovyan/work/course/`.
- For collaborative work, drop files in `jupyterhub_volumes/shared/` (read/write for all at `/home/jovyan/work/shared/`).
- Edit or add Quarto sources in `web_src/` to update the course website; the script will render them.
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
4) Start everything (build + render + launch):
2025-10-16 01:07:07 +02:00
2025-11-25 10:51:23 +01:00
```bash
2025-10-15 07:10:44 +02:00
./start-jupyterhub.sh
2025-10-14 17:40:41 +02:00
```
2025-11-25 10:51:23 +01:00
5) Access JupyterHub in a browser at `http://localhost:8888`.
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
6) Stop the stack when youre done (run from `obijupyterhub/`):
2025-10-15 15:04:00 +02:00
2025-11-25 10:51:23 +01:00
```bash
2025-10-14 17:40:41 +02:00
docker-compose down
```
2025-11-25 11:59:28 +01:00
### Operating the stack (one command, a few options)
2025-10-14 17:40:41 +02:00
2025-11-25 11:59:28 +01:00
- Start or rebuild: `./start-jupyterhub.sh` (rebuilds images, regenerates the website, starts the stack).
- Start without rebuilding images (offline): `./start-jupyterhub.sh --no-build`
- Force rebuild without cache: `./start-jupyterhub.sh --force-rebuild`
- Stop only: `./start-jupyterhub.sh --stop-server`
- Rebuild website only (no Docker stop/start): `./start-jupyterhub.sh --update-lectures`
- Rebuild obidoc docs: `./start-jupyterhub.sh --build-obidoc` (also builds automatically if `jupyterhub_volumes/web/obidoc` is empty; skipped in offline mode)
2025-11-25 10:51:23 +01:00
- Access at `http://localhost:8888` (students: any username / password `metabar2025`; admin: `admin` / `admin2025`).
- Check logs from `obijupyterhub/` with `docker-compose logs -f jupyterhub`.
- Stop with `docker-compose down` (from `obijupyterhub/`). Rerun `./start-jupyterhub.sh` to start again or after config changes.
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
## Managing shared data
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
Each student lands in `/home/jovyan/work/` with three key areas: their own files, a shared space, and a read-only course space. Everything under `work/` is persisted on the host in `jupyterhub_volumes`.
2025-10-15 15:04:00 +02:00
```
2025-10-16 01:07:07 +02:00
work/ # Personal workspace root (persistent)
2025-10-15 14:08:52 +02:00
├── [student files] # Their own files and notebooks
├── R_packages/ # Personal R packages (writable by student)
├── shared/ # Shared workspace (read/write, shared with all)
└── course/ # Course files (read-only, managed by admin)
├── R_packages/ # Shared R packages (read-only, installed by prof)
├── bin/ # Shared executables (in PATH)
└── [course materials] # Your course files
```
2025-11-25 10:51:23 +01:00
R looks for packages in this order: personal `work/R_packages/`, then shared `work/course/R_packages/`, then system libraries. Because everything lives under `work/`, student files survive restarts.
2025-10-15 07:15:05 +02:00
### User Accounts
2025-11-25 10:51:23 +01:00
Defaults are defined in `obijupyterhub/docker-compose.yml`: admin (`admin` / `admin2025`) with write access to `course/`, and students (any username, password `metabar2025`) with read-only access to `course/`. Adjust `JUPYTERHUB_ADMIN_PASSWORD` and `JUPYTERHUB_PASSWORD` there, then rerun `./start-jupyterhub.sh`.
2025-10-15 07:15:05 +02:00
### Installing R Packages (Admin Only)
2025-11-25 10:51:23 +01:00
From the host, install shared R packages into `course/R_packages/`:
2025-10-15 07:15:05 +02:00
2025-10-15 15:04:00 +02:00
``` bash
2025-10-15 07:15:05 +02:00
# Install packages
2025-10-16 01:07:07 +02:00
tools/install_packages.sh reshape2 plotly knitr
2025-10-15 07:15:05 +02:00
```
2025-11-25 10:51:23 +01:00
Students can install their own packages into their personal `work/R_packages/`:
2025-10-15 14:08:52 +02:00
2025-10-16 01:07:07 +02:00
```r
2025-10-15 14:08:52 +02:00
# Install in personal library (each student has their own)
2025-10-16 01:07:07 +02:00
install.packages('mypackage') # Will install in work/R_packages/
2025-10-15 14:08:52 +02:00
```
2025-10-15 07:15:05 +02:00
### Using R Packages (Students)
Students simply load packages normally:
2025-10-15 15:04:00 +02:00
``` r
2025-10-15 14:08:52 +02:00
library(reshape2) # R checks: 1) work/R_packages/ 2) work/course/R_packages/ 3) system
2025-10-15 07:15:05 +02:00
library(plotly)
```
2025-10-16 01:07:07 +02:00
R automatically searches in this order:
1. Personal packages: `/home/jovyan/work/R_packages/` (R_LIBS_USER)
1. Prof packages: `/home/jovyan/work/course/R_packages/` (R_LIBS_SITE)
1. System packages
2025-10-15 07:15:05 +02:00
### List Available Packages
2025-10-15 15:04:00 +02:00
``` r
2025-10-15 14:08:52 +02:00
# List all available packages (personal + course + system)
2025-10-15 07:15:05 +02:00
installed.packages()[,"Package"]
2025-10-15 14:08:52 +02:00
# Check personal packages
list.files("/home/jovyan/work/R_packages")
# Check course packages (installed by prof)
list.files("/home/jovyan/work/course/R_packages")
2025-10-15 07:15:05 +02:00
```
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
### Deposit or retrieve course and student files
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
On the host, place course files in `jupyterhub_volumes/course/` (they appear read-only to students), shared files in `jupyterhub_volumes/shared/`, and collect student work from `jupyterhub_volumes/users/`.
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
## User Management
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
### Option 1: Predefined User List
2025-10-15 15:04:00 +02:00
2025-10-15 07:10:44 +02:00
In `jupyterhub_config.py`, uncomment and modify:
2025-10-15 15:04:00 +02:00
``` python
2025-10-15 07:10:44 +02:00
c.Authenticator.allowed_users = {'student1', 'student2', 'student3'}
2025-10-14 17:40:41 +02:00
```
2025-10-15 07:10:44 +02:00
### Option 2: Allow Everyone (for testing)
2025-10-15 15:04:00 +02:00
2025-10-15 07:10:44 +02:00
By default, the configuration allows any user:
2025-10-15 15:04:00 +02:00
``` python
2025-10-14 17:40:41 +02:00
c.Authenticator.allow_all = True
```
2025-10-15 07:10:44 +02:00
⚠️ **Warning**: DummyAuthenticator is ONLY for local testing!
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
## Kernel Verification
2025-10-14 17:40:41 +02:00
2025-10-16 01:07:07 +02:00
Once logged in, create a new notebook and verify you have access to:
- **Python 3** (default kernel)
- **R** (R kernel)
- **Bash** (bash kernel)
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
## Customization for Your Labs
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
### Add Additional R Packages
2025-10-15 15:04:00 +02:00
2025-10-15 07:10:44 +02:00
Modify the `Dockerfile` (before `USER ${NB_UID}`):
2025-10-15 15:04:00 +02:00
``` dockerfile
2025-10-15 07:10:44 +02:00
RUN R -e "install.packages(c('your_package'), repos='http://cran.rstudio.com/')"
2025-10-14 17:40:41 +02:00
```
2025-11-25 10:51:23 +01:00
Then rerun `./start-jupyterhub.sh` to rebuild and restart.
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
### Add Python Packages
2025-10-15 15:04:00 +02:00
2025-10-15 07:10:44 +02:00
Add to the `Dockerfile` (before `USER ${NB_UID}`):
2025-10-15 15:04:00 +02:00
``` dockerfile
2025-10-14 17:40:41 +02:00
RUN pip install numpy pandas matplotlib seaborn
```
2025-11-25 10:51:23 +01:00
Then rerun `./start-jupyterhub.sh` to rebuild and restart.
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
### Change Port (if 8000 is occupied)
2025-10-15 15:04:00 +02:00
2025-10-15 07:10:44 +02:00
Modify in `docker-compose.yml`:
2025-10-15 15:04:00 +02:00
``` yaml
2025-10-14 17:40:41 +02:00
ports:
2025-10-15 07:10:44 +02:00
- "8001:8000" # Accessible on localhost:8001
2025-10-14 17:40:41 +02:00
```
2025-10-15 07:10:44 +02:00
## Advantages of This Approach
2025-10-14 17:40:41 +02:00
2025-10-15 15:04:00 +02:00
**Everything in Docker**: No need to install Python/JupyterHub on your computer\
**Portable**: Easy to deploy on another server\
**Isolated**: No pollution of your system environment\
**Easy to Clean**: A simple `docker-compose down` is enough\
2025-10-15 07:10:44 +02:00
**Reproducible**: Students will have exactly the same environment
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
## Troubleshooting
2025-10-14 17:40:41 +02:00
2025-11-25 10:51:23 +01:00
- Docker daemon unavailable: make sure OrbStack/Docker Desktop/daemon is running; verify `/var/run/docker.sock` exists.
- Student containers do not start: check `docker-compose logs jupyterhub` and confirm the images exist with `docker images | grep jupyterhub-student`.
- Port conflict: change the published port in `docker-compose.yml`.
2025-10-15 15:04:00 +02:00
2025-10-14 17:40:41 +02:00
2025-10-15 07:10:44 +02:00
**I want to start from scratch**:
2025-10-15 15:04:00 +02:00
``` bash
2025-11-25 10:51:23 +01:00
pushd obijupyterhub
2025-10-14 17:40:41 +02:00
docker-compose down -v
docker rmi jupyterhub-hub jupyterhub-student obijupyterhub-builder
2025-10-16 01:07:07 +02:00
popd
# Optionally clear the R package cache
rm -rf jupyterhub_volumes/builder/R_packages
2025-10-15 07:10:44 +02:00
# Then rebuild everything
./start-jupyterhub.sh
2025-11-17 14:22:19 +01:00
```