14 Commits

Author SHA1 Message Date
Eric Coissac
021e2a3919 A pretty jupyter login 2025-11-21 14:40:46 +01:00
Eric Coissac
30b7175702 Make cleaning 2025-11-17 14:18:13 +01:00
Eric Coissac
35d275c104 Add pedagogic content 2025-11-16 14:55:30 +01:00
Eric Coissac
78156a8c95 bug des volumes utilisateurs 2025-11-16 14:55:30 +01:00
Eric Coissac
0eae496a94 Downsize the student image 2025-11-05 17:28:55 +01:00
Eric Coissac
d49592c5d4 Evolution of Unix course 2025-10-31 19:56:10 +01:00
Eric Coissac
1d0db893f1 Unix lecture - translation 2025-10-31 14:20:37 +01:00
cc0a7d446d Merge pull request 'First complete version' (#4) from push-qsttonrunzsp into master
Reviewed-on: #4
2025-10-16 18:49:32 +00:00
Eric Coissac
02b48e75fa First complete version 2025-10-16 20:48:35 +02:00
57bf9934a3 Merge pull request 'Add a sftpgo server and a web server' (#3) from push-xmzwrlqxnvns into master
Reviewed-on: #3
2025-10-15 23:03:39 +00:00
Eric Coissac
a3608759c5 Add the sftpgo server and a web server 2025-10-16 01:02:07 +02:00
ae77f71b6c Merge pull request 'Correction of the doc' (#2) from push-zkpmmkmzmzwl into master
Reviewed-on: #2
2025-10-15 13:37:49 +00:00
Eric Coissac
362720d93d Correction of the doc 2025-10-15 15:16:42 +02:00
6b49ae48a3 Merge pull request 'push-svpkkstsnzoy' (#1) from push-svpkkstsnzoy into push-qlwkxzrrwlkv
Reviewed-on: #1
2025-10-15 13:00:00 +00:00
437 changed files with 1256224 additions and 304 deletions

15
.gitignore vendored Normal file
View File

@@ -0,0 +1,15 @@
/Affinity
/jupyterhub_volumes/users
/jupyterhub_volumes/shared
/jupyterhub_volumes/jupyterhub
/jupyterhub_volumes/caddy
/jupyterhub_volumes/course/data/Genbank
/**/.DS_Store
/web_src/**/*.RData
/web_src/**/*.pdf
/web_src/**/*_files
/web_src/**/*_cache
/.luarc.json
/sandbox
*.log
ncbitaxo_*

View File

@@ -0,0 +1 @@
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QA password that secures access to the system\nA user number or UID (User IDentifier) that identifies the user on the machine\nA location on the hard disk to store the user's files, called Home or home\nA group of users, which allows working in groups on the machine (see later)\\E$"}

3
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,3 @@
{
"git.enabled": false
}

View File

@@ -1,49 +0,0 @@
# ---------- Stage 1 : builder ----------
FROM jupyter/base-notebook:latest AS builder
USER root
# Install system dependencies for R, build tools and Go/Rust
RUN apt-get update && apt-get install -y \
r-base r-base-dev \
libcurl4-openssl-dev libssl-dev libxml2-dev \
build-essential git curl \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Install R kernel + useful packages
RUN R -e "install.packages('IRkernel', repos='http://cran.rstudio.com/')" && \
R -e "IRkernel::installspec(user = FALSE)" && \
R -e "install.packages(c('tidyverse','vegan','ade4'), repos='http://cran.rstudio.com/')"
# Install bash kernel
RUN pip install bash_kernel && python -m bash_kernel.install --sys-prefix
# Install obitools4
RUN curl -L https://raw.githubusercontent.com/metabarcoding/obitools4/master/install_obitools.sh | bash
# Install csvkit
RUN pip install csvkit
# Install csvlens via Rust
RUN curl https://sh.rustup.rs -sSf | bash -s -- -y && \
. $HOME/.cargo/env && \
cargo install csvlens
RUN apt-get update && apt-get install -y ruby ruby-dev build-essential \
&& gem install youplot
# Copy csvlens to /usr/local/bin for final use
RUN cp $HOME/.cargo/bin/csvlens /usr/local/bin/
# Set permissions for Jupyter user
RUN mkdir -p /home/${NB_USER}/.local/share/jupyter && \
chown -R ${NB_UID}:${NB_GID} /home/${NB_USER}
# Switch back to Jupyter user
USER ${NB_UID}:${NB_GID}
WORKDIR /home/${NB_USER}/work
# Environment variables
ENV PATH="/home/${NB_USER}/work/course/bin:${PATH}"
ENV R_LIBS_USER="/home/${NB_USER}/work/R_packages"
ENV R_LIBS_SITE="/home/${NB_USER}/work/course/R_packages:/usr/local/lib/R/site-library:/usr/lib/R/site-library"

285
Readme.md
View File

@@ -1,96 +1,127 @@
# JupyterHub Configuration with OrbStack on Mac (all in Docker)
## Prerequisites
- OrbStack installed and running
## File Structure
You must have docker running on your computer
Your `~/jupyterhub-tp` directory should contain:
```
~/jupyterhub-tp/
├── Dockerfile # Image for students (already created)
├── Dockerfile.hub # Image for JupyterHub (new)
├── jupyterhub_config.py # Configuration
├── docker-compose.yml # Orchestration
└── start-jupyterhub.sh # Startup script
```
- On MacOS, [OrbStack](https://orbstack.dev/ "A Docker implementation optimised for MacOS") is recommanded
## Installation Steps
### 1. Create Directory Structure
```bash
mkdir -p ~/jupyterhub-tp
cd ~/jupyterhub-tp
git clone https://forge.metabarcoding.org/MetabarcodingSchool/OBIJupyterHub.git
```
### 2. Create All Necessary Files
Create the following files with the content from artifacts:
- `Dockerfile` (artifact "Dockerfile for JupyterHub with R and Bash")
- `Dockerfile.hub` (artifact "Dockerfile for JupyterHub container")
- `jupyterhub_config.py` (artifact "JupyterHub Configuration")
- `docker-compose.yml` (artifact "docker-compose.yml")
- `start-jupyterhub.sh` (artifact "start-jupyterhub.sh")
### 3. Make Startup Script Executable
Enter into the `OBIJupyterHub` directory
```bash
chmod +x start-jupyterhub.sh
cd OBIJupyterHub
```
### 4. Start JupyterHub
#### File Structure
```bash
Your `OBIJupyterHub` directory should contain:
```
OBIJupyterHub
├── start-jupyterhub.sh - The script used to setup and start the server
├── obijupyterhub - The files describing the docker images and the stack
│   ├── Caddyfile
│   ├── docker-compose.yml
│   ├── Dockerfile
│   ├── Dockerfile.hub
│   ├── jupyterhub_config.py
│   ├── sftpgo_config.json
│   └── start-notebook.sh
├── jupyterhub_volumes - The directory containing the docker volumes
│   ├── caddy
│   ├── course - Read only volume mounted on every student container
│   │   ├── bin
│   │   └── R_packages
│   ├── jupyterhub
│   ├── shared - Read write volume shared in every student container
│   ├── users
│   └── web
│   ├── img
│   │   └── welcome_metabar.webp
│   ├── index.html
│   └── pages
├── Readme.md - This documentation
├── tools
│   ├── generate_pages_json.py
│   └── install_packages.sh
└─── web_src - The quarto document sources used to build the web site
   ├── _output
   ├── _quarto.yml
   ├── 00_home.qmd
   ├── lectures
   │   └── computers
   │   └── regex
   │   ├── lecture_regex.qmd
   │   ├── slides_regex.qmd
   │   └── slides.css
   └── scripts
   └── copy-to-web.sh
```
### 2. Start JupyterHub
From the terminal, in the `OBIJupyterHub` directory, run the following command:
``` bash
./start-jupyterhub.sh
```
### 5. Access JupyterHub
### 3. Access JupyterHub
Open your browser and go to: **http://localhost:8888**
You can log in with any username and password: `metabar2025`
You can log in as a student with any username and password: `metabar2025`
## Useful Commands
### View JupyterHub logs
```bash
``` bash
cd obijupyterhub
docker-compose logs -f jupyterhub
```
### View all containers (hub + students)
```bash
docker ps
``` bash
docker ps | grep jupyterhub
```
### Stop JupyterHub
```bash
``` bash
cd obijupyterhub
docker-compose down
```
### Restart JupyterHub (after config modification)
```bash
docker-compose restart jupyterhub
```
### Rebuild after Dockerfile modification
```bash
# For student image
docker build -t jupyterhub-student:latest -f Dockerfile .
``` bash
cd obijupyterhub
docker-compose restart jupyterhub
# For hub image
docker-compose up -d --build
```
### View logs for a specific student
```bash
docker logs jupyter-username
``` bash
docker logs jupyter-<username>
```
Replace `<username>` by the actual username of the student.
### Clean up after lab
```bash
``` bash
# Stop and remove all containers
cd obijupyterhub
docker-compose down
# Remove student containers
@@ -110,8 +141,9 @@ docker volume prune -f
### Directory Structure for Each Student
Each student will see this directory structure in their JupyterLab (everything under `work/` is persistent):
```
work/ # Personal workspace root (persistent)
```
work/ # Personal workspace root (persistent)
├── [student files] # Their own files and notebooks
├── R_packages/ # Personal R packages (writable by student)
├── shared/ # Shared workspace (read/write, shared with all)
@@ -121,56 +153,38 @@ work/ # Personal workspace root (persistent)
└── [course materials] # Your course files
```
**R Package Priority:**
1. R checks `work/R_packages/` first (personal, writable)
2. Then `work/course/R_packages/` (shared, read-only, installed by prof)
3. Then system libraries
**R Package Priority:**
1. R checks `work/R_packages/` first (personal, writable)
1. Then `work/course/R_packages/` (shared, read-only, installed by prof)
1. Then system libraries
**Important:** Everything is under `work/`, so all student files are automatically saved in their persistent volume.
### User Accounts
**Admin Account:**
- Username: `admin`
- Password: `admin2025` (change in docker-compose.yml: `JUPYTERHUB_ADMIN_PASSWORD`)
**Admin Account:**
- Username: `admin`
- Password: `admin2025` (change in docker-compose.yml: `JUPYTERHUB_ADMIN_PASSWORD`)
- Can write to `course/` directory
**Student Accounts:**
- Username: any name
- Password: `metabar2025` (change in docker-compose.yml: `JUPYTERHUB_PASSWORD`)
**Student Accounts:**
- Username: any name
- Password: `metabar2025` (change in docker-compose.yml: `JUPYTERHUB_PASSWORD`)
- Read-only access to `course/` directory
### Installing R Packages (Admin Only)
**From your Mac (recommended):**
```bash
chmod +x install-r-packages-admin.sh
``` bash
# Install packages
./install-r-packages-admin.sh reshape2 plotly knitr
tools/install_packages.sh reshape2 plotly knitr
```
This script:
- Installs packages in the `course/R_packages/` directory
- All students can use them (read-only)
- No need to rebuild the image
**From admin notebook:**
Login as `admin` and create an R notebook:
```r
# Install packages in course/R_packages (admin only, available to all students)
course_lib <- "/home/jovyan/work/course/R_packages"
dir.create(course_lib, recursive = TRUE, showWarnings = FALSE)
install.packages(c('reshape2', 'plotly', 'knitr'),
lib = course_lib,
repos = 'http://cran.rstudio.com/')
```
Note: Admin account has write access to the course directory.
This script: - Installs packages in the `course/R_packages/` directory - All students can use them (read-only) - No need to rebuild the image
**Students can also install their own packages:**
@@ -178,25 +192,27 @@ Students can install packages in their personal `work/R_packages/`:
```r
# Install in personal library (each student has their own)
install.packages(c('mypackage')) # Will install in work/R_packages/
install.packages('mypackage') # Will install in work/R_packages/
```
### Using R Packages (Students)
Students simply load packages normally:
```r
``` r
library(reshape2) # R checks: 1) work/R_packages/ 2) work/course/R_packages/ 3) system
library(plotly)
```
R automatically searches in this order:
1. Personal packages: `/home/jovyan/work/R_packages/` (R_LIBS_USER)
2. Prof packages: `/home/jovyan/work/course/R_packages/` (R_LIBS_SITE)
3. System packages
R automatically searches in this order:
1. Personal packages: `/home/jovyan/work/R_packages/` (R_LIBS_USER)
1. Prof packages: `/home/jovyan/work/course/R_packages/` (R_LIBS_SITE)
1. System packages
### List Available Packages
```r
``` r
# List all available packages (personal + course + system)
installed.packages()[,"Package"]
@@ -211,7 +227,7 @@ list.files("/home/jovyan/work/course/R_packages")
To put files in the `course/` directory (accessible read-only):
```bash
``` bash
# Create a temporary directory
mkdir -p ~/jupyterhub-tp/course-files
@@ -226,34 +242,21 @@ docker run --rm \
alpine sh -c "cp -r /source/* /target/"
```
### Access Shared Files Between Students
Students can collaborate via the `shared/` directory:
```python
# In a notebook, to read a shared file
import pandas as pd
df = pd.read_csv('/home/jovyan/work/shared/group_data.csv')
# To write a shared file
df.to_csv('/home/jovyan/work/shared/alice_results.csv')
```
### Retrieve Student Work
```bash
``` bash
# List user volumes
docker volume ls | grep jupyterhub-user
docker volume ls | grep 'obijupyterhub_user-'
# Copy files from a specific student
docker run --rm \
-v jupyterhub-user-alice:/source \
-v obijupyterhub_user-alice:/source \
-v ~/submissions:/target \
alpine sh -c "cp -r /source/* /target/alice/"
# Copy all shared work
docker run --rm \
-v jupyterhub-shared:/source \
-v obijupyterhub_shared:/source \
-v ~/submissions/shared:/target \
alpine sh -c "cp -r /source/* /target/"
```
@@ -261,14 +264,18 @@ docker run --rm \
## User Management
### Option 1: Predefined User List
In `jupyterhub_config.py`, uncomment and modify:
```python
``` python
c.Authenticator.allowed_users = {'student1', 'student2', 'student3'}
```
### Option 2: Allow Everyone (for testing)
By default, the configuration allows any user:
```python
``` python
c.Authenticator.allow_all = True
```
@@ -276,75 +283,87 @@ c.Authenticator.allow_all = True
## Kernel Verification
Once logged in, create a new notebook and verify you have access to:
- **Python 3** (default kernel)
- **R** (R kernel)
Once logged in, create a new notebook and verify you have access to:
- **Python 3** (default kernel)
- **R** (R kernel)
- **Bash** (bash kernel)
## Customization for Your Labs
### Add Additional R Packages
Modify the `Dockerfile` (before `USER ${NB_UID}`):
```dockerfile
``` dockerfile
RUN R -e "install.packages(c('your_package'), repos='http://cran.rstudio.com/')"
```
Then rebuild:
Then restart the server (it rebuilds the images if needed):
```bash
docker build -t jupyterhub-student:latest -f Dockerfile .
docker-compose restart jupyterhub
./start-jupyterhub.sh
```
### Add Python Packages
Add to the `Dockerfile` (before `USER ${NB_UID}`):
```dockerfile
``` dockerfile
RUN pip install numpy pandas matplotlib seaborn
```
### Distribute Files to Students
Create a `files_lab/` directory and add to the `Dockerfile`:
```dockerfile
``` dockerfile
COPY files_lab/ /home/${NB_USER}/lab/
RUN chown -R ${NB_UID}:${NB_GID} /home/${NB_USER}/lab
```
### Change Port (if 8000 is occupied)
Modify in `docker-compose.yml`:
```yaml
``` yaml
ports:
- "8001:8000" # Accessible on localhost:8001
```
## Advantages of This Approach
**Everything in Docker**: No need to install Python/JupyterHub on your Mac
**Portable**: Easy to deploy on another Mac or server
**Isolated**: No pollution of your system environment
**Easy to Clean**: A simple `docker-compose down` is enough
✅ **Everything in Docker**: No need to install Python/JupyterHub on your computer\
✅ **Portable**: Easy to deploy on another server\
✅ **Isolated**: No pollution of your system environment\
✅ **Easy to Clean**: A simple `docker-compose down` is enough\
✅ **Reproducible**: Students will have exactly the same environment
## Troubleshooting
**Error "Cannot connect to Docker daemon"**:
- Check that OrbStack is running
**Error "Cannot connect to Docker daemon"**:
- Check that OrbStack is running
- Verify the socket exists: `ls -la /var/run/docker.sock`
**Student containers don't start**:
- Check logs: `docker-compose logs jupyterhub`
**Student containers don't start**:
- Check logs: `docker-compose logs jupyterhub`
- Verify student image exists: `docker images | grep jupyterhub-student`
**Port 8000 already in use**:
**Port 8000 already in use**:
- Change port in `docker-compose.yml`
**After config modification, changes are not applied**:
```bash
docker-compose restart jupyterhub
```
**I want to start from scratch**:
```bash
``` bash
push obijupyterhub
docker-compose down -v
docker rmi jupyterhub-hub jupyterhub-student
popd
# Then rebuild everything
./start-jupyterhub.sh
```
```

View File

@@ -1,37 +0,0 @@
services:
jupyterhub:
build:
context: .
dockerfile: Dockerfile.hub
container_name: jupyterhub
image: jupyterhub-hub:latest
ports:
- "8888:8000"
volumes:
# Access to Docker socket to spawn student containers
- /var/run/docker.sock:/var/run/docker.sock
# JupyterHub database persistence
- jupyterhub-data:/srv/jupyterhub
# Mount config file directly (for easy modifications)
- ./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro
networks:
- jupyterhub-network
restart: unless-stopped
environment:
# Shared password for all students
JUPYTERHUB_PASSWORD: metabar2025
# Admin password (for installing R packages)
JUPYTERHUB_ADMIN_PASSWORD: admin2025
# Optional environment variables
DOCKER_NOTEBOOK_DIR: /home/jovyan/work
networks:
jupyterhub-network:
name: jupyterhub-network
driver: bridge
volumes:
jupyterhub-data:
jupyterhub-shared:
jupyterhub-course:

862
fish-primers.qmd Normal file
View File

@@ -0,0 +1,862 @@
---
title: "Designing Fish primers"
format: revealjs
editor: visual
---
```{r setup, include=FALSE}
library(knitr)
```
# Preparing the data
------------------------------------------------------------------------
## What do we need ?
To design a new animal DNA metabarcode we download from the NCBI the following data
- The complete set of whole mitochondrial genomes
- The NCBI taxonomy
------------------------------------------------------------------------
## We need also:
- a Unix computer: a Mac or a Linux box
- A unix terminal window for typing commands
- Installed on the computer
- The OBITools --- <http://github.com/metabarcoding/obitools4>
- ecoPrimers --- <http://metabarcoding.org/ecoprimers>
- R --- <http://wwww.r-project.org>
------------------------------------------------------------------------
## Downloading the mitochondrial genomes
We can use an internet browser and download the files from NCBI FTP website
![](ncbi-ftp.png){fig-align="center"}
------------------------------------------------------------------------
## Downloading the mitochondrial genomes
We can use an internet browser and download the files from NCBI FTP website
or run the following command lines
```{bash}
#| eval: false
#| echo: true
curl 'https://ftp.ncbi.nlm.nih.gov/genomes/refseq/mitochondrion/mitochondrion.1.genomic.gbff.gz' \
> mito.all.gb.gz
```
------------------------------------------------------------------------
```{bash}
#| eval: false
#| echo: true
zless mito.all.gb.gz
```
```
LOCUS NW_009243181 45189 bp DNA linear CON 06-OCT-2014
DEFINITION Fonticula alba strain ATCC 38817 mitochondrial scaffold
supercont2.211, whole genome shotgun sequence.
ACCESSION NW_009243181 NZ_AROH01000000
VERSION NW_009243181.1
DBLINK BioProject: PRJNA262900
Assembly: GCF_000388065.1
KEYWORDS WGS; RefSeq.
SOURCE mitochondrion Fonticula alba
ORGANISM Fonticula alba
Eukaryota; Rotosphaerida; Fonticulaceae; Fonticula.
REFERENCE 1 (bases 1 to 45189)
```
------------------------------------------------------------------------
## Downloading the complete taxonomy
```{bash}
#| eval: false
#| echo: true
obitaxonomy --download-ncbi
```
```
INFO[0000] Number of workers set 16
INFO[0000] Downloading NCBI Taxdump to ncbitaxo_20250211.tgz
downloading 100% ████████████████████████████████████████| (66/66 MB, 5.1 MB/s)
```
The NCBI taxonomy contains all the relationship between taxa. Each taxon is identified by a unique numerical id: `taxid`
------------------------------------------------------------------------
## The archive contains several files
file: `nodes.dmp`
```
1 | 1 | no rank | | 8 | 0 | ...
2 | 131567 | superkingdom | | 0 | 0 |
6 | 335928 | genus | | 0 | 1 |
7 | 6 | species | AC | 0 | 1 |
9 | 32199 | species | BA | 0 |
10 | 135621 | genus | | 0 |
11 | 1707 | species | CG | 0 | 1 |
13 | 203488 | genus | | 0 | 1 |
14 | 13 | species | DT | 0 | 1 |
```
------------------------------------------------------------------------
## file: `names.dmp`
```
1 | root | | scientific name |
2 | Bacteria | Bacteria <prokaryote> | scientific name |
2 | Monera | Monera <Bacteria> | in-part |
2 | Procaryotae | Procaryotae <Bacteria> | in-part |
2 | Prokaryota | Prokaryota <Bacteria> | in-part |
2 | Prokaryotae | Prokaryotae <Bacteria> | in-part |
2 | bacteria | bacteria <blast2> | blast name |
2 | eubacteria | | genbank common name |
2 | prokaryote | prokaryote <Bacteria> | in-part |
...
10 | Cellvibrio | | scientific name |
11 | [Cellvibrio] gilvus | | scientific name |
13 | Dictyoglomus | | scientific name |
14 | Dictyoglomus thermophilum | | scientific name |
```
------------------------------------------------------------------------
## Preparing the set of complete genomes
```{bash}
#| eval: false
#| echo: true
obiconvert --skip-empty \
--update-taxid \
-t ncbitaxo_20250211.tgz \
mito.all.gb.gz \
> mito.all.fasta
head -5 mito.all.fasta
```
five first lines of the new `mito.all.fasta` file
```
>NC_072933 {"definition":"Echinosophora koreensis mitochondrion, complete genome.","scientific_name":"mitochondrion Echinosophora koreensis","taxid":228658}
ctttcgggtcggaaatagaagatctggattagatcccttctcgatagctttagtcagagc
tcatccctcgaaaaagggagtagtgagatgagaaaagggtgactagaatacggaaattca
actagtgaagtcagatccgggaattccactattgaagttatccgtcttaggcttcaagca
agctatctttcaaggaagtcagtctaagccctaagccaagatctgctttttgccagtcaa
```
------------------------------------------------------------------------
## We want:
- annotate sequences by their species `taxid`
- keep a single genome per species
- extract only vertebrate genome
------------------------------------------------------------------------
## Looking for the **Vertebrata**'s taxid
```{bash}
#| eval: false
#| echo: true
obitaxonomy -t ncbitaxo_20250211.tgz \
--fixed \
'vertebrata'
```
``` csv
taxid,parent,taxonomic_rank,scientific_name
taxon:1261581 [Vertebrata]@genus,taxon:2008651 [Polysiphonioideae]@subfamily,genus,Vertebrata
taxon:7742 [Vertebrata]@clade,taxon:89593 [Craniata]@subphylum,clade,Vertebrata
```
------------------------------------------------------------------------
## Looking for the **Vertebrata**'s taxid
```{bash}
#| eval: false
#| echo: true
obitaxonomy -t ncbitaxo_20250211.tgz \
--fixed \
'vertebrata' \
| csvlook
```
``` csv
| taxid | parent | taxonomic_rank | scientific_name |
| -------------------------------- | ------------------------------------------- | -------------- | --------------- |
| taxon:1261581 [Vertebrata]@genus | taxon:2008651 [Polysiphonioideae]@subfamily | genus | Vertebrata |
| taxon:7742 [Vertebrata]@clade | taxon:89593 [Craniata]@subphylum | clade | Vertebrata |
```
## A genus called **Vertebrata**
```{bash}
#| eval: false
#| echo: true
obitaxonomy -t ncbitaxo_20250211.tgz \
-p 2008651 \
| csvlook
```
``` csv
| taxid | parent | taxonomic_rank | scientific_name |
| ------------------------------------------- | ------------------------------------------- | -------------- | ------------------ |
| taxon:2008651 [Polysiphonioideae]@subfamily | taxon:2803 [Rhodomelaceae]@family | subfamily | Polysiphonioideae |
| taxon:2803 [Rhodomelaceae]@family | taxon:2802 [Ceramiales]@order | family | Rhodomelaceae |
| taxon:2802 [Ceramiales]@order | taxon:2045261 [Rhodymeniophycidae]@subclass | order | Ceramiales |
| taxon:2045261 [Rhodymeniophycidae]@subclass | taxon:2806 [Florideophyceae]@class | subclass | Rhodymeniophycidae |
| taxon:2806 [Florideophyceae]@class | taxon:2763 [Rhodophyta]@phylum | class | Florideophyceae |
| taxon:2763 [Rhodophyta]@phylum | taxon:2759 [Eukaryota]@superkingdom | phylum | Rhodophyta |
| taxon:2759 [Eukaryota]@superkingdom | taxon:131567 [cellular organisms]@no rank | superkingdom | Eukaryota |
| taxon:131567 [cellular organisms]@no rank | taxon:1 [root]@no rank | no rank | cellular organisms |
| taxon:1 [root]@no rank | taxon:1 [root]@no rank | no rank | root |
```
------------------------------------------------------------------------
## Reannotation and selection of the genomes
```{bash}
#| eval: false
#| echo: true
obiannotate -t ncbitaxo_20250211.tgz \
--with-taxon-at-rank=species \
mito.all.fasta | \
obiannotate -S 'ori_taxid=annotations.taxid' | \
obiannotate -S 'taxid=annotations.species_taxid' | \
obiuniq -c taxid > mito.one.fasta
```
------------------------------------------------------------------------
## Species representation
```{bash}
#| eval: false
#| echo: true
obicsv -k taxid mito.one.fasta \
| tail -n +2 \
| sort \
| uniq -c \
| sort -nk1 \
| cut -w -f 2 \
| uplot count
```
```
┌ ┐
1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 17769.0
2 ┤ 90.0
3 ┤ 17.0
4 ┤ 5.0
5 ┤ 4.0
6 ┤ 2.0
7 ┤ 1.0
└ ┘
```
------------------------------------------------------------------------
## Selection of the vertebrata genomes
```{bash}
#| eval: false
#| echo: true
obigrep -t ncbitaxo_20250211.tgz \
-r 7742 \
mito.one.fasta > mito.vert.fasta
```
```{bash}
#| eval: false
#| echo: true
obicount mito.vert.fasta \
| csvlook
```
```
| entities | n |
| -------- | ----------- |
| variants | 7,822 |
| reads | 7,823 |
| symbols | 131,378,756 |
```
------------------------------------------------------------------------
## Prepare data for ecoPrimers 1/3
```{bash}
#| eval: false
#| echo: true
mkdir ncbitaxo_20250211
cd ncbitaxo_20250211
tar zxvf ../ncbitaxo_20250211.tgz
cd ..
```
------------------------------------------------------------------------
## Prepare data for ecoPrimers 2/3
```{bash}
#| eval: false
#| echo: true
obiconvert -O mito.vert.fasta > mito.vert.old.fasta
```
```{bash}
#| eval: false
#| echo: true
head -5 mito.vert.old.fasta
```
``` csv
>NC_071784 taxid=taxon:2065826 [Sineleotris saccharae]@species; count=1; ori_taxid=taxon:2065826 [Sineleotris saccharae]@species; scientific_name=mitochondrion Sineleotris saccharae; species_name=Sineleotris saccharae; species_taxid=taxon:2065826 [Sineleotris saccharae]@species; Sineleotris saccharae mitochondrion, complete genome.
gctagcgtagcttaaccaaagcataacactgaagatgttaagatgggccctagaaagccc
cgcaagcacaaaagcttggtcctggctttactatcagcttaggctaaacttacacatgca
agtatccgcatccccgtgagaatgcccttaagctcccaccgctaacaggagtcaaggagc
cggtatcaggcacaaccctgagttagcccacgacaccttgctcagccacacccccaaggg
```
------------------------------------------------------------------------
## Prepare data for ecoPrimers 3/3
```{bash}
#| eval: false
#| echo: true
ecoPCRFormat -t ncbitaxo_20250211 \
-f \
-n vertebrata \
mito.vert.old.fasta
```
```{bash}
#| eval: false
#| echo: true
ls -l vertebrata*
```
```
-rw-r--r--@ 1 coissac staff 260899785 Feb 11 11:53 vertabrata.ndx
-rw-r--r--@ 1 coissac staff 546 Feb 11 11:53 vertabrata.rdx
-rw-r--r--@ 1 coissac staff 121379751 Feb 11 11:53 vertabrata.tdx
-rw-r--r--@ 1 coissac staff 40446318 Feb 11 11:54 vertabrata_001.sdx
```
------------------------------------------------------------------------
## Looking for the *Teleostei* `taxid`
```{bash}
#| eval: false
#| echo: true
obitaxonomy -t ncbitaxo_20250211.tgz \
--fixed \
'Teleostei' \
| csvlook
```
``` csv
| taxid | parent | taxonomic_rank | scientific_name |
| ---------------------------------- | ---------------------------------- | -------------- | --------------- |
| taxon:32443 [Teleostei]@infraclass | taxon:41665 [Neopterygii]@subclass | infraclass | Teleostei |
```
------------------------------------------------------------------------
## Selecting the best primer pairs
```{bash}
#| eval: false
#| echo: true
ecoPrimers -d vertebrata \
-e 3 -3 2 \
-l 30 -L 150 \
-r 32443 \
-c > Teleostei.ecoprimers
```
- Total pair count : 9407
- Total good pair count : 407
------------------------------------------------------------------------
```{bash}
#| eval: false
#| echo: true
head -35 Teleostei.ecoprimers
```
``` csv
#
# ecoPrimer version 0.5
# Rank level optimisation : species
# max error count by oligonucleotide : 3
#
# Restricted to taxon:
# 32443 : Teleostei (infraclass)
#
# strict primer quorum : 0.70
# example quorum : 0.90
# counterexample quorum : 0.10
#
# database : vertebrata
# Database is constituted of 3909 examples corresponding to 3876 species
# and 0 counterexamples corresponding to 0 species
#
# amplifiat length between [30,150] bp
# DB sequences are considered as circular
# Pairs having specificity less than 0.60 will be ignored
#
0 AGAGTGACGGGCGGTGTG CGTCAGGTCGAGGTGTAG 62.8 42.4 57.5 34.1 12 11 GG 3864 0 0.988 3832 0 0.989 2731 0.713 134 146 138.22
1 CGTCAGGTCGAGGTGTAG GAGTGACGGGCGGTGTGT 57.5 34.1 63.1 42.9 11 12 GG 3863 0 0.988 3831 0 0.988 2730 0.713 133 145 137.22
2 CGTCAGGTCGAGGTGTAG GGGAGAGTGACGGGCGGT 57.5 34.1 64.5 37.0 11 13 GG 3811 0 0.975 3779 0 0.975 2689 0.712 137 149 141.22
3 CGTCAGGTCGAGGTGTAG GGGGAGAGTGACGGGCGG 57.5 34.1 65.5 38.4 11 14 GG 3804 0 0.973 3772 0 0.973 2682 0.711 138 149 142.22
4 ACACCGCCCGTCACTCTC ACCTTCCGGTACACTTAC 62.5 36.8 54.0 16.6 12 9 GG 3850 0 0.985 3818 0 0.985 2658 0.696 46 132 66.51
5 AACGTCAGGTCGAGGTGT AGAGTGACGGGCGGTGTG 58.8 28.4 62.8 41.7 10 12 GG 3779 0 0.967 3746 0 0.966 2653 0.708 137 148 140.23
6 ACACCGCCCGTCACTCTC CACCTTCCGGTACACTTA 62.5 36.8 54.0 16.6 12 9 GG 3846 0 0.984 3814 0 0.984 2654 0.696 47 133 67.51
7 AACGTCAGGTCGAGGTGT GAGTGACGGGCGGTGTGT 58.8 28.4 63.1 42.1 10 12 GG 3778 0 0.966 3745 0 0.966 2652 0.708 136 147 139.23
8 ACCTTCCGGTACACTTAC CACACCGCCCGTCACTCT 54.0 16.6 62.8 37.3 9 12 GG 3845 0 0.984 3813 0 0.984 2653 0.696 47 133 67.51
9 ACACCGCCCGTCACTCTC TCCGGTACACTTACCATG 62.5 36.8 54.1 18.1 12 9 GG 3851 0 0.985 3819 0 0.985 2651 0.694 42 128 62.51
10 ACACCGCCCGTCACTCTC CCGGTACACTTACCATGT 62.5 36.8 54.4 18.6 12 9 GG 3851 0 0.985 3819 0 0.985 2651 0.694 41 127 61.51
11 ACACCGCCCGTCACTCTC CCAAGTGCACCTTCCGGT 62.5 36.8 60.7 28.9 12 11 GG 3837 0 0.982 3805 0 0.982 2650 0.696 54 140 74.51
12 ACACCGCCCGTCACTCTC GCACCTTCCGGTACACTT 62.5 36.8 57.7 22.5 12 10 GG 3842 0 0.983 3810 0 0.983 2650 0.696 48 134 68.51
13 ACACCGCCCGTCACTCTC CGGTACACTTACCATGTT 62.5 36.8 52.4 15.7 12 8 GG 3850 0 0.985 3818 0 0.985 2650 0.694 40 126 60.51
14 ACACCGCCCGTCACTCTC CACTTACCATGTTACGAC 62.5 36.8 51.1 27.7 12 8 GG 3850 0 0.985 3817 0 0.985 2649 0.694 35 121 55.51
```
------------------------------------------------------------------------
- Primer ID : 11
 
| Primer | sequence | tm max | tm min | GC count |
|---------|--------------------|--------|--------|----------|
| Forward | ACACCGCCCGTCACTCTC | 62.5 | 36.8 | 12 |
| Reverse | CCAAGTGCACCTTCCGGT | 60.7 | 28.9 | 11 |
 
- amplifying 3837/3909 sequences\
- identify 2650/3876 Species
- Size ranging from 54bp to 140bp (mean: 74.75 bp)
## Testing the new primer pair
```{bash}
#| eval: false
#| echo: true
obipcr --forward ACACCGCCCGTCACTCTC \
--reverse CCAAGTGCACCTTCCGGT \
-e 5 \
-l 30 -L 150 \
-c \
mito.vert.fasta \
> Teleostei_11.fasta
```
```{bash}
#| eval: false
#| echo: true
head Teleostei_11.fasta
```
``` csv
>NC_022183_sub[925..998] {"count":1,"definition":"Acrossocheilus hemispinus mitochondrion, complete genome.","direction":"forward","forward_error":1,"forward_match":"acaccgcccgtcaccctc","forward_primer":"ACACCGCCCGTCACTCTC","ori_taxid":"taxon:356810 [Acrossocheilus hemispinus]@species","reverse_error":0,"reverse_match":"ccaagtgcaccttccggt","reverse_primer":"CCAAGTGCACCTTCCGGT","scientific_name":"mitochondrion Acrossocheilus hemispinus","species_name":"Acrossocheilus hemispinus","species_taxid":"taxon:356810 [Acrossocheilus hemispinus]@species","taxid":"taxon:356810 [Acrossocheilus hemispinus]@species"}
cccgtcaaaatacaccaaaaatacttaatacaataacactaacaaggggaggcaagtcgt
aacatggtaagtgt
>NC_018560_sub[916..988] {"count":1,"definition":"Astatotilapia calliptera mitochondrion, complete genome.","direction":"forward","forward_error":0,"forward_match":"acaccgcccgtcactctc","forward_primer":"ACACCGCCCGTCACTCTC","ori_taxid":"taxon:8154 [Astatotilapia calliptera]@species","reverse_error":1,"reverse_match":"ccaagtacaccttccggt","reverse_primer":"CCAAGTGCACCTTCCGGT","scientific_name":"mitochondrion Astatotilapia calliptera (eastern happy)","species_name":"Astatotilapia calliptera","species_taxid":"taxon:8154 [Astatotilapia calliptera]@species","taxid":"taxon:8154 [Astatotilapia calliptera]@species"}
cccaagccaacaacatcctataaataatacattttaccggtaaaggggaggcaagtcgta
acatggtaagtgt
>NC_056117_sub[923..997] {"count":1,"definition":"Pseudocrossocheilus tridentis mitochondrion, complete genome.","direction":"forward","forward_error":0,"forward_match":"acaccgcccgtcactctc","forward_primer":"ACACCGCCCGTCACTCTC","ori_taxid":"taxon:887881 [Pseudocrossocheilus tridentis]@species","reverse_error":0,"reverse_match":"ccaagtgcaccttccggt","reverse_primer":"CCAAGTGCACCTTCCGGT","scientific_name":"mitochondrion Pseudocrossocheilus tridentis","species_name":"Pseudocrossocheilus tridentis","species_taxid":"taxon:887881 [Pseudocrossocheilus tridentis]@species","taxid":"taxon:887881 [Pseudocrossocheilus tridentis]@species"}
ccctgtcaaaaagcatcaaatatatataataaattagcaatgacaaggggaggcaagtcg
taacacggtaagtgt
>NC_045904_sub[919..997] {"count":1,"definition":"Eospalax fontanierii mitochondrion, complete genome.","direction":"forward","forward_error":1,"forward_match":"acaccgcccgtcgctctc","forward_primer":"ACACCGCCCGTCACTCTC","ori_taxid":"taxon:146134 [Eospalax fontanierii]@species","reverse_error":4,"reverse_match":"ccaagcacactttccagt","reverse_primer":"CCAAGTGCACCTTCCGGT","scientific_name":"mitochondrion Eospalax fontanierii","species_name":"Eospalax fontanierii","species_taxid":"taxon:146134 [Eospalax fontanierii]@species","taxid":"taxon:146134 [Eospalax fontanierii]@species"}
```
------------------------------------------------------------------------
convert the fasta file to csv
```{bash}
#| eval: false
#| echo: true
obicsv --auto -s -i Teleostei_11.fasta > Teleostei_11.csv
```
and display the begining of the table
```{bash}
#| eval: false
#| echo: true
head Teleostei_11.csv | csvlook
```
``` csv
| id | count | direction | forward_error | forward_match | forward_primer | ori_taxid | reverse_error | reverse_match | reverse_primer | scientific_name | species_name | species_taxid | taxid | sequence |
| ------------------------- | ----- | --------- | ------------- | ------------------ | ------------------ | ---------------------------------------------------- | ------------- | ------------------ | ------------------ | ------------------------------------------------------ | ----------------------------- | ---------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------- |
| NC_022183_sub[925..998] | True | forward | True | acaccgcccgtcaccctc | ACACCGCCCGTCACTCTC | taxon:356810 [Acrossocheilus hemispinus]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Acrossocheilus hemispinus | Acrossocheilus hemispinus | taxon:356810 [Acrossocheilus hemispinus]@species | taxon:356810 [Acrossocheilus hemispinus]@species | cccgtcaaaatacaccaaaaatacttaatacaataacactaacaaggggaggcaagtcgtaacatggtaagtgt |
| NC_018560_sub[916..988] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:8154 [Astatotilapia calliptera]@species | 1 | ccaagtacaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Astatotilapia calliptera (eastern happy) | Astatotilapia calliptera | taxon:8154 [Astatotilapia calliptera]@species | taxon:8154 [Astatotilapia calliptera]@species | cccaagccaacaacatcctataaataatacattttaccggtaaaggggaggcaagtcgtaacatggtaagtgt |
| NC_056117_sub[923..997] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:887881 [Pseudocrossocheilus tridentis]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Pseudocrossocheilus tridentis | Pseudocrossocheilus tridentis | taxon:887881 [Pseudocrossocheilus tridentis]@species | taxon:887881 [Pseudocrossocheilus tridentis]@species | ccctgtcaaaaagcatcaaatatatataataaattagcaatgacaaggggaggcaagtcgtaacacggtaagtgt |
| NC_045904_sub[919..997] | True | forward | True | acaccgcccgtcgctctc | ACACCGCCCGTCACTCTC | taxon:146134 [Eospalax fontanierii]@species | 4 | ccaagcacactttccagt | CCAAGTGCACCTTCCGGT | mitochondrion Eospalax fontanierii | Eospalax fontanierii | taxon:146134 [Eospalax fontanierii]@species | taxon:146134 [Eospalax fontanierii]@species | ctcaagtacataaacttggatatattcttaataacccaacaaaaatattagaggagataagtcgtaacaaggtaagcat |
| NC_018546_sub[916..987] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:30732 [Oryzias melastigma]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Oryzias melastigma (Indian medaka) | Oryzias melastigma | taxon:30732 [Oryzias melastigma]@species | taxon:30732 [Oryzias melastigma]@species | cccgacccattttaaaaattaaataaaagatttcaggaactaaggggaggcaagtcgtaacatggtaagtgt |
| NC_044151_sub[922..993] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:2597641 [Sicyopterus squamosissimus]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Sicyopterus squamosissimus (cling goby) | Sicyopterus squamosissimus | taxon:2597641 [Sicyopterus squamosissimus]@species | taxon:2597641 [Sicyopterus squamosissimus]@species | cccaaaacaaacacacacataaataagaaaaaatgaaaataaaggggaggcaagtcgtaacatggtaagtgt |
| NC_044152_sub[922..994] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:2597642 [Sicyopterus stiphodonoides]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Sicyopterus stiphodonoides (cling goby) | Sicyopterus stiphodonoides | taxon:2597642 [Sicyopterus stiphodonoides]@species | taxon:2597642 [Sicyopterus stiphodonoides]@species | cccaaaacaaacacacacataaataagaaaaaantgaaaataaaggggaggcaagtcgtaacatggtaagtgt |
| NC_026976_sub[1453..1531] | True | forward | True | acaccgcccgtcactccc | ACACCGCCCGTCACTCTC | taxon:9545 [Macaca nemestrina]@species | 1 | ccaagtgcaccttccagt | CCAAGTGCACCTTCCGGT | mitochondrion Macaca nemestrina (pig-tailed macaque) | Macaca nemestrina | taxon:9545 [Macaca nemestrina]@species | taxon:9545 [Macaca nemestrina]@species | ctcaaatatatttaaggaacatcttaactaaacgccctaatatttatatagaggggataagtcgtaacatggtaagtgt |
| NC_031553_sub[921..995] | True | forward | False | acaccgcccgtcactctc | ACACCGCCCGTCACTCTC | taxon:643337 [Puntioplites proctozystron]@species | 0 | ccaagtgcaccttccggt | CCAAGTGCACCTTCCGGT | mitochondrion Puntioplites proctozystron | Puntioplites proctozystron | taxon:643337 [Puntioplites proctozystron]@species | taxon:643337 [Puntioplites proctozystron]@species | ccctgtcaaaacgcactaaaaatatctaatacaaaagcaccgacaaggggaggcaagtcgtaacacggtaagtgt |
```
# We are now switching to R
------------------------------------------------------------------------
## Preparing our R session
First we have to download the two follong libraries
```{r}
#| echo: true
library(tidyverse)
library(ggpubr)
library(ROBITools4)
```
------------------------------------------------------------------------
## Loading the data
```{r}
#| echo: true
fish <- read_csv('Teleostei_11.csv', show_col_types = FALSE)
taxo <- read_ncbi_taxdump('ncbitaxo_20250211')
assign_default_taxonomy(taxo)
```
------------------------------------------------------------------------
Looking for Teleostei taxid
```{r}
#| echo: true
teleo_taxid <- ecofind('Teleostei')
teleo_taxid
```
------------------------------------------------------------------------
## Format taxids
```{r}
#| echo: true
fish %>% mutate(
taxid = as_taxid(
as.integer(
str_split_fixed(
str_split_fixed(
taxid,pattern = " ",
n = 2)[,1],
":",
2)[,2]))
) %>%
as_tbl_obipcr() %>%
mutate(across(taxon(),
.names="category",
.fn=taxonomy_classifier(Teleostei = 32443))) %>%
group_by(category) %>%
mutate(weight = taxonomic_weights(taxid,taxo)) %>%
ungroup() -> fish
```
------------------------------------------------------------------------
## The fish tibble
```{r}
#| echo: true
head(fish,n = 4)
```
------------------------------------------------------------------------
## Identifying which sequences belongs fish
```{r}
#| echo: true
table(fish$category)
```
------------------------------------------------------------------------
## Testing the conservation of the priming sites
```{r}
#| echo: true
pssm_forward <- pssm(fish$forward_match,
weights = fish$weight,
categories = fish$category)
pssm_reverse <- pssm(fish$reverse_match,
weights = fish$weight,
categories = fish$category)
```
```{r}
#| echo: true
pssm_forward
```
------------------------------------------------------------------------
## Rescaling the matrix as Shanon entropy
$$
H = - \sum_{i \in \{A,C,G,T\}} p_i \times \frac{\log(p_i)}{\log(2)}
$$
```{r}
#| echo: true
pssm_forward <- pssm_scale_shanon(pssm_forward)
pssm_reverse <- pssm_scale_shanon(pssm_reverse)
```
------------------------------------------------------------------------
## Display the rescaled matrix
```{r}
#| echo: true
pssm_forward
```
------------------------------------------------------------------------
## The DNA logo of our primer pair
```{r}
#| echo: true
flogo <- ggbarcodelogo(pssm_forward) +
xlab("Forward primer") + ylab("Bits")
rlogo <- ggbarcodelogo(pssm_reverse) +
xlab("Reverse primer") + ylab("Bits")
ggarrange(flogo,rlogo,ncol=2) -> dnaplot
dnaplot
```
------------------------------------------------------------------------
## How many mismatches ?
```{r}
#| echo: true
ggbarcodemistmatch(fish$forward_error,
fish$reverse_error,
otu=fish$species_name,
categories=fish$category) + theme_minimal()
```
------------------------------------------------------------------------
## Are we discriminate taxa ?
```{r}
#| echo: true
with(fish %>% filter(category == "Teleostei"),
discriminated_at_rank(taxid,
c("species","genus","family","order"),
sequence))
```
------------------------------------------------------------------------
## How many sequences will provide information at rank ?
```{r}
#| echo: true
with(fish %>% filter(category == "Teleostei"),
discriminant_at_rank(taxid,
c("species","genus","family","order"),
sequence))
```
------------------------------------------------------------------------
## Is it the same for *Cyprinidae* ?
```{r}
#| echo: true
cyprinidae_taxid <- ecofind('Cyprinidae')
cyprinidae_taxid
```
------------------------------------------------------------------------
## Classify according to both categories
```{r}
#| echo: true
fish %>%
mutate(across(taxon(),
.names="category2",
.fn=taxonomy_classifier(Teleostei = 32443,
Cyprinidae = 7953))) -> fish
table(fish$category2)
```
------------------------------------------------------------------------
## Are we discriminate taxa ?
```{r}
#| echo: true
with(fish %>% filter(category2 == "Cyprinidae"),
discriminated_at_rank(taxid,
c("species","genus"),
sequence))
```
# Go back to unix
## We run ecoPrimers and ecoPCR on the select primer pair
```{bash}
#| eval: false
#| echo: true
ecoPrimers -d vertebrata \
-e 3 -3 2 \
-l 30 -L 150 \
-r 7953 -c > Cyprinidae.ecoprimers
```
```{bash}
#| eval: false
#| echo: true
obipcr --forward ACGGCGTAAAGGGTGGTT \
--reverse TATCTAATCCCAGTTTGT \
-e 5 \
-l 30 -L 500 \
-c \
mito.vert.fasta \
> Cyprinidae_14.fasta
```
```{bash}
#| eval: false
#| echo: true
obicsv --auto -s -i Cyprinidae_14.fasta > Cyprinidae_14.csv
```
# Go back to R
```{r}
cyprinidae <- read_csv('Cyprinidae_14.csv',
show_col_types = FALSE) %>%
mutate(
taxid = as_taxid(
as.integer(
str_split_fixed(
str_split_fixed(
taxid,pattern = " ",
n = 2)[,1],
":",
2)[,2]))
) %>%
as_tbl_obipcr() %>%
mutate(across(taxon(),
.names="category",
.fn=taxonomy_classifier(Teleostei = 32443,
Cyprinidae = 7953))) %>%
group_by(category) %>%
mutate(weight = taxonomic_weights(taxid,taxo)) %>%
ungroup()
```
------------------------------------------------------------------------
## Identifying which sequences belongs fish and *Cyprinidae*
```{r}
#| echo: true
table(cyprinidae$category)
```
------------------------------------------------------------------------
## Looking for conservation
```{r}
#| echo: true
pssm_forward <- pssm(cyprinidae$forward_match,
weights = cyprinidae$weight,
categories = cyprinidae$category) %>%
pssm_scale_shanon()
pssm_reverse <- pssm(cyprinidae$reverse_match,
weights = cyprinidae$weight,
categories = cyprinidae$category) %>%
pssm_scale_shanon()
```
------------------------------------------------------------------------
## Plot the new DNA logo
```{r}
flogo <- ggbarcodelogo(pssm_forward) +
xlab("Forward primer") + ylab("Bits")
rlogo <- ggbarcodelogo(pssm_reverse) +
xlab("Reverse primer") + ylab("Bits")
ggarrange(flogo,rlogo,ncol=2) -> dnaplot
dnaplot
```
------------------------------------------------------------------------
## How many mismatches ?
```{r}
#| echo: true
ggbarcodemistmatch(cyprinidae$forward_error,
cyprinidae$reverse_error,
otu=cyprinidae$species_name,
categories=cyprinidae$category) + theme_minimal()
```
------------------------------------------------------------------------
## Are we discriminate *Cyprinidae* taxa ?
```{r}
#| echo: true
with(cyprinidae %>% filter(category == "Cyprinidae"),
discriminated_at_rank(taxid,
c("species","genus"),
sequence))
```

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,8 @@
@param,matching,strict
@param,primer_mismatches,2
@param,indels,false
experiment,sample,sample_tag,forward_primer,reverse_primer
wolf_diet,13a_F730603,aattaac,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,15a_F730814,gaagtag,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,26a_F040644,gaatatc,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,29a_F260619,gcctcct,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
1 @param,matching,strict
2 @param,primer_mismatches,2
3 @param,indels,false
4 experiment,sample,sample_tag,forward_primer,reverse_primer
5 wolf_diet,13a_F730603,aattaac,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
6 wolf_diet,15a_F730814,gaagtag,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
7 wolf_diet,26a_F040644,gaatatc,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
8 wolf_diet,29a_F260619,gcctcct,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff

3
jupyterhub_volumes/web/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
/.quarto/
**/*.quarto_ipynb
/pages/

Binary file not shown.

After

Width:  |  Height:  |  Size: 350 KiB

View File

@@ -0,0 +1,206 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>DNA Metabarcoding Learning Server</title>
<style>
body {
margin: 0;
font-family: sans-serif;
display: flex;
height: 100vh;
overflow: hidden;
}
/* Sidebar */
nav {
width: 250px;
background-color: #2c3e50;
color: white;
display: flex;
flex-direction: column;
justify-content: space-between;
padding: 20px 0;
overflow-y: auto;
}
nav ul {
list-style: none;
padding-left: 15px;
margin: 0;
}
nav li {
margin: 4px 0;
}
nav a {
color: white;
text-decoration: none;
display: block;
padding: 4px 8px;
border-radius: 4px;
font-weight: 500;
}
nav a:hover {
background-color: #34495e;
}
/* Toggle icons */
.folder {
cursor: pointer;
font-weight: bold;
display: flex;
align-items: center;
}
.folder::before {
content: "▸";
display: inline-block;
margin-right: 6px;
transition: transform 0.2s ease;
}
.folder.open::before {
transform: rotate(90deg);
}
ul.collapsed {
display: none;
}
/* Admin links */
nav .admin-links {
border-top: 1px solid #34495e;
margin-top: 10px;
padding-top: 10px;
}
/* Main content */
main {
flex: 1;
display: flex;
flex-direction: column;
overflow: hidden;
align-items: center;
background-color: #f7f7f7;
}
header img {
width: 100%;
max-width: 1000px;
height: auto;
display: block;
}
iframe#content-frame {
flex: 1;
width: 100%;
border: none;
max-width: 1000px;
background-color: white;
}
nav::-webkit-scrollbar {
width: 8px;
}
nav::-webkit-scrollbar-thumb {
background-color: #34495e;
border-radius: 4px;
}
</style>
</head>
<body>
<nav>
<ul id="nav-menu"></ul>
<ul class="admin-links">
<li><a href="/obidoc/" target="_blank">OBITools 4 Doc</a></li>
<li><a href="/jupyter/" target="_blank">JupyterHub</a></li>
<li><a href="/sftp/" target="_blank">Data Admin</a></li>
</ul>
</nav>
<main>
<header>
<img src="img/welcome_metabar.webp" alt="Welcome Banner">
</header>
<iframe id="content-frame" src=""></iframe>
</main>
<script>
const iframe = document.getElementById("content-frame");
const navMenu = document.getElementById("nav-menu");
/**
* Génère récursivement le menu à partir de l'arborescence JSON
*/
function buildMenu(items, parent) {
items.forEach(item => {
const li = document.createElement("li");
if (item.children) {
const folder = document.createElement("div");
folder.className = "folder";
folder.textContent = item.label;
const subUl = document.createElement("ul");
subUl.classList.add("collapsed");
folder.addEventListener("click", () => {
folder.classList.toggle("open");
subUl.classList.toggle("collapsed");
});
li.appendChild(folder);
li.appendChild(subUl);
parent.appendChild(li);
buildMenu(item.children, subUl);
} else if (item.file) {
const a = document.createElement("a");
a.href = "#";
a.textContent = item.label;
a.addEventListener("click", e => {
e.preventDefault();
iframe.src = "pages/" + item.file;
history.replaceState(null, null, "#" + item.file);
});
li.appendChild(a);
parent.appendChild(li);
}
});
}
fetch('pages/pages.json')
.then(resp => resp.json())
.then(pages => {
buildMenu(pages, navMenu);
// Charger la page par défaut (1ère sans enfants)
let defaultPage = null;
function findFirstFile(items) {
for (const it of items) {
if (it.file) return it.file;
if (it.children) {
const child = findFirstFile(it.children);
if (child) return child;
}
}
return null;
}
defaultPage = findFirstFile(pages);
if (location.hash) {
iframe.src = "pages/" + location.hash.substring(1);
} else if (defaultPage) {
iframe.src = "pages/" + defaultPage;
}
})
.catch(err => {
console.error("Erreur chargement pages.json", err);
iframe.srcdoc = "<p>Impossible de charger le contenu.</p>";
});
</script>
</body>
</html>

View File

@@ -0,0 +1,56 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="theme-color" media="(prefers-color-scheme: light)" content="#ffffff">
<meta name="theme-color" media="(prefers-color-scheme: dark)" content="#343a40">
<meta name="color-scheme" content="light dark"><meta property="og:url" content="http://metabar:8888/obidoc/404.html">
<meta property="og:site_name" content="OBITools4 documentation">
<meta property="og:title" content="404 Page not found">
<meta property="og:locale" content="en_us">
<meta property="og:type" content="website">
<title>404 Page not found | OBITools4 documentation</title>
<link rel="icon" href="/obidoc/favicon.png" >
<link rel="manifest" href="/obidoc/manifest.json">
<link rel="canonical" href="http://metabar:8888/obidoc/404.html">
<link rel="stylesheet" href="/obidoc/book.min.5fd7b8e2d1c0ae15da279c52ff32731130386f71b58f011468f20d0056fe6b78.css" integrity="sha256-X9e44tHArhXaJ5xS/zJzETA4b3G1jwEUaPINAFb&#43;a3g=" crossorigin="anonymous">
<script defer src="/obidoc/fuse.min.js"></script>
<script defer src="/obidoc/en.search.min.4da51bdd2d833922fdbc0e19df517221387fc625ffb68ee140d605b3c5b68058.js" integrity="sha256-TaUb3S2DOSL9vA4Z31FyITh/xiX/to7hQNYFs8W2gFg=" crossorigin="anonymous"></script>
<script defer src="/obidoc/sw.min.32af8eafce4180aa1c5dea66d99fb26ba9043ea7c7a4c706138c91d9051b285e.js" integrity="sha256-Mq&#43;Or85BgKocXepm2Z&#43;ya6kEPqfHpMcGE4yR2QUbKF4=" crossorigin="anonymous"></script>
<!--
Made with Book Theme
https://github.com/alex-shpak/hugo-book
-->
<link rel="stylesheet" type="text/css" href="http://metabar:8888/obidoc/hugo-cite.css" />
<style>
.not-found {
text-align: center;
}
.not-found h1 {
margin: .25em 0 0 0;
opacity: .25;
font-size: 40vmin;
}
</style>
</head>
<body>
<main class="flex justify-center not-found">
<div>
<h1>404</h1>
<h2>Page Not Found</h2>
<h3>
<a href="/obidoc/">OBITools4 documentation</a>
</h3>
</div>
</main>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Categories on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/categories/</link>
<description>Recent content in Categories on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/categories/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,9 @@
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>http://metabar:8888/obidoc/categories/</title>
<link rel="canonical" href="http://metabar:8888/obidoc/categories/">
<meta charset="utf-8">
<meta http-equiv="refresh" content="0; url=http://metabar:8888/obidoc/categories/">
</head>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Commands on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/commands/</link>
<description>Recent content in Commands on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/commands/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,9 @@
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>http://metabar:8888/obidoc/commands/</title>
<link rel="canonical" href="http://metabar:8888/obidoc/commands/">
<meta charset="utf-8">
<meta http-equiv="refresh" content="0; url=http://metabar:8888/obidoc/commands/">
</head>
</html>

View File

@@ -0,0 +1 @@
.admonition{margin:1rem 0;border-radius:4px;box-shadow:0 1px 3px rgba(0,0,0,0.12);transition:all 0.3s ease}.admonition-header{padding:0.5rem 1rem;display:flex;align-items:center;font-weight:600;border-bottom:1px solid rgba(0,0,0,0.1);font-size:1.1rem;border-radius:4px 4px 0 0}.admonition-header svg{width:1.1em;height:1.1em;margin-right:0.5rem;fill:currentColor}.admonition-content{padding:1rem;background-color:#fff;border-radius:0 0 4px 4px;color:#000;transition:background-color 0.3s ease, color 0.3s ease}.admonition-content p{margin:0 0 0.5rem 0}.admonition-content p:last-child{margin-bottom:0}.admonition-content ul,.admonition-content ol{margin:0 0 0.5rem 0;padding-left:1.2rem}.admonition-content ul:last-child,.admonition-content ol:last-child{margin-bottom:0}.admonition-content blockquote{margin:0 0 0.5rem 0;padding-left:1rem;border-left:3px solid #e0e0e0}.admonition-content blockquote:last-child{margin-bottom:0}.admonition-content code{background-color:#f5f5f5;color:#24292e;padding:0.2em 0.4em;border-radius:3px;font-size:0.9em}@media (prefers-color-scheme: dark){.admonition-content{background-color:#1D1E20;color:#e6e6e6}.admonition-content code{background-color:#313244;color:#cdd6f4}.admonition-content blockquote{border-left-color:#45475a;color:#cdd6f4}}body.dark .admonition-content{background-color:#1D1E20;color:#e6e6e6}body.dark .admonition-content code{background-color:#313244;color:#cdd6f4}body.dark .admonition-content blockquote{border-left-color:#45475a;color:#cdd6f4}.admonition.abstract{background:transparent;border-left:4px solid #209fb5}.admonition.abstract .admonition-header{background:rgba(32,159,181,0.1);color:#209fb5}.admonition.caution{background:transparent;border-left:4px solid #e64553}.admonition.caution .admonition-header{background:rgba(230,69,83,0.1);color:#e64553}.admonition.code{background:transparent;border-left:4px solid #7287fd}.admonition.code .admonition-header{background:rgba(114,135,253,0.1);color:#7287fd}.admonition.conclusion{background:transparent;border-left:4px solid #dd7878}.admonition.conclusion .admonition-header{background:rgba(221,120,120,0.1);color:#dd7878}.admonition.danger{background:transparent;border-left:4px solid #fe640b}.admonition.danger .admonition-header{background:rgba(254,100,11,0.1);color:#fe640b}.admonition.error{background:transparent;border-left:4px solid #d20f39}.admonition.error .admonition-header{background:rgba(210,15,57,0.1);color:#d20f39}.admonition.example{background:transparent;border-left:4px solid #dc8a78}.admonition.example .admonition-header{background:rgba(220,138,120,0.1);color:#dc8a78}.admonition.experiment{background:transparent;border-left:4px solid #51bb2a}.admonition.experiment .admonition-header{background:rgba(81,187,42,0.1);color:#51bb2a}.admonition.goal{background:transparent;border-left:4px solid #e64553}.admonition.goal .admonition-header{background:rgba(230,69,83,0.1);color:#e64553}.admonition.idea{background:transparent;border-left:4px solid #df8e1d}.admonition.idea .admonition-header{background:rgba(223,142,29,0.1);color:#df8e1d}.admonition.important{background:transparent;border-left:4px solid #7D4DDA}.admonition.important .admonition-header{background:rgba(125,77,218,0.1);color:#7D4DDA}.admonition.info{background:transparent;border-left:4px solid #04a5e5}.admonition.info .admonition-header{background:rgba(4,165,229,0.1);color:#04a5e5}.admonition.memo{background:transparent;border-left:4px solid #e64553}.admonition.memo .admonition-header{background:rgba(230,69,83,0.1);color:#e64553}.admonition.note{background:transparent;border-left:4px solid #096ae1}.admonition.note .admonition-header{background:rgba(9,106,225,0.1);color:#096ae1}.admonition.notify{background:transparent;border-left:4px solid #0d48bd}.admonition.notify .admonition-header{background:rgba(13,72,189,0.1);color:#0d48bd}.admonition.question{background:transparent;border-left:4px solid #179299}.admonition.question .admonition-header{background:rgba(23,146,153,0.1);color:#179299}.admonition.quote{background:transparent;border-left:4px solid #7287fd}.admonition.quote .admonition-header{background:rgba(114,135,253,0.1);color:#7287fd}.admonition.success{background:transparent;border-left:4px solid #40a02b}.admonition.success .admonition-header{background:rgba(64,160,43,0.1);color:#40a02b}.admonition.task{background:transparent;border-left:4px solid #8839ef}.admonition.task .admonition-header{background:rgba(136,57,239,0.1);color:#8839ef}.admonition.tip{background:transparent;border-left:4px solid #179299}.admonition.tip .admonition-header{background:rgba(23,146,153,0.1);color:#179299}.admonition.warning{background:transparent;border-left:4px solid #df8e1d}.admonition.warning .admonition-header{background:rgba(223,142,29,0.1);color:#df8e1d}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Advanced tools on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/advanced/</link>
<description>Recent content in Advanced tools on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/advanced/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Sequence alignments on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/alignments/</link>
<description>Recent content in Sequence alignments on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/alignments/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Exact alignment on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/alignments/obipairing/exact-alignment/</link>
<description>Recent content in Exact alignment on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/commands/alignments/obipairing/exact-alignment/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,149 @@
---
title: "Untitled"
format: html
editor: visual
---
```{r}
library(tidyverse)
library(plotly)
library(matrixStats)
```
```{r}
log_sum_exp <- function(a,b) {
m <- map2_dbl(a,b,max)
m + log(exp(a-m)+exp(b-m))
}
```
$$
\log(1 - e^b) = \log\left(-e^{(1-b)}\right)
$$
```{r}
log1m_exp <- function(b) {
if (any(b >= 0)) {
stop(glue::glue("b must be negative (b={b})"))
}
return(log(-expm1(b))) # expm1(b) = exp(b) - 1, pour éviter les erreurs d'arrondi
}
```
$$
\log(e^a - e^b) = a + \log(1 - e^{b-a})
$$
```{r}
log_diff_exp <- function(a, b) {
# Vérifier si a > b pour éviter des résultats indéfinis
if (any(a < b)) {
stop(glue::glue("Erreur : a ({a}) doit etre strictement sup<75>rieur <20> b ({b}) pour que e^a - e^b soit positif."))
}
# Calculer log(e^a - e^b) de manière stable
ifelse(a == b, -Inf,a + log1m_exp(b-a))
}
```
$$
P_{error} = 10^{-\frac{Q}{10}}
$$
$$
\begin{aligned}
q_F &= -\frac{Q_F}{10} \cdot \log(10) \\
q_R &= -\frac{Q_R}{10} \cdot \log(10) \\
P(macth | Obs(match)) &= (1-e^{q_F}) (1-e^{q_R}) + (1-e^{q_F})\frac{e^{q_R}}{4}+ (1-e^{q_R})\frac{e^{q_F}}{4} + \frac{e^{q_F+q_R}}{4} \\
&=1 - e^{q_R} - e^{q_F} + e^{q_F+q_R} + \frac{e^{q_R}}{4} - \frac{e^{q_F+q_R}}{4} + \frac{e^{q_F}}{4} - \frac{e^{q_F+q_R}}{4} + \frac{e^{q_F+q_R}}{4} \\
&= \frac{4 - 4e^{q_F} - 4e^{q_R} + 4e^{q_F+q_R} + e^{q_F} + e^{q_R} - e^{q_F+q_R}}{4} \\
&= \frac{4 - 3e^{q_F}- 3e^{q_R} + 3e^{q_F+q_R}}{4}\\
&= \frac{4 - 3(e^{q_F}+e^{q_R}-e^{q_F+q_R})}{4} \\
&= 1 - \frac{3}{4}\left(e^{q_F}+e^{q_R}-e^{q_F+q_R}\right)
\end{aligned}
$$
```{r}
Pm_match_observed <- function(Q_F, Q_R) {
l10 <- log(10)
l3 <- log(3)
l4 <- log(4)
q_F <- -Q_F/10*l10
q_R <- -Q_R/10*l10
term1 <- log_sum_exp(q_F,q_R)
term2 <- log_diff_exp(term1,q_F+q_R) + l3 - l4
log1m_exp(term2)
}
```
$$
\begin{aligned}
P(macth | Obs(mismatch)) &= \frac{(1-e^{q_F})e^{q_R}}{4} + \frac{(1-e^{q_R})e^{q_F}}{4} + \frac{e^{q_F+q_R}}{4} \\
&= \frac{(1-e^{q_F})e^{q_R} + (1-e^{q_R})e^{q_F} + e^{q_F+q_R}}{4} \\
&= \frac{e^{q_R} - e^{q_F+q_R} + e^{q_F} - e^{q_F+q_R} + e^{q_F+q_R}}{4} \\
&= \frac{e^{q_F} + e^{q_R} - e^{q_F+q_R}}{4}
\end{aligned}
$$
```{r}
Pm_mismatch_observed <- function(Q_F, Q_R) {
l10 <- log(10)
l3 <- log(3)
l4 <- log(4)
q_F <- -Q_F/10*l10
q_R <- -Q_R/10*l10
term1 <- log_sum_exp(q_F,q_R)
log_diff_exp(term1,q_F+q_R) - l4
}
```
```{r}
score_match_observed <- function(Q_F, Q_R) {
Pm_match_observed(Q_F,Q_R) - log1m_exp(Pm_match_observed(Q_F,Q_R))
}
score_mismatch_observed <- function(Q_F, Q_R) {
Pm_mismatch_observed(Q_F,Q_R) - log1m_exp(Pm_mismatch_observed(Q_F,Q_R))
}
```
```{r}
scores <- expand_grid(QF=0:40,QR=0:40) %>%
mutate(score_match_observed = round(score_match_observed(QF,QR),2),
score_mismatch_observed = round(score_mismatch_observed(QF,QR),2))
```
```{r}
plot_match <- plot_ly(scores,
x=~QF, y=~QR, z=~score_match_observed,
type="mesh3d") %>%
layout(
plot_bgcolor = "#bababa",
scene = list(
xaxis = list(title = "Q forward read"), # Change x/y/z axis title
yaxis = list(title = "Q reverse read"),
zaxis = list(title = "Score match")))
```
```{r}
plot_mismatch <- plot_ly(scores,
x=~QF, y=~QR, z=~score_mismatch_observed,
type="mesh3d") %>%
layout(
plot_bgcolor = "#bababa",
scene = list(
xaxis = list(title = "Q forward read"), # Change x/y/z axis title
yaxis = list(title = "Q reverse read"),
zaxis = list(title = "Score mismatch")))
```
```{r}
write(plotly_json(plot_match,FALSE),"content/docs/commands/alignments/obipairing/exact-alignment/match.json")
write(plotly_json(plot_mismatch,FALSE),"content/docs/commands/alignments/obipairing/exact-alignment/mismatch.json")
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>The FASTA-like alignment on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/alignments/obipairing/fasta-like/</link>
<description>Recent content in The FASTA-like alignment on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/commands/alignments/obipairing/fasta-like/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Basics on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/basics/</link>
<description>Recent content in Basics on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/basics/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Demultiplexing samples on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/demultiplexing/</link>
<description>Recent content in Demultiplexing samples on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/demultiplexing/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Experimentals on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/experimental/</link>
<description>Recent content in Experimentals on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/experimental/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>The OBITools4 commands on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/</link>
<description>Recent content in The OBITools4 commands on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate>Fri, 04 Oct 2024 17:16:03 +0200</lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Glossary of tags</title>
<link>http://metabar:8888/obidoc/docs/commands/tags/</link>
<pubDate>Fri, 04 Oct 2024 17:16:03 +0200</pubDate>
<guid>http://metabar:8888/obidoc/docs/commands/tags/</guid>
<description>&lt;h1 id=&#34;glossary-of-tags&#34;&gt;&#xA; Glossary of tags&#xA; &lt;a class=&#34;anchor&#34; href=&#34;#glossary-of-tags&#34;&gt;#&lt;/a&gt;&#xA;&lt;/h1&gt;&#xA;&lt;h3 id=&#34;--d--&#34;&gt;&#xA; - D -&#xA; &lt;a class=&#34;anchor&#34; href=&#34;#--d--&#34;&gt;#&lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;definition&lt;/strong&gt; :&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;text information about the sequence present in the original sequence file.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;direction&lt;/strong&gt; :&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;set to “forward” if the original sequence did not need to be reverse-complemented to be processed, set to “reverse” otherwise.&#xA;(&#xA; &lt;a href=&#34;http://metabar:8888/obidoc/obitools/obipcr/&#34;&gt;obipcr&lt;/a&gt;)&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h3 id=&#34;--f--&#34;&gt;&#xA; - F -&#xA; &lt;a class=&#34;anchor&#34; href=&#34;#--f--&#34;&gt;#&lt;/a&gt;&#xA;&lt;/h3&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;forward_error&lt;/strong&gt; :&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Number of mismatch between forward primer and priming site&#xA;(&#xA; &lt;a href=&#34;http://metabar:8888/obidoc/obitools/obipcr/&#34;&gt;obipcr&lt;/a&gt;)&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;forward_match&lt;/strong&gt; :&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Forward primer priming site sequence&#xA;(&#xA; &lt;a href=&#34;http://metabar:8888/obidoc/obitools/obipcr/&#34;&gt;obipcr&lt;/a&gt;)&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;&#xA;&lt;p&gt;&lt;strong&gt;forward_primer&lt;/strong&gt; :&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Forward primer sequence&#xA;(&#xA; &lt;a href=&#34;http://metabar:8888/obidoc/obitools/obipcr/&#34;&gt;obipcr&lt;/a&gt;)&lt;/p&gt;</description>
</item>
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Shared command options on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/options/</link>
<description>Recent content in Shared command options on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/options/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,6 @@
>AB061527 {"count":1,"definition":"Sorex unguiculatus mitochondrial NA, complete genome.","family_name":"Soricidae","family_taxid":9376,"genus_name":"Sorex","genus_taxid":9379,"obicleandb_level":"family","obicleandb_trusted":2.2137847111025621e-13,"species_name":"Sorex unguiculatus","species_taxid":62275,"taxid":62275}
ttagccctaaacttaggtatttaatctaacaaaaatacccgtcagagaactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>AL355887 {"count":2,"definition":"Human chromosome 14 NA sequence BAC R-179O11 of library RPCI-11 from chromosome 14 of Homo sapiens (Human)XXKW HTG.; HTGS_ACTIVFIN.","family_name":"Hominidae","family_taxid":9604,"genus_name":"Homo","genus_taxid":9605,"obicleandb_level":"genus","obicleandb_trusted":0,"species_name":"Homo sapiens","species_taxid":9606,"taxid":9606}
ttagccctaaactctagtagttacattaacaaaaccattcgtcagaatactacgagcaac
agcttaaaactcaaaggacctggcagttctttatatccct

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Others on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/others/</link>
<description>Recent content in Others on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/others/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Taxonomy on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/commands/taxonomy/</link>
<description>Recent content in Taxonomy on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/commands/taxonomy/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,604 @@
#!/usr/bin/env python3
import re
import gzip
import struct
import sys
import time
import getopt
from functools import cmp_to_key
_dbenable=False
#####
#
#
# Generic file function
#
#
#####
def universalOpen(file):
if isinstance(file,str):
if file[-3:] == '.gz':
rep = gzip.open(file)
else:
rep = open(file)
else:
rep = file
return rep
def universalTell(file):
if isinstance(file, gzip.GzipFile):
file=file.myfileobj
return file.tell()
def fileSize(file):
if isinstance(file, gzip.GzipFile):
file=file.myfileobj
pos = file.tell()
file.seek(0,2)
length = file.tell()
file.seek(pos,0)
return length
def progressBar(pos,max,reset=False,delta=[]):
if reset:
del delta[:]
if not delta:
delta.append(time.time())
delta.append(time.time())
delta[1]=time.time()
elapsed = delta[1]-delta[0]
percent = float(pos)/max * 100
remain = time.strftime('%H:%M:%S',time.gmtime(elapsed / percent * (100-percent)))
bar = '#' * int(percent/2)
bar+= '|/-\\-'[pos % 5]
bar+= ' ' * (50 - int(percent/2))
sys.stderr.write('\r%5.1f %% |%s] remain : %s' %(percent,bar,remain))
#####
#
#
# NCBI Dump Taxonomy reader
#
#
#####
def endLessIterator(endedlist):
for x in endedlist:
yield x
while(1):
yield endedlist[-1]
class ColumnFile(object):
def __init__(self,stream,sep=None,strip=True,types=None):
if isinstance(stream,str):
self._stream = open(stream)
else:
try:
iter(stream)
self._stream = stream
except TypeError:
raise ValueError('stream must be string or an iterator')
self._delimiter=sep
self._strip=strip
if types:
self._types=[x for x in types]
for i in range(len(self._types)):
if self._types[i] is bool:
self._types[i]=ColumnFile.str2bool
else:
self._types=None
def str2bool(x):
return bool(eval(x.strip()[0].upper(),{'T':True,'V':True,'F':False}))
str2bool = staticmethod(str2bool)
def __iter__(self):
return self
def __next__(self):
ligne = next(self._stream)
data = ligne.split(self._delimiter)
if self._strip or self._types:
data = [x.strip() for x in data]
if self._types:
it = endLessIterator(self._types)
data = [x[1](x[0]) for x in ((y,next(it)) for y in data)]
return data
def taxonCmp(t1,t2):
if t1[0] < t2[0]:
return -1
elif t1[0] > t2[0]:
return +1
return 0
def bsearchTaxon(taxonomy,taxid):
taxCount = len(taxonomy)
begin = 0
end = taxCount
oldcheck=taxCount
check = int(begin + end / 2)
while check != oldcheck and taxonomy[check][0]!=taxid :
if taxonomy[check][0] < taxid:
begin=check
else:
end=check
oldcheck=check
check = int((begin + end) / 2)
if taxonomy[check][0]==taxid:
return check
else:
return None
def readNodeTable(file):
file = universalOpen(file)
nodes = ColumnFile(file,
sep='|',
types=(int,int,str,
str,str,bool,
int,bool,int,
bool,bool,bool,str))
print("Reading taxonomy dump file...", file=sys.stderr)
taxonomy=[[n[0],n[2],n[1]] for n in nodes]
print("List all taxonomy rank...", file=sys.stderr)
ranks =list(set(x[1] for x in taxonomy))
ranks.sort()
ranks = {rank: index for index, rank in enumerate(ranks)}
print("Sorting taxons...", file=sys.stderr)
taxonomy.sort(key=lambda x: x[0])
print("Indexing taxonomy...", file=sys.stderr)
index = {}
for t in taxonomy:
index[t[0]]=bsearchTaxon(taxonomy, t[0])
print("Indexing parent and rank...", file=sys.stderr)
for t in taxonomy:
t[1]=ranks[t[1]]
t[2]=index[t[2]]
return taxonomy,ranks,index
def nameIterator(file):
file = universalOpen(file)
names = ColumnFile(file,
sep='|',
types=(int,str,
str,str))
for taxid,name,unique,classname,white in names:
yield taxid,name,classname
def mergedNodeIterator(file):
file = universalOpen(file)
merged = ColumnFile(file,
sep='|',
types=(int,int,str))
for taxid,current,white in merged:
yield taxid,current
def deletedNodeIterator(file):
file = universalOpen(file)
deleted = ColumnFile(file,
sep='|',
types=(int,str))
for taxid,white in deleted:
yield taxid
def readTaxonomyDump(taxdir):
taxonomy,ranks,index = readNodeTable('%s/nodes.dmp' % taxdir)
print("Adding scientific name...", file=sys.stderr)
alternativeName=[]
for taxid,name,classname in nameIterator('%s/names.dmp' % taxdir):
alternativeName.append((name,classname,index[taxid]))
if classname == 'scientific name':
taxonomy[index[taxid]].append(name)
print("Adding taxid alias...", file=sys.stderr)
for taxid,current in mergedNodeIterator('%s/merged.dmp' % taxdir):
index[taxid]=index[current]
print("Adding deleted taxid...", file=sys.stderr)
for taxid in deletedNodeIterator('%s/delnodes.dmp' % taxdir):
index[taxid]=None
return taxonomy,ranks,alternativeName,index
#####
#
#
# Genbank/EMBL sequence reader
#
#
#####
def entryIterator(file):
file = universalOpen(file)
rep =[]
ligne = file.readline()
while ligne:
rep.append(ligne)
if ligne == '//\n':
rep = ''.join(rep)
yield rep
rep = []
ligne = file.readline()
def fastaEntryIterator(file):
file = universalOpen(file)
rep =[]
ligne = file.readline()
while ligne:
if ligne[0] == '>' and rep:
rep = ''.join(rep)
yield rep
rep = []
rep.append(ligne)
ligne = file.readline()
if rep:
rep = ''.join(rep)
yield rep
_cleanSeq = re.compile('[ \n0-9]+')
def cleanSeq(seq):
return _cleanSeq.sub('',seq)
_gbParseID = re.compile('(?<=^LOCUS {7})[^ ]+(?= )',re.MULTILINE)
_gbParseDE = re.compile('(?<=^DEFINITION {2}).+?\\. *$(?=[^ ])',re.MULTILINE+re.DOTALL)
_gbParseSQ = re.compile('(?<=^ORIGIN).+?(?=^//$)',re.MULTILINE+re.DOTALL)
_gbParseTX = re.compile('(?<= /db_xref="taxon:)[0-9]+(?=")')
def genbankEntryParser(entry):
Id = _gbParseID.findall(entry)[0]
De = ' '.join(_gbParseDE.findall(entry)[0].split())
Sq = cleanSeq(_gbParseSQ.findall(entry)[0].upper())
try:
Tx = int(_gbParseTX.findall(entry)[0])
except IndexError:
Tx = None
return {'id':Id,'taxid':Tx,'definition':De,'sequence':Sq}
######################
_cleanDef = re.compile('[\nDE]')
def cleanDef(definition):
return _cleanDef.sub('',definition)
_emblParseID = re.compile('(?<=^ID {3})[^ ]+(?=;)',re.MULTILINE)
_emblParseDE = re.compile('(?<=^DE {3}).+?\\. *$(?=[^ ])',re.MULTILINE+re.DOTALL)
_emblParseSQ = re.compile('(?<=^ ).+?(?=^//$)',re.MULTILINE+re.DOTALL)
_emblParseTX = re.compile('(?<= /db_xref="taxon:)[0-9]+(?=")')
def emblEntryParser(entry):
Id = _emblParseID.findall(entry)[0]
De = ' '.join(cleanDef(_emblParseDE.findall(entry)[0]).split())
Sq = cleanSeq(_emblParseSQ.findall(entry)[0].upper())
try:
Tx = int(_emblParseTX.findall(entry)[0])
except IndexError:
Tx = None
return {'id':Id,'taxid':Tx,'definition':De,'sequence':Sq}
######################
_fastaSplit=re.compile(';\\W*')
def parseFasta(seq):
seq=seq.split('\n')
title = seq[0].strip()[1:].split(None,1)
id=title[0]
if len(title) == 2:
field = _fastaSplit.split(title[1])
else:
field=[]
info = dict(x.split('=',1) for x in field if '=' in x)
definition = ' '.join([x for x in field if '=' not in x])
seq=(''.join([x.strip() for x in seq[1:]])).upper()
return id,seq,definition,info
def fastaEntryParser(entry):
id,seq,definition,info = parseFasta(entry)
Tx = info.get('taxid',None)
if Tx is not None:
match = re.search(r'taxon:(\d+)', Tx)
if match:
Tx = match.group(1)
Tx=int(Tx)
return {'id':id,'taxid':Tx,'definition':definition,'sequence':seq}
def sequenceIteratorFactory(entryParser,entryIterator):
def sequenceIterator(file):
for entry in entryIterator(file):
yield entryParser(entry)
return sequenceIterator
def taxonomyInfo(entry,connection):
taxid = entry['taxid']
curseur = connection.cursor()
curseur.execute("""
select taxid,species,genus,family,
taxonomy.scientificName(taxid) as sn,
taxonomy.scientificName(species) as species_sn,
taxonomy.scientificName(genus) as genus_sn,
taxonomy.scientificName(family) as family_sn
from
(
select alias as taxid,
taxonomy.getSpecies(alias) as species,
taxonomy.getGenus(alias) as genus,
taxonomy.getFamily(alias) as family
from taxonomy.aliases
where id=%d ) as tax
""" % taxid)
rep = curseur.fetchone()
entry['current_taxid']=rep[0]
entry['species']=rep[1]
entry['genus']=rep[2]
entry['family']=rep[3]
entry['scientific_name']=rep[4]
entry['species_sn']=rep[5]
entry['genus_sn']=rep[6]
entry['family_sn']=rep[7]
return entry
#####
#
#
# Binary writer
#
#
#####
def ecoSeqPacker(sq):
compactseq = gzip.zlib.compress(bytes(sq['sequence'],"ascii"),9)
cptseqlength = len(compactseq)
delength = len(sq['definition'])
totalSize = 4 + 20 + 4 + 4 + 4 + cptseqlength + delength
packed = struct.pack('> I I 20s I I I %ds %ds' % (delength,cptseqlength),
totalSize,
sq['taxid'],
bytes(sq['id'],"ascii"),
delength,
len(sq['sequence']),
cptseqlength,
bytes(sq['definition'],"ascii"),
compactseq)
assert len(packed) == totalSize+4, "error in sequence packing"
return packed
def ecoTaxPacker(tx):
namelength = len(tx[3])
totalSize = 4 + 4 + 4 + 4 + namelength
packed = struct.pack('> I I I I I %ds' % namelength,
totalSize,
tx[0],
tx[1],
tx[2],
namelength,
bytes(tx[3],"ascii"))
return packed
def ecoRankPacker(rank):
namelength = len(rank)
packed = struct.pack('> I %ds' % namelength,
namelength,
bytes(rank, 'ascii'))
return packed
def ecoNamePacker(name):
namelength = len(name[0])
classlength= len(name[1])
totalSize = namelength + classlength + 4 + 4 + 4 + 4
packed = struct.pack('> I I I I I %ds %ds' % (namelength,classlength),
totalSize,
int(name[1]=='scientific name'),
namelength,
classlength,
name[2],
bytes(name[0], 'ascii'),
bytes(name[1], 'ascii'))
return packed
def ecoSeqWriter(file,input,taxindex,parser):
output = open(file,'wb')
input = universalOpen(input)
inputsize = fileSize(input)
entries = parser(input)
seqcount=0
skipped = []
output.write(struct.pack('> I',seqcount))
progressBar(1, inputsize,reset=True)
for entry in entries:
if entry['taxid'] is not None:
try:
entry['taxid']=taxindex[entry['taxid']]
except KeyError:
entry['taxid']=None
if entry['taxid'] is not None:
seqcount+=1
output.write(ecoSeqPacker(entry))
else:
skipped.append(entry['id'])
where = universalTell(input)
progressBar(where, inputsize)
print(" Readed sequences : %d " % seqcount, end=' ', file=sys.stderr)
else:
skipped.append(entry['id'])
print(file=sys.stderr)
output.seek(0,0)
output.write(struct.pack('> I',seqcount))
output.close()
return skipped
def ecoTaxWriter(file,taxonomy):
output = open(file,'wb')
output.write(struct.pack('> I',len(taxonomy)))
for tx in taxonomy:
output.write(ecoTaxPacker(tx))
output.close()
def ecoRankWriter(file,ranks):
output = open(file,'wb')
output.write(struct.pack('> I',len(ranks)))
rankNames = list(ranks.keys())
rankNames.sort()
for rank in rankNames:
output.write(ecoRankPacker(rank))
output.close()
def nameCmp(n1,n2):
name1=n1[0].upper()
name2=n2[0].upper()
if name1 < name2:
return -1
elif name1 > name2:
return 1
return 0
def ecoNameWriter(file,names):
output = open(file,'wb')
output.write(struct.pack('> I',len(names)))
names.sort(key=lambda x:x[0].upper())
for name in names:
output.write(ecoNamePacker(name))
output.close()
def ecoDBWriter(prefix,taxonomy,seqFileNames,parser):
ecoRankWriter('%s.rdx' % prefix, taxonomy[1])
ecoTaxWriter('%s.tdx' % prefix, taxonomy[0])
ecoNameWriter('%s.ndx' % prefix, taxonomy[2])
filecount = 0
for filename in seqFileNames:
filecount+=1
sk=ecoSeqWriter('%s_%03d.sdx' % (prefix,filecount),
filename,
taxonomy[3],
parser)
if sk:
print("Skipped entry :", file=sys.stderr)
print(sk, file=sys.stderr)
def ecoParseOptions(arguments):
opt = {
'prefix' : 'ecodb',
'taxdir' : 'taxdump',
'parser' : sequenceIteratorFactory(genbankEntryParser,
entryIterator)
}
o,filenames = getopt.getopt(arguments,
'ht:T:n:gfe',
['help',
'taxonomy=',
'taxonomy_db=',
'name=',
'genbank',
'fasta',
'embl'])
for name,value in o:
if name in ('-h','--help'):
printHelp()
exit()
elif name in ('-t','--taxonomy'):
opt['taxmod']='dump'
opt['taxdir']=value
elif name in ('-T','--taxonomy_db'):
opt['taxmod']='db'
opt['taxdb']=value
elif name in ('-n','--name'):
opt['prefix']=value
elif name in ('-g','--genbank'):
opt['parser']=sequenceIteratorFactory(genbankEntryParser,
entryIterator)
elif name in ('-f','--fasta'):
opt['parser']=sequenceIteratorFactory(fastaEntryParser,
fastaEntryIterator)
elif name in ('-e','--embl'):
opt['parser']=sequenceIteratorFactory(emblEntryParser,
entryIterator)
else:
raise ValueError('Unknown option %s' % name)
return opt,filenames
def printHelp():
print("-----------------------------------")
print(" ecoPCRFormat.py")
print("-----------------------------------")
print("ecoPCRFormat.py [option] <argument>")
print("-----------------------------------")
print("-e --embl :[E]mbl format")
print("-f --fasta :[F]asta format")
print("-g --genbank :[G]enbank format")
print("-h --help :[H]elp - print this help")
print("-n --name :[N]ame of the new database created")
print("-t --taxonomy :[T]axonomy - path to the taxonomy database")
print(" :bcp-like dump from GenBank taxonomy database.")
print("-----------------------------------")
if __name__ == '__main__':
opt,filenames = ecoParseOptions(sys.argv[1:])
taxonomy = readTaxonomyDump(opt['taxdir'])
ecoDBWriter(opt['prefix'], taxonomy, filenames, opt['parser'])

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Designing new barcodes on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/ecoprimers/</link>
<description>Recent content in Designing new barcodes on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/ecoprimers/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

View File

@@ -0,0 +1,103 @@
SHELL := /bin/bash
FTPNCBI=ftp.ncbi.nlm.nih.gov
GBURL=https://$(FTPNCBI)/genbank
GBRELEASE_URL=$(GBURL)/GB_Release_Number
TAXOURL=https://$(FTPNCBI)/pub/taxonomy/taxdump.tar.gz
GBRELEASE:=$(shell curl $(GBRELEASE_URL))
GBDIV_ALL:=$(shell curl -L ${GBURL} \
| grep -E 'gb.+\.seq\.gz' \
| sed -E 's@^.*<a href="gb([^0-9]+)[0-9]+\.seq.gz.*$$@\1@' \
| sort \
| uniq)
GBDIV=bct inv mam phg pln pri rod vrl vrt
DIRECTORIES=fasta fasta_fgs
GBFILE_ALL:=$(shell curl -L ${GBURL} \
| grep -E "gb($$(tr ' ' '|' <<< "${GBDIV}"))[0-9]+" \
| sed -E 's@^<a href="(gb.+.seq.gz)">.*$$@\1@')
SUFFIXES += .d
NODEPS:=clean taxonomy
DEPFILES:=$(wildcard Release_$(GBRELEASE)/depends/*.d)
ifeq (0, $(words $(findstring $(MAKECMDGOALS), $(NODEPS))))
#Chances are, these files don't exist. GMake will create them and
#clean up automatically afterwards
-include $(DEPFILES)
endif
all: depends directories FORCE
@make downloads
downloads: taxonomy fasta_files
@echo Genbank Release number $(GBRELEASE)
@echo all divisions : $(GBDIV_ALL)
FORCE:
@sleep 1
.PHONY: all directories depends taxonomy fasta_files FORCE
depends: directories Release_$(GBRELEASE)/depends/gbfiles.d Makefile
division: $(GBDIV)
taxonomy: directories Release_$(GBRELEASE)/taxonomy
directories: Release_$(GBRELEASE)/fasta Release_$(GBRELEASE)/stamp Release_$(GBRELEASE)/tmp
Release_$(GBRELEASE):
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/fasta: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/stamp: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/tmp: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/depends/gbfiles.d: Makefile
@echo Create depends directory
@mkdir -p Release_$(GBRELEASE)/depends
@for f in ${GBFILE_ALL} ; do \
echo -e "Release_$(GBRELEASE)/stamp/$$f.stamp:" ; \
echo -e "\t@echo Downloading file : $$f..." ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/tmp" ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/stamp" ; \
echo -e "\t@curl -L ${GBURL}/$$f > Release_$(GBRELEASE)/tmp/$$f && touch \$$@" ; \
echo ; \
div=$$(sed -E 's@^gb(...).*$$@\1@' <<< $$f) ; \
fasta="Release_$(GBRELEASE)/fasta/$$div/$${f/.seq.gz/.fasta.gz}" ; \
fasta_fgs="Release_$(GBRELEASE)/fasta_fgs/$$div/$${f/.seq.gz/.fasta.gz}" ; \
fasta_files="$$fasta_files $$fasta" ; \
fasta_fgs_files="$$fasta_fgs_files $$fasta_fgs" ; \
echo -e "$$fasta: Release_$(GBRELEASE)/stamp/$$f.stamp" ; \
echo -e "\t@echo converting file : \$$< in fasta" ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/fasta/$$div" ; \
echo -e "\t@obiconvert -Z --fasta-output --skip-empty \\" ; \
echo -e "\t Release_$(GBRELEASE)/tmp/$$f > Release_$(GBRELEASE)/tmp/$${f/.seq.gz/.fasta.gz} \\" ; \
echo -e "\t && mv Release_$(GBRELEASE)/tmp/$${f/.seq.gz/.fasta.gz} \$$@ \\" ; \
echo -e "\t && rm -f Release_$(GBRELEASE)/tmp/$$f \\" ; \
echo -e "\t || rm -f \$$@" ; \
echo -e "\t@echo conversion of $$@ done." ; \
echo ; \
done > $@ ; \
echo >> $@ ; \
echo "fasta_files: $$fasta_files" >> $@ ;
Release_$(GBRELEASE)/taxonomy:
mkdir -p $@
curl -iL $(TAXOURL) \
| tar -C $@ -zxf -

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Analysing an Illumina data set on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/illumina/</link>
<description>Recent content in Analysing an Illumina data set on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/illumina/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,26 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337}}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:98:16207:5869#0/1_sub[78..81] {"count":2007,"merged_sample":{"29a_F260619":2007},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":2021}}
tttt
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105}}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_head":true,"obiclean_headcount":2,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":0,"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376}}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":4,"obiclean_singletoncount":3,"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct

View File

@@ -0,0 +1,24 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337}}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105}}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_head":true,"obiclean_headcount":2,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":0,"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376}}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":4,"obiclean_singletoncount":3,"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct

View File

@@ -0,0 +1,329 @@
>HELIUM_000100422_612GNAAXX:7:56:19300:10949#0/1_sub[28..127] {"count":37,"merged_sample":{"29a_F260619":37},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":43}}
ttagccctaaacacaagtaattaatataacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:63:4595:15643#0/1_sub[28..126] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":4}}
ttagccctaaacataagctattccataacaaaataattcgccagagtactaccggcaata
gcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:7:8807:7823#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":3}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtcataccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:19171:11413#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":1,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":1,"obiclean_mutation":{"HELIUM_000100422_612GNAAXX:7:8:6794:4925#0/1_sub[28..127]":"(t)->(g)@38"},"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"i","29a_F260619":"s"},"obiclean_weight":{"15a_F730814":1,"29a_F260619":1}}
ttagccctaaacacaagtaattaatataacaaaagtagtcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:44:5008:2115#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaaca
gcctgaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:53:16956:10563#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":4}}
ttagccctaaacataaacattcaataaacaaggatgttcgcaagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:54:10323:7022#0/1_sub[28..127] {"count":3,"merged_sample":{"13a_F730603":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":7}}
ttagccctaaacacaaataattatataaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:45:7460:13396#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":3}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttatgcccgt
>HELIUM_000100422_612GNAAXX:7:58:8409:9911#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaacattcaataaacaaaataattcgccagaggactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:100:18844:15930#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":2}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaagcgcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:29:9723:20435#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":4}}
ttagccctaaacacaaataattacacaaacaaaattgttcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:43:4660:16319#0/1_sub[28..126] {"count":22,"merged_sample":{"26a_F040644":22},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":42}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:103:11045:3860#0/1_sub[28..127] {"count":4,"merged_sample":{"15a_F730814":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":4}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaagggcttggcggtgctttatgccctt
>HELIUM_000100422_612GNAAXX:7:67:10944:19430#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":3}}
ttagccctaaacacaagtaattaatataacaaaattattcgtcagagtactaccggcaat
agcttaaaactcaaaggacgtggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:56:6962:17278#0/1_sub[28..126] {"count":4,"merged_sample":{"26a_F040644":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":7}}
ttagccctaaacataaacattcaataaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:60:13553:20530#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":1,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":2,"obiclean_status":{"15a_F730814":"s","29a_F260619":"s"},"obiclean_weight":{"15a_F730814":1,"29a_F260619":1}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
atgttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:35:13167:18371#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaagaatgttcgccggagaactactaggaaca
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:105:1895:7593#0/1_sub[28..127] {"count":11,"merged_sample":{"29a_F260619":11},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":13}}
ttagccctaaacctcaacagttaaatcaacaaaactgctcgccagaacactacgagccac
agcttaaaactcaaaggacctggcggtgcttcatatccct
>HELIUM_000100422_612GNAAXX:7:118:15661:12736#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":2}}
ttagccctaaacacaagtaattaatataacaaaattatccgcaagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:30:8391:13152#0/1_sub[28..127] {"count":17,"merged_sample":{"13a_F730603":17},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":25}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaac
agcccaaaactcaaaggacttggcggtgcttcacaccctt
>HELIUM_000100422_612GNAAXX:7:95:14375:10838#0/1_sub[28..127] {"count":4,"merged_sample":{"29a_F260619":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":5}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:82:13334:18499#0/1_sub[28..127] {"count":6,"merged_sample":{"29a_F260619":6},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":10}}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:11272:1128#0/1_sub[28..127] {"count":2,"merged_sample":{"13a_F730603":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":2}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:7:6016:14767#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":4}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtacgtctagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337}}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:9:8337:12329#0/1_sub[28..126] {"count":5,"merged_sample":{"26a_F040644":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":6}}
ttagccctaaacataaacagtcaataaacaaggatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:15:12854:18952#0/1_sub[28..126] {"count":10,"merged_sample":{"29a_F260619":10},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":16}}
ttagccctaaacataagctattccataacaaaattattcgccagagtactaccggcaata
gcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:1:4513:20277#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaaccattctataacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:57:18237:6765#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":3}}
ttagccctaaacacaagtaattaatataacaaaattatgcgccagagtactgccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:119:8691:15994#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":2}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcgat
agcttaaaacgcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:6:1739:11421#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaagaatgttcgctagagtactactagcaaca
gcctgacactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:73:17136:17728#0/1_sub[28..127] {"count":6,"merged_sample":{"13a_F730603":6},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":7}}
ctagccttaaacacaaatagttatgcaaacaaaattattcgccagagtactaccggcaac
agcccaaaactcaaaggacttggcggtgcttcacaccctt
>HELIUM_000100422_612GNAAXX:7:76:9874:3117#0/1_sub[28..127] {"count":9,"merged_sample":{"29a_F260619":9},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":12}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:18:16679:15889#0/1_sub[28..127] {"count":4,"merged_sample":{"29a_F260619":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":4}}
ttagccctaaacctcaacagttaaatcaacaaaactgctcgccagaacactacgagccac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:52:6908:8410#0/1_sub[28..126] {"count":5,"merged_sample":{"26a_F040644":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":10}}
ttagccctaaacataagctattctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacttggcggtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:71:13461:7411#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":3}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagaagactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:98:16207:5869#0/1_sub[78..81] {"count":2007,"merged_sample":{"29a_F260619":2007},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":2021}}
tttt
>HELIUM_000100422_612GNAAXX:7:47:6989:13864#0/1_sub[28..126] {"count":3,"merged_sample":{"26a_F040644":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaaccattctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:72:8941:18482#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaatg
gcctaaaactcaaaggacttggtggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:25:10818:13742#0/1_sub[28..133] {"count":8,"merged_sample":{"26a_F040644":8},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":8}}
ttagccctaaacgtaaactgcaaactattccataataaaataattcgcccaagaactact
agcaacagcttaaaactcaaaggacttggtggtgctttctacccct
>HELIUM_000100422_612GNAAXX:7:70:11509:6042#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":2}}
ttagccctaaacacaagaaattaatataacaaaaatattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105}}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:46:4342:17317#0/1_sub[28..126] {"count":5,"merged_sample":{"26a_F040644":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":8}}
ttagccctaaacataaatcattctataacaaaataattcgccggagaactactaggaaca
gcttaaaactcaaaggacttggcggtgccttacgtccct
>HELIUM_000100422_612GNAAXX:7:111:17844:17230#0/1_sub[28..126] {"count":13,"merged_sample":{"26a_F040644":13},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":17}}
ttagccctaaacataaatcagtctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacgtggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:34:17680:16952#0/1_sub[28..127] {"count":15,"merged_sample":{"13a_F730603":15},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":15}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgcttcacaccctt
>HELIUM_000100422_612GNAAXX:7:34:2640:2539#0/1_sub[28..127] {"count":19,"merged_sample":{"29a_F260619":19},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":25}}
ttagccctaaacacaaataattacacaaacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:71:18891:9661#0/1_sub[28..126] {"count":3,"merged_sample":{"26a_F040644":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaatcagtctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacgtggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:26:16553:1613#0/1_sub[77..81] {"count":38,"merged_sample":{"13a_F730603":38},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":38}}
caata
>HELIUM_000100422_612GNAAXX:7:45:5732:11220#0/1_sub[28..126] {"count":3,"merged_sample":{"26a_F040644":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggtggtgctttctacccct
>HELIUM_000100422_612GNAAXX:7:1:14254:1103#0/1_sub[28..126] {"count":8,"merged_sample":{"26a_F040644":8},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":11}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:50:17151:20608#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:96:13203:2879#0/1_sub[28..127] {"count":5,"merged_sample":{"13a_F730603":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":6}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaac
agcccaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:78:18041:18996#0/1_sub[28..126] {"count":13,"merged_sample":{"13a_F730603":9,"15a_F730814":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":2,"obiclean_status":{"13a_F730603":"s","15a_F730814":"s"},"obiclean_weight":{"13a_F730603":9,"15a_F730814":4}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:80:18387:10166#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":1,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":1,"obiclean_mutation":{"HELIUM_000100422_612GNAAXX:7:100:14685:15065#0/1_sub[28..127]":"(a)->(t)@24","HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127]":"(t)->(g)@71"},"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s","29a_F260619":"i"},"obiclean_weight":{"15a_F730814":1,"29a_F260619":1}}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaacgcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:54:10113:12172#0/1_sub[28..127] {"count":7,"merged_sample":{"15a_F730814":4,"29a_F260619":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":2,"obiclean_status":{"15a_F730814":"s","29a_F260619":"s"},"obiclean_weight":{"15a_F730814":5,"29a_F260619":4}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:97:17822:18365#0/1_sub[28..127] {"count":3,"merged_sample":{"29a_F260619":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":4}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactgccggcaat
agcttaaaactcaaaggacttggcggtgctttatcccctt
>HELIUM_000100422_612GNAAXX:7:87:6328:12406#0/1_sub[28..126] {"count":6,"merged_sample":{"26a_F040644":6},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":8}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:7:4611:13145#0/1_sub[28..127] {"count":3,"merged_sample":{"13a_F730603":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":4}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtacgtccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_head":true,"obiclean_headcount":2,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":0,"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:78:6346:5817#0/1_sub[28..127] {"count":5,"merged_sample":{"13a_F730603":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":5}}
ttagccctaaacacaaataattatataaacaaaattattcgccagagtactaccggcaac
agcccaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:7:15122:17310#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":1,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":1,"obiclean_mutation":{"HELIUM_000100422_612GNAAXX:7:94:6384:20392#0/1_sub[28..127]":"(a)->(t)@52"},"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"i","29a_F260619":"s"},"obiclean_weight":{"15a_F730814":1,"29a_F260619":1}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtacttcctgcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:27:7695:17738#0/1_sub[28..126] {"count":7,"merged_sample":{"26a_F040644":7},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":11}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:48:15379:13773#0/1_sub[28..126] {"count":5,"merged_sample":{"26a_F040644":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":5}}
ttagccctaaacatagataattttacaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:117:16246:17184#0/1_sub[28..127] {"count":5,"merged_sample":{"13a_F730603":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":10}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagaggactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:87:11044:6323#0/1_sub[28..126] {"count":69,"merged_sample":{"26a_F040644":69},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":84}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:40:12248:18615#0/1_sub[28..126] {"count":4,"merged_sample":{"26a_F040644":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":4}}
ttagccctaaacataagctattctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:7:6470:13562#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":2}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtacctccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:72:11850:15705#0/1_sub[28..126] {"count":4,"merged_sample":{"26a_F040644":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":4}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:82:8566:4827#0/1_sub[28..126] {"count":6,"merged_sample":{"26a_F040644":6},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":9}}
ttagccctaaacataaacattcaataaacaaggatgttcgcaagagtactactagcaatg
gcctaaaactcaaaggacttggtggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:43:16297:17399#0/1_sub[28..127] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":3}}
ttagccctaaacacaagtaattaatataacaaaattgttcaccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:7:15117:2564#0/1_sub[28..127] {"count":2,"merged_sample":{"13a_F730603":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":2}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtacctccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:90:4058:17862#0/1_sub[28..127] {"count":4,"merged_sample":{"13a_F730603":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":5}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagaggactactagcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:43:13909:1377#0/1_sub[28..126] {"count":10,"merged_sample":{"26a_F040644":10},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":14}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaatg
gcctaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:36:3584:21256#0/1_sub[28..127] {"count":2,"merged_sample":{"13a_F730603":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":2}}
ctagccttaaacacaaatagttatgcaaacaaagctattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttatgccctt
>HELIUM_000100422_612GNAAXX:7:80:13357:2959#0/1_sub[74..81] {"count":12,"merged_sample":{"29a_F260619":12},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":12}}
ccgatagg
>HELIUM_000100422_612GNAAXX:7:70:8097:4516#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaagaaacaagaatgttcaccagagtactactagcaatg
gcctaaaactcaaaggacttggcagtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:7:8746:5790#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtacctctagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:8:9165:18915#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactatgaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:85:7323:6139#0/1_sub[28..126] {"count":9,"merged_sample":{"26a_F040644":9},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":16}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:23:6103:3418#0/1_sub[28..126] {"count":3,"merged_sample":{"26a_F040644":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":4}}
ttagccctaaacatagataattttacaacaaaataattcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:107:2103:10677#0/1_sub[28..127] {"count":20,"merged_sample":{"13a_F730603":20},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":22}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:1:15993:20360#0/1_sub[28..127] {"count":4,"merged_sample":{"13a_F730603":4},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":4}}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactacctgcaat
agcttaaaactcaaaggacttggcggtgctttatgccctt
>HELIUM_000100422_612GNAAXX:7:103:1205:6990#0/1_sub[28..127] {"count":2,"merged_sample":{"13a_F730603":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":2}}
ttagccctaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:111:2380:10482#0/1_sub[28..126] {"count":2,"merged_sample":{"29a_F260619":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":2}}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaata
gcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:76:10530:11312#0/1_sub[28..126] {"count":43,"merged_sample":{"26a_F040644":43},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":69}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:40:15984:4911#0/1_sub[28..126] {"count":16,"merged_sample":{"26a_F040644":16},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":30}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaata
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:109:19171:17424#0/1_sub[28..126] {"count":2,"merged_sample":{"13a_F730603":1,"26a_F040644":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":1,"obiclean_mutation":{"HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126]":"(a)->(t)@51"},"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s","26a_F040644":"i"},"obiclean_weight":{"13a_F730603":1,"26a_F040644":1}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggacttctagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:77:17898:19592#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":1,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":2,"obiclean_status":{"15a_F730814":"s","29a_F260619":"s"},"obiclean_weight":{"15a_F730814":1,"29a_F260619":1}}
ttagccctaaacacaagtaattaatataacaaaataattcgccagaggactactagcaat
agcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:32:4908:16517#0/1_sub[78..81] {"count":7,"merged_sample":{"29a_F260619":7},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":7}}
ccgc
>HELIUM_000100422_612GNAAXX:7:100:8022:19461#0/1_sub[28..127] {"count":5,"merged_sample":{"29a_F260619":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":11}}
ttagccctaaacacaagtaattaatataacaaaataattcgccagagaactactagcaac
agattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:59:2390:15297#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaaggatgttcgcaagagtactactagcaatg
gcctaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:98:10839:20244#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":2}}
ttagccctaaacataaacattcaataaacaaggatgttcgccagagtactactagcaatg
gcctaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:34:13086:6440#0/1_sub[28..127] {"count":14,"merged_sample":{"13a_F730603":14},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":18}}
ttagccctaaacacaaataattatataaacaaaattattcgccagagtactaccggcaac
agcccaaaactcaaaggacttggcggtgcttcacaccctt
>HELIUM_000100422_612GNAAXX:7:73:10944:14101#0/1_sub[28..127] {"count":2,"merged_sample":{"13a_F730603":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"13a_F730603":"s"},"obiclean_weight":{"13a_F730603":3}}
ttagccctaaacacaaataattatataaacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:61:17561:21218#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaatcattctataacaaaataattcgccggagaactactaggaaca
gcttaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:93:7569:17305#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaatcagtctataacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:55:11954:15731#0/1_sub[28..126] {"count":6,"merged_sample":{"29a_F260619":6},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":7}}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376}}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:54:15067:12524#0/1_sub[28..126] {"count":26,"merged_sample":{"26a_F040644":26},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":49}}
ttagccctaaacatagataattttacaacaaaataattcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:6:10451:2548#0/1_sub[28..126] {"count":12,"merged_sample":{"26a_F040644":12},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":15}}
ttagccctaaacataaacagtcaataaacaaggatgttcgccagagtactactagcaatg
gcctaaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:23:10872:20213#0/1_sub[28..126] {"count":2,"merged_sample":{"26a_F040644":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":3}}
ttagccctaaacataaatcattctataacaaaataattcgccggagaactactagcaaca
gcttaaaactcaaaggacttggcggtgccttacgtccct
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":4,"obiclean_singletoncount":3,"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1}}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:98:6034:5203#0/1_sub[28..127] {"count":3,"merged_sample":{"29a_F260619":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":4}}
ttagccctaaacacaagtaattaatataacaaaattattcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:70:2429:19509#0/1_sub[28..126] {"count":5,"merged_sample":{"26a_F040644":5},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":5}}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgccttacgtccct
>HELIUM_000100422_612GNAAXX:7:65:1843:2567#0/1_sub[28..126] {"count":7,"merged_sample":{"26a_F040644":7},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s"},"obiclean_weight":{"26a_F040644":11}}
ttagccctaaacataaaccattctataacaaaataattcgccagagaactactagcaaca
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:92:1339:19811#0/1_sub[28..127] {"count":3,"merged_sample":{"29a_F260619":3},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"29a_F260619":"s"},"obiclean_weight":{"29a_F260619":3}}
ttagccctaaacacaagtaattacacaaacaaaattgttcacaagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:15:13855:1746#0/1_sub[28..127] {"count":3,"merged_sample":{"15a_F730814":2,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":1,"obiclean_mutation":{"HELIUM_000100422_612GNAAXX:7:7:14405:19348#0/1_sub[28..127]":"(t)->(g)@51"},"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s","29a_F260619":"i"},"obiclean_weight":{"15a_F730814":3,"29a_F260619":1}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtgcgaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:7:7092:11003#0/1_sub[28..127] {"count":2,"merged_sample":{"15a_F730814":2},"obiclean_head":true,"obiclean_headcount":0,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":1,"obiclean_status":{"15a_F730814":"s"},"obiclean_weight":{"15a_F730814":2}}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtacgtccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt

View File

@@ -0,0 +1,24 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":1,"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_head":true,"obiclean_headcount":2,"obiclean_internalcount":0,"obiclean_samplecount":2,"obiclean_singletoncount":0,"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":1,"obiclean_singletoncount":0,"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_head":true,"obiclean_headcount":1,"obiclean_internalcount":0,"obiclean_samplecount":4,"obiclean_singletoncount":3,"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct

View File

@@ -0,0 +1,24 @@
>seq0001 {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","seq_number":1,"taxid":"taxon:9992 [Marmota]@genus"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>seq0002 {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","seq_number":2,"taxid":"taxon:35500 [Pecora]@infraorder"}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>seq0003 {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","seq_number":3,"taxid":"taxon:9860 [Cervus elaphus]@species"}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>seq0004 {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","seq_number":4,"taxid":"taxon:337730 [Marmotini]@tribe"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>seq0005 {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","seq_number":5,"taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>seq0006 {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","seq_number":6,"taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>seq0007 {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","seq_number":7,"taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>seq0008 {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","seq_number":8,"taxid":"taxon:9611 [Canis]@genus"}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct

View File

@@ -0,0 +1,9 @@
id,count,obitag_bestid,obitag_bestmatch,obitag_match_count,obitag_rank,obitag_similarity_method,seq_number,taxid,sequence
seq0001,10172,0.9797979797979798,AY227529,1,genus,lcs,1,taxon:9992 [Marmota]@genus,ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaacagcctgaaactcaaaggacttggcggtgctttacatccct
seq0002,260,0.9405940594059405,AF154263,9,infraorder,lcs,2,taxon:35500 [Pecora]@infraorder,ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaacagcttaaaactcaaaggacttggcggtgctttataccctt
seq0003,7146,1,AB245427,1,species,lcs,3,taxon:9860 [Cervus elaphus]@species,ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
seq0004,87,0.9494949494949495,AY227530,2,tribe,lcs,4,taxon:337730 [Marmotini]@tribe,ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaatagcttaaaactcaaaggacttggcggtgctttatatccct
seq0005,95,0.9595959595959596,AC187326,1,subspecies,lcs,5,taxon:9615 [Canis lupus familiaris]@subspecies,ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaacagattaaacctcaaaggacttggcagtgctttatacccct
seq0006,12004,1,AJ885202,1,species,lcs,6,taxon:9858 [Capreolus capreolus]@species,ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
seq0007,319,1,AJ972683,1,species,lcs,7,taxon:9858 [Capreolus capreolus]@species,ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
seq0008,366,1,AB048590,1,genus,lcs,8,taxon:9611 [Canis]@genus,ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaatagcttaaaactcaaaggacttggcggtgctttatatccct
1 id count obitag_bestid obitag_bestmatch obitag_match_count obitag_rank obitag_similarity_method seq_number taxid sequence
2 seq0001 10172 0.9797979797979798 AY227529 1 genus lcs 1 taxon:9992 [Marmota]@genus ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaacagcctgaaactcaaaggacttggcggtgctttacatccct
3 seq0002 260 0.9405940594059405 AF154263 9 infraorder lcs 2 taxon:35500 [Pecora]@infraorder ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaacagcttaaaactcaaaggacttggcggtgctttataccctt
4 seq0003 7146 1 AB245427 1 species lcs 3 taxon:9860 [Cervus elaphus]@species ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
5 seq0004 87 0.9494949494949495 AY227530 2 tribe lcs 4 taxon:337730 [Marmotini]@tribe ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaatagcttaaaactcaaaggacttggcggtgctttatatccct
6 seq0005 95 0.9595959595959596 AC187326 1 subspecies lcs 5 taxon:9615 [Canis lupus familiaris]@subspecies ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaacagattaaacctcaaaggacttggcagtgctttatacccct
7 seq0006 12004 1 AJ885202 1 species lcs 6 taxon:9858 [Capreolus capreolus]@species ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
8 seq0007 319 1 AJ972683 1 species lcs 7 taxon:9858 [Capreolus capreolus]@species ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaatagcttaaaactcaaaggacttggcggtgctttataccctt
9 seq0008 366 1 AB048590 1 genus lcs 8 taxon:9611 [Canis]@genus ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaatagcttaaaactcaaaggacttggcggtgctttatatccct

View File

@@ -0,0 +1,6 @@
id,seq0001,seq0002,seq0003,seq0004,seq0005,seq0006,seq0007,seq0008
29a_F260619,0,337,0,0,105,5789,376,1
15a_F730814,0,0,0,0,0,8822,0,5
13a_F730603,0,0,8039,0,0,0,0,17
26a_F040644,12205,0,0,202,12,0,0,468
1 id seq0001 seq0002 seq0003 seq0004 seq0005 seq0006 seq0007 seq0008
2 29a_F260619 0 337 0 0 105 5789 376 1
3 15a_F730814 0 0 0 0 0 8822 0 5
4 13a_F730603 0 0 8039 0 0 0 0 17
5 26a_F040644 12205 0 0 202 12 0 0 468

View File

@@ -0,0 +1,24 @@
>HELIUM_000100422_612GNAAXX:7:118:3572:14633#0/1_sub[28..126] {"count":10172,"merged_sample":{"26a_F040644":10172},"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":12205},"obitag_bestid":0.9797979797979798,"obitag_bestmatch":"AY227529","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9992 [Marmota]@genus"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagagtactactagcaaca
gcctgaaactcaaaggacttggcggtgctttacatccct
>HELIUM_000100422_612GNAAXX:7:99:9351:13090#0/1_sub[28..127] {"count":260,"merged_sample":{"29a_F260619":260},"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":337},"obitag_bestid":0.9405940594059405,"obitag_bestmatch":"AF154263","obitag_match_count":9,"obitag_rank":"infraorder","obitag_similarity_method":"lcs","taxid":"taxon:35500 [Pecora]@infraorder"}
ttagccctaaacacaaataattacacaaacaaaattgttcaccagagtactagcggcaac
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:108:10111:9078#0/1_sub[28..127] {"count":7146,"merged_sample":{"13a_F730603":7146},"obiclean_status":{"13a_F730603":"h"},"obiclean_weight":{"13a_F730603":8039},"obitag_bestid":1,"obitag_bestmatch":"AB245427","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9860 [Cervus elaphus]@species"}
ctagccttaaacacaaatagttatgcaaacaaaactattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:38:14204:12725#0/1_sub[28..126] {"count":87,"merged_sample":{"26a_F040644":87},"obiclean_status":{"26a_F040644":"h"},"obiclean_weight":{"26a_F040644":202},"obitag_bestid":0.9494949494949495,"obitag_bestmatch":"AY227530","obitag_match_count":2,"obitag_rank":"tribe","obitag_similarity_method":"lcs","taxid":"taxon:337730 [Marmotini]@tribe"}
ttagccctaaacataaacattcaataaacaagaatgttcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct
>HELIUM_000100422_612GNAAXX:7:30:9942:4495#0/1_sub[28..126] {"count":95,"merged_sample":{"26a_F040644":11,"29a_F260619":84},"obiclean_status":{"26a_F040644":"s","29a_F260619":"h"},"obiclean_weight":{"26a_F040644":12,"29a_F260619":105},"obitag_bestid":0.9595959595959596,"obitag_bestmatch":"AC187326","obitag_match_count":1,"obitag_rank":"subspecies","obitag_similarity_method":"lcs","taxid":"taxon:9615 [Canis lupus familiaris]@subspecies"}
ttagccctaaacataagctattccataacaaaataattcgccagagaactactagcaaca
gattaaacctcaaaggacttggcagtgctttatacccct
>HELIUM_000100422_612GNAAXX:7:51:16702:19393#0/1_sub[28..127] {"count":12004,"merged_sample":{"15a_F730814":7465,"29a_F260619":4539},"obiclean_status":{"15a_F730814":"h","29a_F260619":"h"},"obiclean_weight":{"15a_F730814":8822,"29a_F260619":5789},"obitag_bestid":1,"obitag_bestmatch":"AJ885202","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattaatataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:84:14502:1617#0/1_sub[28..127] {"count":319,"merged_sample":{"29a_F260619":319},"obiclean_status":{"29a_F260619":"h"},"obiclean_weight":{"29a_F260619":376},"obitag_bestid":1,"obitag_bestmatch":"AJ972683","obitag_match_count":1,"obitag_rank":"species","obitag_similarity_method":"lcs","taxid":"taxon:9858 [Capreolus capreolus]@species"}
ttagccctaaacacaagtaattattataacaaaattattcgccagagtactaccggcaat
agcttaaaactcaaaggacttggcggtgctttataccctt
>HELIUM_000100422_612GNAAXX:7:50:10637:6527#0/1_sub[28..126] {"count":366,"merged_sample":{"13a_F730603":13,"15a_F730814":5,"26a_F040644":347,"29a_F260619":1},"obiclean_status":{"13a_F730603":"s","15a_F730814":"s","26a_F040644":"h","29a_F260619":"s"},"obiclean_weight":{"13a_F730603":17,"15a_F730814":5,"26a_F040644":468,"29a_F260619":1},"obitag_bestid":1,"obitag_bestmatch":"AB048590","obitag_match_count":1,"obitag_rank":"genus","obitag_similarity_method":"lcs","taxid":"taxon:9611 [Canis]@genus"}
ttagccctaaacatagataattttacaacaaaataattcgccagaggactactagcaata
gcttaaaactcaaaggacttggcggtgctttatatccct

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@@ -0,0 +1,8 @@
@param,matching,strict
@param,primer_mismatches,2
@param,indels,false
experiment,sample,sample_tag,forward_primer,reverse_primer
wolf_diet,13a_F730603,aattaac,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,15a_F730814,gaagtag,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,26a_F040644,gaatatc,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
wolf_diet,29a_F260619,gcctcct,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
1 @param,matching,strict
2 @param,primer_mismatches,2
3 @param,indels,false
4 experiment,sample,sample_tag,forward_primer,reverse_primer
5 wolf_diet,13a_F730603,aattaac,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
6 wolf_diet,15a_F730814,gaagtag,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
7 wolf_diet,26a_F040644,gaatatc,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG
8 wolf_diet,29a_F260619,gcctcct,TTAGATACCCCACTATGC,TAGAACAGGCTCCTCTAG

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Cookbook on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/</link>
<description>Recent content in Cookbook on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

View File

@@ -0,0 +1,102 @@
SHELL := /bin/bash
FTPNCBI=ftp.ncbi.nlm.nih.gov
GBURL=https://$(FTPNCBI)/genbank
GBRELEASE_URL=$(GBURL)/GB_Release_Number
TAXOURL=https://$(FTPNCBI)/pub/taxonomy/taxdump.tar.gz
GBRELEASE:=$(shell curl $(GBRELEASE_URL))
GBDIV_ALL:=$(shell curl -L ${GBURL} \
| grep -E 'gb.+\.seq\.gz' \
| sed -E 's@^.*<a href="gb([^0-9]+)[0-9]+\.seq.gz.*$$@\1@' \
| sort \
| uniq)
GBDIV=bct inv mam phg pln pri rod vrl vrt
DIRECTORIES=fasta fasta_fgs
GBFILE_ALL:=$(shell curl -L ${GBURL} \
| grep -E "gb($$(tr ' ' '|' <<< "${GBDIV}"))[0-9]+" \
| sed -E 's@^<a href="(gb.+.seq.gz)">.*$$@\1@')
SUFFIXES += .d
NODEPS:=clean taxonomy
DEPFILES:=$(wildcard Release_$(GBRELEASE)/depends/*.d)
ifeq (0, $(words $(findstring $(MAKECMDGOALS), $(NODEPS))))
#Chances are, these files don't exist. GMake will create them and
#clean up automatically afterwards
-include $(DEPFILES)
endif
all: depends directories FORCE
@make downloads
downloads: taxonomy fasta_files
@echo Genbank Release number $(GBRELEASE)
@echo all divisions : $(GBDIV_ALL)
FORCE:
.PHONY: all directories depends taxonomy fasta_files FORCE
depends: directories Release_$(GBRELEASE)/depends/gbfiles.d Makefile
division: $(GBDIV)
taxonomy: directories Release_$(GBRELEASE)/taxonomy
directories: Release_$(GBRELEASE)/fasta Release_$(GBRELEASE)/stamp Release_$(GBRELEASE)/tmp
Release_$(GBRELEASE):
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/fasta: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/stamp: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/tmp: Release_$(GBRELEASE)
@mkdir -p $@
@echo Create $@ directory
Release_$(GBRELEASE)/depends/gbfiles.d: Makefile
@echo Create depends directory
@mkdir -p Release_$(GBRELEASE)/depends
@for f in ${GBFILE_ALL} ; do \
echo -e "Release_$(GBRELEASE)/stamp/$$f.stamp:" ; \
echo -e "\t@echo Downloading file : $$f..." ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/tmp" ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/stamp" ; \
echo -e "\t@curl -L ${GBURL}/$$f > Release_$(GBRELEASE)/tmp/$$f && touch \$$@" ; \
echo ; \
div=$$(sed -E 's@^gb(...).*$$@\1@' <<< $$f) ; \
fasta="Release_$(GBRELEASE)/fasta/$$div/$${f/.seq.gz/.fasta.gz}" ; \
fasta_fgs="Release_$(GBRELEASE)/fasta_fgs/$$div/$${f/.seq.gz/.fasta.gz}" ; \
fasta_files="$$fasta_files $$fasta" ; \
fasta_fgs_files="$$fasta_fgs_files $$fasta_fgs" ; \
echo -e "$$fasta: Release_$(GBRELEASE)/stamp/$$f.stamp" ; \
echo -e "\t@echo converting file : \$$< in fasta" ; \
echo -e "\t@mkdir -p Release_$(GBRELEASE)/fasta/$$div" ; \
echo -e "\t@obiconvert -Z --fasta-output --skip-empty \\" ; \
echo -e "\t Release_$(GBRELEASE)/tmp/$$f > Release_$(GBRELEASE)/tmp/$${f/.seq.gz/.fasta.gz} \\" ; \
echo -e "\t && mv Release_$(GBRELEASE)/tmp/$${f/.seq.gz/.fasta.gz} \$$@ \\" ; \
echo -e "\t && rm -f Release_$(GBRELEASE)/tmp/$$f \\" ; \
echo -e "\t || rm -f \$$@" ; \
echo -e "\t@echo conversion of $$@ done." ; \
echo ; \
done > $@ ; \
echo >> $@ ; \
echo "fasta_files: $$fasta_files" >> $@ ;
Release_$(GBRELEASE)/taxonomy:
mkdir -p $@
curl -iL $(TAXOURL) \
| tar -C $@ -zxf -

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Prepare a local copy of Genbank on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/local_genbank/</link>
<description>Recent content in Prepare a local copy of Genbank on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/local_genbank/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Oxford Nanopore data analysis on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/minion/</link>
<description>Recent content in Oxford Nanopore data analysis on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/minion/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Build a reference database on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/cookbook/reference_db/</link>
<description>Recent content in Build a reference database on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<lastBuildDate></lastBuildDate>
<atom:link href="http://metabar:8888/obidoc/docs/cookbook/reference_db/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>File formats on OBITools4 documentation</title>
<link>http://metabar:8888/obidoc/docs/file_format/</link>
<description>Recent content in File formats on OBITools4 documentation</description>
<generator>Hugo</generator>
<language>en-us</language>
<atom:link href="http://metabar:8888/obidoc/docs/file_format/index.xml" rel="self" type="application/rss+xml" />
</channel>
</rss>

Some files were not shown because too many files have changed in this diff Show More