refactor: aggregate query results at sequence level
Refactor the query pipeline to buffer partition outputs into a per-sequence `seq_results` vector, deferring final accumulation until all partitions complete. This ensures global position ordering before computing k-mer presence, counts, and coverage statistics. Additionally, removes a resolved TODO and documents a known BLAST false-positive issue where chloroplast and bacterial contaminants yield unrealistic high-confidence matches.
This commit is contained in:
@@ -24,3 +24,37 @@ Sauf qu'avec un index approximatif, les résultats seront approximatifs.
|
||||
--detail et --mismatch à implementer
|
||||
|
||||
- status : affiche le statut de l'index
|
||||
|
||||
## Problème biologique sur l'identification des contaminants
|
||||
|
||||
Exemple de reads problématiques:
|
||||
```
|
||||
>LH00534:161:22WMGWLT4:4:1101:45301:1420 {"coverage":{"gbbct":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]},"kmer_count":117,"kmer_strict_matches":{"gbbct":117}}
|
||||
GCCCCTACCGTACTCCAGCTTGGTAGTTTCCACCGCCTGTCCAGGGTTGAGCCCTGGGATTTGACGGCGGACTTAAAAAGCCACCTACAGACGCTTTACGCCCAATCATTCCGGATAACGCTTGCATCCTCTGTATTACCGCGGCTGCTGG
|
||||
```
|
||||
|
||||
Par blast match une quantité invréssemblable de genomes chloroplastique avec un match de 100% (6554 hits pour Streptophyta)
|
||||
|
||||
mais aussi une quantité de sequences importantes à des OTU bactériennes (uncutured bacteria 115 hits) aussi avec 100% de similarité.
|
||||
|
||||
```
|
||||
Uncultured bacterium clone Otu01032 16S ribosomal RNA gene, partial sequence
|
||||
Sequence ID: KX996137.1Length: 440Number of Matches: 1
|
||||
Range 1: 153 to 303GenBankGraphics
|
||||
Next Match
|
||||
Previous Match
|
||||
Alignment statistics for match #1 Score Expect Identities Gaps Strand
|
||||
273 bits(302) 2e-69 151/151(100%) 0/151(0%) Plus/Minus
|
||||
|
||||
Query 1 GCCCCTACCGTACTCCAGCTTGGTAGTTTCCACCGCCTGTCCAGGGTTGAGCCCTGGGAT 60
|
||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||
Sbjct 303 GCCCCTACCGTACTCCAGCTTGGTAGTTTCCACCGCCTGTCCAGGGTTGAGCCCTGGGAT 244
|
||||
|
||||
Query 61 TTGACGGCGGACTTAAAAAGCCACCTACAGACGCTTTACGCCCAATCATTCCGGATAACG 120
|
||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||
Sbjct 243 TTGACGGCGGACTTAAAAAGCCACCTACAGACGCTTTACGCCCAATCATTCCGGATAACG 184
|
||||
|
||||
Query 121 CTTGCATCCTCTGTATTACCGCGGCTGCTGG 151
|
||||
|||||||||||||||||||||||||||||||
|
||||
Sbjct 183 CTTGCATCCTCTGTATTACCGCGGCTGCTGG 153
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user