NextPolish¶

NextPolish is used to fix base errors (SNV/Indel) in the genome generated by noisy long reads, it can be used with short read data only or long read data only or a combination of both. It contains two core modules, and use a stepwise fashion to correct the error bases in reference genome. To correct/assemble the raw third-generation sequencing (TGS) long reads with approximately 10-15% sequencing errors, please use NextDenovo.

Installation¶

DOWNLOAD

click here or use the following command:
```
wget https://github.com/Nextomics/NextPolish/releases/latest/download/NextPolish.tgz
```
Note

If you get an error like version 'GLIBC_2.14' not found or liblzma.so.0: cannot open shared object file, Please download this version.
REQUIREMENT
- Python (Support python 2 and 3):
  - Paralleltask

INSTALL

pip install paralleltask
tar -vxzf NextPolish.tgz && cd NextPolish && make

UNINSTALL
```
cd NextPolish && make clean
```
TEST
```
nextPolish test_data/run.cfg
```

Quick Start¶

Prepare sgs_fofn

ls reads1_R1.fq reads1_R2.fq reads2_R1.fq reads2_R2.fq > sgs.fofn

Create run.cfg

genome=input.genome.fa
echo -e "task = best\ngenome = $genome\nsgs_fofn = sgs.fofn" > run.cfg

Run
```
nextPolish run.cfg
```
Finally polished genome
- Sequence: /path_to_work_directory/genome.nextpolish.fasta
- Statistics: /path_to_work_directory/genome.nextpolish.fasta.stat

Tip

You can also use your own alignment pipeline, and then only use NextPolish to polish the genome, which will be faster than the default pipeline when runing on a local system. The accuracy of the polished genome is the same as the default. See following for an example (using bwa to do alignment).

#Set input and parameters
round=2
threads=20
read1=reads_R1.fastq.gz
read2=reads_R2.fastq.gz
input=input.genome.fa
for ((i=1; i<=${round};i++)); do
#step 1:
   #index the genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   python NextPolish/lib/nextpolish1.py -g ${input} -t 1 -p ${threads} -s sgs.sort.bam > genome.polishtemp.fa;
   input=genome.polishtemp.fa;
#step2:
   #index genome file and do alignment
   bwa index ${input};
   bwa mem -t ${threads} ${input} ${read1} ${read2}|samtools view --threads 3 -F 0x4 -b -|samtools fixmate -m --threads 3  - -|samtools sort -m 2g --threads 5 -|samtools markdup --threads 5 -r - sgs.sort.bam
   #index bam and genome files
   samtools index -@ ${threads} sgs.sort.bam;
   samtools faidx ${input};
   #polish genome file
   python NextPolish/lib/nextpolish1.py -g ${input} -t 2 -p ${threads} -s sgs.sort.bam > genome.nextpolish.fa;
   input=genome.nextpolish.fa;
done;
#Finally polished genome file: genome.nextpolish.fa

Note

It is recommend to use long reads to polish the raw genome (set task start with “5” and lgs_fofn or use racon) before polishing with short reads to avoid incorrect mapping of short reads in some high error rate regions, especially for the assembly generated without a consensus step, such as miniasm.

Getting Help¶

HELP

Feel free to raise an issue at the issue page. They would also be helpful to other users.
CONTACT

For additional help, please send an email to huj_at_grandomics_dot_com.

Copyright¶

NextPolish is freely available for academic use and other non-commercial use.

Cite¶

Hu, Jiang, et al. “NextPolish: a fast and efficient genome polishing tool for long read assembly.” Bioinformatics (Oxford, England) (2019).

Limitations¶

NextPolish is designed for genomes assembled by long reads, so it assumes an input genome without gaps (N bases). Therefore, please split your genome assembly by its gaps and then link thems back after polishing if your input contains gaps. Usually we scaffolded a genome using BioNano or Hic data after a polishing step.

Star¶

You can track updates by tab the Star button on the upper-right corner at the github page.