Dear Ole,
I am using GNU parallel to split a huge bam file (>100G) and pipe each block to a perl script that then produces 2 output files for each piped chunk. The aim is to process a huge file in parallel by splitting it and then combine the outputs.
However, often times, I get the following error when I run the job on Harvard computing cluster with slurm scheduler:
parallel: This should not happen. You have found a bug.
Include this in the report:
* The version number: 20180622
* The bugid: swap_activity_file-r
* The command line being run
* The files being read (put the files on a webserver if they are big)
If you get the error on smaller/fewer files, please include those instead.
The command I use is:
samtools view $input | time parallel --resume-failed --no-run-if-empty --memfree 10G --joblog parseGNU.log --tmpdir $tempdir --noswap --pipe --block 1995M perl parse.pl -bam - -outprefix part_{#} -wd "$workdir" -clip "$alignment_score" (here -bam, -outprefix, -wd and -clip are the inputs for my perl script
parse.pl)
The bug is not reproducible. Sometimes a file runs completely fine but other times I get the error. Also, this bug is only reported for larger files. Smaller files such as ones <10G work absolutely fine. I tried increasing --memfree to 20G but still the bug is reported.
Do you know what is going wrong here? Also, will you recommend splitting a bamfile like the way I did and run the job in parallel using pipe?
Thank you very much in advance for your inputs,
Best,Dhawal
--
Dhawal Jain.
Department of Biomedical Informatics
,