parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to debug `parallel` crash?


From: Rob Sargent
Subject: Re: How to debug `parallel` crash?
Date: Sat, 9 Jul 2022 20:18:28 -0600

You’ll need to show some log of something for anyone to really nail this down 

rjs

On Jul 9, 2022, at 7:12 PM, Rob Sargent <robjsargent@gmail.com> wrote:


 

On Jul 9, 2022, at 4:55 PM, Nagle, Michael F <michael.nagle@oregonstate.edu> wrote:


Thanks for your response, Rob. I will do my best to answer your questions. Please let me know if anything is unclear and more info would help. I appreciate your attention to this!

This is a rather powerful Dell workstation running Ubuntu 22.04 LTS, with a 12-core Intel processor and 503GB RAM.

I'm running as a user with admin privileges, but am not using sudo, so as I understand these should not be root processes.

In short, we're running some custom Python code to analyze ~1.3GB hyperspectral images, do some linear algebra and output some plots and arrays describing the biochemical composition in these images. This is benchmarked to take 2-4GB of RAM per image. There is one image per job. By default, parallel​ is running 24 jobs, dual-threading on each of 12 cores... There should be plenty of RAM to run 24 4GB jobs at once. Since this is an embarrassingly parallel computation and we already use bash scripting in this workflow, I prefer to keep it simple and use GNU Parallel rather than Python parallel frameworks... it always worked great in the past.

Here is the script I'm calling from the command line, inside the jobs file described further below: gmodetector_py/analyze_sample.py at master · naglemi/gmodetector_py (github.com)

# This is what we run to execute the .jobs​ file
parallel -a $job_list_name

# I have also tried limiting the number of jobs to 20, which also leads to the same crashing problem after a few runs.
parallel--jobs 20 -a $job_list_name

# Here is how we prepare the .jobs​ file. We produce one job per image, each given its own line in a text file, with options set by a bunch of variables in a Jupyter notebook. Note, I have also confirmed it still crashes if we run outside of Jupyter.
for file in $data/*.hdr
do
 if [[ "$file" != *'hroma'* ]] && [[ "$file" != *'roadband'* ]]; then
  echo "python wrappers/analyze_sample.py \
--file_path $file \
--fluorophores ${fluorophores[*]} \
--min_desired_wavelength ${desired_wavelength_range[0]} \
--max_desired_wavelength ${desired_wavelength_range[1]} \
--red_channel ${FalseColor_channels[0]} \
--green_channel ${FalseColor_channels[1]} \
--blue_channel ${FalseColor_channels[2]} \
--red_cap ${FalseColor_caps[0]} \
--green_cap ${FalseColor_caps[1]} \
--blue_cap ${FalseColor_caps[2]} \
--plot 1 \
--spectral_library_path "$spectral_library_path" \
--output_dir $output_directory_full \
--threshold 38" >> $job_list_name
 fi
done

Thanks again!

We’ll I’m shocked you’ve managed to crash the machine. Not at my desk just now but I think you’ll need Audi to look for ‘panic’ reports in the syslog. 

Have you kept an eye on physical memory?  dmesg?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]