[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Perfromance problem running multiple copies of Octave on a multicore
From: |
Mike Miller |
Subject: |
Re: Perfromance problem running multiple copies of Octave on a multicore processor |
Date: |
Thu, 10 Dec 2015 13:11:43 -0500 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Thu, Dec 10, 2015 at 13:46:53 +0000, Ian McCallion wrote:
> Hi Olaf,
>
> Here is the code. ldd is unix-only so haven't done the commands you
> suggested. However I will send you the win equivalent shortly.
>
> The cmd script is how I switch blas versions.
>
> One important thing I note (which I thought was a testing error and
> ignored earlier), is that the only combination that runs fast is
> Octave 3.8.0 and the blas shipped with it.
>
> I will experiment with different versions of lapack here. Please take
> a quick look at the code to ensure there are no siily errors. You will
> need source chnges for your environment to run it.
For comparison, here is what I get calling your functions with Octave
3.8.2 and 4.0.0 in my Debian environment (using OpenBLAS). I also upped
the count a little to something meaningful for my system.
Without OMP_NUM_THREADS:
>> perf1 (8, 1000, 12, 500, 5000)
4.0.0, , rand(1000,12)*rand(12,500) 5000 times took 14.52
seconds
3.8.2, , rand(1000,12)*rand(12,500) 5000 times took 19.39
seconds
Running 8 processes
>> perf1 (1, 1000, 12, 500, 8*5000)
4.0.0, , rand(1000,12)*rand(12,500) 40000 times took 13.92
seconds
3.8.2, , rand(1000,12)*rand(12,500) 40000 times took 13.98
seconds
Running 1 processes
With OMP_NUM_THREADS set to 1 (disabling multi-processing within
OpenBLAS):
>> perf1 (8, 1000, 12, 500, 5000)
4.0.0, 1.0.0, rand(1000,12)*rand(12,500) 5000 times took 10.16
seconds
3.8.2, 1.0.0, rand(1000,12)*rand(12,500) 5000 times took 13.02
seconds
Running 8 processes
>> perf1 (1, 1000, 12, 500, 8*5000)
4.0.0, 1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.05
seconds
3.8.2, 1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.23
seconds
Running 1 processes
So the observations I make:
• No significant change between 3.8.2 and 4.0.0, 4.0.0 maybe slightly
faster when running multiple jobs
• Make sure to take into account the interaction between running
parallel Octave jobs and the use of OpenMP within OpenBLAS
Are you using the 4.0.0 official binary? Which 3.8.2 binary are you
using?
--
mike
- Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/01
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Olaf Till, 2015/12/01
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/05
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Olaf Till, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Olaf Till, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor,
Mike Miller <=
- Message not available
- Fwd: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/10
- Re: Fwd: Perfromance problem running multiple copies of Octave on a multicore processor, Mike Miller, 2015/12/10
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Ian McCallion, 2015/12/14
- Re: Perfromance problem running multiple copies of Octave on a multicore processor, Mike Miller, 2015/12/14
Re: Perfromance problem running multiple copies of Octave on a multicore processor, Michael Creel, 2015/12/06