[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles a
From: |
Mohammad Akhlaghi |
Subject: |
[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles are also being removed |
Date: |
Wed, 22 Jul 2020 21:05:53 -0400 (EDT) |
branch: master
commit 3d72b24d0694919457c6d0adc03b0c95c3f03edf
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>
NoiseChisel: negative outlier tiles are also being removed
Until now, when removing outlier tiles, NoiseChisel would only remove
positive outliers. In standard/basic datasets this is fine (skewness caused
by data is always positive). But in some modified datasets (for example
when models have been subtracted), it is possible to have negative outliers
which can cause problems.
With this commit, the core library function doing this (which is now named
'gal_statistics_outlier_bydistance') will also apply its algorithm in the
negative direction and tiles with both positive and negative outliers are
found and discarded.
This bug was found by Joanna Sakowska.
---
NEWS | 5 ++
THANKS | 3 +-
doc/announce-acknowledge.txt | 1 +
doc/gnuastro.texi | 70 +++++++++-------------
lib/gnuastro/statistics.h | 7 ++-
lib/statistics.c | 18 +++---
lib/tile-internal.c | 139 +++++++++++++++++++++++++++++++++----------
7 files changed, 159 insertions(+), 84 deletions(-)
diff --git a/NEWS b/NEWS
index 832c81a..0a89eba 100644
--- a/NEWS
+++ b/NEWS
@@ -75,12 +75,17 @@ See the end of the file for license conditions.
- gal_type_string_to_number: Numbers ending in '.' or '.0' will be
parsed as floating point. Until now, it would only parse numbers as
floating point if they had non-zero decimals.
+ - gal_statistics_outlier_bydistance: new name for the old
+ 'gal_statistics_outlier_positive'. It can now use the same algorithm
+ for negative outliers with a new argument.
** Bugs fixed
bug #58434: MakeCatalog crash when ordering is required and no usable pixels.
bug #58455: Timezone is not portable and uses flag instead of seconds.
bug #58696: Warp with --centeroncorner --scale making wrong size.
bug #58774: Warp' s output on a cube is a 2D image or wrong size.
+ bug #58809: NoiseChisel not removing negative outlier tiles.
+
diff --git a/THANKS b/THANKS
index 4041a45..5cdbccf 100644
--- a/THANKS
+++ b/THANKS
@@ -70,6 +70,7 @@ support in Gnuastro. The list is ordered alphabetically (by
family name).
Bob Proulx bob@proulx.com
Joseph Putko josephputko@gmail.com
Teymoor Saifollahi teymur.saif@gmail.com
+ Joanna Sakowska js01093@surrey.ac.uk
Elham Saremi saremi@ipm.ir
Yahya Sefidbakht y.sefidbakht@gmail.com
Alejandro Serrano Borlaff asborlaff@ucm.es
@@ -122,4 +123,4 @@ Copyright (C) 2015-2020, Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.3 or any later
version published by the Free Software Foundation; with no Invariant
-Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
\ No newline at end of file
+Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
diff --git a/doc/announce-acknowledge.txt b/doc/announce-acknowledge.txt
index 489ac99..c290d42 100644
--- a/doc/announce-acknowledge.txt
+++ b/doc/announce-acknowledge.txt
@@ -5,6 +5,7 @@ Carlos Allende Prieto
Leindert Boogaard
Alexey Dokuchaev
Raúl Infante Sainz
+Joanna Sakowska
Zahra Sharbaf
Ole Streicher
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index ae0dd67..10d4d3b 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -24288,52 +24288,38 @@ above.
@end deftypefun
-@deftypefun {gal_data_t *} gal_statistics_outlier_positive (gal_data_t
@code{*input}, size_t @code{window_size}, float @code{sigma}, float
@code{sigclip_multip}, float @code{sigclip_param}, int @code{inplace}, int
@code{quiet})
-Find the first positive outlier in the @code{input} distribution. The
-returned dataset contains a single element: the first positive outlier. It
-is one of the dataset's elements, in the same type as the input. If the
-process fails for any reason (for example no outlier was found), a
-@code{NULL} pointer will be returned.
-
-All (possibly existing) blank elements are first removed from the input
-dataset, then it is sorted. A sliding window of @code{window_size} elements
-is parsed over the dataset. Starting from the @code{window_size}-th element
-of the dataset, in the direction of increasing values. This window is used
-as a reference. The first element where the distance to the previous
-(sorted) element is @code{sigma} units away from the distribution of
-distances in its window is considered an outlier and returned by this
-function.
+@deftypefun {gal_data_t *} gal_statistics_outlier_bydistance (int
@code{pos1_neg0}, gal_data_t @code{*input}, size_t @code{window_size}, float
@code{sigma}, float @code{sigclip_multip}, float @code{sigclip_param}, int
@code{inplace}, int @code{quiet})
+
+Find the first positive outlier (if @code{pos1_neg0!=0}) in the @code{input}
distribution.
+When @code{pos1_neg0==0}, the same algorithm goes to the start of the dataset.
+The returned dataset contains a single element: the first positive outlier.
+It is one of the dataset's elements, in the same type as the input.
+If the process fails for any reason (for example no outlier was found), a
@code{NULL} pointer will be returned.
+
+All (possibly existing) blank elements are first removed from the input
dataset, then it is sorted.
+A sliding window of @code{window_size} elements is parsed over the dataset.
+Starting from the @code{window_size}-th element of the dataset, in the
direction of increasing values.
+This window is used as a reference.
+The first element where the distance to the previous (sorted) element is
@code{sigma} units away from the distribution of distances in its window is
considered an outlier and returned by this function.
-Formally, if we assume there are @mymath{N} non-blank elements. They are
-first sorted. Searching for the outlier starts on element @mymath{W}. Let's
-take @mymath{v_i} to be the @mymath{i}-th element of the sorted input (with
-no blank values) and @mymath{m} and @mymath{\sigma} as the
-@mymath{\sigma}-clipped median and standard deviation from the distances of
-the previous @mymath{W} elements (not including @mymath{v_i}). If the value
-given to @code{sigma} is displayed with @mymath{s}, the @mymath{i}-th
-element is considered as an outlier when the condition below is true.
+Formally, if we assume there are @mymath{N} non-blank elements.
+They are first sorted.
+Searching for the outlier starts on element @mymath{W}.
+Let's take @mymath{v_i} to be the @mymath{i}-th element of the sorted input
(with no blank values) and @mymath{m} and @mymath{\sigma} as the
@mymath{\sigma}-clipped median and standard deviation from the distances of the
previous @mymath{W} elements (not including @mymath{v_i}).
+If the value given to @code{sigma} is displayed with @mymath{s}, the
@mymath{i}-th element is considered as an outlier when the condition below is
true.
@dispmath{{(v_i-v_{i-1})-m\over \sigma}>s}
-The @code{sigclip_multip} and @code{sigclip_param} arguments specify the
-properties of the @mymath{\sigma}-clipping (see @ref{Sigma clipping} for
-more). You see that by this definition, the outlier cannot be any of the
-lower half elements. The advantage of this algorithm compared to
-@mymath{\sigma}-clippign is that it only looks backwards (in the sorted
-array) and parses it in one direction.
-
-If @code{inplace!=0}, the removing of blank elements and sorting will be
-done within the input dataset's allocated space. Otherwise, this function
-will internally allocate (and later free) the necessary space to keep the
-intermediate space that this process requires.
-
-If @code{quiet!=0}, this function will report the parameters every time it
-moves the window as a separate line with several columns. The first column
-is the value, the second (in square brackets) is the sorted index, the
-third is the distance of this element from the previous one. The Fourth and
-fifth (in parenthesis) are the median and standard deviation of the
-@mymath{\sigma}-clipped distribution within the window and the last column
-is the difference between the third and fourth, divided by the fifth.
+The @code{sigclip_multip} and @code{sigclip_param} arguments specify the
properties of the @mymath{\sigma}-clipping (see @ref{Sigma clipping} for more).
+You see that by this definition, the outlier cannot be any of the lower half
elements.
+The advantage of this algorithm compared to @mymath{\sigma}-clippign is that
it only looks backwards (in the sorted array) and parses it in one direction.
+
+If @code{inplace!=0}, the removing of blank elements and sorting will be done
within the input dataset's allocated space.
+Otherwise, this function will internally allocate (and later free) the
necessary space to keep the intermediate space that this process requires.
+
+If @code{quiet!=0}, this function will report the parameters every time it
moves the window as a separate line with several columns.
+The first column is the value, the second (in square brackets) is the sorted
index, the third is the distance of this element from the previous one.
+The Fourth and fifth (in parenthesis) are the median and standard deviation of
the @mymath{\sigma}-clipped distribution within the window and the last column
is the difference between the third and fourth, divided by the fifth.
@end deftypefun
diff --git a/lib/gnuastro/statistics.h b/lib/gnuastro/statistics.h
index 6700da1..2b4ead5 100644
--- a/lib/gnuastro/statistics.h
+++ b/lib/gnuastro/statistics.h
@@ -177,9 +177,10 @@ gal_statistics_sigma_clip(gal_data_t *input, float multip,
float param,
int inplace, int quiet);
gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
- float sigma, float sigclip_multip,
- float sigclip_param, int inplace, int quiet);
+gal_statistics_outlier_bydistance(int pos1_neg0, gal_data_t *input,
+ size_t window_size, float sigma,
+ float sigclip_multip, float sigclip_param,
+ int inplace, int quiet);
gal_data_t *
gal_statistics_outlier_flat_cfp(gal_data_t *input, size_t numprev,
diff --git a/lib/statistics.c b/lib/statistics.c
index 28b8727..c151674 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -2235,11 +2235,15 @@ gal_statistics_sigma_clip(gal_data_t *input, float
multip, float param,
/* Find the first outlier in a distribution. */
#define OUTLIER_BYTYPE(IT) { \
IT *arr=nbs->array; \
- for(i=window_size;i<nbs->size;++i) \
+ for(i=window_size; i<nbs->size && i!=0; pos1_neg0 ? ++i : --i) \
{ \
/* Fill in the distance array. */ \
- for(j=0; j<wtakeone; ++j) \
- darr[j] = arr[i-window_size+j+1] - arr[i-window_size+j]; \
+ if(pos1_neg0) \
+ for(j=0; j<wtakeone; ++j) \
+ darr[j] = arr[i-window_size+j+1] - arr[i-window_size+j]; \
+ else \
+ for(j=0; j<wtakeone; ++j) \
+ darr[j] = arr[i+window_size-j+1] - arr[i+window_size-j]; \
\
/* Get the sigma-clipped information. */ \
sclip=gal_statistics_sigma_clip(dist, sigclip_multip, \
@@ -2271,9 +2275,10 @@ gal_statistics_sigma_clip(gal_data_t *input, float
multip, float param,
} \
}
gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
- float sigma, float sigclip_multip,
- float sigclip_param, int inplace, int quiet)
+gal_statistics_outlier_bydistance(int pos1_neg0, gal_data_t *input,
+ size_t window_size, float sigma,
+ float sigclip_multip, float sigclip_param,
+ int inplace, int quiet)
{
float *sarr;
double *darr;
@@ -2300,7 +2305,6 @@ gal_statistics_outlier_positive(gal_data_t *input, size_t
window_size,
}
*/
-
/* Allocate space to keep the distances. */
wtakeone=window_size-1;
dist=gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &wtakeone, NULL,
diff --git a/lib/tile-internal.c b/lib/tile-internal.c
index 2bc9e7e..7965aa8 100644
--- a/lib/tile-internal.c
+++ b/lib/tile-internal.c
@@ -44,11 +44,11 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t
*second,
size_t tottilesinch, double *outliersclip,
float outliersigma)
{
- gal_data_t *outlier, *nbs;
size_t i, osize=first->size;
size_t start=tottilesinch*channelid;
float *oa1=NULL, *oa2=NULL, *oa3=NULL;
- float o, *arr1=NULL, *arr2=NULL, *arr3=NULL;
+ gal_data_t *nbs, *outlier_p, *outlier_n;
+ float o_p, o_n, *arr1=NULL, *arr2=NULL, *arr3=NULL;
/* A small sanity check. */
if(first->type!=GAL_TYPE_FLOAT32)
@@ -83,56 +83,133 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t
*second,
first array. */
arr1=first->array;
nbs=gal_statistics_no_blank_sorted(first, 0);
- outlier=gal_statistics_outlier_positive(nbs, nbs->size/2, outliersigma,
- outliersclip[0], outliersclip[1],
- 0, 1);
+ outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size/2,
outliersigma,
+ outliersclip[0], outliersclip[1],
+ 1, 1);
+ outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size/2,
outliersigma,
+ outliersclip[0], outliersclip[1],
+ 1, 1);
+ /* For a check.
+ {
+ float *med;
+ gal_data_t *median=gal_statistics_median(nbs, 1);
+ float *out_n=outlier_n->array, *out_p=outlier_p->array;
+ med=median->array;
+ printf("vals: %f (out_n), %f (med), %f (out_p)\n",
+ out_n[0], med[0], out_p[0]);
+ exit(0);
+ }
+ */
+
+ /* Clean up the temporary 'nbs' array. */
gal_data_free(nbs);
- if(outlier)
+
+ /* If outliers exist, then implement them. */
+ if(outlier_p)
{
- o = *((float *)(outlier->array));
- for(i=0;i<first->size;++i)
- /* Just note that we have blank (NaN) values, so to avoid doing a
- NaN check with 'isnan', we will check if the value is below the
- quantile, if it succeeds (isn't NaN and is below the quantile),
- then we'll put it's actual value, otherwise, a NaN. */
- arr1[i] = arr1[i]<=o ? arr1[i] : NAN;
- gal_data_free(outlier);
+ o_p = *((float *)(outlier_p->array));
+ gal_data_free(outlier_p);
+ if(outlier_n)
+ {
+ /* For easy reading, put the negative outlier into 'o_n'. */
+ o_n = *((float *)(outlier_n->array));
+ gal_data_free(outlier_n);
+
+ /* See description below, it just includes a negative outlier. */
+ for(i=0;i<first->size;++i)
+ arr1[i] = arr1[i]<o_p ? (arr1[i]>o_n ? arr1[i] : NAN) : NAN;
+ }
+ else
+ /* Just note that we have blank (NaN) values, so to avoid doing a
+ NaN check with 'isnan', we will check if the value is below
+ the quantile, if it succeeds (isn't NaN and is below the
+ quantile), then we'll put it's actual value, otherwise, a
+ NaN. */
+ for(i=0;i<first->size;++i)
+ arr1[i] = arr1[i]<o_p ? arr1[i] : NAN;
}
+ else
+ if(outlier_n)
+ {
+ o_n = *((float *)(outlier_n->array));
+ for(i=0;i<first->size;++i)
+ arr1[i] = arr1[i]>o_n ? arr1[i] : NAN;
+ gal_data_free(outlier_n);
+ }
/* Second quantile threshold. We are finding the outliers independently
on each dataset to later remove any tile that is blank in atleast one
of them. */
arr2=second->array;
nbs=gal_statistics_no_blank_sorted(second, 0);
- outlier=gal_statistics_outlier_positive(nbs, nbs->size, outliersigma,
- outliersclip[0], outliersclip[1],
- 0, 1);
+ outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size, outliersigma,
+ outliersclip[0], outliersclip[1],
+ 1, 1);
+ outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size, outliersigma,
+ outliersclip[0], outliersclip[1],
+ 1, 1);
gal_data_free(nbs);
- if(outlier)
+ if(outlier_p)
{
- o = *((float *)(outlier->array));
- for(i=0;i<second->size;++i)
- arr2[i] = arr2[i]<=o ? arr2[i] : NAN;
- gal_data_free(outlier);
+ o_p = *((float *)(outlier_p->array));
+ gal_data_free(outlier_p);
+ if(outlier_n)
+ {
+ o_n = *((float *)(outlier_n->array));
+ for(i=0;i<first->size;++i)
+ arr2[i] = arr2[i]<o_p ? (arr2[i]>o_n ? arr2[i] : NAN) : NAN;
+ gal_data_free(outlier_n);
+ }
+ else
+ for(i=0;i<first->size;++i)
+ arr2[i] = arr2[i]<o_p ? arr2[i] : NAN;
}
+ else
+ if(outlier_n)
+ {
+ o_n = *((float *)(outlier_n->array));
+ for(i=0;i<first->size;++i)
+ arr2[i] = arr2[i]>o_n ? arr2[i] : NAN;
+ gal_data_free(outlier_n);
+ }
/* The third (if it exists). */
if(third)
{
arr3=third->array;
nbs=gal_statistics_no_blank_sorted(third, 0);
- outlier=gal_statistics_outlier_positive(nbs, nbs->size/2,
- outliersigma,
- outliersclip[0],
- outliersclip[1], 0, 1);
+ outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size/2,
+ outliersigma,
+ outliersclip[0],
+ outliersclip[1], 1, 1);
+ outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size/2,
+ outliersigma,
+ outliersclip[0],
+ outliersclip[1], 1, 1);
gal_data_free(nbs);
- if(outlier)
+ if(outlier_p)
{
- o = *((float *)(outlier->array));
- for(i=0;i<third->size;++i)
- arr3[i] = arr3[i]<=o ? arr3[i] : NAN;
- gal_data_free(outlier);
+ o_p = *((float *)(outlier_p->array));
+ gal_data_free(outlier_p);
+ if(outlier_n)
+ {
+ o_n = *((float *)(outlier_n->array));
+ for(i=0;i<first->size;++i)
+ arr3[i] = arr3[i]<o_p ? (arr3[i]>o_n ? arr3[i] : NAN) : NAN;
+ gal_data_free(outlier_n);
+ }
+ else
+ for(i=0;i<first->size;++i)
+ arr3[i] = arr3[i]<o_p ? arr3[i] : NAN;
}
+ else
+ if(outlier_n)
+ {
+ o_n = *((float *)(outlier_n->array));
+ for(i=0;i<first->size;++i)
+ arr3[i] = arr3[i]>o_n ? arr3[i] : NAN;
+ gal_data_free(outlier_n);
+ }
}
/* Make sure all three have the same NaN pixels. */
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles are also being removed,
Mohammad Akhlaghi <=