[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles a

gnuastro-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles a

From:	Mohammad Akhlaghi
Subject:	[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles are also being removed
Date:	Wed, 22 Jul 2020 21:05:53 -0400 (EDT)
branch: master
commit 3d72b24d0694919457c6d0adc03b0c95c3f03edf
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    NoiseChisel: negative outlier tiles are also being removed
    
    Until now, when removing outlier tiles, NoiseChisel would only remove
    positive outliers. In standard/basic datasets this is fine (skewness caused
    by data is always positive). But in some modified datasets (for example
    when models have been subtracted), it is possible to have negative outliers
    which can cause problems.
    
    With this commit, the core library function doing this (which is now named
    'gal_statistics_outlier_bydistance') will also apply its algorithm in the
    negative direction and tiles with both positive and negative outliers are
    found and discarded.
    
    This bug was found by Joanna Sakowska.
---
 NEWS                         |   5 ++
 THANKS                       |   3 +-
 doc/announce-acknowledge.txt |   1 +
 doc/gnuastro.texi            |  70 +++++++++-------------
 lib/gnuastro/statistics.h    |   7 ++-
 lib/statistics.c             |  18 +++---
 lib/tile-internal.c          | 139 +++++++++++++++++++++++++++++++++----------
 7 files changed, 159 insertions(+), 84 deletions(-)

diff --git a/NEWS b/NEWS
index 832c81a..0a89eba 100644
--- a/NEWS
+++ b/NEWS
@@ -75,12 +75,17 @@ See the end of the file for license conditions.
    - gal_type_string_to_number: Numbers ending in '.' or '.0' will be
      parsed as floating point. Until now, it would only parse numbers as
      floating point if they had non-zero decimals.
+   - gal_statistics_outlier_bydistance: new name for the old
+     'gal_statistics_outlier_positive'. It can now use the same algorithm
+     for negative outliers with a new argument.
 
 ** Bugs fixed
   bug #58434: MakeCatalog crash when ordering is required and no usable pixels.
   bug #58455: Timezone is not portable and uses flag instead of seconds.
   bug #58696: Warp with --centeroncorner --scale making wrong size.
   bug #58774: Warp' s output on a cube is a 2D image or wrong size.
+  bug #58809: NoiseChisel not removing negative outlier tiles.
+
 
 
 
diff --git a/THANKS b/THANKS
index 4041a45..5cdbccf 100644
--- a/THANKS
+++ b/THANKS
@@ -70,6 +70,7 @@ support in Gnuastro. The list is ordered alphabetically (by 
family name).
     Bob Proulx                           bob@proulx.com
     Joseph Putko                         josephputko@gmail.com
     Teymoor Saifollahi                   teymur.saif@gmail.com
+    Joanna Sakowska                      js01093@surrey.ac.uk
     Elham Saremi                         saremi@ipm.ir
     Yahya Sefidbakht                     y.sefidbakht@gmail.com
     Alejandro Serrano Borlaff            asborlaff@ucm.es
@@ -122,4 +123,4 @@ Copyright (C) 2015-2020, Free Software Foundation, Inc.
 Permission is granted to copy, distribute and/or modify this document under
 the terms of the GNU Free Documentation License, Version 1.3 or any later
 version published by the Free Software Foundation; with no Invariant
-Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
\ No newline at end of file
+Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
diff --git a/doc/announce-acknowledge.txt b/doc/announce-acknowledge.txt
index 489ac99..c290d42 100644
--- a/doc/announce-acknowledge.txt
+++ b/doc/announce-acknowledge.txt
@@ -5,6 +5,7 @@ Carlos Allende Prieto
 Leindert Boogaard
 Alexey Dokuchaev
 Raúl Infante Sainz
+Joanna Sakowska
 Zahra Sharbaf
 Ole Streicher
 
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index ae0dd67..10d4d3b 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -24288,52 +24288,38 @@ above.
 @end deftypefun
 
 
-@deftypefun {gal_data_t *} gal_statistics_outlier_positive (gal_data_t 
@code{*input}, size_t @code{window_size}, float @code{sigma}, float 
@code{sigclip_multip}, float @code{sigclip_param}, int @code{inplace}, int 
@code{quiet})
-Find the first positive outlier in the @code{input} distribution. The
-returned dataset contains a single element: the first positive outlier. It
-is one of the dataset's elements, in the same type as the input. If the
-process fails for any reason (for example no outlier was found), a
-@code{NULL} pointer will be returned.
-
-All (possibly existing) blank elements are first removed from the input
-dataset, then it is sorted. A sliding window of @code{window_size} elements
-is parsed over the dataset. Starting from the @code{window_size}-th element
-of the dataset, in the direction of increasing values. This window is used
-as a reference. The first element where the distance to the previous
-(sorted) element is @code{sigma} units away from the distribution of
-distances in its window is considered an outlier and returned by this
-function.
+@deftypefun {gal_data_t *} gal_statistics_outlier_bydistance (int 
@code{pos1_neg0}, gal_data_t @code{*input}, size_t @code{window_size}, float 
@code{sigma}, float @code{sigclip_multip}, float @code{sigclip_param}, int 
@code{inplace}, int @code{quiet})
+
+Find the first positive outlier (if @code{pos1_neg0!=0}) in the @code{input} 
distribution.
+When @code{pos1_neg0==0}, the same algorithm goes to the start of the dataset.
+The returned dataset contains a single element: the first positive outlier.
+It is one of the dataset's elements, in the same type as the input.
+If the process fails for any reason (for example no outlier was found), a 
@code{NULL} pointer will be returned.
+
+All (possibly existing) blank elements are first removed from the input 
dataset, then it is sorted.
+A sliding window of @code{window_size} elements is parsed over the dataset.
+Starting from the @code{window_size}-th element of the dataset, in the 
direction of increasing values.
+This window is used as a reference.
+The first element where the distance to the previous (sorted) element is 
@code{sigma} units away from the distribution of distances in its window is 
considered an outlier and returned by this function.
 
-Formally, if we assume there are @mymath{N} non-blank elements. They are
-first sorted. Searching for the outlier starts on element @mymath{W}. Let's
-take @mymath{v_i} to be the @mymath{i}-th element of the sorted input (with
-no blank values) and @mymath{m} and @mymath{\sigma} as the
-@mymath{\sigma}-clipped median and standard deviation from the distances of
-the previous @mymath{W} elements (not including @mymath{v_i}). If the value
-given to @code{sigma} is displayed with @mymath{s}, the @mymath{i}-th
-element is considered as an outlier when the condition below is true.
+Formally, if we assume there are @mymath{N} non-blank elements.
+They are first sorted.
+Searching for the outlier starts on element @mymath{W}.
+Let's take @mymath{v_i} to be the @mymath{i}-th element of the sorted input 
(with no blank values) and @mymath{m} and @mymath{\sigma} as the 
@mymath{\sigma}-clipped median and standard deviation from the distances of the 
previous @mymath{W} elements (not including @mymath{v_i}).
+If the value given to @code{sigma} is displayed with @mymath{s}, the 
@mymath{i}-th element is considered as an outlier when the condition below is 
true.
 
 @dispmath{{(v_i-v_{i-1})-m\over \sigma}>s}
 
-The @code{sigclip_multip} and @code{sigclip_param} arguments specify the
-properties of the @mymath{\sigma}-clipping (see @ref{Sigma clipping} for
-more). You see that by this definition, the outlier cannot be any of the
-lower half elements. The advantage of this algorithm compared to
-@mymath{\sigma}-clippign is that it only looks backwards (in the sorted
-array) and parses it in one direction.
-
-If @code{inplace!=0}, the removing of blank elements and sorting will be
-done within the input dataset's allocated space. Otherwise, this function
-will internally allocate (and later free) the necessary space to keep the
-intermediate space that this process requires.
-
-If @code{quiet!=0}, this function will report the parameters every time it
-moves the window as a separate line with several columns. The first column
-is the value, the second (in square brackets) is the sorted index, the
-third is the distance of this element from the previous one. The Fourth and
-fifth (in parenthesis) are the median and standard deviation of the
-@mymath{\sigma}-clipped distribution within the window and the last column
-is the difference between the third and fourth, divided by the fifth.
+The @code{sigclip_multip} and @code{sigclip_param} arguments specify the 
properties of the @mymath{\sigma}-clipping (see @ref{Sigma clipping} for more).
+You see that by this definition, the outlier cannot be any of the lower half 
elements.
+The advantage of this algorithm compared to @mymath{\sigma}-clippign is that 
it only looks backwards (in the sorted array) and parses it in one direction.
+
+If @code{inplace!=0}, the removing of blank elements and sorting will be done 
within the input dataset's allocated space.
+Otherwise, this function will internally allocate (and later free) the 
necessary space to keep the intermediate space that this process requires.
+
+If @code{quiet!=0}, this function will report the parameters every time it 
moves the window as a separate line with several columns.
+The first column is the value, the second (in square brackets) is the sorted 
index, the third is the distance of this element from the previous one.
+The Fourth and fifth (in parenthesis) are the median and standard deviation of 
the @mymath{\sigma}-clipped distribution within the window and the last column 
is the difference between the third and fourth, divided by the fifth.
 
 @end deftypefun
 
diff --git a/lib/gnuastro/statistics.h b/lib/gnuastro/statistics.h
index 6700da1..2b4ead5 100644
--- a/lib/gnuastro/statistics.h
+++ b/lib/gnuastro/statistics.h
@@ -177,9 +177,10 @@ gal_statistics_sigma_clip(gal_data_t *input, float multip, 
float param,
                           int inplace, int quiet);
 
 gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
-                                float sigma, float sigclip_multip,
-                                float sigclip_param, int inplace, int quiet);
+gal_statistics_outlier_bydistance(int pos1_neg0, gal_data_t *input,
+                                  size_t window_size, float sigma,
+                                  float sigclip_multip, float sigclip_param,
+                                  int inplace, int quiet);
 
 gal_data_t *
 gal_statistics_outlier_flat_cfp(gal_data_t *input, size_t numprev,
diff --git a/lib/statistics.c b/lib/statistics.c
index 28b8727..c151674 100644
--- a/lib/statistics.c
+++ b/lib/statistics.c
@@ -2235,11 +2235,15 @@ gal_statistics_sigma_clip(gal_data_t *input, float 
multip, float param,
 /* Find the first outlier in a distribution. */
 #define OUTLIER_BYTYPE(IT) {                                            \
     IT *arr=nbs->array;                                                 \
-    for(i=window_size;i<nbs->size;++i)                                  \
+    for(i=window_size; i<nbs->size && i!=0; pos1_neg0 ? ++i : --i)      \
       {                                                                 \
         /* Fill in the distance array. */                               \
-        for(j=0; j<wtakeone; ++j)                                       \
-          darr[j] = arr[i-window_size+j+1] - arr[i-window_size+j];      \
+        if(pos1_neg0)                                                   \
+          for(j=0; j<wtakeone; ++j)                                     \
+            darr[j] = arr[i-window_size+j+1] - arr[i-window_size+j];    \
+        else                                                            \
+          for(j=0; j<wtakeone; ++j)                                     \
+            darr[j] = arr[i+window_size-j+1] - arr[i+window_size-j];    \
                                                                         \
         /* Get the sigma-clipped information. */                        \
         sclip=gal_statistics_sigma_clip(dist, sigclip_multip,           \
@@ -2271,9 +2275,10 @@ gal_statistics_sigma_clip(gal_data_t *input, float 
multip, float param,
       }                                                                 \
   }
 gal_data_t *
-gal_statistics_outlier_positive(gal_data_t *input, size_t window_size,
-                                float sigma, float sigclip_multip,
-                                float sigclip_param, int inplace, int quiet)
+gal_statistics_outlier_bydistance(int pos1_neg0, gal_data_t *input,
+                                  size_t window_size, float sigma,
+                                  float sigclip_multip, float sigclip_param,
+                                  int inplace, int quiet)
 {
   float *sarr;
   double *darr;
@@ -2300,7 +2305,6 @@ gal_statistics_outlier_positive(gal_data_t *input, size_t 
window_size,
         }
       */
 
-
       /* Allocate space to keep the distances. */
       wtakeone=window_size-1;
       dist=gal_data_alloc(NULL, GAL_TYPE_FLOAT64, 1, &wtakeone, NULL,
diff --git a/lib/tile-internal.c b/lib/tile-internal.c
index 2bc9e7e..7965aa8 100644
--- a/lib/tile-internal.c
+++ b/lib/tile-internal.c
@@ -44,11 +44,11 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
                              size_t tottilesinch, double *outliersclip,
                              float outliersigma)
 {
-  gal_data_t *outlier, *nbs;
   size_t i, osize=first->size;
   size_t start=tottilesinch*channelid;
   float *oa1=NULL, *oa2=NULL, *oa3=NULL;
-  float o, *arr1=NULL, *arr2=NULL, *arr3=NULL;
+  gal_data_t *nbs, *outlier_p, *outlier_n;
+  float o_p, o_n, *arr1=NULL, *arr2=NULL, *arr3=NULL;
 
   /* A small sanity check. */
   if(first->type!=GAL_TYPE_FLOAT32)
@@ -83,56 +83,133 @@ tileinternal_no_outlier_work(gal_data_t *first, gal_data_t 
*second,
      first array. */
   arr1=first->array;
   nbs=gal_statistics_no_blank_sorted(first, 0);
-  outlier=gal_statistics_outlier_positive(nbs, nbs->size/2, outliersigma,
-                                          outliersclip[0], outliersclip[1],
-                                          0, 1);
+  outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size/2, 
outliersigma,
+                                              outliersclip[0], outliersclip[1],
+                                              1, 1);
+  outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size/2, 
outliersigma,
+                                              outliersclip[0], outliersclip[1],
+                                              1, 1);
+  /* For a check.
+  {
+    float *med;
+    gal_data_t *median=gal_statistics_median(nbs, 1);
+    float *out_n=outlier_n->array, *out_p=outlier_p->array;
+    med=median->array;
+    printf("vals: %f (out_n), %f (med), %f (out_p)\n",
+           out_n[0], med[0], out_p[0]);
+    exit(0);
+  }
+  */
+
+  /* Clean up the temporary 'nbs' array. */
   gal_data_free(nbs);
-  if(outlier)
+
+  /* If outliers exist, then implement them. */
+  if(outlier_p)
     {
-      o = *((float *)(outlier->array));
-      for(i=0;i<first->size;++i)
-        /* Just note that we have blank (NaN) values, so to avoid doing a
-           NaN check with 'isnan', we will check if the value is below the
-           quantile, if it succeeds (isn't NaN and is below the quantile),
-           then we'll put it's actual value, otherwise, a NaN. */
-        arr1[i] = arr1[i]<=o ? arr1[i] : NAN;
-      gal_data_free(outlier);
+      o_p = *((float *)(outlier_p->array));
+      gal_data_free(outlier_p);
+      if(outlier_n)
+        {
+          /* For easy reading, put the negative outlier into 'o_n'. */
+          o_n = *((float *)(outlier_n->array));
+          gal_data_free(outlier_n);
+
+          /* See description below, it just includes a negative outlier. */
+          for(i=0;i<first->size;++i)
+            arr1[i] = arr1[i]<o_p ? (arr1[i]>o_n ? arr1[i] : NAN) : NAN;
+        }
+      else
+          /* Just note that we have blank (NaN) values, so to avoid doing a
+             NaN check with 'isnan', we will check if the value is below
+             the quantile, if it succeeds (isn't NaN and is below the
+             quantile), then we'll put it's actual value, otherwise, a
+             NaN. */
+        for(i=0;i<first->size;++i)
+          arr1[i] = arr1[i]<o_p ? arr1[i] : NAN;
     }
+  else
+    if(outlier_n)
+      {
+        o_n = *((float *)(outlier_n->array));
+        for(i=0;i<first->size;++i)
+          arr1[i] = arr1[i]>o_n ? arr1[i] : NAN;
+        gal_data_free(outlier_n);
+      }
 
   /* Second quantile threshold. We are finding the outliers independently
      on each dataset to later remove any tile that is blank in atleast one
      of them. */
   arr2=second->array;
   nbs=gal_statistics_no_blank_sorted(second, 0);
-  outlier=gal_statistics_outlier_positive(nbs, nbs->size, outliersigma,
-                                          outliersclip[0], outliersclip[1],
-                                          0, 1);
+  outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size, outliersigma,
+                                              outliersclip[0], outliersclip[1],
+                                              1, 1);
+  outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size, outliersigma,
+                                              outliersclip[0], outliersclip[1],
+                                              1, 1);
   gal_data_free(nbs);
-  if(outlier)
+  if(outlier_p)
     {
-      o = *((float *)(outlier->array));
-      for(i=0;i<second->size;++i)
-        arr2[i] = arr2[i]<=o ? arr2[i] : NAN;
-      gal_data_free(outlier);
+      o_p = *((float *)(outlier_p->array));
+      gal_data_free(outlier_p);
+      if(outlier_n)
+        {
+          o_n = *((float *)(outlier_n->array));
+          for(i=0;i<first->size;++i)
+            arr2[i] = arr2[i]<o_p ? (arr2[i]>o_n ? arr2[i] : NAN) : NAN;
+          gal_data_free(outlier_n);
+        }
+      else
+        for(i=0;i<first->size;++i)
+          arr2[i] = arr2[i]<o_p ? arr2[i] : NAN;
     }
+  else
+    if(outlier_n)
+      {
+        o_n = *((float *)(outlier_n->array));
+        for(i=0;i<first->size;++i)
+          arr2[i] = arr2[i]>o_n ? arr2[i] : NAN;
+        gal_data_free(outlier_n);
+      }
 
   /* The third (if it exists). */
   if(third)
     {
       arr3=third->array;
       nbs=gal_statistics_no_blank_sorted(third, 0);
-      outlier=gal_statistics_outlier_positive(nbs, nbs->size/2,
-                                              outliersigma,
-                                              outliersclip[0],
-                                              outliersclip[1], 0, 1);
+      outlier_p=gal_statistics_outlier_bydistance(1, nbs, nbs->size/2,
+                                                  outliersigma,
+                                                  outliersclip[0],
+                                                  outliersclip[1], 1, 1);
+      outlier_n=gal_statistics_outlier_bydistance(0, nbs, nbs->size/2,
+                                                  outliersigma,
+                                                  outliersclip[0],
+                                                  outliersclip[1], 1, 1);
       gal_data_free(nbs);
-      if(outlier)
+      if(outlier_p)
         {
-          o = *((float *)(outlier->array));
-          for(i=0;i<third->size;++i)
-            arr3[i] = arr3[i]<=o ? arr3[i] : NAN;
-          gal_data_free(outlier);
+          o_p = *((float *)(outlier_p->array));
+          gal_data_free(outlier_p);
+          if(outlier_n)
+            {
+              o_n = *((float *)(outlier_n->array));
+              for(i=0;i<first->size;++i)
+                arr3[i] = arr3[i]<o_p ? (arr3[i]>o_n ? arr3[i] : NAN) : NAN;
+              gal_data_free(outlier_n);
+            }
+          else
+            for(i=0;i<first->size;++i)
+              arr3[i] = arr3[i]<o_p ? arr3[i] : NAN;
         }
+      else
+        if(outlier_n)
+          {
+            o_n = *((float *)(outlier_n->array));
+            for(i=0;i<first->size;++i)
+              arr3[i] = arr3[i]>o_n ? arr3[i] : NAN;
+            gal_data_free(outlier_n);
+          }
     }
 
   /* Make sure all three have the same NaN pixels. */
[Prev in Thread]
Current Thread
[Next in Thread]
[gnuastro-commits] master 3d72b24: NoiseChisel: negative outlier tiles are also being removed, Mohammad Akhlaghi <=
Prev by Date: [gnuastro-commits] master 68f645f: Table: --metaupdate, new option to update column meta data
Next by Date: [gnuastro-commits] master cd2fca2: Book: better explanation of pre-convolved stars in first tutorial
Previous by thread: [gnuastro-commits] master 68f645f: Table: --metaupdate, new option to update column meta data
Next by thread: [gnuastro-commits] master cd2fca2: Book: better explanation of pre-convolved stars in first tutorial
Index(es):
- Date
- Thread