bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1)


From: jeff.liu
Subject: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1) v2
Date: Sat, 17 Apr 2010 20:38:12 +0800
User-agent: Thunderbird 2.0.0.14 (X11/20080505)

Hi Jim,

Thank you so much for your so quick response!

Jim Meyering wrote:
> jeff.liu wrote:
>> Hello All,
>>
>> This is the revised version.
> 
> Thanks for persevering and for the examples.
> 
> Would you please maintain a script that automates some of the
> commands you've been running, so that we can ensure they
> continue to work as advertised?  I keep hoping you'll contribute
> a test script or two, too, so I don't have write those, but if
> you can at least provide the basis for the tests, I'll massage
> that into the required form.  I.e., instead of running the many
> commands as you outlined in your mail, you would run a single
> script that encapsulates those, where you would be free to assume
> (specific-to-your-setup) hard-coded absolute directory names for
> each of the different file system types.
> Also, hard-coded paths for $cp_orig and $cp_new would be fine,
> since you'd use those to demonstrate the performance improvements.
Yes, I kept in mind for your suggestions about how to meet the requirements for 
new features. :)

Actually, I already working on a test script, the reason why I didn't submit it 
along with the
current patch is I have concern if the current test coverage is enough, so I 
send out the patch
first.  The test script will be submit in my next try.
> 
>> Both '--fiemap' and '--fiemap-sync' options were eliminated in this version.
>> Now cp(1) will perform fiemap optimization whenever possible if cp(1) 
>> invoked with --sparse=[WHEN],
>> otherwise, it will fall back to the standard copy.
> ...
>> address@hidden:~/opensource_dev/coreutils$ time sudo /bin/cp --sparse=always 
>> /ocfs2/sparse_4
> 
> Hmm... you're using sudo here.
> Does that mean the fiemap ioctl works only as root?
sorry! I also forgot why I using sudo at that time. :(
But I am sure fiemap ioctl(2) could works without root privilage.

> Otherwise, why bother with sudo?
> 
>> Subject: [PATCH 1/1] Introduct fiemap ioctl(2) for efficient sparse file 
>> copy v3
>>
>> Signed-off-by: Jie Liu <address@hidden>
>> ---
>>  src/copy.c   |  141 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 243 insertions(+), 0 deletions(-)
>>  create mode 100644 src/fiemap.h
>>
>> diff --git a/src/copy.c b/src/copy.c
> ...
>> +static bool
>> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>> +                off_t src_total_size, char const *src_name,
>> +                char const *dst_name)
>> +{
>> +  int last = 0;
> 
> Use "bool" for booleans:
> 
>     bool last = false;
>> +  unsigned int i;
>> +  bool return_val = true;
> 
> Please don't use a name like "return_val".
> In this case, using "ok" is more descriptive.
> (or reverse the meaning and use "fail")
Thanks for the point out.
> 
>> +  char fiemap_buf[4096] = "";
> 
> This is a dead store.  Remove the ' = ""';
> 
>> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
>> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
>> +                    sizeof (struct fiemap_extent);
>> +  uint64_t last_ext_logical = 0;
>> +  uint64_t last_ext_len = 0;
>> +  uint64_t last_read_size = 0;
>> +
>> +  memset (fiemap, 0, sizeof (*fiemap));
> 
> What is the point of initializing this buffer?
> I suspect you can safely remove it.
I'll double check for this point.
> 
>> +  do
>> +    {
>> +      fiemap->fm_start = 0ULL;
>> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> +      fiemap->fm_extent_count = count;
>> +
>> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
>> +          return false;
>> +
>> +      /* If 0 extents are returned, then more ioctls are not needed.  */
>> +      if (fiemap->fm_mapped_extents == 0)
>> +          return true;
>> +
>> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
>> +        {
>> +          uint64_t ext_logical = fm_ext[i].fe_logical;
> 
> There is no point in declaring this as uint64_t when the
> sole uses are cast to "off_t".
> Instead declare it as off_t, and add an assertion
> to ensure that the RHS value was in range.
> 
>              off_t ext_logical = fm_ext[i].fe_logical;
>              assert (fm_ext[i].fe_logical <= OFF_T_MAX);
> 
Yes, it should be declared to 'off_t' here.

>> +          uint64_t ext_len = fm_ext[i].fe_length;
> 
>> +
>> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>> +            {
>> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
>> +              return_val = false;
>> +            }
>> +
>> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>> +            {
>> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
>> +              return_val = false;
>> +            }
> 
> If either lseek fails, then we must not proceed to read or write.
> Setting "return_val" is insufficient.
> 
> This makes me think we'll have to redesign a little, to avoid
> the possibility of corruption.
> 
> Consider this change:
> 
>   If the fiemap ioctl succeeds: use only this specialized code to
>   perform the copy and do not fall back on normal copy.  However, if
>   any operation fails here (like these lseeks, read or write), we
>   diagnose it as usual.  Not falling back upon failure has the added
>   benefit that there's no risk of a duplicate diagnostic if the same
>   failure occurs again during the regular copy.
> 
> Otherwise, imagine that lseek advances the source FD,
> but the destination lseek fails.  Now the FDs are inconsistent.
> We would have to realign (or restore) them, or risk writing
> data at the wrong offset: corruption.
So, do you means if fiemap ioctl(2) succeeds, no matter what error(read, write, 
lseek) occurred
afterwards, we should always diagnosing it and then abort with the 
corresponding error msg?

My original design really cause the file corruption if the lseek(2) already 
advances the source FD
but the destination lseek(2) fails.

But is it make sense to seek back to the start offset for both source and 
destination FDs, then fall
back to the regular copy?

> 
>> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
>> +            {
>> +              last_ext_logical = ext_logical;
>> +              last_ext_len = ext_len;
>> +              last = 1;
> 
>                  last = true;
> 
>> +            }
>> +
>> +          char buf[buf_size];
> 
> "buf" is used only within the loop, so please declare it there.
> 
>> +          while (0 < ext_len)
>> +            {
>> +              memset (buf, 0, sizeof (buf));
> 
> I see no need to initialize "buf".
> The only data you use from it is put there by "read".
This initizalize really do not need here.
> 
>> +              /* Avoid reading into the holes if the left extent
>> +                 length is shorter than the buffer size.  */
>> +              if (ext_len < buf_size)
>> +                buf_size = ext_len;
>> +
>> +              ssize_t n_read = read (src_fd, buf, buf_size);
>> +              if (n_read < 0)
>> +                {
>> +#ifdef EINTR
>> +                  if (errno == EINTR)
>> +                    continue;
>> +#endif
>> +                  error (0, errno, _("reading %s"), quote (src_name));
>> +                  return_val = false;
>> +                }
>> +
>> +              if (n_read == 0)
>> +                {
>> +                   /* Figure out how many bytes read from the last extent.  
>> */
>> +                   last_read_size = last_ext_len - ext_len;
>> +                   break;
> 
> Each of the above three lines is indented one space too far.
oh, I do such stupid thing again!!!!
> 
>> +                }
>> +
>> +              if (full_write (dest_fd, buf, n_read) != n_read)
>> +                {
>> +                  error (0, errno, _("writing %s"), quote (dst_name));
>> +                  return_val = false;
>> +                }
>> +
>> +              ext_len -= n_read;
>> +            }
>> +
>> +          fiemap->fm_start = (fm_ext[i-1].fe_logical + 
>> fm_ext[i-1].fe_length);
>> +        }
>> +    } while (last == 0);
> 
>        } while (!last);
> 
>> +
>> +  /* If a file ends up with holes, the sum of the last extent logical offset
>> +     and the read returned size should shorter than the actual size of the 
>> file.
>> +     We should sets the file size to ((struct stat) st_buf.st_size).  */
> 
>      /* If a file ends with a hole, the sum of the last extent logical offset
>         and the read-returned size will be shorter than the actual size of the
>         file.  Use ftruncate to extend the length of the destination file.  */
> 
>> +  if (last_ext_logical + last_read_size < src_total_size)
>> +    {
>> +      if (ftruncate (dest_fd, src_total_size) < 0)
>> +        {
>> +          error (0, errno, _("truncating %s"), quote (dst_name));
> 
> s/truncating/extending/
thanks for above 3 corrections!
> 
>> +          return_val = false;
>> +        }
>> +    }
>> +
>> +  return return_val;
>> +}
>> +#else
>> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
>> +#endif
>> +
>>  /* FIXME: describe */
>>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>>     performance hit that's probably noticeable only on trees deeper
>> @@ -679,6 +807,16 @@ copy_reg (char const *src_name, char const *dst_name,
>>  #endif
>>          }
>>
>> +      if (make_holes)
>> +        {
>> +          /* Perform efficient FIEMAP copy for sparse files, fall back to 
>> the
>> +             standard copy if fails.  */
>> +          if (fiemap_copy_ok (source_desc, dest_desc,
>> +                              buf_size, src_open_sb.st_size,
>> +                              src_name, dst_name))
>> +            goto preserve_metadata;
>> +        }

Best Regards,
-Jeff






reply via email to

[Prev in Thread] Current Thread [Next in Thread]