[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Tar doesn't handle short reads
From: |
Nikos Tsipinakis |
Subject: |
Tar doesn't handle short reads |
Date: |
Mon, 27 Jul 2020 11:41:30 +0300 |
[ I'm not subscribed to the list, so please keep me in Cc ]
Greetings tar users and developers,
For the past 2 days I've been hunting a tar-related failure that I could only
reproduce by running dpkg-source, running the tar incantation dpkg-source uses
internally on its own doesn't appear to work. I'll document my debugging
endeavour here to make sure I have my facts straight, as well as to help anyone
else stumbling onto this.
The error is as follows:
$ dpkg-source -b .
dpkg-source: info: using source format '3.0 (quilt)'
dpkg-source: info: building newsboat using existing
./newsboat_2.20.1.orig.tar.xz
dpkg-source: info: building newsboat using existing
./newsboat_2.20.1.orig.tar.xz.asc
dpkg-source: info: building newsboat in newsboat_2.20.1-1.debian.tar.xz
tar: debian/changelog: File shrank by 36 bytes; padding with zeros
dpkg-source: error: tar -cf - subprocess returned exit status 1
On further investigation, this bug is triggered if all of the following
conditions are met (on my system):
1. Have a changelog file >4096 bytes
2. The build directory should be in a 9p filesystem (in a VM)
3. Kernel version should be >=5.6 (it doesn't error on 4.19)
4. Do the exact combination of steps dpkg-source does (running the tar command
directly
doesn't appear to cause it)
Condition 3 threw me off and I started looking into the kernel as the source of
this, however, I now believe it's some change in the 9p driver that simply
uncovered it.
$ strace -f dpkg-source -b .
[...]
72606 execve("/home/nikos/tar-1.32/src//tar", ["tar", "-cf", "-",
"--format=gnu", "--sort=name", "--mtime", "@1595708407", "--clamp-mtime",
"--null", "--numeric-owner", "--owner=0", "--group=0", "--exclude=*.a",
"--exclude=*.la", "--exclude=*.o", "--exclude=*.so", "--exclude=.*.sw?",
"--exclude=*/*~", "--exclude=,,*", "--exclude=.[#~]*", "--exclude=.arch-ids",
"--exclude=.arch-inventory", "--exclude=.be", "--exclude=.bzr",
"--exclude=.bzr.backup", "--exclude=.bzr.tags", "--exclude=.bzrignore",
"--exclude=.cvsignore", "--exclude=.deps", "--exclude=.git",
"--exclude=.gitattributes", "--exclude=.gitignore", ...], 0x55b9e7b2b4d0 /* 22
vars */ <unfinished ...>
[...]
72606 newfstatat(3, "changelog", {st_mode=S_IFREG|0644, st_size=4132, ...},
AT_SYMLINK_NOFOLLOW) = 0
72606 openat(3, "changelog", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC)
= 4
72606 fstat(4, {st_mode=S_IFREG|0644, st_size=4132, ...}) = 0
72606 read(4, "newsboat (2.20.1-1) UNRELEASED; "..., 4132) = 4096
72606 write(2, "tar: ", 5) = 5
72606 write(2, "debian/changelog: File shrank by"..., 61) = 61
72606 write(2, "\n", 1) = 1
72606 close(4) = 0
[...]
The read() call returns less bytes than requested and then tar assumed it was
truncated and issues a warning. I traced this call to create.c:dump_regular_file
which indeed doesn't seem to handle this possibility.
Attached are two patches, the first replaces the safe_read call in
blocking_read with
full_read from gnulib, I have verified that this change fixes the bug.
I can't seem to understand what's the point of distinguishing between
safe_{read,write} and full_{read,write}, is there a situation where not handling
short read/writes would be important? Why not merge these functions?
So finally, since I couldn't find any reason not to, the second patch replaces
all instances of safe_read with full_read to avoid such problems in other areas
of the code. (note: Without much verification!)
0001-Use-full_read-to-avoid-errors-due-to-short-reads.patch
Description: Text Data
0002-Remove-remaining-usages-of-safe_read.patch
Description: Text Data
- Tar doesn't handle short reads,
Nikos Tsipinakis <=