bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tar doesn't handle short reads


From: Nikos Tsipinakis
Subject: Tar doesn't handle short reads
Date: Mon, 27 Jul 2020 11:41:30 +0300

[ I'm not subscribed to the list, so please keep me in Cc ]

Greetings tar users and developers,

For the past 2 days I've been hunting a tar-related failure that I could only
reproduce by running dpkg-source, running the tar incantation dpkg-source uses
internally on its own doesn't appear to work. I'll document my debugging
endeavour here to make sure I have my facts straight, as well as to help anyone
else stumbling onto this.

The error is as follows:

$ dpkg-source -b .
dpkg-source: info: using source format '3.0 (quilt)'
dpkg-source: info: building newsboat using existing 
./newsboat_2.20.1.orig.tar.xz
dpkg-source: info: building newsboat using existing 
./newsboat_2.20.1.orig.tar.xz.asc
dpkg-source: info: building newsboat in newsboat_2.20.1-1.debian.tar.xz
tar: debian/changelog: File shrank by 36 bytes; padding with zeros
dpkg-source: error: tar -cf - subprocess returned exit status 1

On further investigation, this bug is triggered if all of the following
conditions are met (on my system):

1. Have a changelog file >4096 bytes
2. The build directory should be in a 9p filesystem (in a VM)
3. Kernel version should be >=5.6 (it doesn't error on 4.19)
4. Do the exact combination of steps dpkg-source does (running the tar command 
directly
doesn't appear to cause it)

Condition 3 threw me off and I started looking into the kernel as the source of
this, however, I now believe it's some change in the 9p driver that simply
uncovered it.

$ strace -f dpkg-source -b .
[...]
72606 execve("/home/nikos/tar-1.32/src//tar", ["tar", "-cf", "-", 
"--format=gnu", "--sort=name", "--mtime", "@1595708407", "--clamp-mtime", 
"--null", "--numeric-owner", "--owner=0", "--group=0", "--exclude=*.a", 
"--exclude=*.la", "--exclude=*.o", "--exclude=*.so", "--exclude=.*.sw?", 
"--exclude=*/*~", "--exclude=,,*", "--exclude=.[#~]*", "--exclude=.arch-ids", 
"--exclude=.arch-inventory", "--exclude=.be", "--exclude=.bzr", 
"--exclude=.bzr.backup", "--exclude=.bzr.tags", "--exclude=.bzrignore", 
"--exclude=.cvsignore", "--exclude=.deps", "--exclude=.git", 
"--exclude=.gitattributes", "--exclude=.gitignore", ...], 0x55b9e7b2b4d0 /* 22 
vars */ <unfinished ...>
[...]
72606 newfstatat(3, "changelog", {st_mode=S_IFREG|0644, st_size=4132, ...}, 
AT_SYMLINK_NOFOLLOW) = 0
72606 openat(3, "changelog", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) 
= 4
72606 fstat(4, {st_mode=S_IFREG|0644, st_size=4132, ...}) = 0
72606 read(4, "newsboat (2.20.1-1) UNRELEASED; "..., 4132) = 4096
72606 write(2, "tar: ", 5)              = 5
72606 write(2, "debian/changelog: File shrank by"..., 61) = 61
72606 write(2, "\n", 1)                 = 1
72606 close(4)                          = 0
[...]

The read() call returns less bytes than requested and then tar assumed it was
truncated and issues a warning. I traced this call to create.c:dump_regular_file
which indeed doesn't seem to handle this possibility.

Attached are two patches, the first replaces the safe_read call in 
blocking_read with
full_read from gnulib, I have verified that this change fixes the bug.

I can't seem to understand what's the point of distinguishing between
safe_{read,write} and full_{read,write}, is there a situation where not handling
short read/writes would be important? Why not merge these functions?

So finally, since I couldn't find any reason not to, the second patch replaces
all instances of safe_read with full_read to avoid such problems in other areas
of the code. (note: Without much verification!)

Attachment: 0001-Use-full_read-to-avoid-errors-due-to-short-reads.patch
Description: Text Data

Attachment: 0002-Remove-remaining-usages-of-safe_read.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]