[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Excessive file system usage
From: |
Dave Trollope |
Subject: |
Re: Excessive file system usage |
Date: |
Wed, 4 Dec 2019 09:24:47 -0600 |
Hi Alan,
Sorry, yes I forgot to mention this is linux, Debian GNU/Linux 9
Linux e1e6db1d8408 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64
GNU/Linux
I’ve reproduced this behavior in kubernetes and outside kubernetes in a raw
docker container so its not kubernetes specific but may be related to the way
the containerized image is built in docker.
We haven’t observed this on our standard ec2, but to be honest we haven’t
monitored in the same way - I can try that and see. We have enough space there
that it could have gone unnoticed. I will try.
What I'm doing is watching the filesystem as the SAVE TRANSLATE command is
running, using watch -n 0.5 "df -H; ls -ltr /tmp"
The only file being written is the csv but the filesystem used space is
dropping at a much higher rate than data being written. No other temp files are
being placed in /tmp
I also reproduced this using a ram based fs - if you watch the usage it behaves
the same so I don't think its specific to dockerized filesystems, but I might
yet be wrong on that.
The link you share is a common problem when starting out with containers where
the build process creates lots of images. As you build lots of images, you have
to cleanup. Its one of the first things you learn as you step in to the
container world!
Appreciate the quick reply. It certainly was a shocking observation when I
found it :-)
Cheers
Dave
On Dec 4, 2019, 8:29 AM -0600, Alan Mead <address@hidden>, wrote:
> Wow, that's a lot. Do you mean that 7GB of space are needed (for, I guess
> temporary files)? And you did not observe that previously?
>
> Maybe the devs are familiar with kubernetes; I only know the name. Can you
> describe the environment (e.g., OS)? And pspp version? How many conversions
> have you observed this behavior?
>
> And you're sure this isn't a kubernetes problem (like it's making snapshots
> as it writes the file or something)? I ask because when I google about this,
> it looks like there are sharp edges; glancing through, these don't seem to
> directly and specifically address the behavior you're seeing, but it looks
> like there could be these kinds of issues with kubernetes and the PSPP devs
> wouldn't be able to help unless they knew kubernetes:
>
> https://cntnr.io/whats-eating-my-disk-docker-system-commands-explained-d778178f96f1
> https://softwareengineeringdaily.com/2019/01/11/why-is-storage-on-kubernetes-is-so-hard/
>
> -Alan
>
>
> On 12/4/2019 6:40 AM, Dave Trollope wrote:
> > We just moved Pspp to Kubernetes containers where we use it to extract csvs
> > from sav files. The sav files are about 1gb and each csv is about 150mb.
> >
> > We’ve watched the file system as it does it and over 7gb of the file system
> > is used while writing 150mb. I assume the SAVE command is doing lots of
> > seeks and insertions in the file magnifying the file system usage. Any
> > options to limit this behavior?
> >
> > Here is the script we are using
> > GET FILE = "{}"
> >
> > SAVE TRANSLATE
> > /OUTFILE="{}"
> > /TYPE=CSV
> > /FIELDNAMES
> > /REPLACE
> > /KEEP={}
> > /MISSING=RECODE
> > /CELLS=LABELS.
> > Cheers
> > Dave
> >
>
> --
>
> Alan D. Mead, Ph.D.
> President, Talent Algorithms Inc.
>
> science + technology = better workers
>
> http://www.alanmead.org
>
> The irony of this ... is that the Internet is
> both almost-infinitely expandable, while at the
> same time constrained within its own pre-defined
> box. And if that makes no sense to you, just
> reflect on the existence of Facebook. We have
> the vastness of the internet and yet billions
> of people decided to spend most of them time
> within a horribly designed, fake-news emporium
> of a website that sucks every possible piece of
> personal information out of you so it can sell it
> to others. And they see nothing wrong with that.
>
> -- Kieren McCarthy, commenting on why we are not
> all using IPv6
Re: Excessive file system usage, Ben Pfaff, 2019/12/04