guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File search


From: Ludovic Courtès
Subject: Re: File search
Date: Tue, 25 Jan 2022 12:15:43 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I also had the idea of making it a package... this way only the people
> who opt to install the database locally would incur the cost (in
> bandwidth).
>
> Perhaps a question for Vagrant: talking about size, is this SQLite
> database file comparable or smaller in size to the apt-file database
> that needs to be downloaded?  With the Debian software catalog being
> about 30% bigger, I'd expect a similarly bigger file size.
>
> If Debian is doing better in terms of database file size, we could look
> at how they're doing it.

As a back-of-the-envelope estimate, here’s the amount of text that needs
to be available in the database:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~/src$ sqlite3 -csv  /tmp/db 'select name,version from packages; 
select name from directories;select name from files;'|wc -c
197689978
ludo@berlin ~/src$ guile -c '(pk (/ 197689978 (expt 2. 20)))'

;;; (188.5318546295166)
ludo@berlin ~/src$ du -h /tmp/db
389M    /tmp/db
--8<---------------cut here---------------end--------------->8---

So roughly, SQLite with this particular schema ends up taking twice as
much space as the lower bound.

We can do a bit better (I’m not an expert, so I’m just trying things
naively) by dropping the index and cleaning up the database:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~/src$ cp /tmp/db{,.without-index}
ludo@berlin ~/src$ sqlite3  /tmp/db.without-index
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> drop index IndexFiles;
sqlite> .quit
ludo@berlin ~/src$ du -h /tmp/db.without-index 
389M    /tmp/db.without-index
ludo@berlin ~/src$ sqlite3  /tmp/db.without-index 
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> vacuum;
sqlite> .quit
ludo@berlin ~/src$ du -h /tmp/db.without-index 
290M    /tmp/db.without-index
--8<---------------cut here---------------end--------------->8---

With compression:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~/src$ zstd -19 < /tmp/db.without-index > /tmp/db.without-index.zst
ludo@berlin ~/src$ du -h /tmp/db.without-index.zst 
37M     /tmp/db.without-index.zst
--8<---------------cut here---------------end--------------->8---

(Down from 61MB.)  For comparison, this is smaller than guile, perl,
gtk+, and roughly the same as glibc:out.

For the record, with compression, the lower bound is about 12 MiB:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~/src$ sqlite3 -csv  /tmp/db 'select name,version from packages; 
select name from directories;select name from files;'|zstd -19|wc -c
12128674
ludo@berlin ~/src$ guile -c '(pk (/ 12128674 (expt 2. 20)))'

;;; (11.566804885864258)
--8<---------------cut here---------------end--------------->8---

All this to say that we could distribute the database in a form that
gets closer to the optimal size, at the expense of extra processing on
the client side upon reception to put it into shape (creating an index,
etc.).

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]