|
From: | Marcus Müller |
Subject: | Re: Experience on how to publish data on Zenodo |
Date: | Thu, 27 Jul 2023 13:09:49 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
Now, if your database is really just JSON-style documents, I'd keep that as exchange format: A directory full of JSON files. Throw it all in a compressed archive¹, or if that still is of significant size, first convert the JSON to BJSON prior to compression².
If things just *happen* to be in JSON, but could actually reasonably be considered tabular data, things look differently.
In any case, some datasets I saw on zenodo also include the (typically, R or python) scripts to convert the data from the archive format to the deployment format (in your case, the elasticsearch-dump link and maybe a script to run it and its inverse?). I like that, because data without a clear guidance on how to read it correctly can, at worst, become useless.
Cheers, Marcus¹ a .tar and run `zstd -15` on that (GNU tar doesn't let you set zstd compression options through command line options), or run `mksquashfs directory/ archive.squash -comp zst`, if you think the consumers will have Linux, which can directly mount that, or a modern 7zip installation.
² If that is any better – zstd isn't bad about grammatically strict syntax, and IIRC LZMA-derivatives are actually pretty good about it, so try with lzlib or fastlzma2, maybe
On 27.07.23 09:40, Johannes Demel wrote:
Hi everyone,I'd like to publish some of the data I recorded with my GNU Radio demo on zenodo.org. It is not IQ data but other data that we stored in an elasticsearch database. Maybe some of you know about a suitable data format or a source that discusses best practices. Right now, I export everything viahttps://github.com/elasticsearch-dump/elasticsearch-dumpBut they use their custom format and change it over time. I doubt it is suitable for archiving purposes.I'd appreciate hints on how to do that properly. Cheers Johannes
[Prev in Thread] | Current Thread | [Next in Thread] |