discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Experience on how to publish data on Zenodo


From: Marcus Müller
Subject: Re: Experience on how to publish data on Zenodo
Date: Thu, 27 Jul 2023 13:09:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

Can you tell us what kind of documents you stored in the elasticsearch database? I'm not familiar with using elasticsearch myself, but as far as my understanding reaches, it's a full-text index for JSON-style hierarchical data. So I guess it does both fulltext search and structured retrieval.

Now, if your database is really just JSON-style documents, I'd keep that as exchange format: A directory full of JSON files. Throw it all in a compressed archive¹, or if that still is of significant size, first convert the JSON to BJSON prior to compression².

If things just *happen* to be in JSON, but could actually reasonably be considered tabular data, things look differently.

In any case, some datasets I saw on zenodo also include the (typically, R or python) scripts to convert the data from the archive format to the deployment format (in your case, the elasticsearch-dump link and maybe a script to run it and its inverse?). I like that, because data without a clear guidance on how to read it correctly can, at worst, become useless.

Cheers,
Marcus


¹ a .tar and run `zstd -15` on that (GNU tar doesn't let you set zstd compression options through command line options), or run `mksquashfs directory/ archive.squash -comp zst`, if you think the consumers will have Linux, which can directly mount that, or a modern 7zip installation.

² If that is any better – zstd isn't bad about grammatically strict syntax, and IIRC LZMA-derivatives are actually pretty good about it, so try with lzlib or fastlzma2, maybe


On 27.07.23 09:40, Johannes Demel wrote:
Hi everyone,

I'd like to publish some of the data I recorded with my GNU Radio demo on zenodo.org. It is not IQ data but other data that we stored in an elasticsearch database. Maybe some of you know about a suitable data format or a source that discusses best practices. Right now, I export everything via
https://github.com/elasticsearch-dump/elasticsearch-dump
But they use their custom format and change it over time. I doubt it is suitable for archiving purposes.

I'd appreciate hints on how to do that properly.

Cheers
Johannes




reply via email to

[Prev in Thread] Current Thread [Next in Thread]