monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] serialization format


From: J Decker
Subject: Re: [Monotone-devel] serialization format
Date: Tue, 5 Apr 2016 20:26:45 -0700

If the structures might mutate with time something like json is pretty brief.
if you have high reliability, sqlite for instance will store a blob
with only \0 for the 0  and \\ for \ ...

which results in a copy or shift of data but only a simple comparison
if '\\'   kinda like base 254 sorta :)
depending on what character happens least you could replace <nul> for
<del> or something ...



On Tue, Apr 5, 2016 at 9:25 AM, Markus Wanner <address@hidden> wrote:
> On 04/04/2016 10:02 PM, Ludovic Brenta wrote:
>> No but they might care about performance.  How much of monotone's time
>> is actually spent translating between binary and hex?  Is this really a
>> major performance bottleneck?
>
> Well, not the conversion between hex and binary itself, no. But the
> effect the serialization format has on hashing.
>
> Let's have a look at some perf samples gathered during a functional test
> run:
>
>> #
>> # Overhead  Shared Object          Symbol
>> # ........  .....................  
>> ...................................................................................................................
>> #
>>      6.80%  libbotan-1.10.so.1.10  [.] 
>> _ZN5Botan12SHA_160_SSE210compress_nEPKhm
>>      3.74%  libc-2.21.so           [.] _int_free
>>      2.60%  libstdc++.so.6.0.21    [.] 
>> _ZSt18_Rb_tree_incrementPKSt18_Rb_tree_node_base
>>      2.24%  libstdc++.so.6.0.21    [.] 
>> _ZSt29_Rb_tree_insert_and_rebalancebPSt18_Rb_tree_node_baseS0_RS_
>>      1.85%  libc-2.21.so           [.] malloc
>>      1.85%  mtn                    [.] 
>> _ZNSt8_Rb_treeIN6option6optionI7optionsEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE7_M_copyINS9_20_Reuse_or_alloc
>>      1.73%  mtn                    [.] 
>> _ZSt11__set_unionISt23_Rb_tree_const_iteratorIN6option6optionI7optionsEEES5_St15insert_iteratorISt3setIS4_St4le
>>      1.66%  ld-2.21.so             [.] do_lookup_x
>>      1.57%  libcrypto.so.1.0.0     [.] DES_encrypt2
>>      1.36%  libc-2.21.so           [.] __memcmp_sse4_1
>>      1.17%  mtn                    [.] 
>> _ZNSt8_Rb_treeIN6option6optionI7optionsEES3_St9_IdentityIS3_ESt4lessIS3_ESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS
>>      1.04%  libc-2.21.so           [.] free
>>      1.03%  libc-2.21.so           [.] malloc_consolidate
>>      0.98%  [unknown]              [k] 0xffffffff817f4ca0
>>      0.75%  mtn                    [.] 
>> _ZNSt17_Function_handlerIFvP7optionsNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEMS0_FvS7_EE10_M_manage
>>      0.71%  ld-2.21.so             [.] _dl_lookup_symbol_x
>>      0.71%  mtn                    [.] 
>> _ZNSt17_Function_handlerIFvP7optionsEMS0_FvvEE10_M_managerERSt9_Any_dataRKS6_St18_Manager_operation
>>      0.67%  libcrypto.so.1.0.0     [.] DES_encrypt1
>>      0.64%  libbotan-1.10.so.1.10  [.] 
>> _ZN5Botan16MDx_HashFunction12final_resultEPh
>>      0.64%  libgmp.so.10.2.0       [.] __gmpn_redc_1
>>      0.62%  [unknown]              [k] 0xffffffff811b24fa
>>      0.58%  libc-2.21.so           [.] strlen
>>      0.58%  libbotan-1.10.so.1.10  [.] 
>> _ZN5Botan16MDx_HashFunction8add_dataEPKhm
>>      0.57%  [unknown]              [k] 0xffffffff813d3417
> ...
>>      0.06%  libbotan-1.10.so.1.10  [.] _ZN5Botan10hex_decodeEPhPKcmRmb
> ...
>>      0.02%  libbotan-1.10.so.1.10  [.] _ZN5Botan10hex_encodeEPcPKhmb
>
>
> Hashing probably is the single most time consuming operation here, with
> about 8% of the time spent (note that the add_data and final_result
> methods are within the top 25 as well).
>
> The CPU time that's used for the actual hex encoding and decoding is
> vanishingly small, below 0.1%.
>
>
> Now, I'm clearly not into micro optimizations (but rather consider
> modifications like using base58 instead of the hex encoding for hashes
> presented to the user - an encoding that's certain to consume more CPU
> time, not sure how much more, though.)
>
> However, reducing the amount of data to be hashed, cached and moved
> around (in memory, network, etc..) sounds like a generally good idea to
> me (performance wise). However, it's equally clearly a bad idea from a
> usability perspective. So there's a balance. That's why I started this
> thread.
>
> Given the arguments so far I tend towards a binary encoding, as I think
> developers should be able to handle binary data. And if users really
> don't care...
>
> Regards
>
> Markus Wanner
>
>
>
> _______________________________________________
> Monotone-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/monotone-devel
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]