Hi,
this issue already known for several months now - see [0], [1].
The keys used for this are very large (around 30-60MB). Syncing
them takes some bandwith and indexing/writing them to the disk
consumes a lot of CPU and I/O resources. If the addition to the
database fails, the key is added again when another peer
synchronizes it, causing the same load (which results in the large
I/O spikes that you see).
SKS is single threaded and therefore, any other action is blocked
while the key addition takes place. This is also known for years
and the fact that it can be easily used for attacks is ignored.
If I remember correctly, the behaviour above is intended and
therefore, I would not expect any fixes in the next months. There
have been some fixes which exclude some of the bad keys [2] (which
might be included in the ubuntu/debian sks packages so this may be
why it stopped over the last months), however this only works as
long as nobody generates and uploads a new key.
Best Regards,
Moritz
[0]:
https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
[1]:
https://bitbucket.org/skskeyserver/sks-keyserver/issues/60/denial-of-service-via-large-uid-packets
[2]:
https://lists.nongnu.org/archive/html/sks-devel/2018-07/msg00053.html
Am 13.12.18 um 07:28 schrieb Steffen Kaiser:
On Wed, 12 Dec 2018, Todd Fleisher wrote:
> Looks like the issue has spread to other peers as the
behavior returned
when I started my server initially, it was hit by the same problem
with great I/O and CPU consumption. I had to throow more and more
RAM to it. Actually building up the database from scratch required
me to give it 4GB or 8GB RAM, then I lowered the limit and it
looks like that some keys require the same amount. For my server
"the problem suddenly stopped", I don't know why, except to assume
the bad keys are included now.
Wasn't there a thread some monthes back telling that some index
will be read into memory or something like that?
Well, it's a strange user experience that users cannot query the
server, when it is adding keys and stalls like that.
> when reconciling with yet another peer (pgpkeys.co.uk
<http://pgpkeys.co.uk/>), so I’ve uncommented the previous
peer (sks.infcs.de <http://sks.infcs.de/>) and will wait for
someone to advise if there’s anything that can be done to reduce
this extra IO load (https://imgur.com/a/wHPYGsK
<https://imgur.com/a/wHPYGsK>)
> -T
>> On Dec 11, 2018, at 10:10 AM, Todd Fleisher
<address@hidden> wrote:
>>
>> Signed PGP part
>> I had gotten things under control after sending this, but
starting yesterday it came back when reconciling with a different
peer. I commented that peer out for now and things are back to
normal.
>>
>> Is anyone else seeing similar behavior? Is there anything
that can be done other than pausing reconciliation with peers that
bring on the issue?
>>
>> Here is a graph of my IO during the issue. You can see it
drop back to normal immediately after I commented out the problem
peer.
>>
>>
>>
>> -T
>>
>>> On Oct 19, 2018, at 11:38 PM, Paul Fawkesley
<address@hidden <mailto:address@hidden>>
wrote:
>>>
>>> Hi Todd, for what it's worth, I've been experiencing
this too since March.
>>>
>>> The hangs are so severe my keyserver would fail to
respond to requests. In order not to provide a poor experience to
users of the pool, I removed myself from it.
>>>
>>> Anecdotally it appears other keyservers still in the
pool are similarly affected: I experience high rates of timeout
and failure when using the pool these days.
>>>
>>> I installed Hockeypuck on another server and peered
it with my SKS instance. It syncs successfully but Hockeypock
*also* goes nuts periodically while syncing. Its memory and CPU
use rockets, often pushing into gigabytes of swap space, so that
server is pretty unresponsive too.
>>>
>>> I'm about to arrive at the OpenPGP Email summit in
Brussels, I'm sure this will come up as a topic, I shall report
back...
>>>
>>> Paul
>>>
>>> On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:
>>>> Hi All,
>>>> I wanted to follow up on this and add some new
data points. I tried
>>>> building some new SKS instances based on a more
recent dump
>>>> (specifically 2018-10-07 from
https://keyserver.mattrude.com/dump/
<https://keyserver.mattrude.com/dump/>
>>>> <https://keyserver.mattrude.com/dump/
<https://keyserver.mattrude.com/dump/>>) and found those
instances were
>>>> plagued by the same issue when I began peering
with my existing
>>>> instances. When I re-built the new instances from
an older dump
>>>> (specifically 2018-10-01 from the same source),
the issues went away.
>>>> This seems to imply some problematic data was
introduced into the pool
>>>> during the first week of October that is causing
the issues.
>>>>
>>>> I found an existing issue logged about this
behavior @
>>>>
https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
<https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>
>>>>
<https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
<https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>>
>>>>
>>>> For now, I’m able to keep my instances stable by
building them from the
>>>> earlier 2018-10-01 dump and not adding the second
peer to my membership
>>>> file. I would like to better understand why this
is happening and figure
>>>> out how to go about fixing it, in part so I can
begin peering with more
>>>> servers to improve the mesh.
>>>>
>>>> -T
>>>>
>>>>> On Oct 8, 2018, at 1:54 PM, Todd Fleisher
<address@hidden
<mailto:address@hidden>> wrote:
>>>>>
>>>>> Hi All,
>>>>> I recently joined the pool and started having
an issue after adding a second external peer to my membership
file. The symptoms are abnormally high IO load on the disk
whenever my server tries to reconcile with the second peer
(149.28.198.86), ending with a failure message "add_keys_merge
failed: Eventloop.SigAlarm”. It appears it tries to reconcile a
large number of keys (100) consistently when this happens. I’ve
read previous list threads about this message (e.g.
https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html
<https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html>
<https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html
<https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html>>
&
https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html
<https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>
<https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html
<https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>>)
which mention the cause potentially being a large key trying to be
processed and failing. I tried increasing my client_max_body_size
from 8m -> 32m in NGINX, but the issue persisted. For now, I
have removed the second peer from my membership file to keep from
over-taxing my server with no apparent benefits. I have included
an excerpt of my logs showing the behavior. Can someone please
advise what might be causing this issue and what can be done to
resolve it? Thanks in advance.
>>>>>
>>>>> <skslog2.txt>
-- Steffen Kaiser
>
> _______________________________________________
> Sks-devel mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/sks-devel
|