help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [git] puplic repo url incorrect or server overloaded?


From: Bob Proulx
Subject: Re: [git] puplic repo url incorrect or server overloaded?
Date: Thu, 13 Nov 2014 11:15:28 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

H. Dieter Wilhelm wrote:
>     git://git.sv.gnu.org/emacs
> ...
>       fatal: read error: Connection reset by peer
> ...
> Is this just a sign that the server is overloaded or am I'm doing
> something wrong?

There has been some miscommunication all around.  Let me bend
everyone's ear for a moment and fill in some behind the scenes
background information.

Emacs has made another attempt at converting from bzr to git.  I am
one of the volunteers that helps administer the Savannah systems and
have been helping them with this process on the admin side.  

Recently esr made another conversion and uploaded 13G of source to the
git repository.  

  vcs:~# du -sh /srv/git/emacs.git
  13G     /srv/git/emacs.git
    
I had set up the initial empty repository and didn't realize that the
recent upload was 13G.  Then it was announced that was the new
repository.  I didn't know it had been announced that this was the new
source location.  But I did notice that vcs browned out and needed
help.  I rebooted it twice to rescue it.  I had to stop all of the git
daemons at another two times.

Think what happens.  Everyone and their dog tries to download a fresh
copy of the repository.  At 13G each!  The concurrency limit was 40
concurrent processes.  Each of those were trying to download 13G.
That would take hours each.  It starved out all other projects from
being able to access their git repositories.

Worse is that it overloaded the system to the point that it was
inaccessible.  Turns out that it can't handle 40 simultaneous git
downloads of 13G each.  It pushed the system over a tipping point to
where other processes couldn't finish faster than new processes were
started.  The last system load logged was 350 before it stopped
responding at all.

Alerts notified me of the problem.  It was unresponsive.  I and the
FSF admins were in conference about the problem.  We rebooted the
server to rescue it.  Things seemed okay for a bit.  Then the load
average would creep up again.  We lost it again.  I needed to reboot
it again.  Figured out that it was git that was doing it.  I was
forced to disable git in order to keep the system alive.  Shut down
git and kill all of the git downloads.

Of course initially it wasn't known that it was the new 13G emacs
repository that was the problem.  That became apparent only after
digging into the problem.  The git daemons don't log what they are
doing and all of the projects share the pool.  It was just that git
was overloading the system.  I reduced the limits on git resources.  I
added more virtual memory.  I reconfigured overcommit off to avoid the
oom from killing critical processes.  Turned git on again and watched
the process list closely.  Figure out that everyone is downloading
emacs.

During this time your emacs git clone and others were probably getting
reset more than once.  What you were seeing was problems from this.

When we figured out that the emacs repository was new and 13G in size
and that was the problem!  I asked esr to repack it and upload it
again.  I moved the 13G repository out of the way to prevent the
continuing problem.  Kill all of the troubled git downloads and
restart git so that other projects could function again.  esr repacked
the emacs archive and uploaded it again.  The new repacked repository
is now only 200M in size.  As you can imagine that makes a world of
difference!

  vcs:/srv/git/emacs.git# du -sh
  200M    .

Basically now at this time everything is back to operating normally.
I still have the reduced process limits for git in place though.
Because at any time some other project might do the same thing.

This is all discussed at length that is the firehose that is
emacs-devel.  Unfortunately I don't have the time to read it right now
so as a note to others I am only reacting and helping when people CC
me on tasks from there.

And that is the behind the scenes of yesterday.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]