[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: celery connection problems to RabbitMQ
From: |
ayleph |
Subject: |
Re: celery connection problems to RabbitMQ |
Date: |
Mon, 11 May 2020 19:37:07 +0000 (UTC) |
Hi Fernando,
I'm so glad you brought this up. I think I've had the same issues with Celery
for months. It's been so bad that I had to disable uploads on goblinrefuge.com
because every time I restarted Celery it would mark a lot of uploads failed and
required too much manual intervention to fix.
I don't have any useful debug information to contribute. When I switched from
flup/fcgi to uwsgi, I lost a lot of my error logging and haven't been able to
figure out how to get it back.
May 11, 2020 03:44:42 Ben Sturmfels <address@hidden>:
> Hi Fernando,
>
>
> Please post a patch or a link to a remote branch to an issue on the
>
> issue tracker - ideally a separate change for celery and spectrograms.
>
>
> Regarding PySoundFile, the file you're after is setup.py.
>
>
> For what it's worth, I see that the Python SoundFile library is
>
> available in Debian, but PySoundFile doesn't appear to be. This isn't a
>
> complete showstopper, but it would help us when tackling distro
>
> packaging in the near future.
>
>
> Regards,
>
> Ben
>
>
> On Mon, 11 May 2020, Fernando Gutierrez wrote:
>
>
>
>
>
> > Hi Ben
> >
> >
> > Sorry I think I didn't explain clearly. I only fixed the connection
> > reset
> >
> > exceptions in celery but the bug with media changed to failed state is
> >
> > not fixed.
> >
> >
> > I will continue debugging but it may take some time. I don't know
> >
> > why celery thinks a completed task needs to be run again.
> >
> >
> > In the meantime I will submit a patch for the systemd file, the
> >
> > BROKER_HEARTBEAK issue and also a fix for the audio spectrogram
> >
> > code as I mentioned in the IRC channel.
> >
> >
> > I have a couple of questions:
> >
> >
> > 1) I'm not familiar with the development process, I already created
> > an
> >
> > account in savannah.gnu.org but I don't see how to submit a patch
> >
> > for review.
> >
> > 2) For the spectrogram I used the PySoundFile package. What file do
> >
> > I need to modify so it gets pulled during setup?, in my setup I
> >
> > manually called ./bin/pip install PySoundFile
> >
> >
> > Thanks
> >
> > Fernando
> >
> >
> > On Sun, May 10, 2020 at 6:39 AM Ben Sturmfels
> >
> > <address@hidden> wrote:
> >
> >
> > Hi Fernando,
> >
> >
> > On Sun, 10 May 2020, Fernando Gutierrez wrote:
> >
> >
> > > I recently asked in the IRC channel about RabbitMQ connection
> >
> > reset
> >
> > > errors in celeryd logs.
> >
> > >
> >
> > > I think there are two issues:
> >
> > >
> >
> > > 1) The example systemd file (mediagoblin-celeryd.service) from
> >
> > >
> >
> > https://mediagoblin.readthedocs.io/en/stable/siteadmin/deploying.html
> >
> >
> > > does not specify that celeryd must be started after RabbitMQ,
> >
> > so it is
> >
> > > sometimes started before and fails because RabbitMQ is not
> >
> > running
> >
> > > yet.
> >
> > >
> >
> > > 2) In mediagoblin/mediagoblin/init/celery/__init__.py, it sets
> >
> > > celery_settings['BROKER_HEARTBEAT'] = 1. In slower systems
> >
> > or
> >
> > > under heavy load if the worker is too slow to respond in < 1
> >
> > second it
> >
> > > will miss the heartbeat and after a few missed heartbeats the
> >
> > > connection is considered dead and reset.
> >
> > > I'm not sure what is the purpose of changing
> >
> > BROKER_HEARTBEAT to
> >
> > > 1 but the celery docs recommend not using such a small
> >
> > value. In my
> >
> > > install I changed it to 20 and I no longer see any connection
> >
> > > problems.
> >
> > >
> >
> > > Are you willing to accept a patch for
> >
> > > mediagoblin/docs/source/siteadmin/deployment.rst and
> >
> > > mediagoblin/mediagoblin/init/celery/__init__.py to fix those two
> >
> > > problems?
> >
> >
> > Thank you very much for diving in and investigating the issue.
> >
> > We'd be
> >
> > happy to take a patch on this. If you can add a comment to
> >
> > explain the
> >
> > new BROKER_HEARTBEAT value in the code, that would be great.
> >
> >
> > I wonder if there there might still be a problem lurking here
> >
> > though,
> >
> > even if your system is now working properly Not being able to
> >
> > connect to
> >
> > RabbitMQ or an unresponsive celery worker probably shouldn't
> >
> > change
> >
> > existing processed media items to failed.
> >
> >
> > Thanks for your work on this!
> >
> >
> > Regards,
> >
> > Ben
> >
> >
> >
>
>