[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#50040] [PATCH 0/2] publish: Always render nar/narinfo during backin
From: |
Mathieu Othacehe |
Subject: |
[bug#50040] [PATCH 0/2] publish: Always render nar/narinfo during backing. |
Date: |
Fri, 17 Sep 2021 17:27:04 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hey Ludo,
> The ‘mumi worker’ process also seems to be doing a lot of I/O and using
> quite a bit of CPU (presumably allocating a lot). Its ‘perf’ profile is
> like this:
>
> 2.62% .mumi-real libguile-3.0.so.1.1.1 [.] scm_ilength
> 1.20% .mumi-real libguile-3.0.so.1.1.1 [.] scm_getc
> 1.12% .mumi-real libgc.so.1.4.3 [.]
> GC_add_to_black_list_normal.part.4
> 1.06% .mumi-real libgc.so.1.4.3 [.] GC_build_fl
> 0.98% .mumi-real libgc.so.1.4.3 [.] GC_reclaim_clear
> 0.97% .mumi-real libguile-3.0.so.1.1.1 [.] get_callee_vcode
> 0.91% .mumi-real libgc.so.1.4.3 [.] GC_generic_malloc_many
> 0.90% .mumi-real libguile-3.0.so.1.1.1 [.] peek_byte_or_eof
> 0.78% .mumi-real [JIT] tid 62601 [.] 0x00007f886964804d
>
> Food for thought…
Yep, interesting findings, thanks for sharing. I just discovered that
the Nginx server is logging in /var/log/nginx/error.log the errors we
are interested in:
--8<---------------cut here---------------start------------->8---
2021/09/17 14:13:37 [error] 129925#0: *3727055 upstream timed out (110:
Connection timed out) while reading response header from upstream, client: XXX,
server: ci.guix.gnu.org, request: "GET /3m1j4ddx11
prp6azw3rjdhljg5vchf1s.narinfo HTTP/1.1", upstream:
"http://127.0.0.1:3000/3m1j4ddx11prp6azw3rjdhljg5vchf1s.narinfo", host:
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3727020 upstream timed out (110:
Connection timed out) while reading response header from upstream, client: XXX,
server: ci.guix.gnu.org, request: "GET /pdhr7fgql3
qm1x24yxkj4c2f6s7vffi5.narinfo HTTP/1.1", upstream:
"http://127.0.0.1:3000/pdhr7fgql3qm1x24yxkj4c2f6s7vffi5.narinfo", host:
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3726717 upstream timed out (110:
Connection timed out) while reading response header from upstream, client: XXX,
server: ci.guix.gnu.org, request: "GET /19gv6sq
qcmn8q020bpm9jyl6n9bjpg05.narinfo HTTP/1.1", upstream:
"http://127.0.0.1:3000/19gv6sqqcmn8q020bpm9jyl6n9bjpg05.narinfo", host:
"ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3725361 upstream timed out (110:
Connection timed out) while reading response header from upstream, client:
141.80.167.169, server: ci.guix.gnu.org, request: "GET /y8dp69gk
y4dn5vnn7wnx04j20q0yrdja.narinfo HTTP/1.1", upstream:
"http://127.0.0.1:3000/y8dp69gky4dn5vnn7wnx04j20q0yrdja.narinfo", host:
"141.80.167.131"
--8<---------------cut here---------------end--------------->8---
Despite the following diff, extending the timeout delays to 10 seconds,
we still have a lot of those errors. Some happening on the build farm
(141.80.167.X), some directly on user machines.
--8<---------------cut here---------------start------------->8---
diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 44ff28e..7a085e5 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -97,9 +97,9 @@
;; Do not tolerate slowness of hydra.gnu.org when fetching
;; narinfos: better return 504 quickly than wait forever.
- "proxy_connect_timeout 2s;"
- "proxy_read_timeout 2s;"
- "proxy_send_timeout 2s;"
+ "proxy_connect_timeout 10s;"
+ "proxy_read_timeout 10s;"
+ "proxy_send_timeout 10s;"
--8<---------------cut here---------------end--------------->8---
I'll see if I can reproduce the I/O pressure on demand, to be able to
have more accurate strace/perf investigations.
Thanks,
Mathieu