guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How I got stuck (and ways to resolve)


From: Gábor Boskovits
Subject: Re: How I got stuck (and ways to resolve)
Date: Sun, 23 Dec 2018 23:24:50 +0100

Hello,

Björn Höfling <address@hidden> ezt írta (időpont:
2018. dec. 20., Cs, 16:03):
>
> Hi Guix,
>
> the task for mentors in this Outreachy week #3 is to give an
> example of how one got stuck in a problem, and show ways to go on and
> finally succeed with your problem.
>

Thanks for taking the initiative here!

> So, I write a bit about my problems and encourage everyone else to tell
> their story of small or big stucks.
>
> Our intern in this round is doing video documentation, so her tasks are
> a bit different than "normal" contributions, anyway I will report about
> packaging problems as this is what I mostly did for Guix and where I
> really got stuck and I think everyone of us gets stuck quickly with
> this.
>

Yes, you are most probably right I also often get stuck in packaging.
Most of the time I have the feeling I know about nothing when I
encounter some software that is radically new to me. Go packages
were on such example.

> When packaging a new software for Guix, I start with documenting the
> process: I create a new MarkDown file (I will be very sloppy with the
> syntax), write the date and package I want to pack and start. Mostly,
> this is "write-only": I just write down my activity, my problems and
> the solution. I rarely read that again. But it helps me structure the
> task and continue on when I put it aside for a while.
>

I have something similar to that, I usually have an org file for that,
so that I can organize my tasks as a todo list. I also have a template
where I have items that need to be done every time, like:
- check for bundled software
- check for reproducibility problems using --rounds=2 or similar
- run guix lint
, just to name a few.

> In general, my problem is that I know nothing (or: always too little)
> about makefiles, about C and C++, about CMake, Python, Qt, Ruby, and
> all the rest: I'm a Java expert and I know nothing about all these
> "strange" languages and their "brainfucked" error messages.
>

I am quite into C++, I like it very much, but some error messages can be
real cryptic. When some template related stuff surfaces from deep inside
a template metaprogramming library, that is really bad.

I am packaging OpenJDK11 now, and it took me for a while (at least a few hours)
to realize why I could not unbundle libpng. I had to look at the code
of the build system,
find the check, snd then realized that it uses pkg-config, which I failed
to provide as input. After that everything worked fine.

> So, one of the first "solutions" is to keep calm and read the error
> message. Read it again. Try to parse it. What is the problem? From which
> program/compiler/tool did it came? From which dependency? Is really
> THIS the problem, or is it caused by something ELSE somewhere above? In
> which build phase did it occur (configure, compile, test, ...)?
>
> Let's dig into one from opencv, don't look too much into the details
> of the error message:
>
> ---8<---unaltered-citation::start--8<-------------------------------
>
> [...]
>
> Now it compiles.
>
> But then I stumble over a nasty build error:
>
> ```
> cd /tmp/guix-build-opencv-3.4.0.drv-0/build/modules/video && 
> /gnu/store/5sv5zy2k
> gg6iaqyv8zw49w4243j0xkd0-gcc-5.4.0/bin/c++   -DCVAPI_EXPORTS 
> -D_USE_MATH_DEFINES
>  -D__OPENCV_BUILD=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
> -D__STDC_LIM
> IT_MACROS -I/tmp/guix-build-opencv-3.4.0.drv-0/build 
> -I/tmp/guix-build-opencv-3.
> 4.0.drv-0/opencv-3.4.0/modules/video/include 
> -I/tmp/guix-build-opencv-3.4.0.drv-
> 0/opencv-3.4.0/modules/video/src 
> -I/tmp/guix-build-opencv-3.4.0.drv-0/build/modu
> les/video 
> -I/tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/core/include
>  -I/tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/imgproc/include  
> -fsi
> gned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor 
> -Werror=address
> -Werror=sequence-point -Wformat -Werror=format-security 
> -Wmissing-declarations -
> Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized 
> -Winit-
> self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment 
> -fdiagnostics-show
> -option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections 
> -fdata-
> sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden 
> -O
> 2 -g -DNDEBUG -fPIC    -Winvalid-pch  -include 
> "/tmp/guix-build-opencv-3.4.0.drv
> -0/build/modules/video/precomp.hpp" -o 
> CMakeFiles/opencv_video.dir/src/ecc.cpp.o
>  -c /tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/video/src/ecc.cpp
> In file included from 
> /tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/imgcodecs/src/grfmt_exr.hpp:52:0,
>                  from 
> /tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/imgcodecs/src/grfmts.hpp:53,
>                  from 
> /tmp/guix-build-opencv-3.4.0.drv-0/opencv-3.4.0/modules/imgcodecs/src/loadsave.cpp:47:
> /gnu/store/kikj95f44ygrp3fapd1yybykxl167i0l-openexr-2.2.1/include/OpenEXR/ImfChromaticities.h:46:22:
>  fatal error: ImathVec.h: No such file or directory
> compilation terminated.
> make[2]: *** 
> [modules/imgcodecs/CMakeFiles/opencv_imgcodecs.dir/build.make:66: 
> modules/imgcodecs/CMakeFiles/opencv_imgcodecs.dir/src/loadsave.cpp.o] Error 1
> make[2]: Leaving directory '/tmp/guix-build-opencv-3.4.0.drv-0/build'
> make[1]: *** [CMakeFiles/Makefile2:4057: 
> modules/imgcodecs/CMakeFiles/opencv_imgcodecs.dir/all] Error 2
> make[1]: *** Waiting for unfinished jobs....
>
> ```
>
> Bug reports maybe related:
>
> * https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865862
> * https://github.com/ampas/CTL/issues/15
>
> But somehow not.
>
> While I was executing `find /gnu/store -name ImathVec.h`, I had that idea: 
> Look again at the
> Guix definition of *openexv*. And yes: There is also that package `ilmbase`.
> And here we go:
>
> ```
> ./6knqzzds4fp6y4qrzyg2ppx6qmiil7jr-ilmbase-2.2.1/include/OpenEXR/ImathVec.h
> ```
>
> So, let's add that to the inputs, too!
>
> Failed again. Would habe been too nice to work.
>

I am also often stuggling like this, rebuilding a package from the
beginning to get some little
modification tested and failing it again. We discussed before the R-B
Summit, that it would be nice to
have a kind of "package development environment", where we could
snapshot build states after phases,
modify the definition, re-run from a given phase, stuff like that.
That could significantly reduce packaging
time.

> I found something in graphics.scm: blender has also a workaround for it:
>
> ```
>        #:phases
>        (modify-phases %standard-phases
>          (add-after 'set-paths 'add-ilmbase-include-path
>            (lambda* (#:key inputs #:allow-other-keys)
>              ;; OpenEXR propagates ilmbase, but its include files do not 
> appear
>              ;; in the CPATH, so we need to add "$ilmbase/include/OpenEXR/" to
>              ;; the CPATH to satisfy the dependency on "ImathVec.h".
>              (setenv "CPATH"
>                      (string-append (assoc-ref inputs "ilmbase")
>                                     "/include/OpenEXR"
>                                     ":" (or (getenv "CPATH") "")))
>              #t)))))
> ```
>
>
> Yes, that's it.
>
> Going to the next error:
>
> ---8<---unaltered-citation::end--8<-------------------------------
>
> * This was during the compile phase
> * You see this "Error 2", which is not the error.
> * The error is above "Error 1", but if you compile in parallel, there
> could be some garbage in between. Be aware to always search up for
> "Error 1".
> * Some header file could not be found.
> * I didn't know what to do.
> * So I googled for it.
> * I found some reports which where not too helpful
> * Out of desparation, I brute-force searched the store for it.
> * I somehow had the idea to look closer at/around package "openexv" and
> found "ilmbase" as the root package.
> * With that hint, I just added that dependency too.
> * OK, it would have been too nice if it worked directly...
> * I searched around and could find another package using it that
> prepared a CPATH. Copy&paste solved that problem finally.
> * Surprise, the very next problem is directly coming!
>
>
> What is always annoying are failing tests. What's even more annoying is
> when you notice that they fail on upstream too and nobody cares.
>

Another very annoying type of test failure is when a testsuite fails
indeterministically,
you sometimes literally have to run the build with --round=100, or so
to reproduce, and
you have to keep-failed, to have a look at the test log. :(

> Here is one example out of my log:
>
> ---8<---unaltered-citation::start--8<-------------------------------
>
>
> I found out that opencv is using Google Test framework.
> And you can disable individual tests by adding "DISABLED_" in front
> of the test name
>
> https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md
>
> chapter: Temporarily Disabling Tests
> ```
> For example, the following tests won't be run by Google Test, even though 
> they will still be compiled:
>
>
> // Tests that Foo does Abc.
> TEST(FooTest, DISABLED_DoesAbc) { ... }
> ```
>
> ---8<---unaltered-citation::end--8<-------------------------------
>
> So, here OpenCV is using a specific test framework and I first had to
> find out how to disable tests in it.
>
> In the final package definition, this introduced a new package phase
> that disables the tests with a substitute* construct to add the
> "DISABLED_" macro in the C file.
>
>
>
>
> Some general ways to resolve:
>
> * Look exactly at the error message.
> * Google it. Is it fixed somewhere else?

It might also help, if it is not fixed, but have an upstream issue. Sometimes
we are in better position to reproduce the error, so an issue with
more information
needed can be resolved faster. Sometimes the problem is obvious, just lacks a
report. Feel free to file a report, if you can provide a patch, that
is even better.
Last time I did this with ovmf, the issue being that new gcc added a warning
about identation style, and warnings where treated as errors. I also created lot
od reproducibilit related small patches to fix tools not respecting
SOURCE_DATE_EPOCH.

> * Look at packages that use the dependency that causes the problem. Do
> they have a solution?
> * Is there a newer version of your software?
> * Is there documentation about the dependencies?
> * Maybe you need a SPECIFIC version of one library to build your
> package? Try to up/downgrade that dependency.
> * Write down your problem. Be very specific and write it in a way down
> that others can reproduce it.
> * Sometimes, just writing helps already. If not, send it out to the
> mailing list to ask for help.
>

Really nice summary!

> And finally:
>
> * Take a break.
> * Drink a tea.
> * Get back when you are fresh and concentrated.
>

All so true!

> And if all doesn't help:
>
> * Get sarcastic. You are not alone with your problems: There is at
> least Kenneth Hoste with his great talk "How To Make Package Managers
> Cry" from FOSDEM 2018:
>
> https://archive.fosdem.org/2018/schedule/event/how_to_make_package_managers_cry/
>
>
>
> Björn

Now I will get a bit into the OpenJDK packaging problems I am having
right now, so that you can have a bit insight:

Julien did a lot of work getting in OpenJDK9 and 10. This meant that I
just had to look around. I decided to write the
OpenJDK11 definition without using inherit. I did this not to have to
write more code, but to have the current version
very cleanly defined. I recommended this wokrflow earlier in the
mailing list, to have the top of the bootstrap
without inherit. This means that we can safely modify the bootstrap
not breaking the packages exported.
I also decided to document the flags, decision added in the package,
so that you don't have to wonder later
why a give configure flag was added, for example. Now I am struggling
with making it reproducible.

There are some problems with this, though:
1. this is a multi output package, and the daemon is not playing
nicely with these:
when you build --rounds=2, then only the first differing output is printed.
I currently have doc reproducilbe, jdk not reproducible, and out unknown :)
In jdk11 they managed to solve all earlier reproducibility problems
with the doc,
and they managed to introduce a new one :)
jdk is not reproducible, what is still to be done:
make ct.sym reproducible (this file contains timestamps, easiest way seems to be
to regenerate the content from scheme with a timestamp using SOURCE_DATE_EPOCH,
but have to test if this breaks something)
repack jmod-s: the .jmod files are modified zip files, apperantly a 4
byte header is prepended to them,
so simply using zip/unzip does not work on them, have to use the jmod tool.
remove the timestamps from the autogenerated files from the src.zip
archive, before reseting its timestamps.

I hope that soon we will have a reproducible OpenJDK11 package, then
to refactor the bootstrap to a separate module,
fix the reproducibility problems there, and then swith default jdk to jdk11. :)

These are thing that will last for a while :)

Best regards,
g_bor



reply via email to

[Prev in Thread] Current Thread [Next in Thread]