guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#74609] [PATCH] Adding a fully-bootstrapped mono


From: unmush
Subject: [bug#74609] [PATCH] Adding a fully-bootstrapped mono
Date: Fri, 29 Nov 2024 15:05:09 +0000

We used to have a mono package, but it was removed due to includingbootstrap 
binaries (among others).  This patch series introduces a full,
17-mono-package sequence that takes us from a mono-1.2.6 built fully
from source to mono-6.12.0 built fully from source, using only packages
that already have full bootstrap paths.  I make no promise that this is
the shortest or most optimal path, but it exists and I have verified it
works.

As I've spent what is probably an unreasonable amount of time working
toward this, I thought I'd share some of my thoughts, experiences, and
commentary.  Sorry in advance if it gets a bit rambly or lecture-ish.

* Prologue

I started down this road because someone I'm working on a project with
decided to depend on a C# package that requires C# 12.0 features, and my
personal mono package based on the tarball releases (which include
bootstrap binaries) only went up to C# 7.0.  This meant that the C#
package in question de facto required strictly Microsoft's (er, I mean,
"the .NET foundation"'s) .NET implementation - hereafter referred to as
"dotnet" - and a very recent version no less.  The bootstrapping story
with dotnet is very bad:
https://github.com/dotnet/source-build/issues/1930.  Even beginning to
untangle it would probably require a relatively modern C# compiler, and
something that at least sort of understands msbuild.  And there's not
much point to "bootstrapping" dotnet from something that isn't
bootstrapped itself.  So I figured I may as well start with mono.

* History

While mono is today probably the most well-known alternative to
Microsoft's .NET offerings, it is not the only one.  Indeed, in the
early 2000s there were at least 2 competing free software
implementations: Mono, and DotGNU's Portable.NET (abbreviated pnet).
They differed in goals, licenses, and methods: Portable.NET was a GNU
project concerned with, among other things, limiting the ability of
Microsoft to impose vendor lock-in via its proprietary .NET
implementation and software patents.  As a GNU project, it used the GPL
for its runtime and compiler, and the GPL with a linking exception for
its standard library, pnetlib.  Mono, on the other hand, used a mix of
many copyleft and permissive licenses: X11 for the standard library, GPL
for the compiler (later dual-licensed to add an X11 option), and LGPL
for the runtime, with GPL and LGPL code also offered "under commercial
terms for when the GPL and the LGPL are not suitable".  In 2016 after
its acquisition by Microsoft, the runtime was relicensed to use the
Expat (MIT) license.

But perhaps most importantly to us, while Mono opted to write its C#
compiler, mcs, in... C#, Portable.NET's runtime and C# compiler were
both written in C.  Portable.NET along with the entire DotGNU project
was decommissioned in 2012, but the source is still available, and it
still works fine (with a few modifications for compatibility with newer
versions of its dependencies).  In https://issues.guix.gnu.org/57625
Adam Faiz submitted patches to package pnet and pnetlib, along with one
of their dependencies named treecc.  These packages were based on the
last release of Portable.NET, version 0.8.0, released in 2007.  I
initially used these packages as the basis for my bootstrap efforts, and
even managed to get mono-1.2.6 built using them, but later discovered
that using a more recent version from git made it much easier.  For
example, while pnet-0.8.0 can do pointer arithmetic inside unsafe code
blocks, it doesn't support the += or -= operators specifically, which
requires lots of patching (after all, who would use x = x + y when you
could do x+= y?).  There are many other similar improvements in the git
version, so for this patch series I've decided to go with pnet-git.

* The start

After building mono-1.2.6, I tried a few later versions, and the third
or fourth one would always fail with errors about missing methods.  It
turns out that the reason for this is that, contrary to what their
marketing suggests, C# and Java are not "write once, run everywhere".
This is because their compilers rely on the details of the libraries
that the program will be run with at compile-time.  This is used, for
example, to do overload resolution.  Suppose, for example, that a
certain implementation of the "==" operator is present in version 1.0 of
a library, and then in version 2.0 of a library a more specific
implementation is introduced.  Now code that is compiled against version
2.0 may instead automatically reference the more-specific
implementation, as is in accordance with the rules of C#.  But when it
is run with version 1.0, it will fail because that implementation
doesn't exist.  In my case, for some reason the initial mcs and core
libraries being built to compile the rest of mono were being compiled
against a 2.0 library and then run with a 1.0 library.  It turns out
that this was because mcs uses mono's code for producing assemblies
(.NET dlls and exes), and mono decides which version to put in an
assembly it writes based on "which runtime version" is being used, and
that version is decided at startup based on... the version that was put
in the assembly it is running.  So for example, mono-1.9.1 would produce
2.0 assemblies because mono-1.2.6 produced 2.0 assemblies because pnet
produced 2.0 assemblies.  So I modified mono's runtime in mono-1.9.1 to
allow for this version to be overridden via environment variable, and
set it to "v1.1.4322", and things went a lot more smoothly after that.

>From there on it was mostly the usual trial-and-error process of
identifying where things had bitrotted.  I made sure to unvendor libgc
wherever possible, though eventually by mono-4.9.0 they explicitly
dropped support in their configure script for using any libgc other than
what was bundled, so at that point I switched to using their homebrewed
sgen garbage collector.

* A concerning development

Once I got to mono-2.11.4, though, things took a turn for the
interesting: mono started using git submodules, and the (recursive? #t)
clones were all failing.  It turns out that this is because their
submodules reference github.com using the git:// protocol.

This is notable for a few reasons.

First, github dropped support for the git:// protocol in 2021, so
recursive clones won't work now.  This means I have to explicitly list
out every submodule, its commit, and its sha256 hash, for every mono
version until they switched to using http or https.  mono-2.11.4 has
only 4 submodules, but that doesn't last for long: by mono-4.9.0 it has
14 submodules.  A significant portion of these patches is just listing
these submodules and their hashes.  It's a bit annoying.

The more concerning reason, though, is *why* github dropped support for
the git:// protocol: it is unencrypted and unauthenticated.  This is
mitigated somewhat by the use of sha-1 hashes to identify commits in the
referenced submodules, putting a significant computational burden on
anyone who would try to alter what was fetched corresponding to a given
submodule.  Significantly more risky, though, is the process of
*updating* submodules that use git:// URLs.  It is quite unlikely that a
developer is going to independently clone one of the submodules over
https, navigate to a desirable commit, copy the sha-1 hash, and manually
update the submodule reference's commit.  They're far more likely to run
'cd submodule; git pull; cd ..; git add submodule; git commit ...' or an
equivalent.

Of course, any changes a network man-in-the-middle might try to make
here would still be reflected in the commit history, so even if a
developer did that, they or any of their fellow committers could spot
anything strange or malicious and point it out.  Also, the changes
couldn't be propagated to others trying to pull them who weren't on a
path containing the MITM because the potentially-malicious commit
wouldn't be present in the real submodule's repository.  So the
transparency of git clearly showing changes to text files, combined with
the fact that surely no git hosting platform would just allow arbitrary
entities to make whatever commits they want accessible under any
arbitrary repository URL, rather mitigate this security issue.

This usage of git:// URLs lasted all the way until September 28, 2021,
when github's removal of support for it forced the developers to change
them to https.

* Meanwhile, in reality

On November 28, 2016, mono added a submodule named roslyn-binaries.
Unsurprisingly, it included binary blobs for Microsoft's Roslyn compiler
(which I believe had been open-sourced shortly prior).  From here on,
mono's build system would default to using these binaries for building
on little-endian systems (though another compiler could be specified
with the --with-csc configure flag).  I happen to know that it is
extremely unlikely that many mono developers used this configure flag.
I know this because the 5.0 series is an absolute pain in the neck to
build from source, because they consistently depend on new C# features
*before* they implement them.

To go on a brief tangent: does anyone remember back when youtube-dl was
temporarily taken down from github due to the RIAA's DMCA request?  Many
were unhappy about that.  One such unhappy person made news when they
made the full contents of youtube-dl's repository available to access
through the DMCA request repository:
https://gist.github.com/lrvick/02088ee5466ca51116bdaf1e709ddd7c.  It
turns out that there are many actions that one can take on github that
will make arbitrary commits available under arbitrary repository URLs.

So, in reality, for the span of time from November 28, 2016 to
September 28, 2021, anybody sitting on the network path between github
and any mono developer updating the roslyn-binaries submodule could
decide on any arbitrary new commit to be used.  Of course, merely
inspecting the diff for the commit will reveal nothing of use, because
the contents are binary blobs.  And not only are these blobs those of a
compiler, they are the blobs of a compiler that is sure to be used to
compile another compiler, which will then be redistributed as an opaque,
non-bootstrappable binary blob to be used for compiling other compilers.

You would be hard-pressed to find a more fertile breeding ground for Ken
Thompson / Trusting Trust attacks.  If every agent of the NSA (and
whatever other agencies, including those of other countries, had access
to the appropriate network traffic) somehow failed to capitalize on 6
years of opportunity to compromise an entire software ecosystem using
only a basic MITM of unencrypted traffic, they deserve to be sacked.
Whether such an attack actually occurred or not, this is a case study in
carelessness and why bootstrappability is so important; discovering all
this made me quite worried about having used a mono version built from
blobs previously, and has convinced me that, as time-wasting and tedious
as this project has been, it is nevertheless probably an important one.

* Another note on roslyn-binaries

If you're going to write a self-hosting compiler, the least you can do
is keep it self-hosting.  Deciding to write a self-hosting compiler is a
valid choice, of course, with its own merits and demerits, but there is
something bitterly poetic about mono starting out requiring specifically
Microsoft's C# compiler in order to build (mono did its initial
bootstrapping using Microsoft's proprietary csc), achieving independence
through self-hosting, being acquired by Microsoft, and thereafter coming
crawling back to Microsoft's C# compiler once more before eventually
dying.

The funny thing is that it's not even necessary.  The dependencies on
new C# features are all in mono's standard library (which increasingly
borrowed code from Microsoft's corefx library), not in mono's compiler.

* More binary submodules?

Even before roslyn-binaries, there was binary-reference-assemblies,
which contained prebuilt "reference" blobs for the various versions of
the standard libraries.  These exist, I assume, precisely because of the
library incompatibility problems regarding overloading that I mentioned
earlier.  While later versions of mono included sources and a build
system for producing these reference binaries, mono-4.9.0 and earlier
did not.  Mono's build system still demanded /something/ to install,
though, so I told it to use the real standard library of the input mono
version.  When I did get to a mono version that at least claimed to
support regenerating the reference binaries, I found that it didn't work
with mcs due to differences in which libraries had to be referenced, so
I had to patch it to add a bunch of references determined through trial
and error.

The xunit-binaries submodule was also added sometime before mono-5.1.0.
This dependency makes it impossible to run the full test suite without
binary blobs.  Presumably for this reason, Debian elects to only run
tests within the mono/mini/ and mono/tests/ subdirectories.  For my
part, I've disabled all tests except for those of mono-6.12.0, the final
version, limited to the two aforementioned subdirectories.  This is
because it would take extra time for the builds, because several of the
tests depend on binary blobs bundled into the mono repository itself
(which my thorough cleaning of all dlls and exes from the sources
removes), because a large chunk of the tests depend on binary blobs in
xunit-binaries in later versions, and because "expect some test
failures" is part of the mono documentation and I don't have the
time to figure out for the mono developers every reason why each of 17
versions of their test suite is broken.

* The long march through the 5.0s

The 5.0 series was when Microsoft acquired Mono, and it shows.  You'll
notice I needed to introduce "pre-" packages for various versions
because in several cases a tagged release could not build the following
tagged release.  For that matter, they couldn't build the pre- package
either, but it at least took fewer patches to get them working.  The
reason for this is that Mono added a dependency on Microsoft's corefx
library source code, and it usually started using C# features well
before mcs was able to compile them.  Because of this, despite taking 8
versions to get from 1.2.6 to 4.9.0, it took another 8 versions to get
through the 5.0 series, and 5 of them required nontrivial patching to
massage the source into a form compilable by mcs.

* The final stretch

Eventually I realized that the dependencies on new features were all
coming from corefx, not from mono's compiler.  Consequently, the only
reason for this particular bootstrap-hostile ordering of builds is that
it happened to be the order the mono devs committed things.  So I just
cherry-picked every commit I could find touching mcs/mcs (magit was
quite useful for this) and applied it to 5.10.0 to produce what is
essentially the 6.12.0 compiler, then used it to jump straight to
building 6.12.0.

Use of this technique earlier on in the bootstrap process may be
of interest to anyone looking to shorten the chain of packages.

* The finishing touches

My initial goal was to package dotnet, and I had tried to progress
toward that from mono-4.9.0 for a period, but with no success.  During
that time, though, I did encounter a bug in mono's xbuild condition
parser, which I wrote a patch for, and included in mono-6.12.0.

I also discovered that xbuild would wrongly complain about missing
references even when the proper assemblies were in MONO_PATH or
MONO_GAC_PREFIX, because xbuild would erroneously only consider the path
/gnu/store/...mono-6.12.0/lib/mono/gac when looking for global assembly
caches, completely ignoring MONO_GAC_PREFIX.  So I wrote a patch to fix
that, and included it in mono-6.12.0.

Having witnessed how much nicer it is to package things that use rpath /
runpath than things that use environment variables (like python) and
therefore require constant wrapping of executables and use of
propagated-inputs, I devised a patch that would extend mono's
per-assembly config files to support a <runpath> element.  For example,
if you have a file /tmp/dir2/test2.exe, and there is also a file
/tmp/dir2/test2.exe.config, and its contents are

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runpath path="/tmp/dir1"/>
</configuration>

and it references test1.dll, it will first look for it at
/tmp/dir1/test1.dll.  Note that, of course, test1.dll still needs to be
accessible to the compiler at compile-time through MONO_PATH or an
explicitly-specified path passed on the mcs command line.

It is my hope that this feature will be of use to anybody interested in
developing a build system.

* Future work

Mono had several difficult points in bootstrapping and packaging, but at
the end of the day it still met the basic description of a software
package: well-defined environment-supplied inputs and sources, a
user-supplied install prefix, and files installed under that prefix.

The dotnet world is an entirely different beast.  The first step of most
build systems I have encountered from that realm is downloading an
entire toolchain, among other dependencies, as a binary blob.  They
heavily depend on the exact packages they specify being available
exactly where they say to install them.  There is no "install", there
are no "install directories" to my knowledge.  A build that doesn't
contact nuget.org is an abberation.  I am at a loss how to build these
things, much less package them.  I badly need help.

* Closing thoughts

"You wish now that our places had been exchanged.  That I had died, and
DotGNU had lived?"

"... Yes.  I wish that."

Maintenance of Mono was recently transferred over to WineHQ.  With that
announcement this statement was placed at https://www.mono-project.com:

"We want to recognize that the Mono Project was the first .NET
implementation on Android, iOS, Linux, and other operating systems. The
Mono Project was a trailblazer for the .NET platform across many
operating systems. It helped make cross-platform .NET a reality and
enabled .NET in many new places and we appreciate the work of those who
came before us."

I would like to clarify that, according to Miguel de Icaza himself
(https://www.mono-project.com/archived/mailpostearlystory/), DotGNU
"started working on the system about the same time".  According to
https://lwn.net/2002/0103/a/dotgnu.php3 Portable.NET began "in January
2001".  While it's unclear exactly when Portable.NET reached various
milestones, and the significance of the various milestones varies
somewhat (for example, mono probably does not care that Portable.NET
also includes a Java and C compiler), I think that there is cause to
dispute the claim that Mono was "the first" .NET implementation on
Linux.

On a related note, if we haven't looked at the possibility of using
Portable.NET in the Java bootstrap process, it may be worth visiting at
some point.

Thank you for your time, I think I need to get some rest now.

- unmush

Attachment: 0001-gnu-add-treecc.patch
Description: Text Data

Attachment: 0005-gnu-Add-mono-1.9.1.patch
Description: Text Data

Attachment: 0002-gnu-Add-pnet-git.patch
Description: Text Data

Attachment: 0003-gnu-Add-pnetlib-git.patch
Description: Text Data

Attachment: 0004-gnu-Add-mono-1.2.6.patch
Description: Text Data

Attachment: 0009-gnu-Add-mono-3.0.patch
Description: Text Data

Attachment: 0013-gnu-Add-mono-5.1.0.patch
Description: Text Data

Attachment: 0007-gnu-Add-mono-2.6.4.patch
Description: Text Data

Attachment: 0011-gnu-Add-mono-4.9.0.patch
Description: Text Data

Attachment: 0008-gnu-Add-mono-2.11.4.patch
Description: Text Data

Attachment: 0010-gnu-Add-mono-3.12.1.patch
Description: Text Data

Attachment: 0006-gnu-Add-mono-2.4.2.patch
Description: Text Data

Attachment: 0014-gnu-Add-mono-5.2.0.patch
Description: Text Data

Attachment: 0012-gnu-Add-mono-5.0.1.patch
Description: Text Data

Attachment: 0017-gnu-Add-mono-5.8.0.patch
Description: Text Data

Attachment: 0015-gnu-Add-mono-5.4.0.patch
Description: Text Data

Attachment: 0016-gnu-Add-mono-pre-5.8.0.patch
Description: Text Data

Attachment: 0018-gnu-Add-mono-pre-5.10.0.patch
Description: Text Data

Attachment: 0020-gnu-Add-libgdiplus.patch
Description: Text Data

Attachment: 0021-gnu-Add-mono-6.12.0.patch
Description: Text Data

Attachment: 0019-gnu-Add-mono-5.10.0.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]