Re: aarch64-gnu (and Happy New Year!)

From: Guy-Fleury Iteriteka
Subject: Re: aarch64-gnu (and Happy New Year!)
Date: Sun, 31 Dec 2023 23:38:25 +0200

On December 31, 2023 9:53:26 PM GMT+02:00, Sergey Bugaev <bugaevc@gmail.com> 
>Hello, and happy holidays!
>Every now and then, I hear someone mention potential ports of gnumach
>to new architectures. I think I have heard RISC-V and (64-bit?) ARM
>mentioned somewhere recently as potential new port targets. Being
>involved in the x86_64 port last spring was a really fun and
>interesting experience, and I learned a lot; so I, for one, have
>always thought doing more ports would be a great idea, and that I
>would be glad to be a part of such an effort again.
>Among the architectures, AArch64 and RISC-V indeed seem most
>attractive (not that I know much about either). Among those two,
>RISC-V is certainly newer and more exciting, but Aarch64 is certainly
>more widespread and established. (Wouldn't it be super cool if we
>could run GNU/Hurd everywhere from tiny ARM boards, to Raspberry Pi's,
>to common smartphones, to, now, ARM-based laptops desktops?) Also I
>have had some experience with ARM in the past, so I knew a tiny bit of
>ARM assembly.
>So I thought, what would it take to port the Hurd to AArch64, a
>completely non-x86 architecture, one that I knew very little about?
>There is no AArch64 gnumach (that I know of) yet, but I could try to
>hack on glibc even without one, I'd only need some headers, right?
>There's also no compiler toolchain, but those patches to add the
>x86_64-gnu target looked pretty understandable, so — how hard could it
>Well, I did more than think about it :)
>I read up on AArch64 registers / assembly / architecture / calling
>convention, added the aarch64-gnu target to binutils and GCC, added
>basic versions of mach/aarch64/ headers to gnumach (but no actual
>code), and made a mostly complete port of glibc. I haven't spent much
>effort on Hurd proper, but I have tried running the build, and the
>core Hurd servers (ext2fs, proc, exec, auth) do get built.
>I will be posting the patches soon. For now, here's just a little teaser:
>glibc/build $ file libc.so elf/ld.so
>libc.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
>(GNU/Linux), dynamically linked, interpreter /lib/ld-aarch64.so.1, for
>GNU/Hurd 0.0.0, with debug_info, not stripped
>elf/ld.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
>(SYSV), dynamically linked, with debug_info, not stripped
>hurd/build $ file ext2fs/ext2fs.static proc/proc
>ext2fs/ext2fs.static: ELF 64-bit LSB executable, ARM aarch64, version
>1 (GNU/Linux), statically linked, for GNU/Hurd 0.0.0, with debug_info,
>not stripped
>proc/proc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV),
>dynamically linked, interpreter /lib/ld-aarch64.so.1, for GNU/Hurd
>0.0.0, with debug_info, not stripped
>glibc/build $ aarch64-gnu-objdump --disassemble=__mig_get_reply_port libc.so
>libc.so:     file format elf64-littleaarch64
>Disassembly of section .plt:
>Disassembly of section .text:
>000000000002b8e0 <__mig_get_reply_port>:
>   2b8e0: a9be7bfd stp x29, x30, [sp, #-32]!
>   2b8e4: 910003fd mov x29, sp
>   2b8e8: f9000bf3 str x19, [sp, #16]
>   2b8ec: d53bd053 mrs x19, tpidr_el0
>   2b8f0: b85f8260 ldur w0, [x19, #-8]
>   2b8f4: 34000080 cbz w0, 2b904 <__mig_get_reply_port+0x24>
>   2b8f8: f9400bf3 ldr x19, [sp, #16]
>   2b8fc: a8c27bfd ldp x29, x30, [sp], #32
>   2b900: d65f03c0 ret
>   2b904: 97fffbef bl 2a8c0 <__mach_reply_port>
>   2b908: b81f8260 stur w0, [x19, #-8]
>   2b90c: f9400bf3 ldr x19, [sp, #16]
>   2b910: a8c27bfd ldp x29, x30, [sp], #32
>   2b914: d65f03c0 ret
>So it compiles and links, but does it work? — well, we can't know
>that, not until someone ports gnumach, right?
>Well actually we can :) I've done the same thing as last time, when
>working on the x86_64 port: run a statically linked hello world
>executable on Linux, under GDB, carefully skipping over and emulating
>syscalls and RPCs. This did uncover a number of bugs, both in my port
>of glibc and in how the toolchain was set up (the first issue was that
>static-init.S was not even getting linked in, the second issue was
>that static-init.S was crashing even prior to the _hurd_stack_setup
>call, and so on). But, I fixed all of those, and got the test
>executable working! — as in, successfully running all the glibc
>initialization (no small feat; this includes TLS setup, hwcaps /
>cpu-features, and ifuncs), reaching main (), successfully doing puts
>(), and shutting down. So it totally works, and is only missing an
>AArch64 gnumach to run on.
>The really unexpected part is how easy this actually was: it took me
>like 3 days from "ok, guess I'm doing this, let's add a new target to
>binutils and gcc" to glibc building successfully, and a couple more
>days to get hello world to work (single-stepping under GDB is just
>that time-consuming). Either I'm getting good at this..., or (perhaps
>more realistically) maybe it was just easy all along, and it was my
>inexperience with glibc internals that slowed me down the last time.
>Also, we have worked out a lot of 64-bit issues with the x86_64 port,
>so this is something I didn't have to deal with this time.
>Now to some of the more technical things:
>* The TLS implementation is basically complete and working. We're using
>  tpidr_el0 for the thread pointer (as can be seen in the listing above),
>  like GNU/Linux and unlike Windows (which uses x18, apparently) and
>  macOS (which uses tpidrro_el0). We're using "Variant I" layout, as
>  described in "ELF Handling for Thread-Local Storage", again same as
>  GNU/Linux, and unlike what we do on both x86 targets. This actually
>  ends up being simpler than what we had for x86! The other cool thing is
>  that we can do "msr tpidr_el0, x0" from userspace without any gnumach
>  involvement, so that part of the implementation is quite a bit simpler
>  too.
>* Conversely, while on x86 it is possible to perform "cpuid" and identify
>  CPU features entirely in user space, on AArch64 this requires access
>  to some EL1-only registers. On Linux and the BSDs, the kernel exposes
>  info about the CPU features via AT_HWCAP (and more recently, AT_HWCAP2)
>  auxval entries. Moreover, Linux allows userland to read some otherwise
>  EL1-only registers (notably for us, midr_el1) by catching the trap that
>  results from the EL0 code trying to do that, and emulating its effect.
>  Also, Linux exposes midr_el1 and revidr_el1 values through procfs.
>  The Hurd does not use auxval, nor is gnumach involved in execve anyway.
>  So I thought the natural way to expose this info would be with an RPC,
>  and so in mach_aarch64.defs I have an aarch64_get_hwcaps routine that
>  returns the two hwcaps values (using the same bits as AT_HWCAP{,2}) and
>  the values of midr_el1/revidr_el1. This is hooked to init_cpu_features
>  in glibc, and used to initialize GLRO(dl_hwcap) / GLRO(dl_hwcap2) and
>  eventually to pick the appropriate ifunc implementations.
>* The page size (or rather, paging granularity) is notoriously not
>  necessarily 4096 on ARM, and the best practice is for userland not to
>  assume any specific page size and always query it dynamically. GNU Mach
>  will (probably) have to be built support for some specific page size,
>  but I've cleaned up a few places in glibc where things were relying on
>  a statically defined page size.
>* There are a number of hardware hardening features available on AArch64
>  (PAC, BTI, MTE — why do people keep adding more and more workarounds,
>  including hardware ones, instead of rewriting software in a properly
>  memory-safe language...). Those are not really supported right now; all
>  of them would require some support form gnumach side; we'll probably
>  need new protection flags (VM_PROT_BTI, VM_PROT_MTE), for one thing.
>  We would need to come up with a design for how we want these to work
>  Hurd-wide. For example I imagine it's the userland that will be
>  generating PAC keys (and settings them for a newly exec'ed task), since
>  gnumach does not contain the functionality to generate random values
>  (nor should it); but this leaves open question of what should happen to
>  early bootstrap tasks and whether they can start using PAC after
>  initial startup.
>* Unlike on x86, I believe it is not possible to fully restore execution
>  context (the values of all registers, including pc and cpsr) purely in
>  userland; one of the reasons for that being that we can apparently no
>  longer do a load from memory straight into pc, like it was possible in
>  previous ARM revisions. So the way sigreturn () works on Linux is of
>  course they have it as a syscall that takes a struct sigcontext, and
>  writes it over the saved thread state. Sounds familiar to you? — of
>  course, that's almost exactly like thread_set_state () in Mach-speak.
>  The difference being that thread_set_state () explicitly disallows you
>  to set the calling thread's state, which makes it impossible to use for
>  implementing sigreturn (). So I'm thinking we should lift that
>  restriction; there's no reason why thread_set_state () cannot be made
>  to work on the calling thread; it only requires some careful coding to
>  make sure the return register (%eax/%rax/x0) is *not* rewritten with
>  mach_msg_trap's return code, unlike normally.
>  But other than that, I do have AArch64 versions of trampoline.c and
>  intr-msg.h (complete with SYSCALL_EXAMINE & MSG_EXAMINE). Whether they
>  work, we'll only learn once we have enough of the Hurd running to have
>  the proc server.
>Anyways, enjoy! As said, I will be posting the patches some time soon.
>I of course don't expect to get any reviews during the holidays. And —
>any volunteers for a gnumach port? :)
Not me :)
>P.S. Believe it or not, this is not the announcement that I was going
>to make at Joshua's Christmas party; I only started hacking on this
>later, after that email exchange. That other thing is still to be
>announced :)
I'm impatient to hear that:)

Hello Sergey, happy new year! 

I don't know how you can achieve something like that so quickly, but it's 
always a pleasure to hear something new from you.

