[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNUstep on Windows using Clang + MSVC ABI
From: |
David Chisnall |
Subject: |
Re: GNUstep on Windows using Clang + MSVC ABI |
Date: |
Tue, 9 Feb 2021 17:00:26 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 |
On 09/02/2021 16:20, Richard Frith-Macdonald wrote:
On 29 Jan 2021, at 16:28, David Chisnall <gnustep@theravensnest.org> wrote:
We have given up supporting MinGW for snmalloc and I am considering
optionally supporting snmalloc for Objective-C object allocations since it is
much faster than the system malloc on most platforms and is particularly good
for fixed-size allocations as are common in Objective-C. It would be a shame
for this to be an everywhere-except-Windows thing.
Hi David, I am giving snmalloc a try, but I'm wondering if (and why) you think
it is actually better than mimalloc?
Well, I am biased as one of the authors of snmalloc...
As far as I can tell mimalloc ticks the general performance boxes and is
admirably small (ie relatively simple code).
My interest is probably unusual in that it's primarity about maintaining a low
memeory footprint for long running processes rather than about speed, so if
snmalloc does a better job of avoiding fragmentation it would be very appealing.
I think mimalloc may do slightly better there. We've adopted a lot of
techniques from mimalloc and snmalloc was doing better than mimalloc
last time we benchmarked. The main difference between the two is that
mimalloc frees directly back into the chunks using atomic operations,
whereas snmalloc has a per-thread message queue that receives freed
allocations, making the alloc and free operations entirely thread-local.
From my incredibly biased perspective:
Snmalloc is C++ with clean abstractions and using template parameters
for a lot of the policy, which lets it both be very flexible and also
compile down to tiny amounts of code (around 10 x86 instructions on the
fast path for malloc). Mimalloc is a C codebase.
Snmalloc has very clean architecture and platform abstractions. We
support a lot of both, including things like Haiku and running inside an
SGX enclave with OpenEnclave. The mimalloc OS abstraction layer has a
lot of ifdefs interleaved with code:
https://github.com/microsoft/mimalloc/blob/master/src/os.c
The snmalloc equivalent is a platform-abstraction layer where we define
a class for each platform. Some of these are simple, for example
OpenBSD just says to use the generic POSIX paths:
https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_openbsd.h
Others are a bit more complex, with Windows providing its own codepaths:
https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_windows.h
This abstraction layer if even sufficiently flexible that we can run
inside the FreeBSD kernel with a handful of lines of code:
https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_freebsd_kernel.h
We're also able to tune aggressively for size class. If we know the
size statically at compile time, or if we know it at free time, then we
can be more accurate. For Objective-C, I'd like to steal one bit form
the isa pointer or refcount to see if we've use the extraBytes argument
in class_createInstance and, if not, use the snmalloc API that doesn't
need to look up the size. This should save two instructions and one or
two cache lines in the free path for most Objective-C dealloc operations.
We originally built snmalloc for Verona, where we expect to do a lot of
allocations on one thread and the corresponding frees on another thread.
A lot of producer-consumer workloads have this characteristic and
we've seen some incredible speedups reported for snmalloc over jemalloc
on workloads like this and generally a modest improvement over mimalloc.
We are doing all of the CHERI temporal safety work on top of snmalloc.
It is the building block for a fully memory-safe C/C++ environment. As
a side effect of this (and of the fact that it's a C++ codebase), we've
been adding smart-pointer types for all of the different kinds of
pointer (e.g. pointers into free lists, pointers handed out to the
malloc consumer). We are about to start a security audit and this makes
if *far* easier for anyone to reason about the correctness of our code
than in a C codebase.
We've also done a lot of work to support sandboxing within an address
space with snmalloc. This isn't quite finished but it gives us a very
cheap way of allocating sandbox memory from either inside or outside a
sandbox that can then be freed on the other side of the boundary
cheaply. This is important for the Verona foreign-code model, which is
built entirely on top of sandboxing.
All of that said, mimalloc is also evolving and was the source of
several ideas that improve performance. My favourite was initialising
the TLS for the allocator with a dummy allocator and then replacing it
only on the slow paths. This took an extra 'has malloc been
initialised' branch of the fast path for every allocation.
Note that any comparison of total memory usage for snmalloc should be
done with a system under memory pressure. On *NIX platforms that
support some variant of `MADV_FREE`, we use it in preference to
returning memory directly, so will show a large RSS until the OS decides
it needs to reclaim some pages. On Windows, we use the low-memory
notification and don't return pages until it has been triggered. This
lets us grow to the available memory and only start returning pages when
the OS tells us that we should.
David