gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNUstep on Windows using Clang + MSVC ABI


From: David Chisnall
Subject: Re: GNUstep on Windows using Clang + MSVC ABI
Date: Tue, 9 Feb 2021 17:00:26 +0000
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1

On 09/02/2021 16:20, Richard Frith-Macdonald wrote:


On 29 Jan 2021, at 16:28, David Chisnall <gnustep@theravensnest.org> wrote:


  We have given up supporting MinGW for snmalloc and I am considering 
optionally supporting snmalloc for Objective-C object allocations since it is 
much faster than the system malloc on most platforms and is particularly good 
for fixed-size allocations as are common in Objective-C.  It would be a shame 
for this to be an everywhere-except-Windows thing.

Hi David,  I am giving snmalloc a try, but I'm wondering if (and why) you think 
it is actually better than mimalloc?

Well, I am biased as one of the authors of snmalloc...

As far as I can tell mimalloc ticks the general performance boxes and is 
admirably small (ie relatively simple code).
My interest is probably unusual in that it's primarity about maintaining a low 
memeory footprint for long running processes rather than about speed, so if 
snmalloc does a better job of avoiding fragmentation it would be very appealing.

I think mimalloc may do slightly better there. We've adopted a lot of techniques from mimalloc and snmalloc was doing better than mimalloc last time we benchmarked. The main difference between the two is that mimalloc frees directly back into the chunks using atomic operations, whereas snmalloc has a per-thread message queue that receives freed allocations, making the alloc and free operations entirely thread-local.

From my incredibly biased perspective:

Snmalloc is C++ with clean abstractions and using template parameters for a lot of the policy, which lets it both be very flexible and also compile down to tiny amounts of code (around 10 x86 instructions on the fast path for malloc). Mimalloc is a C codebase.

Snmalloc has very clean architecture and platform abstractions. We support a lot of both, including things like Haiku and running inside an SGX enclave with OpenEnclave. The mimalloc OS abstraction layer has a lot of ifdefs interleaved with code:

https://github.com/microsoft/mimalloc/blob/master/src/os.c

The snmalloc equivalent is a platform-abstraction layer where we define a class for each platform. Some of these are simple, for example OpenBSD just says to use the generic POSIX paths:

https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_openbsd.h

Others are a bit more complex, with Windows providing its own codepaths:

https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_windows.h

This abstraction layer if even sufficiently flexible that we can run inside the FreeBSD kernel with a handful of lines of code:

https://github.com/microsoft/snmalloc/blob/master/src/pal/pal_freebsd_kernel.h

We're also able to tune aggressively for size class. If we know the size statically at compile time, or if we know it at free time, then we can be more accurate. For Objective-C, I'd like to steal one bit form the isa pointer or refcount to see if we've use the extraBytes argument in class_createInstance and, if not, use the snmalloc API that doesn't need to look up the size. This should save two instructions and one or two cache lines in the free path for most Objective-C dealloc operations.

We originally built snmalloc for Verona, where we expect to do a lot of allocations on one thread and the corresponding frees on another thread. A lot of producer-consumer workloads have this characteristic and we've seen some incredible speedups reported for snmalloc over jemalloc on workloads like this and generally a modest improvement over mimalloc.

We are doing all of the CHERI temporal safety work on top of snmalloc. It is the building block for a fully memory-safe C/C++ environment. As a side effect of this (and of the fact that it's a C++ codebase), we've been adding smart-pointer types for all of the different kinds of pointer (e.g. pointers into free lists, pointers handed out to the malloc consumer). We are about to start a security audit and this makes if *far* easier for anyone to reason about the correctness of our code than in a C codebase.

We've also done a lot of work to support sandboxing within an address space with snmalloc. This isn't quite finished but it gives us a very cheap way of allocating sandbox memory from either inside or outside a sandbox that can then be freed on the other side of the boundary cheaply. This is important for the Verona foreign-code model, which is built entirely on top of sandboxing.

All of that said, mimalloc is also evolving and was the source of several ideas that improve performance. My favourite was initialising the TLS for the allocator with a dummy allocator and then replacing it only on the slow paths. This took an extra 'has malloc been initialised' branch of the fast path for every allocation.

Note that any comparison of total memory usage for snmalloc should be done with a system under memory pressure. On *NIX platforms that support some variant of `MADV_FREE`, we use it in preference to returning memory directly, so will show a large RSS until the OS decides it needs to reclaim some pages. On Windows, we use the low-memory notification and don't return pages until it has been triggered. This lets us grow to the available memory and only start returning pages when the OS tells us that we should.

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]