bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Implementing RLIMIT_AS


From: Samuel Thibault
Subject: Re: [RFC] Implementing RLIMIT_AS
Date: Sun, 22 Dec 2024 02:35:08 +0100

Hello,

Diego Nieto Cid, le jeu. 19 déc. 2024 12:47:50 -0300, a ecrit:
> After playing a bit with the setrlimit calls in Linux to see how the
> resource RLIMIT_AS actually works, it seems to be limiting the amount
> of virtual memory available to the process (and not the available
> virtual address range).

Yes, see posix' documentation about RLIMIT_AS.

>   1. Add a `hard_limit` field, of type size_t, to the vm_map struct.
>      I started with hard limit beacause I still need to research how
>      soft limits work.

What do you refer to by hard/soft?

> So, for now, it's a plain rejection with ENOMEM.

Yes, that's what we want in the end.

>   2. At vm_map_setup, initialize the `hard_limit` field with the
>      appropriate value which should be RLIM_INFINITY.

As mentioned on the contributing page, we can use as rule of thumb the
default of linux: half the physical memory + size of swap.

>   3. Finally, enforce the limit in `vm_allocate`, `vm_map` and
>      `vm_allocate_contiguous` by checking that current map size
>      (`size` field) plus the requested size (`size` param) is less
>      than the current map's `hard_limit` field.

As mentioned in the thread, the check should *really* rather be made
inside vm_map/pmap functions, so they are shared by whatever happens to
allocate adressing space. As mentioned, looking for what updates the
size field would probably be a good fit.

> I thought of adding an RPC call that sets the `hard_limit` field
> which, I guess, should be located among the other task related RPCs.

Yes, with the host port being an optional parameter for the case when
the limit is getting requested to be increased.

Luca, le jeu. 19 déc. 2024 22:36:49 +0100, a ecrit:
> Il 19/12/24 16:47, Diego Nieto Cid ha scritto:
> > After playing a bit with the setrlimit calls in Linux to see how the
> > resource RLIMIT_AS actually works, it seems to be limiting the amount
> > of virtual memory available to the process (and not the available
> > virtual address range).
> 
> I see that some limits (e.g. RLIMIT_DATA) are managed in glibc instead of
> gnumach, maybe this could be a simpler way to add some minimal support?

Probably not, because that won't cover other vm_allocates (from e.g. mig
etc.)

> I guess that one might overcome these limits by using directly the
> mach rpc or hijacking glibc, but it could be enough for some use
> cases.

For the targetting use cases (programs happily mallocating a lot of
memory), yes. But it'll probably be quite simple to implement inside
gnumach.

> One big point to address is how to enforce the ability to change this limit,
> e.g. an unprivileged task shouldn't be able to increase its own memory
> limit. You could reuse the host privileged port, but maybe it could make
> sense to have a dedicated one for resource limits?

I'd say using the host port will be fine for now.

> > Also, I wanted to ask whether I covered all the allocation points
> > or there is somewhere else where limits shall be enforced. For instance,
> > something I still have to look at is the out-of-line data sent through
> > a mach_msg call.
> 
> One additional point would be at least in task_create(), I guess the new
> task would have the same restriction as the one creating it.

Yes, it should be getting inherited through task_create.

Diego Nieto Cid, le jeu. 19 déc. 2024 22:54:23 -0300, a ecrit:
> I've been testing RLIMIT_DATA...
> 
> On Thu, Dec 19, 2024 at 10:36:49PM +0100, Luca wrote:
> > I see that some limits (e.g. RLIMIT_DATA) are managed in glibc instead of
> > gnumach, maybe this could be a simpler way to add some minimal support? I
> > guess that one might overcome these limits by using directly the mach rpc or
> > hijacking glibc, but it could be enough for some use cases.
> 
> I failed to make it do anything. The only place where RLIMIT_DATA is read is
> in the Mach/Hurd specific `__hurd_set_brk` function.
> 
>     glibc-2.40/sysdeps/mach/hurd/brk.c:  rlimit = 
> _hurd_rlimits[RLIMIT_DATA].rlim_cur;  

Yes, and this is indeed enforcing RLIMIT_DATA as it should: it limits
the heap size.

> Also, I cannot make it to fail with the attached test program.

Note that malloc() uses mmap() for big allocations, thus escaping
RLIMIT_DATA, as it should. You'd need a lot of small mallocs() to
actually make the heap grow and reach RLIMIT_DATA.

Luca, le ven. 20 déc. 2024 10:29:33 +0100, a ecrit:
> This program also succeeds on Linux, according to setrlimit you could try to
> use mmap() or sbrk() instead of malloc() to make it fail.

mmap() won't hit RLIMIT_DATA. sbrk() will, but better use the portable
malloc() call, just with small values.

Diego Nieto Cid, le jeu. 19 déc. 2024 19:54:31 -0300, a ecrit:
> > > 
> > >       I tried a lower value, like 2GB, but some process is mapping
> > >       4GB at once during boot and it just hangs when the allocation
> > >       fails.
> > 
> > Which process is that?
> 
> Its task name is `exec` and it's using vm_map. The log in question is:
> 
>     [vm_map] [task exec] map size: 0, requested size: 4294967296, hard limit: 
> 2147483648

It'd be useful to get a backtrace. You can make your grub use
/hurd/exec.static instead of /hurd/exec, and use kdb's trace/u command
to get the userland backtrace easily. You could also add mach_print()s
in exec.c.

Luca, le ven. 20 déc. 2024 10:25:02 +0100, a ecrit:
> are you working on x86_64? if yes, that could be the redzone configured
> here:
> 
> https://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/exec/exec.c#n1247

That's indeed a very good candidate.

One thing is: it's a VM_PROT_NONE/VM_PROT_NONE area. We wouldn't really
want to make such area account for RLIMIT_AS, as they are not meant to
store anything.

> > One additional point would be at least in task_create(), I guess the new
> > task would have the same restriction as the one creating it.
> 
> Yes, indeed. A quick look at kern/task.c shows I should check vm_map_fork:
> 
>     } else if (inherit_memory) {
>         new_task->map = vm_map_fork(parent_task->map);

Not really: exec does not set inherit_memory to 1, it always re-creates
a completely new task. What you want is to make task_create always
inherit from the parent_task, if any.

> > > > ----
> > > > 
> > > > Index: gnumach-1.8+git20240714/vm/vm_map.c
> > > 
> > > It would be better to create the patches from the git repository instead 
> > > of
> > > the debian package, and then use git-format-patch and git-send-mail.
> > > 
> > 
> > Sorry, I've fallen in the temptation of `dpkg-buildpackage` and friends :)
> 
> I think the only debian patch you might need to be able to boot is the dde
> patch,

Even that patch shouldn't be needed nowadays: the support was commited
upstream, and it's only very old-built netdde/rumpdisk that would need the
debian patch.

Sergey Bugaev, le ven. 20 déc. 2024 12:18:36 +0300, a ecrit:
> > Index: gnumach-1.8+git20240714/vm/vm_map.c
> > ===================================================================
> > --- gnumach-1.8+git20240714.orig/vm/vm_map.c
> > +++ gnumach-1.8+git20240714/vm/vm_map.c
> > @@ -198,6 +198,9 @@ void vm_map_setup(
> >         map->first_free = vm_map_to_entry(map);
> >         map->hint = vm_map_to_entry(map);
> >         map->name = NULL;
> > +       /* TODO hardcoded limit for testing purposes, rather use 
> > RLIM_INFINITY */
> > +       /* TODO add RPC to update this limit */
> > +       map->hard_limit = 8l * 1024l * 1024l * 1024l;
> 
> I suppose the default state should be 'unlimited',

As mentioned above, we'd rather have some limitation by default.

> Hope that helps. And now some overall design questions for the feature
> (not to discourage you): why? do we actually want this limit? what's
> it useful for? isn't address space cheap? is it a sort of advisory
> limit, or is it meant to be robust against malicious tasks?

Not only malicious, but also mostly dumb ones, which indeed do try to
malloc() to somehow see how much they can afford. Or just get bogus. Or
just try to check if they get a failure (some glibc tests do that).

> Isn't the limit trivial to work around by spawning a new task (forking
> at Unix level)?

Yes, that's not a problem for the immediate issue at stake.

> Even if the new task inherits the parent's limit, you
> now have twice as much address space available. Moreover, the Hurd's
> exec server will happily give anyone a fresh new task derived from
> itself (as opposed to the caller) if you pass oldtask =
> MACH_PORT_NULL.

Good point, we'd probably want to use the caller's task if oldtask is
passed as NULL.

Sergey Bugaev, le sam. 21 déc. 2024 22:06:10 +0300, a ecrit:
> > > do we actually want this limit?
> >
> > Hrm don't know :( I gathered from here[4] that it's something we'd like to
> > have. But I may have misunderstood Samuel on that.
> >
> > [4] https://lists.gnu.org/archive/html/bug-hurd/2024-12/msg00133.html
> 
> I cannot speak for Samuel of course :) but that too sounds like we'd
> want to put limits on memory usage and not on address space.

We could also want to put limits on memory usage too (RLIMIT_RSS),
but here it's really the overcommit question that is at stake. Better
limit processes when they call malloc() rather when they actually touch
memory. And that also shows that it's not really a cgroup that we are
after: we are not aiming for the user to see various processes get
ENOMEM failures, but rather see the culprit process that allocates like
crazy, get an ENOMEM.

Samuel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]