l4-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IPC etc. (was: Future Direction of GNU Hurd?)


From: William ML Leslie
Subject: Re: IPC etc. (was: Future Direction of GNU Hurd?)
Date: Sun, 28 Mar 2021 14:33:09 +1100

On Sat, 27 Mar 2021 at 08:16, Jonathan S. Shapiro
<jonathan.s.shapiro@gmail.com> wrote:
>
> On Wed, Mar 24, 2021 at 3:12 PM William ML Leslie 
> <william.leslie.ttg@gmail.com> wrote:
>> Coyotos has a 64kb limit on the size of indirect strings, and these
>> get truncated if any of the pages are not prepared.  This gets much
>> more irritating in the async case, so being able to asynchronously say
>> "have these pages loaded, and then send this message" becomes
>> essential.
>
>
> I think the "have pages loaded" statement is a little confusing. I do not now 
> remember if the receive area is pre-probed or if the send is retried when a 
> receive page fault occurs. Either way, the actual requirement is that the 
> receive page must exist and be writable. This is a requirement in any copying 
> IPC.
>

Righto, that makes more sense.  So it's the receiver's responsibility
to ensure that the string is not truncated.

It's possible (at least in a system with persistent objects) for the
receiver's endpoint not to be in main memory, regardless of the
presence of indirect strings in the message, so the system already
handles the case (in kern_Process) that it needs to prepare the
endpoint and process in order for the send to succeed.

> Concerning longer strings, my memory may no longer be correct and I don't 
> have time to look at the code right now. I remember that we went back and 
> forth on this - the Coyotos IPC mechanism already has a small state machine 
> because of scheduler activations, so it would be possible to add states for 
> "expecting more string payload". I'm going from memory here, but I think the 
> reason we abandoned that is that it creates a denial of service issue. The 
> receiver must temporarily be in an exclusive state with the sender. A bad 
> sender could exploit this by sending the initial 64K and then failing to 
> continue the protocol. A timeout would be required to handle this case. 64K 
> was as high as I was willing to go as an non-preemptible operation, and it's 
> probably too long.
>
> In Coyotos, for strings longer than this, a memory region named by a 
> capability should be used.
>

Sure.  I'm guessing that this is not a Region but rather a GPT or a
Window, but I'll tackle that later.

>> >> Another is a GC heap walk.  Most operating systems get very confused
>> >> by GC and get to the point where they make no progress, because it
>> >> sees that pages were recently touched and so decides shouldn't be
>> >> paged out.  Having the GC declare where its fingers are and where they
>> >> are headed as the requirement for residency gives the power back to
>> >> the process.
>> >
>> >
>> > Modern GCs use things like madvise() to alleviate this.
>> >
>>
>> For which we have no interface in Coyotos, yet.
>
>
> Have a look at cap_Page.c and the OC_coyotos_AddressSpace_erase operation. It 
> zeros the page. Pages that are known to be zero were not checkpointed, and 
> this would behave the same way in Coyotos when we implemented checkpointing.
>
> The current implementation of Coyotos is memory-only, so releasing the frame 
> doesn't make sense, but it *does* seem to me, looking now, that a parameter 
> could be added indicating that the page should be aggressively released from 
> the in-memory working pool.
>
> The necessary bottom-up recursion would need to be implemented by invoking 
> the capabilities from user level - there is no good way to do a large-scale 
> zero from the kernel level (I can see how to do it, but it would need to 
> become part of the kernel background collector).
>
> I think the right way to do this would be to add a new type of virtual copy 
> space that implements this as a service.
>

I still think this interface is a little backwards.  Consider: we've
got a large number of objects, many of which are deprepared.  Now
something sends a message to one of these objects or otherwise
schedules it.  It would be convenient to know which pages we should
schedule to load, rather than ping-ponging between loading a page,
executing a few instructions, and then faulting again.

This is basically the situation we're in when starting the system too.

So, I think it would be more convenient to have a small explicit list
somewhere of stuff we should have pre-loaded.

>> >> CapROS (and presumably EROS, too) have a set of non-persistent
>> >> applications that most of the persistent processes depend on.  It
>> >> feels a little like how GNU shepherd and systemd pre-open sockets and
>> >> pretend the application is already available.
>> >
>> >
>> > These were for drivers only. They never worked to my satisfaction.
>> >
>>
>> I'm not even sure what you settled on here.  FWIR we went back and
>> forth on orthogonal persistence a bunch of times.
>
>
> It stopped being an issue in Coyotos when we focused on the initial, 
> memory-only implementation.
>
> Assuming you can reinstate the capabilities (which the Endpoint object would 
> permit), there is no problem with a persistent process calling a 
> non-persistent process.
>
> The problem comes when the driver wants to send you a message back after a 
> restart has occurred. If the caller is in a "receive wait" state, it will 
> never get woken up. Since the driver has been re-created, it no longer holds 
> the reply cap.
>
> Just thinking out loud, I can now see two ways this could be handled:
>
> Make the driver Endpoint objects in both directions persistent, and have a 
> registry and protocol for re-establishing the process capability that points 
> to the driver process (on the send side) and the reply capability that points 
> to the receiver (on the driver side). The driver side of this is the part 
> that is the nuisance.
> Alternatively, register the Endpoints as before, and have a restart agent 
> that uses them to perform a SEND to each direct driver client advising that 
> the driver has been restarted and a connection should be re-built.
>
> I'm sure neither of these is quite right, but I think either one could be 
> made to work with some tinkering.
>

Thanks, I'll have a go at it soon enough.  I've been really grateful
that the CapROS source is online, and it has plenty of examples of
drivers serving persistent applications.

--
William Leslie

Q: What is your boss's password?
A: "Authentication", clearly

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]