[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3.1 25/31] hostmem: add properties for NUMA mem
From: |
Hu Tao |
Subject: |
Re: [Qemu-devel] [PATCH v3.1 25/31] hostmem: add properties for NUMA memory policy |
Date: |
Mon, 9 Jun 2014 10:12:07 +0800 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Jun 06, 2014 at 01:15:28PM -0300, Eduardo Habkost wrote:
> On Fri, Jun 06, 2014 at 11:37:26AM +0800, Hu Tao wrote:
> > On Mon, May 19, 2014 at 08:34:54PM -0300, Eduardo Habkost wrote:
> > > On Tue, May 06, 2014 at 05:27:46PM +0800, Hu Tao wrote:
> > > [...]
> > > > @@ -203,6 +296,20 @@ host_memory_backend_memory_init(UserCreatable *uc,
> > > > Error **errp)
> > > > if (backend->prealloc) {
> > > > os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
> > > > }
> > > > +
> > > > +#ifdef CONFIG_NUMA
> > > > + unsigned long maxnode = find_last_bit(backend->host_nodes,
> > > > MAX_NODES);
> > > > +
> > > > + /* This is a workaround for a long standing bug in Linux'
> > > > + * mbind implementation, which cuts off the last specified
> > > > + * node.
> > > > + */
> > >
> > > What if the bug is fixed? mbind() documentation says "nodemask points to
> >
> > No it won't, otherwise softwares depend on mbind() will break.
> >
> > > a bit mask of nodes containing up to maxnode bits", so we must ensure
> > > backend->host_nodes has the one extra bit.
> >
> > Yes.
> >
> > >
> > > Also, if no bit is set, we can pass nodemask=NULL or maxnode=0 as
> > > argument.
> > >
> > > We could address both issues, and do this:
> > >
> > > struct HostMemoryBackend { [...]
> > > DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
> > > [...]
> > > lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
> > > /* lastbit == MAX_NODES means maxnode=0 */
> > > maxnode = (lastbit + 1) % (MAX_NODES + 1);
> > > /* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
> > > * as argument to mbind() due to an old Linux bug (feature?) which
> > > * cuts off the last specified node. This means backend->host_nodes
> > > * must have MAX_NODES+1 bits available.
> > > */
> > > assert(sizeof(backend->host_nodes) >= BITS_TO_LONGS(MAX_NODES + 1) *
> > > sizeof(unsigned long));
> > > assert(maxnode <= MAX_NODES);
> >
> > I think we can just omit these two asserts since they are guaranteed to
> > be true.
>
> asserts() must be always guaranteed to be true, that's the whole point
> of using them. They can detect subtle off-by-one bugs if somebody
> introduces them in the future.
>
> >
> > > mbind(ptr, sz, policy, maxnode ? backend->host_nodes : NULL, maxnode
> > > + 1, flags);
> > >
> > >
> > > (I am starting to wonder if it was worth dropping the libnuma
> > > requirement and implementing our own mbind()-calling code.)
> > >
> > > > + if (mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode +
> > > > 2, 0)) {
> > > > + error_setg_errno(errp, errno,
> > > > + "cannot bind memory to host NUMA nodes");
> > >
> > > Don't we want to set flags to MPOL_MF_STRICT here? I believe we
> > > shouldn't have any pages preallocated at this point, but in case we do,
> > > I would expect them to be moved instead of ignoring the policy set by
> > > the user.
> >
> > MPOL_MF_STRICT | MPOL_MF_MOVE to move. Actually in this version the
> > preallocation happens before mbind, which is fixed in v3.2.
>
> If memory was already allocated in a different node and has to be moved
> that early, that's a bug we want to detect and fix (instead of
> triggering useles memory moves). So I would use only MPOL_MF_STRICT.
Fair enough. But what about huge pages? As man page says, MPOL_MF_STRICT
is ignored on huge page mappings. Is leaving a comment at the place of
memory preallocation to warn people against alocating memory before mbind
(like it's done in v3.2) the only thing we can do?
Regards,
Hu