qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()


From: David Hildenbrand
Subject: Re: [Qemu-devel] [PATCH v1 2/4] s390x/tcg: Introduce probe_read_access()
Date: Wed, 21 Aug 2019 23:33:29 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 21.08.19 22:38, Richard Henderson wrote:
> On 8/21/19 12:36 PM, David Hildenbrand wrote:
>>>> There are certain cases where we can't get access to the raw host
>>>> page. Namely, cpu watchpoints, LAP, NODIRTY. In summary: this won't
>>>> as you describe. (my first approach did exactly this)
>>>
>>> NODIRTY and LAP are automatically handled via probe_write
>>> faulting instead of returning the address.  I think there
>>> may be a bug in probe_write at present not checking
>>
>> For LAP pages we immediately set TLB_INVALID_MASK again, to trigger a
>> new fault on the next write access (only). The could be handled in
>> tlb_vaddr_to_host(), simply returning the address to the page after
>> trying to fill the tlb and succeeding (I implemented that, that's the
>> easy part).
> 
> Yes.
> 
>> TLB_NOTDIRTY and TLB_MMIO are the real issue. We don't want to refault,
>> we want to treat that memory like IO memory and route it via
>> MemoryRegionOps() - e.g., watch_mem_ops() in qemu/exec.c. So it's not a
>> fault but IO memory.
> 
> Watchpoints are not *really* i/o memory (unless of course it's a watchpoint on
> a device, which is a different matter), that's merely how we've chosen to
> implement it to force the memory operation through the slow path.  We can, and
> probably should, implement this differently wrt probe_write.

Yes, I agree wrt probe_write.

> 
> Real MMIO can only fault via cc->transaction_failed(), for some sort of bus
> error.  Which s390x does not currently implement.  In the meantime, a
> probe_write proves that the page is at least mapped correctly, so we have
> eliminated the normal access fault.

Yes, and that's all we care about on s390x.

> 
> NOTDIRTY cannot fault at all.  The associated rcu critical section is ugly
> enough to make me not want to do anything except continue to go through the
> regular MMIO path.
> 
> In any case, so long as we eliminate *access* faults by probing the page 
> table,
> then falling back to the byte-by-byte loop is, AFAICS, sufficient to implement
> the instructions correctly.

"In any case, so long as we eliminate *access* faults by probing the
page table" - that's what I'm doing in this patch (and even more correct
in the prototype patch I shared), no? (besides the watchpoint madness below)

"falling back to the byte-by-byte loop is, AFAICS, sufficient"

I don't think this is sufficient. E.g., LAP protected pages
(PAGE_WRITE_INV which immediately requires a new MMU walk on the next
access) will trigger a new MMU walk on every byte access (that's why I
chose to pre-translate in my prototype). If another CPU modified the
page tables in between, we could suddenly get a fault - although we
checked early. What am I missing?

> 
>> probe_write() performs the MMU translation. If that succeeds, there is
>> no fault. If there are watchpoints, the memory is treated like IO and
>> memory access is rerouted. I think this works as designed.
> 
> Well, if BP_STOP_BEFORE_ACCESS, then we want to raise a debug exception before
> any changes are made.  If !BP_STOP_BEFORE_ACCESS, then we longjmp back to the
> main cpu loop and single-step the current instruction.

I see that we use BP_STOP_BEFORE_ACCESS for PER (Program Event
Recording) on s390x. I don't think that's correct. We want to get
notified after the values were changed.

"A storage-alteration event occurs whenever a CPU,
by using a logical or virtual address, makes a store
access without an access exception to the storage
area designated by control registers 10 and 11. ..."

"For a PER instruction-fetching nullification event, the
unit of operation is nullified. For other PER events,
the unit of operation is completed"

Oh man, why is everything I take a look at broken.

> 
> In the latter case, if the instruction has had any side effects prior to the
> longjmp, they will be re-done when we re-start the current instruction.
> 
> To me this seems like a rather large bug in our implementation of watchpoints,
> as it only really works properly for simple load/store/load-op-store type
> instructions.  Anything that works on many addresses and doesn't delay side
> effects until all accesses are complete will Do The Wrong Thing.
> 
> The fix, AFAICS, is for probe_write to call check_watchpoint(), so that we
> take the debug exit early.

Indeed. I see what you mean now. (I was ignoring the "before access"
because I was assuming we don't need it on s390x)

probe_write() would have to check for all BP_STOP_BEFORE_ACCESS watchpoints.

> 
>> You mean two pages I assume. Yeah, I could certainly simplify the
>> prototype patch I have here quite a lot. 2 pages should be enough for
>> everybody ;)
> 
> Heh.  But, seriously, TARGET_PAGE_SIZE bytes is enough at one go, without
> releasing control so that interrupts etc may be recognized.

Yes, that's what I mean, TARGET_PAGE_SIZE, but eventually crossing a
page boundary. The longer I stare at the MVCL code, the more broken it
is. There are more nice things buried in the PoP. MVCL does not detect
access exceptions beyond the next 2k. So we have to limit it there
differently.

So what I understand is that

- we should handle watchpoints in probe_write()
- not bypass IO memory (especially NOTDIRTY). We cannot always relay on
  getting access to a host page.

Thanks!

-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]