qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS


From: Richard Henderson
Subject: Re: [PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS
Date: Fri, 7 Oct 2022 09:45:47 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

On 10/7/22 06:47, Peter Maydell wrote:
On Sat, 1 Oct 2022 at 18:04, Richard Henderson
<richard.henderson@linaro.org> wrote:

Perform the atomic update for hardware management of the access flag
and the dirty bit.

A limitation of the implementation so far is that the page table
itself must already be writable, i.e. the dirty bit for the stage2
page table must already be set, i.e. we cannot set both dirty bits
at the same time.

This is allowed because it is CONSTRAINED UNPREDICTABLE whether any
atomic update happens at all.  The implementation is allowed to simply
fall back on software update at any time.

I can't see where in the Arm ARM this is stated.

In any case, if HA is set you can't simply return ARMFault_AccessFlag
without breaking the bit in D5.4.12 which says
"When hardware updates of the Access flag are enabled for a stage of
  translation an address translation instruction that uses that stage
  of translation will not report that the address will give rise to
  an Access flag fault in the PAR"

I think this may have been loose (or incorrect) reading of what has become R_QSPMS in I_a, via "access generates ... a Permission fault", due to the pte page being read-only, due to the dirty bit being clear.

However, having performed the same atomic update conversion for x86, I can see now that I merely need to perform the lookup for s1 ptw with MMU_DATA_STORE rather than MMU_DATA_LOAD in order for the s1 pte page to have its own dirty bit processed and become writable.

+    t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */

I think we should split the access flag update and the
dirty-bit update into separate commits. It might be useful
for bisection purposes later, and it looks like they're pretty
cleanly separable. (Though if you look at my last comment in this
email, maybe not quite so clean as in the code as you have it here.)

Shouldn't be too hard to split.  I'll try, at least.

+static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+                             uint64_t new_val, const S1TranslateResult *s1,
+                             ARMMMUFaultInfo *fi)
+{
+    uint64_t cur_val;
+
+    if (unlikely(!s1->hphys)) {
+        fi->type = ARMFault_UnsuppAtomicUpdate;
+        fi->s1ptw = true;
+        return 0;
+    }
+
+#ifndef CONFIG_ATOMIC64
+    /*
+     * We can't support the atomic operation on the host.  We should be
+     * running in round-robin mode though, which means that we would only
+     * race with dma i/o.
+     */
+    qemu_mutex_lock_iothread();

Are there definitely no code paths where we might try to do
a page table walk with the iothread already locked ?

I'll double-check, but another possibility is to simply perform the atomic operation on the low 32-bits, where both AF and DB are located. Another trick I learned from x86...

+        old_des = descriptor;
+        if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+            new_des = descriptor | (1ull << 7);   /* S2AP[1] */
+        } else {
+            new_des = descriptor & ~(1ull << 7);  /* AP[2] */
+        }

If we update the prot bits, we need to also re-calculate the exec bit,
I think, because the execute-never stuff depends on whether the page is
writeable. Alternatively you can do it the way the pseudocode does and
pre-figure-out the final permission bits value (see AArch64.S1ApplyOutputPerms()
and AArch64.S2ApplyOutputPerms()) so you only need to calculate the
exec bit once.

Good catch.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]