help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ABI mismatch on boot on arm32 system


From: Christoph Buck
Subject: Re: ABI mismatch on boot on arm32 system
Date: Wed, 06 Nov 2024 11:25:29 +0100
User-agent: mu4e 1.12.0; emacs 30.0.50

Hi Guix!

So i looked into the guile source code and, as expected, the `scm_hash`
function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on
x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size
parameter `n`, the hash value is limited to size by calculating the
modulo `n` of the hash value, see scm_ihash in hash.c:440, namely

> (unsigned long) scm_raw_ihash (obj, 10) % n

(The `10` can be ignored as far as i can tell). Since the hash values
are different on different platforms the modulo is different as well.

However, if one steps through the call stack of `string-hash` you can
see that the actual hash value is calculated by the
`JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather
interesting comment and a possible workaround for the abi problem,
namely

--8<---------------cut here---------------start------------->8---
/* Scheme can access symbol-hash, which exposes this value.  For    \
   cross-compilation reasons, we ensure that the high 32 bits of    \
   the hash on a 64-bit system are equal to the hash on a 32-bit    \
   system.  The low 32 bits just add more entropy.  */              \
if (sizeof (ret) == 8)                                              \
    ret = (((unsigned long) c) << 32) | b;                          \
else                                                                \
    ret = c;                                                        \
--8<---------------cut here---------------end--------------->8---

in hash.c:82.

Meaning, if executed on a x64 platform, the higher 32bit of the
resulting 64bit hash result are equal to the hash value on a 32bit
platform. A simple test case in c++ looks like this:

--8<---------------cut here---------------start------------->8---
int main(int args, char** argv)
{
    scm_init_guile();
    auto strToHash = scm_from_locale_string ("((device) (mount-point))");
    auto maxULong = scm_from_ulong(ULONG_MAX);
    auto hashResult = scm_hash(strToHash,maxULong);
    auto hashResultUL = scm_to_ulong(hashResult);
    std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl;
    std::cout << "Original hashResult ulong: " << hashResultUL << std::endl;

    if(sizeof(hashResultUL) == 8)
    {
        std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << 
std::endl;
    }
}
--8<---------------cut here---------------end--------------->8---

which results on x64 in

> Max ULONG_MAX: 18446744073709551615
> Original hashResult ulong: 10454028974864831
> Corrected for 32bit: 2434018

and on arm32 to

> Max ULONG_MAX: 4294967295
> Original hashResult ulong: 2434018

This suggest the following workaround. Always limit the hash size to
32bit even if executed on a 64bit platform (or to be more specific a
platform where ulong is 8bytes big). Do this by right shift the hash
value 32bits and don't rely on the size parameter of the `string-hash`
function.

In code it could look something like this

--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
    ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
    ;; 'string-hash' because that's a better hash function that 'hash' on a
    ;; list of symbols.
    (let ((hash
           (syntax-case field-specs ()
             (((field get properties ...) ...)
              (let ((hash-value (string-hash (object->string
                                              (syntax->datum #'((field 
properties ...) ...))))))
                (if (= (native-word-size) 8)
                    (ash hash-value -32)
                    hash-value)))))
          (fd (syntax-case field-specs ()
                (((field get properties ...) ...)
                 (object->string
                  (syntax->datum #'((field properties ...) ...)))))))
      
      (format #t "Compute-abi-cookie: ~a~%" hash)
      hash))
--8<---------------cut here---------------end--------------->8---

where `native-word-size` is define by 

--8<---------------cut here---------------start------------->8---
(define (native-word-size)
  ((@ (system foreign) sizeof) '*))
--8<---------------cut here---------------end--------------->8---

(taken from `cross-compilation.test`). There might be a cleaner way to
formulate this, but you get the point.

This seems to work for all combinations on my machine. I tested
x64 -> arm, x64 -> i868, i868 -> x64...

I can only think of two drawbacks.

1) Lost entropy on 64 bit machines
2) Abi break because on new compilation the hash values on 64bit
   platforms will change.

1) is imho irrelevant, because it is not cryptophically important. For
2) i am not sure how important this is.

Any thoughts on this?

Might this be something worth fixing and sending a patch in?

Best regard

Christoph

-- 
Best regards

Christoph



reply via email to

[Prev in Thread] Current Thread [Next in Thread]