qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 5/5] accel: abort if we fail to load the accelerator plugi


From: Claudio Fontana
Subject: Re: [PATCH v6 5/5] accel: abort if we fail to load the accelerator plugin
Date: Mon, 26 Sep 2022 13:21:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 9/26/22 12:56, Kevin Wolf wrote:
> Am 26.09.2022 um 09:58 hat Claudio Fontana geschrieben:
>> On 9/24/22 14:35, Philippe Mathieu-Daudé via wrote:
>>> On 24/9/22 01:21, Claudio Fontana wrote:
>>>> if QEMU is configured with modules enabled, it is possible that the
>>>> load of an accelerator module will fail.
>>>> Abort in this case, relying on module_object_class_by_name to report
>>>> the specific load error if any.
>>>>
>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>> ---
>>>>   accel/accel-softmmu.c | 8 +++++++-
>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
>>>> index 67276e4f52..9fa4849f2c 100644
>>>> --- a/accel/accel-softmmu.c
>>>> +++ b/accel/accel-softmmu.c
>>>> @@ -66,6 +66,7 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>>>   {
>>>>       const char *ac_name;
>>>>       char *ops_name;
>>>> +    ObjectClass *oc;
>>>>       AccelOpsClass *ops;
>>>>   
>>>>       ac_name = object_class_get_name(OBJECT_CLASS(ac));
>>>> @@ -73,8 +74,13 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>>>   
>>>>       ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
>>>>       ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
>>>> +    oc = module_object_class_by_name(ops_name);
>>>> +    if (!oc) {
>>>> +        error_report("fatal: could not load module for type '%s'", 
>>>> ops_name);
>>>> +        abort();
>>>
>>> I still think a coredump won't help at all to figure the problem here: a 
>>
>> I can change this from abort to exit(1), the issue I am seeing is, usually 
>> when we fail to create or initialize objects
>> we seem to be using abort(), the most prominent examples are in qom/object.c:
>>
>> static TypeImpl *type_new(const TypeInfo *info)
>> {
>>     TypeImpl *ti = g_malloc0(sizeof(*ti));
>>     int i;
>>
>>     g_assert(info->name != NULL);
>>
>>     if (type_table_lookup(info->name) != NULL) {
>>         fprintf(stderr, "Registering `%s' which already exists\n", 
>> info->name);
>>         abort();
>>     }
>>
>> ...
>>
>> void object_initialize(void *data, size_t size, const char *typename)
>> {
>>     TypeImpl *type = type_get_by_name(typename);
>>
>> #ifdef CONFIG_MODULES
>>     if (!type) {
>>         Error *local_err = NULL;
>>         int rv = module_load_qom(typename, &local_err);
>>         if (rv > 0) {
>>             type = type_get_by_name(typename);
>>         } else if (rv < 0) {
>>             error_report_err(local_err);
>>         }
>>     }
>> #endif
>>     if (!type) {
>>         error_report("missing object type '%s'", typename);
>>         abort();
>>     }
>>
>>     object_initialize_with_type(data, size, type);
>> }
>>
>>
>> Do you propose to change only the assert in accel_init_ops_interfaces
>> to exit(1)?
>>
>> Or the other case as well in the series? (ie hw/core/qdev.c qdev_new()
>> ?)
>>
>> Do you propose to change this consistently through the codebase
>> including the object.c snippets above?
> 
> The difference with the snippets above (in the non-module case) is that
> calling object_new() with a type that doesn't exist is a bug, it's an
I see where you are going, however in the object initialization case,
we still have the possibility that the type is available only after module load.

So according to your differentiation, we would have to exit() in the module 
case, and abort() if modules are disabled.


> programming error. Calling type_new() twice for the same TypeInfo or for
> two TypeInfos with the same name is a programming error, too. abort() is
> correct for situations that should never happen in a bug free QEMU.
> 
> Not being able to load a module is generally not a bug in QEMU, it's an
> error of external origin. So here abort() is not appropriate.
> 
> The CONFIG_MODULES code in object_initialize() is problematic because it
> doesn't have a way to deal with an error case that can happen without a
> bug in QEMU. Without changing the prototype of the function to actually
> allow error returns (which I suspect might be a very invasive change),
> maybe the best approach is to just make it a fatal error and leave the
> code mostly as it is in current master:
> 
> #ifdef CONFIG_MODULES
>     if (!type) {
>         /* Assuming that module_load_qom_one() returns an error if the
>          * module doesn't exist */
>         module_load_qom_one(typename, &error_fatal);
>         type = type_get_by_name(typename);
>     }
> #endif
>     if (!type) {
>         error_report("missing object type '%s'", typename);
>         abort();
>     }
> 
>     object_initialize_with_type(data, size, type);
> 
> This makes it print an error message and exit(). Which is honestly not
> great during runtime because it doesn't properly shut down QEMU, let
> alone just fail the operation and keep running, but at least slightly
> better than abort().

Ok so the core of the issue is, if CONFIG_MODULES we want to exit() and not 
abort() if the type is not available.

> 
>>> module is missing, we know its name. Anyhow I don't mind much, and this
>>> can be cleaned later, so:
>>
>> Sure this could be fixed later with a series that tries to use exit()
>> vs abort() consistently throughout the codebase when initializing and
>> creating objects.
> 
> This should mean consistently distinguishing programming errors (i.e.
> QEMU bugs) from errors of external origin.
> 
> Kevin
> 

Thanks,

I'll have a new pass with this in mind.

Claudio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]