qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 3/5] module: add Error arguments to module_load and module


From: Claudio Fontana
Subject: Re: [PATCH v7 3/5] module: add Error arguments to module_load and module_load_qom
Date: Wed, 28 Sep 2022 14:02:48 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 9/28/22 13:31, Markus Armbruster wrote:
> Claudio Fontana <cfontana@suse.de> writes:
> 
>> improve error handling during module load, by changing:
>>
>> bool module_load(const char *prefix, const char *lib_name);
>> void module_load_qom(const char *type);
>>
>> to:
>>
>> int module_load(const char *prefix, const char *name, Error **errp);
>> int module_load_qom(const char *type, Error **errp);
>>
>> where the return value is:
>>
>>  -1 on module load error, and errp is set with the error
>>   0 on module or one of its dependencies are not installed
>>   1 on module load success
>>   2 on module load success (module already loaded or built-in)
> 
> Two changes, if I understand things correctly:
> 
> 1. Convert to Error API from fprintf(stderr, ...)
> 
> 2. Return a more useful value
> 
> Right?

Yes.

> 
> Do you add any new errors here that weren't reported before?  Just

Yes.

> trying to calibrate my expectations before I dig into the actual patch.
> 
>> module_load_qom_one has been introduced in:
>>
>> commit 28457744c345 ("module: qom module support"), which built on top of
>> module_load_one, but discarded the bool return value. Restore it.
>>
>> Adapt all callers to emit errors, or ignore them, or fail hard,
>> as appropriate in each context.
>>
>> Some memory leaks also fixed as part of the module_load changes.
> 
> Where?

I misinterpreted the use of g_hash_table_add, so there is a fix for me here too 
(module_name should not be freed as the g_hash_table_add takes ownership).
The g_* API is a bit messy here, as the ownership and whether the key is freed 
depends on how the table was created (supplying a free function or not).

However, the code in the loop following the g_hash_table_add may "return 
false;" from the function directly,
skipping the cleanup code:

g_hash_table_add(loaded_modules, module_name);

-    for (modinfo = module_info; modinfo->name != NULL; modinfo++) {
-        if (modinfo->arch) {
-            if (strcmp(modinfo->name, module_name) == 0) {
-                if (!module_check_arch(modinfo)) {
-                    return false;
-                }
-            }
-        }
     }

So there is no chance to free at:

-    if (!success) {
-        g_hash_table_remove(loaded_modules, module_name);
-        g_free(module_name);
      }

But, I should not have taken the g_free(...) out of the test, will fix.



> 
>> audio: when attempting to load an audio module, report module load errors.
>> block: when attempting to load a block module, report module load errors.
>> console: when attempting to load a display module, report module load errors.
>>
>> qdev: when creating a new qdev Device object (DeviceState), report load 
>> errors.
>>       If a module cannot be loaded to create that device, now abort 
>> execution.
>>
>> qom/object.c: when initializing a QOM object, or looking up class_by_name,
>>               report module load errors.
>>
>> qtest: when processing the "module_load" qtest command, report errors
>>        in the load of the module.
> 
> This looks like a list of behavioral changes.  Appreciated!  It's a bit
> terse, though.  I might come back to this and suggest improvement.  But
> first, I need to understand the patch.
> 
>>
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  audio/audio.c         |  16 ++--
>>  block.c               |  20 +++-
>>  block/dmg.c           |  14 ++-
>>  hw/core/qdev.c        |  17 +++-
>>  include/qemu/module.h |  37 +++++++-
>>  qom/object.c          |  18 +++-
>>  softmmu/qtest.c       |   8 +-
>>  ui/console.c          |  18 +++-
>>  util/module.c         | 211 +++++++++++++++++++++++-------------------
>>  9 files changed, 235 insertions(+), 124 deletions(-)
>>
>> diff --git a/audio/audio.c b/audio/audio.c
>> index 0a682336a0..ea51793843 100644
>> --- a/audio/audio.c
>> +++ b/audio/audio.c
>> @@ -72,20 +72,24 @@ void audio_driver_register(audio_driver *drv)
>>  audio_driver *audio_driver_lookup(const char *name)
>>  {
>>      struct audio_driver *d;
>> +    Error *local_err = NULL;
>> +    int rv;
>>  
>>      QLIST_FOREACH(d, &audio_drivers, next) {
>>          if (strcmp(name, d->name) == 0) {
>>              return d;
>>          }
>>      }
>> -
>> -    audio_module_load(name);
>> -    QLIST_FOREACH(d, &audio_drivers, next) {
>> -        if (strcmp(name, d->name) == 0) {
>> -            return d;
>> +    rv = audio_module_load(name, &local_err);
>> +    if (rv > 0) {
>> +        QLIST_FOREACH(d, &audio_drivers, next) {
>> +            if (strcmp(name, d->name) == 0) {
>> +                return d;
>> +            }
>>          }
>> +    } else if (rv < 0) {
>> +        error_report_err(local_err);
>>      }
>> -
>>      return NULL;
>>  }
>>  
> 
> Before: audio_module_load() reports to stderr, but the caller can't

before: reports _some_ errors to stderr

> know.  So, error or no error, search the driver registry for the one we
> want.  Return it if found, else fail.
> 
> After: if audio_module_load() fails, report to stderr or current
> monitor, and fail.  If it could find no module or loaded one, search the
> driver registry.  Return it if found, else fail.
> 
> What if audio_module_load() fails, but a search for the driver succeeds?
> Before the patch, we succeed.  

audio_module_load() is the only way that audio_drivers can be updated and the 
search would return a different result.
The previous code was just lazily running the same code twice to make it 
simpler for the programmer in those conditions.

Afterwards, we fail.  Can this happen?

No.

> Yes: module_load() starts with
> 
>        if (!g_module_supported()) {
>            error_setg(errp, "%s", "this platform does not support GLib 
> modules");
>            return -1;
>        }
> 
> Regression.
> 
> One way to fix: ensure module_load() cannot fail when the search will
> succeed.
> 
> Another: search first, and module_load() only if not found, then search
> again.
> 

Does not apply.

> Now let's review your use of the Error API.  Error objects should be
> passed up the call chain to the place that handles then.  The call
> chains are:
> 
> * audio_init(), which is called by
> 
>   - AUD_register_card(), which is called by a bunch of device creation
>     helpers called within machine initialization functions (I think),
>     which are called from qemu_init() via qemu_init_board() and
>     machine_run_board_init()
> 
>   - AUD_add_capture(), which is called by
> 
>     · wav_start_capture(), which is called by hmp_wavcapture on behalf
>       of HMP command wavcapture
> 
>     · audio_add(), which is called by protocol_client_msg() in respone
>       of a VNC_MSG_CLIENT_QEMU_AUDIO_ENABLE message from a VNC client (I
>       think).  Since audio_add() does return anything, its callers
>       remain oblivious of failure.
> 
>   - audio_init_audiodevs(), which is called by
>     qemu_create_early_backends(), which is called by qemu_init()
> 
> * audio_help(), which is called by
> 
>   - audio_parse_option(), which is called by qemu_init()
> 
>   - qemu_init()
> 
> * audio_handle_legacy_opts()
> 
>   - audio_init(); see above
> 
>   - audio_legacy_help(), which is called by qemu_init()
> 
> For the call chains that end in qemu_init(), error_report_err() is okay.
> Just beware of error cascades, i.e. one issue triggering multiple
> reports.  That's bad UX.  The extra reporting should predate this patch,
> which means it's no reason to reject this patch.
> 
> Likewise for call chains that end in an HMP command, because
> error_report_err() does the right thing then: it reports to the current
> monitor.  Bug fix: before the patch, it reports to stderr.  Commit
> message should mention the fix.
> 
> The problematic case is audio_add().  Reporting an error and then
> ignoring it feels wrong.  But it's how audio_add() works even before
> this patch.  Not this patch's problem to solve.
> 
>> diff --git a/block.c b/block.c
>> index 72c7f6d47d..7a94739aed 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -464,12 +464,18 @@ BlockDriver *bdrv_find_format(const char *format_name)
>>      /* The driver isn't registered, maybe we need to load a module */
>>      for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
>>          if (!strcmp(block_driver_modules[i].format_name, format_name)) {
>> -            block_module_load(block_driver_modules[i].library_name);
>> +            Error *local_err = NULL;
>> +            int rv = block_module_load(block_driver_modules[i].library_name,
>> +                                       &local_err);
>> +            if (rv > 0) {
>> +                return bdrv_do_find_format(format_name);
>> +            } else if (rv < 0) {
>> +                error_report_err(local_err);
>> +            }
>>              break;
>>          }
>>      }
>> -
>> -    return bdrv_do_find_format(format_name);
>> +    return NULL;
>>  }
>>  
> 
> Same regression, I think.

There is no problem here either.

> 
> Callers:
> 
> * bdrv_open_common()
> 
> * bdrv_fill_options()
> 
> * bdrv_open_inherit()
> 
> * bdrv_insert_node()
> 
> * bdrv_img_create()
> 
> * qmp_x_blockdev_amend()
> 
> * qmp_blockdev_create()
> 
> * qcow_co_create_opts()
> 
> * enable_write_target()
> 
> These all use the Error API.  Thus, we have instances of the
> "error_report() or similar from within a user of th Error API"
> anti-pattern.
> 
> Let's look more closely at just one of them: qmp_blockdev_create().
> Since we're in QMP monitor context, bdrv_find_format()'s
> error_report_err() will report a specific error to stderr, and then
> qmp_blockdev_create() will report a less specific one to the QMP client.
> This is wrong.
> 
> The obvious fix is to convert bdrv_find_format() to the Error API.
> 
> If you can't do that now, please note the issue in the commit message.
> 
>>  static int bdrv_format_is_whitelisted(const char *format_name, bool 
>> read_only)
>> @@ -976,12 +982,16 @@ BlockDriver *bdrv_find_protocol(const char *filename,
>>      for (i = 0; i < (int)ARRAY_SIZE(block_driver_modules); ++i) {
>>          if (block_driver_modules[i].protocol_name &&
>>              !strcmp(block_driver_modules[i].protocol_name, protocol)) {
>> -            block_module_load(block_driver_modules[i].library_name);
>> +            int rv = 
>> block_module_load(block_driver_modules[i].library_name, errp);
>> +            if (rv > 0) {
>> +                drv1 = bdrv_do_find_protocol(protocol);
>> +            } else if (rv < 0) {
>> +                return NULL;
>> +            }
>>              break;
>>          }
>>      }
>>  
>> -    drv1 = bdrv_do_find_protocol(protocol);
>>      if (!drv1) {
>>          error_setg(errp, "Unknown protocol '%s'", protocol);
>>      }
> 
> Same regression, I think.
> 
> bdrv_find_protocol() already uses the Error API, and your patch passes
> the Error up the chain.  Good.

This can be done here.

> 
> [...]
> 
> Out of steam for today, intend to continue later.  If you want to respin
> meanwhile, that's okay.  If you prefer to wait for me to finish
> reviewing this one, that's okay, too.
> 

I will be in vacation for two weeks, my concern is that this issue will be lost 
to the noise.
Will respin with the actual fixes, and hope this can be put into qemu and then 
move from there.

Thanks,

Claudio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]