Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation

From:	BALATON Zoltan
Subject:	Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation
Date:	Mon, 29 Jun 2020 23:31:51 +0200 (CEST)
User-agent:	Alpine 2.22 (BSF 395 2020-01-19)

On Mon, 29 Jun 2020, Philippe Mathieu-Daudé wrote:

On 6/27/20 9:17 AM, Markus Armbruster wrote:

BALATON Zoltan <balaton@eik.bme.hu> writes:

On Wed, 22 Apr 2020, BALATON Zoltan wrote:

On Wed, 22 Apr 2020, Philippe Mathieu-DaudÃ© wrote:

On 4/22/20 4:27 PM, BALATON Zoltan wrote:

On Wed, 22 Apr 2020, Markus Armbruster wrote:

The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL.Â  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.

spd_data_generate() can pass @errp to error_setg() more than once when
it adjusts both memory size and type.Â  Harmless, because no caller
passes anything that needs adjusting.Â  Until the previous commit,
sam460ex passed types that needed adjusting, but not sizes.

spd_data_generate()'s contract is rather awkward:

Â Â  If everything's fine, return non-null and don't set an error.

Â Â  Else, if memory size or type need adjusting, return non-null and
Â Â  set an error describing the adjustment.

Â Â  Else, return null and set an error reporting why no data can be
Â Â  generated.

Its callers treat the error as a warning even when null is returned.
They don't create the "smbus-eeprom" device then.Â  Suspicious.

Since the previous commit, only "everything's fine" can actually
happen.Â  Drop the unused code and simplify the callers.Â  This gets rid
of the error API violation.


This leaves board code no chance to recover from values given by
user that won't fit without duplicating checks that this function
does. Also this will abort without giving meaningful errors if an
invalid value does get through and result in a crash which is not
used friendly. So I don't like this but if others think this is
acceptable maybe at least unit test should be adjusted to make
sure aborts cannot be triggered by user for values that are not
usually tested during development.


Agreed. Do you have an example (or more) to better show Markus this
code use? So we can add tests.


After Markus's patches probably nothing uses it any more but this
comes with the result that previously giving some random value such
as -m 100 did produce a working sam460ex machine after some warnings
but now it just thows back some errors to the user which may or may
not be helpful to them.

Personally I'd use a script to generate a dumb static array of all
possible sizes...


Maybe testing with the biggest valid value such as -m 2048 (that's
commonly used probably) and an invalid value such as -m 100 might be
enough. Testing all possible values might take too long and would
not test what happens with invalid values. Ideally those invalud
values should also work like before a0258e4afa but should at least
give a meaningful warning so the user can fix the command line
without too much head scratching. Actually that commit was from Igor
not from Marcus so sorry for attributing that to Marcus too, I
remembered wrong.

By the way you could argue that on real machine you cannot plug
certain combinations of memory modules so it's enough to model that
but I think QEMU does not have to be that strict and also support
configs that cannot happen on real hadware but would work. This
might be useful for example if you have some ammount of memory to
set aside for a VM on a host but that's not a size that exists in
memory modules on real hardware. This also works on pc machine in
qemu-system-i386 for example: it accepts -m 100 and does its best to
create a machine with such unrealistic size. The sam460ex did the
same (within SoC's limits) and before a0258e4afa -m 100 was fixed up
to 96 MB which is now not possible due to change in QEMU internal
APIs. This probably isn't important enough to worth the extra effort
to support but would have been nice to preserve.


Besides the above here's another use case of the fix ups that I wanted
to keep:

cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/">https://patchew.org/QEMU/cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/

This board normally uses OpenBIOS which gets RAM size from fw_cfg and
so works with whatever amount of RAM (also Linux booted with -kernel
probably does not care) so any -memory value is valid. However some
may want to also use original firmware ROM for compatibility which
detects RAM reading SPD eeproms (the i2c emulation needed for that is
not working yet but once that's fixed this will be the case). I want
to add smbus_eeproms for this but do not want to just abort for cases
where -memory given by user cannot be covered with SPD data. Instead a
warning and covering as much RAM as possible should be enough (the ROM
will detect less RAM than given with -m
but that's OK and better than just bailing out without a message
tripping an assert). But I don't want to replicate in board code the
calculation and checks the spd_data_generate() function does anyway
(that would just puzzle reviewers for every use of this functions).

Previously this was possible with my original spd_data_generate()
implementation. What's your suggestion to bring that functionality
back without breaking Error API? Maybe adding new parameters to tell
the spd_data_generate() which fixups are allowed?


Quick reply without having thought through the issues at all: I'm not
opposed to you doing work to enable additional or even arbitrary memory
sizes where these actually work.  I'm first and foremost opposed to me
wasting time on "improving" code that is not used for anything.  That's
why I dumbed down spd_data_generate().


I'm starting to understand Zoltan point. What I'm seeing is Zoltan using
a hobbyist code, that just happens to work for hobbyists, but get in the
way of enterprise quality standards.

This is not necessarily a conflict between hobbyist vs enterprise but morelike different view on what the qemu-system-* CLI should be. I think theCLI is the main human interface of QEMU as it does not really provide aGUI for configuring or running VMs (as for example VirtualBox does, QEMUonly has minimal GUI to view and control running VMs) so users are forcedto use either the command line or maybe an external management frontend,but for simple things (like hobbyist use) that's an overkill and also nota good match as those are designed for enterprise use. (Also thesehobbyist are on Windows or macOS where these management apps are notavailable and getting a working QEMU binary is already a challenge.)

The problem is that these management frontends don't have a proper API tocontrol QEMU but abuse the CLI and QEMU monitor for this which aresupposed to be human interfaces at the first place but changing thecommands for the needs of management apps result in arcane command lines.Note that humans and management apps likely have different requirements soif you mean hobbyist = human and enterprise = management frontend thenthat's about what my problem is. I think humans and management apps couldcoexist using the same interfaces if these cannot be cleanly separated (asthat would need either changing management apps to use something else thanthe main human interface or providing proper GUI or CLI frontend forhumans) but if they use the same CLI then allowing some conveniencecommands to make the life of humans easier should not be forbidden.Running a VM should be simple and not require typing multiple lines ofoptions just to result in an error that something is not what QEMU thinksis acceptable even though it could work and could be fixed. That's reallyannoying for a human but may be desirable for a management app so it doesnot need to check it got what it think it specified.

Zoltan doesn't have the skills/time/motivation to rework its working
code to meet the enterprise quality level. Enterprise developers tried
to understand twice (first Igor, then Markus) the hobbyist use to get
it done safer, so it can stay maintained.

Of course I don't have time or motivation to make it enterprise qualitywhen I work unpayed on this in my free time and for fun. I already spendtoo much time with this so while I try to make it good enough to beincluded upstream the direction is clearly different than what enterpriseusers need. But that's OK as the machines I work with are not really usedin an enterprise setting and mostly used by hobbyists, but if some of thecomponents or machines could be useful to enterprise people I expect themto put in the effort to get them to enterprise level.

But this probably does not apply to the very problem discussed here. WhenI've added new machines (apart from sam460ex also pegasos2 which is notupstream yet and now hopefully Mac machines soon too) these needed SPDeeproms because their firmwares detected RAM based on it. There were somealready existing boards which emulated SPD but these were ad-hocimplementations without any commonality. To avoid increasing the mess byadding a few more independent SPD emulations that would get out of syncI've spent some time to come up with a common function that could be usedby all these boards and the new ones I wanted to add. The goal of thisfunction was to put SPD emulation in a single place and make it easy forboard code to use it without needing to duplicate code.

Also Marcus mentioned uniformity between machines: Most machines, like pcones accept any memory size such as -m 100 even though on real hardwareit's not possible but can work with the firmware in QEMU that usually takethis info from FW_CFG or something else and not resticted by SPD data. Iwanted to do the same in sam460ex and allow it to use any memory sizeexactly for uniformity besides used convenience, even though that machinehas some constraints so it required to fix up RAM size to meet thoseconstraints. So -m 100 would result in 96 MB of RAM that the SoC andfirmware can handle and is closest to what the user intended. This workedwell until Igor changed memory allocation to memdev (which I don't evenknow what it is: some enterprise stuff not really needed for hobbyists butmaybe could be useful e.g. to save guest memory image so why not) but thisrequired getting rid of fix ups of memory size in boards (sam460ex wasn'tthe only one) beacuse memdev could not support this for some reason andIgor did not want to add that (even though I've proposed some designs, youcan look up in patch review). So this broke fix ups, then Marcus noticedthat errors reporting via err object cannot be used for warnings as I'vetried to use so to fix it he just removed all the reamaining traces of itthereby making it more difficult to add SPD eeproms to mac_oldworldwithout duplicating the removed checks in board code which I wanted toavoid because:


1. This is knowledge about SPD eeproms that should be in that func

2. Would duplicate non-trivial code in boards that would puzzle reviewersand is error prone too.

Zoltan, I guess I understood your use and have an idea to rework it in
a way that everybody is happy, but as Markus said, since the freeze is
next week, I won't have time to get it done in this short amount of
time.

It's not urgent but if we can agree on something that's acceptable foreveryone I may be able to submit a patch but don't want to put in effortif it will be turned down anyway due to nothing else than the currentsolution being acceptable based on principles over convenience. Arguingwith Markus about it before got me that impression so I'd rather askbefore wasting time with it.

From the PPC460EX-NUB800T-AMCC-datasheet-11553412.pdf datasheet I
understand the 460EX can support "Up to 8 GB in four external banks",
but the SAM 460ex board only wires a single bank (to the SODIMM
connector). You want to use a virtual board with up-to 4 banks in
use, right?

No, the firmware won't check additional banks because it only checks theone wired. So what we need is to put as much RAM as possible on thatSODIMM (and we can use that SoC can handle both DDR and DDR2) but sinceit's already broken and limited to valid SODIMM sizes due to memdev notsupporting memory size fix ups fixing this again is not high priority.

What I'd like is reverting f26740c61a57f and fix that some other way so Idon't have to duplicate size check in board code as can be seen in thepatchew link above but could just call spd_data_generate() to do its job.This was discussed at the time that patch was in review you can read ithere:


http://patchwork.ozlabs.org/project/qemu-devel/patch/20200420132826.8879-3-armbru@redhat.com/

My points were not really considered then, now that I have another usecase maybe it could be revisited and fixed. What I want is to be able tocall spd_data_generate() from board code with whatever sizé (the boarddoes not need to know about SPD limits and so cannot pre-check the size)and the function should return the largest possible size SPD and someindication if the size was not used completely. If Error cannot be usedfor this, return the message or error some other way but let the boardcode decide if it wants to abort or it can use the smaller SPD. Do notassert in the helper function. Maybe the DIMM type fix up can be droppedand only keep the size fix up so then we don't need to use error twice,the board could call the function again if a different type is alsoacceptable, since only sam460ex would need this I can do that there fortype fixup and call spd_data_generate() again with DDR2 if first one withDDR could not fit all ram. But at least the asserts should be dropped forthis and the size check brought back. Then adding SPD to mac_oldworldcould also be done by calling spd_data_generate() instead of duplicatingthe checks this function does anyway. This board has three slots so ifuser says -m 1400 it would call spd_data_generate() with 1400 first, getback 512 SPD that it adds to first slot then calls spd_data_generate()again with 888, gets 512 again that it adds to 2nd slot and callsspd_data_generate() for last slot with 376 which would give 256 and 120remaining that it may warn the user about but still continue because theSPD data is only used by a ROM from real hardware (that may be used forcompatibility with some software) but the default OpenBIOS disregards SPDdata and would still use 1400 so it's not an error to abort on. Simply ifusing a firmare ROM then only 1280 MB of the 1400 will be available due toits limitations but that's not a reason to force users to change theircommand line. Printing a warning is enough to hint they may use differentvalue but aborting without an error message on an assert which is thecurrent situation is not really a user friendly way.

Hopefully at least somebody will read it up to this point, sorry forwriting that much but hopefully this explains my point of view.


Regards,
BALATON Zoltan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, BALATON Zoltan, 2020/06/26
- Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, Markus Armbruster, 2020/06/27
  - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, BALATON Zoltan, 2020/06/27
    - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, Markus Armbruster, 2020/06/29
    - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, BALATON Zoltan, 2020/06/29
  - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, Philippe Mathieu-Daudé, 2020/06/29
    - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, BALATON Zoltan <=
    - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, Philippe Mathieu-Daudé, 2020/06/30
    - Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation, BALATON Zoltan, 2020/06/30

Prev by Date: Re: [PATCH 5/5] hw/i2c: Document the I2C qdev helpers
Next by Date: Re: [PATCH 0/2] hw/block/nvme: handle transient dma errors
Previous by thread: Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation
Next by thread: Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation
Index(es):
- Date
- Thread