[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation
From: |
BALATON Zoltan |
Subject: |
Re: [PATCH v2 2/4] smbus: Fix spd_data_generate() error API violation |
Date: |
Mon, 29 Jun 2020 23:31:51 +0200 (CEST) |
User-agent: |
Alpine 2.22 (BSF 395 2020-01-19) |
On Mon, 29 Jun 2020, Philippe Mathieu-Daudé wrote:
On 6/27/20 9:17 AM, Markus Armbruster wrote:
BALATON Zoltan <balaton@eik.bme.hu> writes:
On Wed, 22 Apr 2020, BALATON Zoltan wrote:
On Wed, 22 Apr 2020, Philippe Mathieu-Daudé wrote:
On 4/22/20 4:27 PM, BALATON Zoltan wrote:
On Wed, 22 Apr 2020, Markus Armbruster wrote:
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL. Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.
spd_data_generate() can pass @errp to error_setg() more than once when
it adjusts both memory size and type. Harmless, because no caller
passes anything that needs adjusting. Until the previous commit,
sam460ex passed types that needed adjusting, but not sizes.
spd_data_generate()'s contract is rather awkward:
  If everything's fine, return non-null and don't set an error.
  Else, if memory size or type need adjusting, return non-null and
  set an error describing the adjustment.
  Else, return null and set an error reporting why no data can be
  generated.
Its callers treat the error as a warning even when null is returned.
They don't create the "smbus-eeprom" device then. Suspicious.
Since the previous commit, only "everything's fine" can actually
happen. Drop the unused code and simplify the callers. This gets rid
of the error API violation.
This leaves board code no chance to recover from values given by
user that won't fit without duplicating checks that this function
does. Also this will abort without giving meaningful errors if an
invalid value does get through and result in a crash which is not
used friendly. So I don't like this but if others think this is
acceptable maybe at least unit test should be adjusted to make
sure aborts cannot be triggered by user for values that are not
usually tested during development.
Agreed. Do you have an example (or more) to better show Markus this
code use? So we can add tests.
After Markus's patches probably nothing uses it any more but this
comes with the result that previously giving some random value such
as -m 100 did produce a working sam460ex machine after some warnings
but now it just thows back some errors to the user which may or may
not be helpful to them.
Personally I'd use a script to generate a dumb static array of all
possible sizes...
Maybe testing with the biggest valid value such as -m 2048 (that's
commonly used probably) and an invalid value such as -m 100 might be
enough. Testing all possible values might take too long and would
not test what happens with invalid values. Ideally those invalud
values should also work like before a0258e4afa but should at least
give a meaningful warning so the user can fix the command line
without too much head scratching. Actually that commit was from Igor
not from Marcus so sorry for attributing that to Marcus too, I
remembered wrong.
By the way you could argue that on real machine you cannot plug
certain combinations of memory modules so it's enough to model that
but I think QEMU does not have to be that strict and also support
configs that cannot happen on real hadware but would work. This
might be useful for example if you have some ammount of memory to
set aside for a VM on a host but that's not a size that exists in
memory modules on real hardware. This also works on pc machine in
qemu-system-i386 for example: it accepts -m 100 and does its best to
create a machine with such unrealistic size. The sam460ex did the
same (within SoC's limits) and before a0258e4afa -m 100 was fixed up
to 96 MB which is now not possible due to change in QEMU internal
APIs. This probably isn't important enough to worth the extra effort
to support but would have been nice to preserve.
Besides the above here's another use case of the fix ups that I wanted
to keep:
cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/">https://patchew.org/QEMU/cover.1592315226.git.balaton@eik.bme.hu/b5f4598529a77f15f554c593e9be2d0ff9e5fab3.1592315226.git.balaton@eik.bme.hu/
This board normally uses OpenBIOS which gets RAM size from fw_cfg and
so works with whatever amount of RAM (also Linux booted with -kernel
probably does not care) so any -memory value is valid. However some
may want to also use original firmware ROM for compatibility which
detects RAM reading SPD eeproms (the i2c emulation needed for that is
not working yet but once that's fixed this will be the case). I want
to add smbus_eeproms for this but do not want to just abort for cases
where -memory given by user cannot be covered with SPD data. Instead a
warning and covering as much RAM as possible should be enough (the ROM
will detect less RAM than given with -m
but that's OK and better than just bailing out without a message
tripping an assert). But I don't want to replicate in board code the
calculation and checks the spd_data_generate() function does anyway
(that would just puzzle reviewers for every use of this functions).
Previously this was possible with my original spd_data_generate()
implementation. What's your suggestion to bring that functionality
back without breaking Error API? Maybe adding new parameters to tell
the spd_data_generate() which fixups are allowed?
Quick reply without having thought through the issues at all: I'm not
opposed to you doing work to enable additional or even arbitrary memory
sizes where these actually work. I'm first and foremost opposed to me
wasting time on "improving" code that is not used for anything. That's
why I dumbed down spd_data_generate().
I'm starting to understand Zoltan point. What I'm seeing is Zoltan using
a hobbyist code, that just happens to work for hobbyists, but get in the
way of enterprise quality standards.
This is not necessarily a conflict between hobbyist vs enterprise but more
like different view on what the qemu-system-* CLI should be. I think the
CLI is the main human interface of QEMU as it does not really provide a
GUI for configuring or running VMs (as for example VirtualBox does, QEMU
only has minimal GUI to view and control running VMs) so users are forced
to use either the command line or maybe an external management frontend,
but for simple things (like hobbyist use) that's an overkill and also not
a good match as those are designed for enterprise use. (Also these
hobbyist are on Windows or macOS where these management apps are not
available and getting a working QEMU binary is already a challenge.)
The problem is that these management frontends don't have a proper API to
control QEMU but abuse the CLI and QEMU monitor for this which are
supposed to be human interfaces at the first place but changing the
commands for the needs of management apps result in arcane command lines.
Note that humans and management apps likely have different requirements so
if you mean hobbyist = human and enterprise = management frontend then
that's about what my problem is. I think humans and management apps could
coexist using the same interfaces if these cannot be cleanly separated (as
that would need either changing management apps to use something else than
the main human interface or providing proper GUI or CLI frontend for
humans) but if they use the same CLI then allowing some convenience
commands to make the life of humans easier should not be forbidden.
Running a VM should be simple and not require typing multiple lines of
options just to result in an error that something is not what QEMU thinks
is acceptable even though it could work and could be fixed. That's really
annoying for a human but may be desirable for a management app so it does
not need to check it got what it think it specified.
Zoltan doesn't have the skills/time/motivation to rework its working
code to meet the enterprise quality level. Enterprise developers tried
to understand twice (first Igor, then Markus) the hobbyist use to get
it done safer, so it can stay maintained.
Of course I don't have time or motivation to make it enterprise quality
when I work unpayed on this in my free time and for fun. I already spend
too much time with this so while I try to make it good enough to be
included upstream the direction is clearly different than what enterprise
users need. But that's OK as the machines I work with are not really used
in an enterprise setting and mostly used by hobbyists, but if some of the
components or machines could be useful to enterprise people I expect them
to put in the effort to get them to enterprise level.
But this probably does not apply to the very problem discussed here. When
I've added new machines (apart from sam460ex also pegasos2 which is not
upstream yet and now hopefully Mac machines soon too) these needed SPD
eeproms because their firmwares detected RAM based on it. There were some
already existing boards which emulated SPD but these were ad-hoc
implementations without any commonality. To avoid increasing the mess by
adding a few more independent SPD emulations that would get out of sync
I've spent some time to come up with a common function that could be used
by all these boards and the new ones I wanted to add. The goal of this
function was to put SPD emulation in a single place and make it easy for
board code to use it without needing to duplicate code.
Also Marcus mentioned uniformity between machines: Most machines, like pc
ones accept any memory size such as -m 100 even though on real hardware
it's not possible but can work with the firmware in QEMU that usually take
this info from FW_CFG or something else and not resticted by SPD data. I
wanted to do the same in sam460ex and allow it to use any memory size
exactly for uniformity besides used convenience, even though that machine
has some constraints so it required to fix up RAM size to meet those
constraints. So -m 100 would result in 96 MB of RAM that the SoC and
firmware can handle and is closest to what the user intended. This worked
well until Igor changed memory allocation to memdev (which I don't even
know what it is: some enterprise stuff not really needed for hobbyists but
maybe could be useful e.g. to save guest memory image so why not) but this
required getting rid of fix ups of memory size in boards (sam460ex wasn't
the only one) beacuse memdev could not support this for some reason and
Igor did not want to add that (even though I've proposed some designs, you
can look up in patch review). So this broke fix ups, then Marcus noticed
that errors reporting via err object cannot be used for warnings as I've
tried to use so to fix it he just removed all the reamaining traces of it
thereby making it more difficult to add SPD eeproms to mac_oldworld
without duplicating the removed checks in board code which I wanted to
avoid because:
1. This is knowledge about SPD eeproms that should be in that func
2. Would duplicate non-trivial code in boards that would puzzle reviewers
and is error prone too.
Zoltan, I guess I understood your use and have an idea to rework it in
a way that everybody is happy, but as Markus said, since the freeze is
next week, I won't have time to get it done in this short amount of
time.
It's not urgent but if we can agree on something that's acceptable for
everyone I may be able to submit a patch but don't want to put in effort
if it will be turned down anyway due to nothing else than the current
solution being acceptable based on principles over convenience. Arguing
with Markus about it before got me that impression so I'd rather ask
before wasting time with it.
From the PPC460EX-NUB800T-AMCC-datasheet-11553412.pdf datasheet I
understand the 460EX can support "Up to 8 GB in four external banks",
but the SAM 460ex board only wires a single bank (to the SODIMM
connector). You want to use a virtual board with up-to 4 banks in
use, right?
No, the firmware won't check additional banks because it only checks the
one wired. So what we need is to put as much RAM as possible on that
SODIMM (and we can use that SoC can handle both DDR and DDR2) but since
it's already broken and limited to valid SODIMM sizes due to memdev not
supporting memory size fix ups fixing this again is not high priority.
What I'd like is reverting f26740c61a57f and fix that some other way so I
don't have to duplicate size check in board code as can be seen in the
patchew link above but could just call spd_data_generate() to do its job.
This was discussed at the time that patch was in review you can read it
here:
http://patchwork.ozlabs.org/project/qemu-devel/patch/20200420132826.8879-3-armbru@redhat.com/
My points were not really considered then, now that I have another use
case maybe it could be revisited and fixed. What I want is to be able to
call spd_data_generate() from board code with whatever sizé (the board
does not need to know about SPD limits and so cannot pre-check the size)
and the function should return the largest possible size SPD and some
indication if the size was not used completely. If Error cannot be used
for this, return the message or error some other way but let the board
code decide if it wants to abort or it can use the smaller SPD. Do not
assert in the helper function. Maybe the DIMM type fix up can be dropped
and only keep the size fix up so then we don't need to use error twice,
the board could call the function again if a different type is also
acceptable, since only sam460ex would need this I can do that there for
type fixup and call spd_data_generate() again with DDR2 if first one with
DDR could not fit all ram. But at least the asserts should be dropped for
this and the size check brought back. Then adding SPD to mac_oldworld
could also be done by calling spd_data_generate() instead of duplicating
the checks this function does anyway. This board has three slots so if
user says -m 1400 it would call spd_data_generate() with 1400 first, get
back 512 SPD that it adds to first slot then calls spd_data_generate()
again with 888, gets 512 again that it adds to 2nd slot and calls
spd_data_generate() for last slot with 376 which would give 256 and 120
remaining that it may warn the user about but still continue because the
SPD data is only used by a ROM from real hardware (that may be used for
compatibility with some software) but the default OpenBIOS disregards SPD
data and would still use 1400 so it's not an error to abort on. Simply if
using a firmare ROM then only 1280 MB of the 1400 will be available due to
its limitations but that's not a reason to force users to change their
command line. Printing a warning is enough to hint they may use different
value but aborting without an error message on an assert which is the
current situation is not really a user friendly way.
Hopefully at least somebody will read it up to this point, sorry for
writing that much but hopefully this explains my point of view.
Regards,
BALATON Zoltan