[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack o
From: |
Anthony Liguori |
Subject: |
Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option |
Date: |
Mon, 27 Aug 2012 08:55:06 -0500 |
User-agent: |
Notmuch/0.13.2+93~ged93d79 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) |
Matthew Ogilvie <address@hidden> writes:
> This patch provides a way to optionally suppress spurious interrupts,
> as a workaround for systems described below:
>
> Some old operating systems do not handle spurious interrupts well,
> and qemu tends to generate them significantly more often than
> real hardware.
This is the wrong approach. You add a LostTickPolicy property to the
i8259 device.
Regards,
Anthony Liguori
>
> Examples:
> - Microport UNIX System V/386 v 2.1 (ca 1987)
> (The main problem I'm fixing: Without this patch, it panics
> sporadically when accessing the hard disk.)
> - AT&T UNIX System V/386 Release 4.0 Version 2.1a (ca 1991)
> See screenshot in "QEMU Official OS Support List":
> http://www.claunia.com/qemu/objectManager.php?sClass=application&iId=9
> (I don't have this system to test.)
> - A report about OS/2 boot lockup from 2004 by Hampa Hug:
> http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.html
> (My patch was partially inspired by his.)
> Also:
> http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00243.html
> (I don't have this system to test.)
>
> Signed-off-by: Matthew Ogilvie <address@hidden>
> ---
>
> Note: checkpatches.pl gives an error about initializing the global
> "int no_spurious_interrupt_hack = 0;", even though existing lines
> near it are doing the same thing. Should I give precedence to
> checkpatches.pl, or nearby code?
>
> There was no version 1 of this patch; this was the last thing I had to
> work around to get UNIX running.
>
> High level symptoms:
> 1. Despite using this UNIX system for nearly 10 years (ca 1987-1996)
> on an early 80386, I don't remember ever seeing any crash like
> this. I vaguely remember I may have had one or two crashes for
> which I don't have other explanations that perhaps could have
> been this, but I don't remember the error messages to confirm it.
> 2. It is somewhat random when UNIX crashes when running in qemu.
> - Sometimes it crashes the first time the floppy-based installer
> tries to access the hard disk (partition table?).
> - Other times (though fairly rarely), it actually finishes
> formatting and copying the first disk's files to the
> hard disk without crashing.
> - On the other hand, I've never seen it successfully boot from
> the hard disk without this patch. An attempt to boot from
> the hard drive always panics quite early.
> 3. I tried -win2k-hack instead, thinking maybe the hard disk is just
> responding faster than UNIX expected. But it doesn't seem
> to have any effect. UNIX still panics sporadically the same way.
> - TANGENT: I was going to see if my patch provides an
> alternative fix for installing Windows 2000, but
> I was unable to reproduce the original -win2k-hack problem at
> all (with neither -win2k-hack NOR this patch). Maybe
> some other change has fixed it some other way? Or maybe
> it is only an issue in configurations I didn't test?
> (KVM instead of TCG? Less RAM? Something else?)
> It might be worth doing a little more investigation,
> and eliminating the -win2k-hack option if appropriate.
> 4. If I enable KVM, I get a different error very early in
> bootup (in splx function instead of splint), and this patch
> doesn't help.
>
> ============
> My low level analysis of what is going on:
>
> It is hard to track down all the details, but based on logging a
> lot of qemu IRQ stuff, and setting a breakpoint in the earliest
> panic-related UNIX function using gdb, it looks like:
>
> 1. It is near the end of servicing a previous IRQ14 from the
> hard disk.
> 2. The processor has interrupts disabled (I think), while UNIX
> clears the slave 8259's IMR (mask) register (sets it to 0), allowing
> all interrupts to be passed on to the master.
> 3. While in that state, IRQ14 is raised (on the slave), which
> gets propagated to the master (IRQ2), but the CPU
> is not interrupted yet.
> 4. UNIX then masks the slave 8259's IMR register
> completely (sets to 0xff).
> 5. Because the master elcr register is set (by BIOS; UNIX never
> touches it) to edge trigger for IRQ2, the master latched on
> to IRQ2 earlier, and continues to assert the processors INT line
> (the env->interrupt_request&CPU_INTERRUPT_HARD bit) even
> after all slave IRQs have been masked off (clearing the input
> IRQ2).
> 6. Finally, UNIX enables CPU interrupts and the interrupt is delivered
> to the CPU, which ends up as a spurious IRQ15 due to the
> slave's imr register. UNIX doesn't know what to do with
> that, and panics/halts.
>
> I'm not sure why it only sporadically hits this sequence of events.
> There doesn't seem to be other IRQs asserted or serviced anywhere
> in the near past; the last several were all IRQ14's. But I can't
> help feeling I'm not reading the log output correctly or something,
> because that doesn't make sense. Maybe there is there some kind
> of a-few-instructions delay before a CPU interrupt is actually
> deliviered after interrupts are enabled, or some delay in raising
> IRQ14 after a hard drive operation is requested, and such delays
> need to fall into a narrow window of opportunity left by UNIX?
>
> I can get a disassembly of the UNIX kernel using a "coff"-enabled
> build of GNU objdump, giving function names but not much else.
> But I haven't studied it in enough detail to actually find the
> relevant code path that is manipulating imr as described above.
> However, this old post outlines some of the high level theory
> of UNIX spl*() functions:
> http://www.linuxmisc.com/29-unix-internals/4e6c1f6fa2e41670.htm
>
> If anyone wants to look into this further, I can provide access to the
> initial boot install floppy, at least. Email me. (Without the rest
> of the install disks, it isn't much use for anything except testing
> virtual machines like qemu against rare corner cases...)
>
> ============
> Alternative Approaches:
>
> An alternative to this patch that might work (I haven't tried) would
> be to have BIOS set the master's elcr register 0x04 bit, making IRQ2
> level triggered instead of edge triggered. I'm not sure what other
> effects this might have. Maybe it would actually be a more accurate
> model (I haven't checked documentation; maybe "slave mode" of a
> IRQ line into the master is supposed to be level triggered?)
>
> Or perhaps find a way to model the minimum timescale that a interrupt
> request needs to be active to be recognized?
>
> Or maybe my analysis isn't correct; I wasn't able to find the
> relevant code path in the UNIX kernel.
>
> ============
>
> cpu-exec.c | 12 +++++++-----
> hw/i8259.c | 18 ++++++++++++++++++
> qemu-options.hx | 12 ++++++++++++
> sysemu.h | 1 +
> vl.c | 4 ++++
> 5 files changed, 42 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 134b3c4..c309847 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -329,11 +329,15 @@ int cpu_exec(CPUArchState *env)
> 0);
> env->interrupt_request &= ~(CPU_INTERRUPT_HARD |
> CPU_INTERRUPT_VIRQ);
> intno = cpu_get_pic_interrupt(env);
> - qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicing
> hardware INT=0x%02x\n", intno);
> - do_interrupt_x86_hardirq(env, intno, 1);
> - /* ensure that no TB jump will be modified as
> - the program flow was changed */
> - next_tb = 0;
> + if (intno >= 0) {
> + qemu_log_mask(CPU_LOG_TB_IN_ASM,
> + "Servicing hardware
> INT=0x%02x\n",
> + intno);
> + do_interrupt_x86_hardirq(env, intno, 1);
> + /* ensure that no TB jump will be modified as
> + the program flow was changed */
> + next_tb = 0;
> + }
> #if !defined(CONFIG_USER_ONLY)
> } else if ((interrupt_request & CPU_INTERRUPT_VIRQ)
> &&
> (env->eflags & IF_MASK) &&
> diff --git a/hw/i8259.c b/hw/i8259.c
> index 6587666..7ecb7e1 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -26,6 +26,7 @@
> #include "isa.h"
> #include "monitor.h"
> #include "qemu-timer.h"
> +#include "sysemu.h"
> #include "i8259_internal.h"
>
> /* debug PIC */
> @@ -193,6 +194,20 @@ int pic_read_irq(DeviceState *d)
> pic_intack(slave_pic, irq2);
> } else {
> /* spurious IRQ on slave controller */
> + if (no_spurious_interrupt_hack) {
> + /* Pretend it was delivered and acknowledged. If
> + * it was spurious due to slave_pic->imr, then
> + * as soon as the mask is cleared, the slave will
> + * re-trigger IRQ2 on the master. If it is spurious for
> + * some other reason, make sure we don't keep trying
> + * to half-process the same spurious interrupt over
> + * and over again.
> + */
> + s->irr &= ~(1<<irq);
> + s->last_irr &= ~(1<<irq);
> + s->isr &= ~(1<<irq);
> + return -1;
> + }
> irq2 = 7;
> }
> intno = slave_pic->irq_base + irq2;
> @@ -202,6 +217,9 @@ int pic_read_irq(DeviceState *d)
> pic_intack(s, irq);
> } else {
> /* spurious IRQ on host controller */
> + if (no_spurious_interrupt_hack) {
> + return -1;
> + }
> irq = 7;
> intno = s->irq_base + irq;
> }
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 03e13ec..57bb0b4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1188,6 +1188,18 @@ Windows 2000 is installed, you no longer need this
> option (this option
> slows down the IDE transfers).
> ETEXI
>
> +DEF("no-spurious-interrupt-hack", 0, QEMU_OPTION_no_spurious_interrupt_hack,
> + "-no-spurious-interrupt-hack disable delivery of spurious
> interrupts\n",
> + QEMU_ARCH_I386)
> +STEXI
> address@hidden -no-spurious-interrupt-hack
> address@hidden -no-spurious-interrupt-hack
> +Use it as a workaround for operating systems that drive PICs in a way that
> +can generate spurious interrupts, but the OS doesn't handle spurious
> +interrupts gracefully. (e.g. late 80s/early 90s versions of ATT UNIX
> +and derivatives)
> +ETEXI
> +
> HXCOMM Deprecated by -rtc
> DEF("rtc-td-hack", 0, QEMU_OPTION_rtc_td_hack, "", QEMU_ARCH_I386)
>
> diff --git a/sysemu.h b/sysemu.h
> index 65552ac..0170109 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -117,6 +117,7 @@ extern int graphic_depth;
> extern DisplayType display_type;
> extern const char *keyboard_layout;
> extern int win2k_install_hack;
> +extern int no_spurious_interrupt_hack;
> extern int alt_grab;
> extern int ctrl_grab;
> extern int usb_enabled;
> diff --git a/vl.c b/vl.c
> index 16d04a2..6de41c1 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -204,6 +204,7 @@ CharDriverState *serial_hds[MAX_SERIAL_PORTS];
> CharDriverState *parallel_hds[MAX_PARALLEL_PORTS];
> CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES];
> int win2k_install_hack = 0;
> +int no_spurious_interrupt_hack = 0;
> int usb_enabled = 0;
> int singlestep = 0;
> int smp_cpus = 1;
> @@ -3046,6 +3047,9 @@ int main(int argc, char **argv, char **envp)
> case QEMU_OPTION_win2k_hack:
> win2k_install_hack = 1;
> break;
> + case QEMU_OPTION_no_spurious_interrupt_hack:
> + no_spurious_interrupt_hack = 1;
> + break;
> case QEMU_OPTION_rtc_td_hack: {
> static GlobalProperty slew_lost_ticks[] = {
> {
> --
> 1.7.10.2.484.gcd07cc5
- [Qemu-devel] [PATCH v2 1/6] fix some debug printf format strings, (continued)
- [Qemu-devel] [PATCH v2 1/6] fix some debug printf format strings, Matthew Ogilvie, 2012/08/23
- [Qemu-devel] [PATCH v2 2/6] target-i386/translate.c: mov to/from crN/drN: ignore mod bits, Matthew Ogilvie, 2012/08/23
- [Qemu-devel] [PATCH v2 3/6] vl: fix -hdachs/-hda argument order parsing issues, Matthew Ogilvie, 2012/08/23
- [Qemu-devel] [PATCH v2 5/6] vga: add some optional CGA compatibility hacks, Matthew Ogilvie, 2012/08/23
- [Qemu-devel] [PATCH v2 4/6] qemu-options.hx: mention retrace= VGA option, Matthew Ogilvie, 2012/08/23
- [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option, Matthew Ogilvie, 2012/08/23
- Re: [Qemu-devel] [PATCH v2 6/6] i8259: add -no-spurious-interrupt-hack option,
Anthony Liguori <=
Re: [Qemu-devel] [PATCH v2 0/6] Running Microport UNIX (ca 1987), malc, 2012/08/23
[Qemu-devel] [PATCH v3 0/3] Microport UNIX series (was: [PATCH v2 0/6] ...), Matthew Ogilvie, 2012/08/24
[Qemu-devel] [PATCH 1/3] debug printf (cirrus_vga): fixup unintended format change, Matthew Ogilvie, 2012/08/24
[Qemu-devel] [PATCH 3/3] doc: mention that -no-spurious-interrupt-hack doesn't work with KVM, Matthew Ogilvie, 2012/08/24