qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real d


From: Thomas Huth
Subject: Re: [qemu-s390x] [PATCH v5 15/15] s390-bios: Support booting from real dasd device
Date: Fri, 29 Mar 2019 09:33:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

On 13/03/2019 17.31, Jason J. Herne wrote:
> Allows guest to boot from a vfio configured real dasd device.
> 
> Signed-off-by: Jason J. Herne <address@hidden>
> Reviewed-by: Cornelia Huck <address@hidden>
> ---
[...]
> diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
> new file mode 100644
> index 0000000..236428a
> --- /dev/null
> +++ b/docs/devel/s390-dasd-ipl.txt
> @@ -0,0 +1,133 @@
> +*****************************
> +***** s390 hardware IPL *****
> +*****************************
> +
> +The s390 hardware IPL process consists of the following steps.
> +
> +1. A READ IPL ccw is constructed in memory location 0x0.
> +    This ccw, by definition, reads the IPL1 record which is located on the 
> disk
> +    at cylinder 0 track 0 record 1. Note that the chain flag is on in this 
> ccw
> +    so when it is complete another ccw will be fetched and executed from 
> memory
> +    location 0x08.
> +
> +2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
> +    IPL1 data is 24 bytes in length and consists of the following pieces of
> +    information: [psw][read ccw][tic ccw]. When the machine executes the Read
> +    IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
> +    location 0x0. Then the ccw program at 0x08 which consists of a read
> +    ccw and a tic ccw is automatically executed because of the chain flag 
> from
> +    the original READ IPL ccw. The read ccw will read the IPL2 data into 
> memory
> +    and the TIC (Tranfer In Channel) will transfer control to the channel

s/Tranfer/Transfer/ ?

[...]
> +**********************************************************
> +***** How this all pertains to QEMU (and the kernel) *****
> +**********************************************************
> +
> +In theory we should merely have to do the following to IPL/boot a guest
> +operating system from a DASD device:
> +
> +1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
> +2. Execute channel program at 0x0.
> +3. LPSW 0x0.
> +
> +However, our emulation of the machine's channel program logic within the 
> kernel
> +is missing one key feature that is required for this process to work:
> +non-prefetch of ccw data.
> +
> +When we start a channel program we pass the channel subsystem parameters via 
> an
> +ORB (Operation Request Block). One of those parameters is a prefetch bit. If 
> the
> +bit is on then the vfio-ccw kernel driver is allowed to read the entire 
> channel
> +program from guest memory before it starts executing it. This means that any
> +channel commands that read additional channel commands will not work as 
> expected
> +because the newly read commands will only exist in guest memory and NOT 
> within
> +the kernel's channel subsystem memory. The kernel vfio-ccw driver currently
> +requires this bit to be on for all channel programs. This is a problem 
> because
> +the IPL process consists of transferring control from the "Read IPL" ccw
> +immediately to the IPL1 channel program that was read by "Read IPL".
> +
> +Not being able to turn off prefetch will also prevent the TIC at the end of 
> the
> +IPL1 channel program from transferring control to the IPL2 channel program.
> +
> +Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
> +tansfers control to another channel program segment immediately after 
> reading it

s/tansfers/transfers/

> +from the disk. So we need to be able to handle this case.
> +
> +**************************
> +***** What QEMU does *****
> +**************************
> +
> +Since we are forced to live with prefetch we cannot use the very simple IPL
> +procedure we defined in the preceding section. So we compensate by doing the
> +following.
> +
> +1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
> +2. Execute "Read IPL" at 0x0.
> +
> +   So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
> +
> +4. Write a custom channel program that will seek to the IPL2 record and then
> +   execute the READ and TIC ccws from IPL1.  Normamly the seek is not 
> required

s/Normamly/Normally/

[...]
> diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
> new file mode 100644
> index 0000000..1a44469
> --- /dev/null
> +++ b/pc-bios/s390-ccw/dasd-ipl.c
> @@ -0,0 +1,249 @@
> +/*
> + * S390 IPL (boot) from a real DASD device via vfio framework.
> + *
> + * Copyright (c) 2019 Jason J. Herne <address@hidden>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "libc.h"
> +#include "s390-ccw.h"
> +#include "s390-arch.h"
> +#include "dasd-ipl.h"
> +#include "helper.h"
> +
> +static char prefix_page[PAGE_SIZE * 2]
> +            __attribute__((__aligned__(PAGE_SIZE * 2)));
> +
> +static void enable_prefixing(void)
> +{
> +    memcpy(&prefix_page, (void *)0, 4096);

You could use the "lowcore" variable from s390-arch.h here instead of
"(void *)0", I guess.

> +    set_prefix(ptr2u32(&prefix_page));
> +}
> +
> +static void disable_prefixing(void)
> +{
> +    set_prefix(0);
> +    /* Copy io interrupt info back to low core */
> +    memcpy((void *)0xB8, prefix_page + 0xB8, 12);

Maybe use &lowcore->subchannel_id instead of 0xB8 ? ... not sure whether
that's nicer here, though...

> +}
> +
> +static bool is_read_tic_ccw_chain(Ccw0 *ccw)
> +{
> +    Ccw0 *next_ccw = ccw + 1;
> +
> +    return ((ccw->cmd_code == CCW_CMD_DASD_READ ||
> +            ccw->cmd_code == CCW_CMD_DASD_READ_MT) &&
> +            ccw->chain && next_ccw->cmd_code == CCW_CMD_TIC);
> +}
> +
> +static bool dynamic_cp_fixup(uint32_t ccw_addr, uint32_t  *next_cpa)
> +{
> +    Ccw0 *cur_ccw = (Ccw0 *)(uint64_t)ccw_addr;
> +    Ccw0 *tic_ccw;
> +
> +    while (true) {
> +        /* Skip over inline TIC (it might not have the chain bit on)  */
> +        if (cur_ccw->cmd_code == CCW_CMD_TIC &&
> +            cur_ccw->cda == ptr2u32(cur_ccw) - 8) {
> +            cur_ccw += 1;
> +            continue;
> +        }
> +
> +        if (!cur_ccw->chain) {
> +            break;
> +        }
> +        if (is_read_tic_ccw_chain(cur_ccw)) {
> +            /*
> +             * Breaking a chain of CCWs may alter the semantics or even the
> +             * validity of a channel program. The heuristic implemented below
> +             * seems to work well in practice for the channel programs
> +             * generated by zipl.
> +             */
> +            tic_ccw = cur_ccw + 1;
> +            *next_cpa = tic_ccw->cda;
> +            cur_ccw->chain = 0;
> +            return true;
> +        }
> +        cur_ccw += 1;
> +    }
> +    return false;
> +}
> +
> +static int run_dynamic_ccw_program(SubChannelId schid, uint16_t cutype,
> +                                   uint32_t cpa)
> +{
> +    bool has_next;
> +    uint32_t next_cpa = 0;
> +    int rc;
> +
> +    do {
> +        has_next = dynamic_cp_fixup(cpa, &next_cpa);
> +
> +        print_int("executing ccw chain at ", cpa);
> +        enable_prefixing();
> +        rc = do_cio(schid, cutype, cpa, CCW_FMT0);
> +        disable_prefixing();
> +
> +        if (rc) {
> +            break;
> +        }
> +        cpa = next_cpa;
> +    } while (has_next);
> +
> +    return rc;
> +}
> +
> +static void make_readipl(void)
> +{
> +    Ccw0 *ccwIplRead = (Ccw0 *)0x00;
> +
> +    /* Create Read IPL ccw at address 0 */
> +    ccwIplRead->cmd_code = CCW_CMD_READ_IPL;
> +    ccwIplRead->cda = 0x00; /* Read into address 0x00 in main memory */
> +    ccwIplRead->chain = 0; /* Chain flag */
> +    ccwIplRead->count = 0x18; /* Read 0x18 bytes of data */
> +}
> +
> +static void run_readipl(SubChannelId schid, uint16_t cutype)
> +{
> +    if (do_cio(schid, cutype, 0x00, CCW_FMT0)) {
> +        panic("dasd-ipl: Failed to run Read IPL channel program\n");
> +    }
> +}
> +
> +/*
> + * The architecture states that IPL1 data should consist of a psw followed by
> + * format-0 READ and TIC CCWs. Let's sanity check.
> + */
> +static void check_ipl1(void)
> +{
> +    Ccw0 *ccwread = (Ccw0 *)0x08;
> +    Ccw0 *ccwtic = (Ccw0 *)0x10;
> +
> +    if (ccwread->cmd_code != CCW_CMD_DASD_READ ||
> +        ccwtic->cmd_code != CCW_CMD_TIC) {
> +        panic("dasd-ipl: IPL1 data invalid. Is this disk really 
> bootable?\n");
> +    }
> +}
> +
> +static void check_ipl2(uint32_t ipl2_addr)
> +{
> +    Ccw0 *ccw = u32toptr(ipl2_addr);
> +
> +    if (ipl2_addr == 0x00) {
> +        panic("IPL2 address invalid. Is this disk really bootable?\n");
> +    }
> +    if (ccw->cmd_code == 0x00) {
> +        panic("IPL2 ccw data invalid. Is this disk really bootable?\n");
> +    }
> +}
> +
> +static uint32_t read_ipl2_addr(void)
> +{
> +    Ccw0 *ccwtic = (Ccw0 *)0x10;
> +
> +    return ccwtic->cda;
> +}
> +
> +static void ipl1_fixup(void)
> +{
> +    Ccw0 *ccwSeek = (Ccw0 *) 0x08;
> +    Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
> +    Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
> +    Ccw0 *ccwRead = (Ccw0 *) 0x20;
> +    CcwSeekData *seekData = (CcwSeekData *) 0x30;
> +    CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
> +
> +    /* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
> +    memcpy(ccwRead, (void *)0x08, 16);

lowcore->ccw1 ?

> +    /* Disable chaining so we don't TIC to IPL2 channel program */
> +    ccwRead->chain = 0x00;
> +
> +    ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
> +    ccwSeek->cda = ptr2u32(seekData);
> +    ccwSeek->chain = 1;
> +    ccwSeek->count = sizeof(*seekData);
> +    seekData->reserved = 0x00;
> +    seekData->cyl = 0x00;
> +    seekData->head = 0x00;
> +
> +    ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
> +    ccwSearchID->cda = ptr2u32(searchData);
> +    ccwSearchID->chain = 1;
> +    ccwSearchID->count = sizeof(*searchData);
> +    searchData->cyl = 0;
> +    searchData->head = 0;
> +    searchData->record = 2;
> +
> +    /* Go back to Search CCW if correct record not yet found */
> +    ccwSearchTic->cmd_code = CCW_CMD_TIC;
> +    ccwSearchTic->cda = ptr2u32(ccwSearchID);
> +}
> +
> +static void run_ipl1(SubChannelId schid, uint16_t cutype)
> + {
> +    uint32_t startAddr = 0x08;
> +
> +    if (do_cio(schid, cutype, startAddr, CCW_FMT0)) {
> +        panic("dasd-ipl: Failed to run IPL1 channel program\n");
> +    }
> +}
> +
> +static void run_ipl2(SubChannelId schid, uint16_t cutype, uint32_t addr)
> +{
> +    if (run_dynamic_ccw_program(schid, cutype, addr)) {
> +        panic("dasd-ipl: Failed to run IPL2 channel program\n");
> +    }
> +}
> +
> +static void lpsw(void *psw_addr)
> +{
> +    PSWLegacy *pswl = (PSWLegacy *) psw_addr;
> +
> +    pswl->mask |= PSW_MASK_EAMODE;   /* Force z-mode */
> +    pswl->addr |= PSW_MASK_BAMODE;
> +    asm volatile("  llgtr 0,0\n llgtr 1,1\n"     /* Some OS's expect to be */
> +                 "  llgtr 2,2\n llgtr 3,3\n"     /* in 32-bit mode. Clear  */
> +                 "  llgtr 4,4\n llgtr 5,5\n"     /* high part of regs to   */
> +                 "  llgtr 6,6\n llgtr 7,7\n"     /* avoid messing up       */
> +                 "  llgtr 8,8\n llgtr 9,9\n"     /* instructions that work */
> +                 "  llgtr 10,10\n llgtr 11,11\n" /* in both addressing     */
> +                 "  llgtr 12,12\n llgtr 13,13\n" /* modes, like servc.     */
> +                 "  llgtr 14,14\n llgtr 15,15\n"
> +                 "  lpsw %0\n"
> +                 : : "Q" (*pswl) : "cc");
> +}

Have you tried to use jump_to_low_kernel() already? ... it might be
cleaner to do the diag 0x308 reset here, too, to avoid that some part of
the machine is in an unexpected state...

 Thomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]