Hello,
I may not have made the detail clear in my previous email. The details of the AHCI device, after running the reproducer I attached in my report are as follows. If there is any information I can provide, please let me know. Thank you.
###root cause###
(1) The s->packet_transfer_size is bigger than the actual data.
(2) packet_transfer_size is passed into ide_atapi_cmd_reply_end, as the total number of iterations. Each iterate round, s->io_buffer_index is increased by 2048, but without boundary check.
(3) The call to ide_transfer_start_norecurse use s->io_buffer + s->io_buffer_index - size as the index, cause an OOB access.
###details###
1. The reproducer sends a command of [WIN_PACKETCMD] + [CMD_READ] and value of IDE device's registers from guest to qemu.
Callback ahci_port_write is called, then check_cmd is called.
2. The packet will set all the registers of the device via: handle_cmd --> handle_reg_h2d_fis.
Registers will be set here:
handle_reg_h2d_fis(..){
...
ide_state->feature = cmd_fis[3]; //######[1]###### , cmd_fis is the data sent by the reproducer.
ide_state->sector = cmd_fis[4]; /* LBA 7:0 */
ide_state->lcyl = cmd_fis[5]; /* LBA 15:8 */
ide_state->hcyl = cmd_fis[6]; /* LBA 23:16 */
ide_state->select = cmd_fis[7]; /* LBA 27:24 (LBA28) */
ide_state->hob_sector = cmd_fis[8]; /* LBA 31:24 */
ide_state->hob_lcyl = cmd_fis[9]; /* LBA 39:32 */
ide_state->hob_hcyl = cmd_fis[10]; /* LBA 47:40 */
ide_state->hob_feature = cmd_fis[11];
ide_state->nsector = (int64_t)((cmd_fis[13] << 8) | cmd_fis[12]);
and ide_exec_cmd will be called to process [WIN_PACKETCMD] command.
ide_exec_cmd(&s->dev[port].port, cmd_fis[2]);
3. ide_exec_cmd (core.c) handles the command,
[WIN_PACKETCMD] = { cmd_packet, CD_OK },
and make a call to cmd_packet,
cmd_packet(...) {
...
s->atapi_dma = s->feature & 1; //######[2]######
if (s->atapi_dma) {
s->dma_cmd = IDE_DMA_ATAPI;
}
s->nsector = 1;
ide_transfer_start(s, s->io_buffer, ATAPI_PACKET_SIZE,
ide_atapi_cmd);
...
}
and set the device to use PIO mode according to s->feature (set in Step 2->##[1]##).
Then, ide_transfer_start is called.
It will pass the [CMD_READ] part after [WIN_PACKETCMD] to ide_atapi_cmd.
4. ide_atapi_cmd parses [CMD_READ], and then calls the corresponding handler: cmd_read.
[ 0x28 ] = { cmd_read, /* (10) */ CHECK_READY },
In cmd_read, the values of nb_sectors and lba are determined according to the packets passed by the reproducer.
In my PoC I set lba to -1 (0xfffffff) and nb_sectors to a large value, such as 0x800.
static void cmd_read(IDEState *s, uint8_t* buf)
{
int nb_sectors, lba;
if (buf[0] == GPCMD_READ_10) {
nb_sectors = lduw_be_p(buf + 7);
} else {
nb_sectors = ldl_be_p(buf + 6); //#####3#####
}
lba = ldl_be_p(buf + 2); //######4######
....
ide_atapi_cmd_read(s, lba, nb_sectors, 2048);
}
5. The code enters the ide_atapi_cmd_read -> ide_atapi_cmd_read_pio.
static void ide_atapi_cmd_read(.....)
{...
if (s->atapi_dma) {
ide_atapi_cmd_read_dma(s, lba, nb_sectors, sector_size);
} else {
ide_atapi_cmd_read_pio(s, lba, nb_sectors, sector_size); //######5#######
}
}
It will set the attributes according to the value passed in the previous steps.
The number of s->packet_transfer_size, which is the packet to read or write, nb_sectors * 2048 can be larger than the buffer pre-allocated by qemu (about 256KB).
static void ide_atapi_cmd_read_pio(IDEState *s, int lba, int nb_sectors,
int sector_size)
{
s->lba = lba;
s->packet_transfer_size = nb_sectors * sector_size; //########6#########
s->elementary_transfer_size = 0;
s->io_buffer_index = sector_size;
s->cd_sector_size = sector_size;
ide_atapi_cmd_reply_end(s);
}
6. In ide_atapi_cmd_reply_end, the data is processed according to packet_transfer_size.
void ide_atapi_cmd_reply_end(IDEState *s)
{
...
while (s->packet_transfer_size > 0) { //########7#######
....
s->packet_transfer_size -= size;
s->elementary_transfer_size -= size;
s->io_buffer_index += size; //#######8#######
if (!ide_transfer_start_norecurse(s,
s->io_buffer + s->io_buffer_index - size,
size, ide_atapi_cmd_reply_end)) {
return;
}
The "size" is usually 2048, which means the io_buffer_index increases by 2048 per round.
Qemu does not test if the result of this operation exceeds the length of the io_buffer itself, resulting in out-of-bounds access.
In ide_transfer_start_norecurse,
bool ide_transfer_start_norecurse(IDEState *s, uint8_t *buf, int size,
EndTransferFunc *end_transfer_func)
{
s->data_ptr = buf; //s->io_buffer + s->io_buffer_index - size
s->data_end = buf + size; //data_ptr + 2048
....
s->bus->dma->ops->pio_transfer(s->bus->dma); //#######9########
return true;
}
//####9####:
static const IDEDMAOps ahci_dma_ops = {
...
.pio_transfer = ahci_pio_transfer,
...
};
In the final processing function ahci_pio_transfer:
static void ahci_pio_transfer(const IDEDMA *dma)
{
....
uint32_t size = (uint32_t)(s->data_end - s->data_ptr); // 2048, usually
uint16_t opts = le16_to_cpu(ad->cur_cmd->opts); //####user controlled value#####
int is_write = opts & AHCI_CMD_WRITE; // read or write is decided by user.
.....
if (has_sglist && size) {
if (is_write) {
dma_buf_write(s->data_ptr, size, &s->sg); //##10##### both can be reached ####
} else {
dma_buf_read(s->data_ptr, size, &s->sg); //##11##### both can be reached ####
}
}
}
s->data_ptr can be a value out of range, so base on ad->cur_cmd->opts, ##10## ##11## can be OOB read or OOB write.
OOB read: obtain the leaked information, which can be used to bypass ASLR or obtain information about the host.
OOB write: which may overwrite some structs of the virtual device after it, overwrite the function pointers in the struct.
Best regards,
Wenxiang Qian