[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH] fix: avoid infinite loop when blockjob encounte
From: |
sochin.jiang |
Subject: |
Re: [Qemu-block] [PATCH] fix: avoid infinite loop when blockjob encountering failure |
Date: |
Thu, 15 Jun 2017 11:24:27 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 |
I realized blockjob is freed after completed unless we call block_job_ref()
before run_block_job is called.
On 2017/6/15 10:38, sochin.jiang wrote:
> Thanks for your kindly reply.
>
> I do have made a mistake that ignoring the AIOContext lock.
>
> About the patch, firstly, if job->ret comes to be non-zero(also means
> job->completed to be true) , blockjob 'callback'(common_block_job_cb) will be
> called, blockjob error will be put into errp. It won't report success.
>
> Secondly, blockjob fails with 'ret < 0' and without calling
> block_job_complete_sync(), we won't have segfault because bdrv_reopen won't
> be called. Also, with the use-after-free problems.
>
> So, skip the block_job_complete_sync() call if job->completed(job->ret to be
> non-zero) is true can avoid all the problems, am I right ?
>
> Thank you again.
>
>
> Best Regard.
>
> Sochin
>
>
>
>
>
>
>
> On 2017/6/14 21:12, Max Reitz wrote:
>> Thanks for your patch! The issue can be reproduced as follows:
>>
>> $ qemu-img create -f qcow2 -b \
>> "json:{'driver':'raw','file':{
>> 'driver':'blkdebug','inject-error':[{'event':'write_aio'}],
>> 'image':{'driver':'null-co'}}}" \
>> overlay.qcow2
>> $ qemu-io -c 'write 0 64k' overlay.qcow2
>> $ qemu-img commit overlay.qcow2
>>
>> While your patch fixes that issue, I still have some comments:
>>
>> On 2017-06-14 08:22, sochin.jiang wrote:
>>> From: "sochin.jiang" <address@hidden>
>>>
>>> img_commit could fall into infinite loop if it's blockjob
>> This should be "into an infinite loop" and "its" instead if "it's".
>>
>> This empty line should be omitted.
>>
>>> fail encountering any I/O error. Try to fix it.
>> Should be "fails on any I/O error" or "fails on encountering any I/O
>> error". Also, you're not trying to fix it but let's all hope you really
>> are fixing it. :-)
>>
>> (So "Fix it." instead of "Try to fix it.")
>>
>>> Signed-off-by: sochin.jiang <address@hidden>
>>> ---
>>> qemu-img.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index 0ad698d..6ba565d 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -895,8 +895,11 @@ static void run_block_job(BlockJob *job, Error **errp)
>>> aio_poll(aio_context, true);
>>> qemu_progress_print(job->len ?
>>> ((float)job->offset / job->len * 100.f) :
>>> 0.0f, 0);
>>> - } while (!job->ready);
>>> + } while (!job->ready && !job->ret);
>> I think it would be better to test job->completed instead of job->ret.
>>
>>>
>>> + if (job->ret) {
>>> + return;
>>> + }
>> We shouldn't just return here but still do all the deinitialization like
>> call aio_context_release(). I guess the best would be to just skip the
>> block_job_complete_sync() call if job->completed is true.
>>
>>> block_job_complete_sync(job, errp);
>>> aio_context_release(aio_context);
>> Then, there are three more issues I found while reviewing this patch:
>>
>> First, if the block job is completed before block_job_complete_sync() is
>> called (i.e. if an error occurred), it is automatically freed. This is
>> bad because this means we'll have some instances of use-after-free here.
>> Therefore, we need to invoke block_job_ref() before run_block_job() and
>> block_job_unref() afterwards. (And since these functions are currenctly
>> static in blockjob.c, we'll have to make them global.)
>>
>> Secondly, run_block_job() doesn't evaluate job->ret. Therefore it will
>> report success even if the commit failed (it is expecting
>> block_job_complete_sync() to put an error into errp, but it will not do
>> that). So we'll have to do that (manually check job->ret and if it's
>> negative, put an error message into errp; also, assert that
>> job->cancelled is false).
>>
>> Thirdly, we have segfault in bdrv_reopen_prepare() if the image has
>> non-string options... I'll handle this one.
>>
>> I can also handle the other two issues, if you'd like me to.
>>
>>
>> Finally, an iotest would be nice (see my reproducer above). But I can
>> handle that as well, if you decide not to write one.
>>
>> Max
>>