bug-ddrescue
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestion about error control


From: Scott Dwyer
Subject: Re: Suggestion about error control
Date: Fri, 12 Jun 2020 17:36:57 -0400
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

To be honest I don???t think I ever used any T10 documentation for the SCSI passthrough. It is needed for ATA passthrough, but there is plenty of other documentation and open source code for the SCSI passthrough, and I know for sure everything I found was free. And from what I can tell, the SCSI passthrough is still processed by the kernel, and the kernel deals with the inconsistencies of devices, so your concerns about the ???zillion existing exceptions??? is still well handled by the kernel.

You only need five SCSI commands:

1) INQUIRY

2) READ CAPACITY (10)

3) READ CAPACITY (16)

4) READ (10)

5) READ (16)

Originally I was using the host_status as one way to tell if a drive was offline, but some devices cause this status to be bad for no reason. So after every read error perform an inquiry, if it fails then the device is no longer responding. Also perform a read capacity command and verify the capacity is still reported as the same size, if not then the drive is no longer responding properly. It is really that simple, once you get past the somewhat complicated part of actually performing and processing the SCSI passthrough. Other than the host_status issue, the only other issue I have seen is that normally if a device is large enough to require READ CAPACITY (16) it is supposed to report a block capacity of 0xffffffff with the READ CAPACITY (10) command, so you would know to use size 16 commands. I don???t remember exactly why or what the conditions were, but I found it better to try a READ CAPACITY (16) command first, and if it fails for invalid command then stick to size 10 commands.

One other thing that must be followed is there is a buffer limit for every connected device when using passthrough mode. The limit is stored at /sys/block/DEVICE/queue/max_sectors_kb, where "DEVICE" is the device you are reading (example "/sys/block/sda/queue/max_sectors_kb"). The number stored here is referenced in KB, and the default for a hard drive is usually 512 (meaning 512KB). This number is usually smaller for a USB connected drive (120KB). This size limit must not be exceeded when reading, or bad things will happen.

You may find those issues to be a reason to say something like ???See, there are things that are inconsistent and that is not safe???. But I can say that following those basic rules has been rock solid for me with the SCSI passthrough. As for the ???zillion existing exceptions???, I have stepped into the realm of direct packet communication with USB devices, and at that level it does get very messy. It makes one aware of how much the kernel does deal with the inconsistencies of the devices so that we don???t see the chaos.

Regards,
Scott


On 6/3/2020 5:18 PM, Antonio Diaz Diaz wrote:
Scott Dwyer wrote:
No, you have spent much time on an excellent program, the only one of
its kind in the open source world, and I bet with little financial return.

Thanks. You are right about the "little financial return". I have received about 20 euros in donations in the last three months. (6.67 eur/month).

My intention was to reply to the suggestion of error control that
ddrescue doesn't do like other programs. You must go deeper to
accomplish this, at a minimum SCSI passthrough. I do it in Linux, and
the other program can also do it in Windows I believe. Both are specific
and non-portable, due to the nature of what needs to be done at a lower
level. It is obviously more complicated, but when done correctly it is
no more dangerous than what the kernel does.

How can one be sure that it is done correctly given the zillion existing exceptions? You know. Some drive does not implement some SCSI command. Some other implements it in a funny way. Some other has a bug in the implementation... I mean, the kernel already does it badly enough (specially for USB drives).

See for example this note from http://sg.danny.cz/sg/
"The term SCSI has several meaning depending on the context. This leads to confusion. One practical way of defining it today is everything that the T10 INCITS committee controls, see www.t10.org . Probably the most succinct overview is this standards architecture page . For practical purposes a "SCSI device" in Linux is any device that uses the Linux SCSI subsystem and this often includes SATA disks."

Moreover, SCSI standards are not freely accesible[1]. If I can't find a free copy, I'll need that someone donates one for the development of ddrescue.

[1] http://www.t10.org/t10_access.htm

And FYI the kernel does NOT know best when it comes to a failing drive,
it will thrash it more than needed in Linux, and Windows is even worse.

I believe you. But at least if linux gets any bug related to a failing drive, say returning wrong data for good sectors near a bad sector, I expect it to be discovered faster than if I make the same mistake in (the much less used) ddrescue, for example.

Then maybe someone can come up with the SCSI passthrough code for
ddrescue (hint to programmers out there that want to, I have produced
open source Linux patches for this in the past that would be a good
starting point, look into the old ddrutility stuff).

Thank you for the patches. I keep them and I plan to use them at least to compare them with my own code as a way to find possible errors in my code.

IIRC, the main reason why I have never used your SCSI passthrough patch is that its main feature is increasing the read performance, which I think should be done by the kernel when --idirect is used. I do not consider that reading data through the SCSI passthrough interface is safe enough for ddrescue. The readme file for your patch tends to confirm this[2].

[2] http://sourceforge.net/projects/ddrutility/files/ddrescue%20patches/passthrough%20patch/

You have done a good work, but I plan to keep the risks low and limit ddrescue's use of the SCSI passthrough interface to the improvement of the detection of error conditions in the input device.

IMO every piece of software should either publish the full source code
(so that users can decide if they trust it) or offer an unlimited
warranty in case of misbehavior of the code.

If this were the case, then all software would be open source or open to
incredible liability. Without the hope of financial gain (or having the
fear of great loss), there would be much less effort, and many good
programs would not exist.

It does not need to be "open source" in the sense of "free software", only in the sense of "the users may verify it, even if they aren't allowed to redistribute it". This surely would increase the safety of the software by removing lots of crappy non-free software from the market.

Maybe I should remove myself from the list so I don't see the emails, and
therefore not tempted to reply. I might just do that...

Please, don't. Your contributions are valuable and appreciated. It is just that writing about non-free software (specially to promote it) is off-topic in GNU lists.

Best regards,
Antonio.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]