Re: Suggestion about error control

bug-ddrescue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestion about error control

From:	Scott Dwyer
Subject:	Re: Suggestion about error control
Date:	Fri, 12 Jun 2020 17:36:57 -0400
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

To be honest I don???t think I ever used any T10 documentation for theSCSI passthrough. It is needed for ATA passthrough, but there is plentyof other documentation and open source code for the SCSI passthrough,and I know for sure everything I found was free. And from what I cantell, the SCSI passthrough is still processed by the kernel, and thekernel deals with the inconsistencies of devices, so your concerns aboutthe ???zillion existing exceptions??? is still well handled by the kernel.


You only need five SCSI commands:

1) INQUIRY

2) READ CAPACITY (10)

3) READ CAPACITY (16)

4) READ (10)

5) READ (16)

Originally I was using the host_status as one way to tell if a drive wasoffline, but some devices cause this status to be bad for no reason. Soafter every read error perform an inquiry, if it fails then the deviceis no longer responding. Also perform a read capacity command and verifythe capacity is still reported as the same size, if not then the driveis no longer responding properly. It is really that simple, once you getpast the somewhat complicated part of actually performing and processingthe SCSI passthrough. Other than the host_status issue, the only otherissue I have seen is that normally if a device is large enough torequire READ CAPACITY (16) it is supposed to report a block capacity of0xffffffff with the READ CAPACITY (10) command, so you would know to usesize 16 commands. I don???t remember exactly why or what the conditionswere, but I found it better to try a READ CAPACITY (16) command first,and if it fails for invalid command then stick to size 10 commands.

One other thing that must be followed is there is a buffer limit forevery connected device when using passthrough mode. The limit is storedat /sys/block/DEVICE/queue/max_sectors_kb, where "DEVICE" is the deviceyou are reading (example "/sys/block/sda/queue/max_sectors_kb"). Thenumber stored here is referenced in KB, and the default for a hard driveis usually 512 (meaning 512KB). This number is usually smaller for a USBconnected drive (120KB). This size limit must not be exceeded whenreading, or bad things will happen.

You may find those issues to be a reason to say something like ???See,there are things that are inconsistent and that is not safe???. But I cansay that following those basic rules has been rock solid for me with theSCSI passthrough. As for the ???zillion existing exceptions???, I havestepped into the realm of direct packet communication with USB devices,and at that level it does get very messy. It makes one aware of how muchthe kernel does deal with the inconsistencies of the devices so that wedon???t see the chaos.


Regards,
Scott


On 6/3/2020 5:18 PM, Antonio Diaz Diaz wrote:

Scott Dwyer wrote:
No, you have spent much time on an excellent program, the only one of
its kind in the open source world, and I bet with little financialreturn.
Thanks. You are right about the "little financial return". I havereceived about 20 euros in donations in the last three months. (6.67eur/month).
My intention was to reply to the suggestion of error control that
ddrescue doesn't do like other programs. You must go deeper to
accomplish this, at a minimum SCSI passthrough. I do it in Linux, and
the other program can also do it in Windows I believe. Both are specific
and non-portable, due to the nature of what needs to be done at a lower
level. It is obviously more complicated, but when done correctly it is
no more dangerous than what the kernel does.
How can one be sure that it is done correctly given the zillionexisting exceptions? You know. Some drive does not implement some SCSIcommand. Some other implements it in a funny way. Some other has a bugin the implementation... I mean, the kernel already does it badlyenough (specially for USB drives).
See for example this note from http://sg.danny.cz/sg/
"The term SCSI has several meaning depending on the context. Thisleads to confusion. One practical way of defining it today iseverything that the T10 INCITS committee controls, see www.t10.org .Probably the most succinct overview is this standards architecturepage . For practical purposes a "SCSI device" in Linux is any devicethat uses the Linux SCSI subsystem and this often includes SATA disks."
Moreover, SCSI standards are not freely accesible[1]. If I can't finda free copy, I'll need that someone donates one for the development ofddrescue.
[1] http://www.t10.org/t10_access.htm
And FYI the kernel does NOT know best when it comes to a failing drive,
it will thrash it more than needed in Linux, and Windows is even worse.
I believe you. But at least if linux gets any bug related to a failingdrive, say returning wrong data for good sectors near a bad sector, Iexpect it to be discovered faster than if I make the same mistake in(the much less used) ddrescue, for example.
Then maybe someone can come up with the SCSI passthrough code for
ddrescue (hint to programmers out there that want to, I have produced
open source Linux patches for this in the past that would be a good
starting point, look into the old ddrutility stuff).
Thank you for the patches. I keep them and I plan to use them at leastto compare them with my own code as a way to find possible errors inmy code.
IIRC, the main reason why I have never used your SCSI passthroughpatch is that its main feature is increasing the read performance,which I think should be done by the kernel when --idirect is used. Ido not consider that reading data through the SCSI passthroughinterface is safe enough for ddrescue. The readme file for your patchtends to confirm this[2].
[2]http://sourceforge.net/projects/ddrutility/files/ddrescue%20patches/passthrough%20patch/
You have done a good work, but I plan to keep the risks low and limitddrescue's use of the SCSI passthrough interface to the improvement ofthe detection of error conditions in the input device.
IMO every piece of software should either publish the full source code
(so that users can decide if they trust it) or offer an unlimited
warranty in case of misbehavior of the code.
If this were the case, then all software would be open source or open to
incredible liability. Without the hope of financial gain (or having the
fear of great loss), there would be much less effort, and many good
programs would not exist.
It does not need to be "open source" in the sense of "free software",only in the sense of "the users may verify it, even if they aren'tallowed to redistribute it". This surely would increase the safety ofthe software by removing lots of crappy non-free software from themarket.
Maybe I should remove myself from the list so I don't see the emails,and
therefore not tempted to reply. I might just do that...
Please, don't. Your contributions are valuable and appreciated. It isjust that writing about non-free software (specially to promote it) isoff-topic in GNU lists.
Best regards,
Antonio.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Suggestion about error control, Antonio Diaz Diaz, 2020/06/01
- Re: Suggestion about error control, Cameron Andrews, 2020/06/01
- Re: Suggestion about error control, Scott Dwyer, 2020/06/02
  - Re: Suggestion about error control, Antonio Diaz Diaz, 2020/06/02
    - Re: Suggestion about error control, Scott Dwyer, 2020/06/02
    - Re: Suggestion about error control, Antonio Diaz Diaz, 2020/06/03
    - Re: Suggestion about error control, Scott Dwyer <=
    - Re: [bug-ddrescue] Suggestion about error control, Antonio Diaz Diaz, 2020/06/15

Prev by Date: Re: Suggestion about error control
Next by Date: Re: [bug-ddrescue] Suggestion about error control
Previous by thread: Re: Suggestion about error control
Next by thread: Re: [bug-ddrescue] Suggestion about error control
Index(es):
- Date
- Thread