gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator h


From: Angel
Subject: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator have problem )
Date: Thu, 17 Jan 2008 13:17:23 +0100
User-agent: KMail/1.9.1

hi 

Managing log files seems pretty hard for me at this moment are you confident it 
is feasible? 

On the other side checksumming also seem very interesting for me as a usable 
userspace feature (off loading cheksums from client apps to server)

Definitely checksumming is on my TODO list;

Im very busy now and still have my QUOTA xlator pet in progress..

Anyway is hard to start making logfile AFR without disturbing current AFR 
developers. 

I sure they should have own ideas about what to do this subject.


Regards,

Life's hard but root password helps!

El Jueves, 17 de Enero de 2008 11:11, Gareth Bult escribió:
> Erm, I said;
> 
> >to write the change to a logfile on the remaining volumes
> 
> By which I meant that the log file would be written on the remaining 
> available server volumes ... (!)
> 
> Regards,
> Gareth.
> 
> ----- Original Message -----
> step 3.: "Angel" <address@hidden>
> To: address@hidden
> Sent: 17 January 2008 10:07:12 o'clock (GMT) Europe/London
> Subject: Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR 
> Translator have problem )
> 
> The probems is:
> 
> If you place AFR on the client, how servers get the log file during recovery 
> operations??
> 
> Regards, Angel
> 
> 
> El Jueves, 17 de Enero de 2008 10:44, Gareth Bult escribió:
> > Hi,
> > 
> > Yes, I would agree these changes would improve the current implementation.
> > 
> > However, a "better" way would be for the client, on failing to write to ONE 
> > of the AFR volumes, to write the change to a logfile on the remaining 
> > volumes .. then for the recovering server to playback the logfile when it 
> > comes back up, or to recopy the file if there are insufficient logs or if 
> > the file has been erased.
> > 
> > This would "seem" to be a very simple implementation .. 
> > 
> > Client;
> > 
> > Write to AFR
> > If Fail then
> >   if log file does not exist create log file
> >   Record, file, version, offset, size, data in logfile
> > 
> > On server;
> > 
> > When recovering;
> > 
> >   for each entry in logfile
> >      if file age > most recent transaction
> >         re-copy whole file
> >      else
> >         replay transaction
> > 
> >   if all volumes "UP", remove logfile 
> > 
> > ?????
> > 
> > One of the REAL benefits of this is that the file is still available DURING 
> > a heal operation.
> > At the moment a HEAL only takes place when a file is being opened, and 
> > while the copy is taking place the file blocks ...
> > 
> > Gareth.
> > 
> > ----- Original Message -----
> > step 3.: "Angel" <address@hidden>
> > To: "Gareth Bult" <address@hidden>
> > Cc: address@hidden
> > Sent: 17 January 2008 08:47:06 o'clock (GMT) Europe/London
> > Subject: New IDEA: The Checksumming xlator ( AFR Translator have problem )
> > 
> > Hi Gareth
> > 
> > You said it!!, gluster is revolutionary!!
> > 
> > AFR does a good job, we only have to help AFR be a better guy!!
> > 
> > What we need is a checksumming translator!!
> > 
> > Suppouse you have your posix volumes A and B on diferent servers.
> > 
> > So your are using AFR(A,B) on client
> > 
> > One of your AFRed node fails ( A ) and some time later it goes back to life 
> > but its backend filesystem 
> > got trashed and fsck'ed and now maybe there subtle differences on the files 
> > inside.
> > 
> > ¡¡Your beloved 100GB XEN files now dont match on your "fautly" A node and 
> > your fresh B node!! 
> > 
> > AFR would notice this by means (i think) of a xattrS on both files, that's 
> > VERSION(FILE on node A) != VERSION(FILE on node B) or anything like that.
> > 
> > But the real problem as you pointed out is that AFR only know files dont 
> > match, so have to copy every byte from you 100GB image from B to A 
> > (automatically on self-heal or on file access )
> > 
> > That's many GB's (maybe PB's)  going back and forth over the net. THIS IS 
> > VERY EXPENSIVE, all we know that.
> > 
> > Enter the Checksumming xlator (SHA1 or MD5 maybe md4 as rsync seems to use 
> > that with any problem)
> > 
> > Checksumming xlator sits a top your posix modules on every node. Whenever 
> > you request the xattr SHA1[block_number] on a file the checksumming xlator 
> > intercepts this call
> > reads block number "block_number" from the file calculates SHA1 and returns 
> > this as xattr pair key:value.
> > 
> > Now AFR can request SHA1 blockwise on both servers and update only those 
> > blocks that dont match SHA1.
> > 
> > With a decent block size we can save a lot of info for every transaction.
> > 
> > -- In the case your taulty node lost its contents you have to copy the 
> > whole 100GB XEN files again
> > -- In the case SHA1 mismatch AFR can only update diferences saving a lot of 
> > resources like RSYNC does. 
> > 
> > One more avanced feature would be incoproprate xdelta librari functions, 
> > making possible generate binary patchs against files...
> > 
> > Now we only need someone to implement this xlator :-)
> > 
> > Regards
> >  
> > El Jueves, 17 de Enero de 2008 01:49, escribió:
> > > Mmm...
> > > 
> > > There are a couple of real issues with self heal at the moment that make 
> > > it a minefield for the inexperienced.
> > > 
> > > Firstly there's the mount bug .. if you have two servers and two clients, 
> > > and one AFR, there's a temptation to mount each client against a 
> > > different server. Which initially works fine .. right up until one of the 
> > > glusterfsd's ends .. when it still works fine. However, when you restart 
> > > the failed glusterfsd, one client will erroneously connect to it (or this 
> > > is my interpretation of the net effect), regardless of the fact that 
> > > self-heal has not taken place .. and because it's out of sync, doing a 
> > > "head -c1" on a file you know has changed gets you nowhere. So 
> > > essentially you need to remount clients against non-crashed servers 
> > > before starting a crashed server .. which is not nice. (this is a filed 
> > > bug)
> > > 
> > > Then we have us poor XEN users who store 100Gb's worth of XEN images on a 
> > > gluster mount .. which means we can live migrate XEN instances between 
> > > servers .. which is fantastic. However, after a server config change or a 
> > > server crash, it means we need to copy 100Gb between the servers .. which 
> > > wouldn't be so bad if we didn't have to stop and start each XEN instance 
> > > in order for self heal to register the file as changed .. and while 
> > > self-heal is re-copying the images, they can't be used, so you're looking 
> > > as 3-4 mins of downtime per instance.
> > > 
> > > Apart from that (!) I think gluster is a revolutionary filesystem and 
> > > will go a long way .. especially if the bug list shrinks .. ;-)
> > > 
> > > Keep up the good work :)
> > > 
> > > [incidentally, I now have 3 separate XEN/gluster server stacks, all 
> > > running live-migrate - it works!]
> > > 
> > > Regards,
> > > Gareth.
> > >
> > 
> 

-- 
Don't be shive by the tone of my voice. Just got my new weapon, weapon of 
choice...
->>--------------------------------------------------

 Angel J. Alvarez Miguel, Sección de Sistemas 
 Area de Explotación y Seguridad Informática
 Servicios Informaticos, Universidad de Alcalá (UAH)
 Alcalá de Henares 28871, Madrid  ** ESPAÑA **
 Tfno: +34 91 885 46 32 Fax: 91 885 51 12

------------------------------------[www.uah.es]-<<--
"No va mas señores..."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]