TITLE:Proposal from local filesystem cache in glusterfs arquitecture AUTOR: Angel Alvarez Requeriments for a local filesystem Cache with minimal code changes and improved modularity. The purpose of this memo is to try to define a posible roadmap to a local filesystem backed cache built from current gluster components with minimal changes and a consisten model with actual developments. Details on propposed modificactions 1- Max available space control feature for filter or posic xlator Definition: This feature limits the available storage space intercepcting size calculations (statvfs?) Implementation: If underlaying module reports Size > Max the set Size = Max Option A: Implemented as an aditional option to the posix storage translator Option B: Implemented as an aditional option to the filter xlator Benefits: Improved Q&A for the rest of modules as we can now test behavior of upper modules on size constrains Control, of space export on every node. 2- Auto-prunning Xlator Definition: The Auto-pruning Xlator show always show underlaying modules at maximun space available. Auto-pruning Xlator atop a posix brick of 50GB shows always 50GB of storage space available. When upper modules try to store file on this xlator and space is scarce then it tries to prune files based on policy. Policies can be like "Delete the biggest file on underlaying storage" and is implemented as a scheduler that controls deletion of files. You start storing files and this Xlator auto-prunes them to keep available storage. On reads return ERROR if file dont exist or act normaly otherwise. upper modules dont complain about non existen file error on this volume Benefits: ¡¡REAL UNLIMITED STORAGE!! This xlator never gives up on new file creations as it prune files as needed to acommodate new ones Implementation: Always return max file space available for upper modules requests On reads proceed if posible else return error properly On creation or write prune files as needed to recover space when the underlaying storage gets filled. Deletion candidates choosen be means of plugable schedulers (prune-big etc..). Always follow scheduler candidates if these files are not opened. Other operations as usual 3 New schedulers definitions - Scheduler prune-big Definition: This scheduler maintains a list of the n biggest files. It update its internal info upon close file ops Auto-prune module schedules this module to choose deletion victims on filesystem cache - Scheduler prune-lru Definition: This scheduler maintains a list of least recently used files. Auto-prune module schedules this module to choose deletion victims on filesystem cache 4- AFR modifications Definition: Actual AFR provides de mechanism to ensure proper replication of subvolumes. AFR tries to favor local volume for reading. New AFR know local volume is (auto pruning) cache. Dont complain for non existant files on local cache. Validate local cache file with timestamps from remote volumes GLUSTER SETUP TO USE ONE LOCAL VOLUME AS FILESYSTEM BACKED CACHE Server setup ########################################################################## volume server-posix type storage/posix # POSIX FS translator option directory /home/export # Export this directory end-volume volume server type protocol/server option transport-type tcp/server # For TCP/IP transport option bind-address # Default is to listen on all interfaces subvolumes server-posix option auth.ip.server-posix.allow * # Allow access to "brick2" volume end-volume Client setup ########################################################################### # REMOTE VOLUME ON SERVER volume client type protocol/client option transport-type tcp/client # for TCP/IP transport option remote-host # IP address of the remote brick # from server for each request option remote-subvolume server-posix # name of the remote volume end-volume # LOCAL VOLUME TO USE AS TEMPORAL CACHE volume temporal-local-posix type storage/posix # POSIX FS translator option directory /tmp/glusterfs-cache # Export this directory end-volume # LIMIT CACHE SIZE TO 10GB Max volume limited-size-temporal type features/filter subvolumes temporal-local-posix option maxsize 10GB option read-only no end-volume # AUTOPRUNE MODULE FOR CACHE MAINTENANCE # prunes n biggest files to make room when needed volume local-cache type performance/autoprune subvolumes limited-size-temporal scheduler prune-big:100 # remember 100 biggest files from prior close operations selkect victims from these... end-volume # AFR SPECIAL CASE REMOTE+LOCAL USING LOCAL FOR READS WHEN POSIBLE # replicates remote volume on local cache, dont try to self-heal missing cache files from cache volume # read first from local if possible checking mod time on remote volumes # dont complain about non existing files on cache volume local-cache-example type cluster/afr option self-heal off # Dont try to self-heal cache-volume option read-subvolume local-cache # Always try to read from this volume, validate from other volumes option volume-is-cache local-volume # dont complain about non existent files subvolumes client-volume local-cache end-volume END OF MEMO