Dynamic Mirroring ================= there are several approaches; ***generic proxy*** ------------------- Use an existing generic http/ftp caching proxy not tuned for mirrors (ie, large files, low updates) cached files are hard to access directly (must go through proxy) doesn't support resume and/or rsync for failed/partial fetches supports http access only exists already ***redirects + cgi/php*** ------------------------- use a proxy/httpd redirect to a cgi/php script that maintains the mirror uses existing http/proxy daemons caches/builds the mirror on a normal filesystem can fetch using http/ftp/rsync and resume/partial fetches supports http access only doesn't exist, but simple? to implement ***mirroring deamon*** ---------------------- Use a custom ftp/http daemon that builds the mirror on demand caches/builds the mirror on a normal filesystem can fetch using http/ftp/rsync and resume/partial fetches can support http/ftp/rsync? access doesn't exist (apt-proxy is a partial solution for debian mirrors only). **mirror filesystem*** ---------------------- Use an os-level mirroring filesystem provides transperant os-level filesystem access platform specific can fetch using http/ftp/rsync and resume/partial fetches supports http/ftp/rsync/* access doesn't exist yet (though there is an ftpfs now) ***application layer filesystem*** ---------------------------------- Use a generic application layer filesystem (as in gmc/gnome etc). Not a solution in itself, but useful for building a solution can fetch using http/ftp/rsync and resume/partial fetches existing application filesystems are targeted for gui/desktop browsing, not daemons Problems with mirroring. ======================== proper mirroring requires fetching of mtimes, symbolic links, inodes (for mirroring of hardlinks only), and directorys. It is also handy to be able to obtain sizes. It is possible to simply fetch and cache directory listings, but unfortunately these are formatted differently for different protocols, so they will need to be translated. ftp: much info can only be retrieved by directory listings, but fortunately these are systematicly structured for easy parsing and interpretation. Some info can be obtained by special ftp commands (size, mtime?), but these are not always supported. The sad news is talking through a http proxy means you are actualy using http, and hence get the crippled http listings (see below). http: some info can be extracted from directory listings, but these are painfuly html formated, and different httpds/proxys format them differently. http traditionaly supports fetching of metadata through "HEAD" fetches, but all desired attributes are not required to be sent. I'm not sure that all info can always be fetched, making true mirroring not always possible. Particularly tricky is symbolic links... rsync: rsync is built for mirroring, but that doesn't necisarily mean dynamic mirroring. rsync will fetch files and build directory structures, with full support for symbolic and hard links. However, I'm not sure it can be used to fetch directory listings or file attributes without fetching the whole file. This means it cannot be used to fetch directory listings only. UPDATE: looks like rsync can be used for listing remote files... checking it out soon. Other?? not widely used. PEER-TO_PEER ============ There are some interesting developments in using peer-to-peer tricks to reduce load on servers. In particular, bittorrent is very interesting. Bittorrent has a limitation that it is designed to distribute fetches of a single large file. It does not work well for lots of little files. For a mirror peer to peer arrangement, it would be nice if a network of mirrors could redirect clients to their nearest available server. For this perhaps a master site could use ICP queries to peers, and return redirects to clients. For propogating changes to mirrors, some sort of "distributed" update would be good, but requires thought.