Dynamic Mirroring
=================

there are several approaches;

***generic proxy***
-------------------

Use an existing generic http/ftp caching proxy
not tuned for mirrors (ie, large files, low updates)
cached files are hard to access directly (must go through proxy)
doesn't support resume and/or rsync for failed/partial fetches
supports http access only
exists already

***redirects + cgi/php***
-------------------------

use a proxy/httpd redirect to a cgi/php script that maintains the mirror
uses existing http/proxy daemons
caches/builds the mirror on a normal filesystem
can fetch using http/ftp/rsync and resume/partial fetches
supports http access only
doesn't exist, but simple? to implement

***mirroring deamon***
----------------------

Use a custom ftp/http daemon that builds the mirror on demand
caches/builds the mirror on a normal filesystem
can fetch using http/ftp/rsync and resume/partial fetches
can support http/ftp/rsync? access
doesn't exist (apt-proxy is a partial solution for debian mirrors only).


**mirror filesystem***
----------------------

Use an os-level mirroring filesystem
provides transperant os-level filesystem access
platform specific
can fetch using http/ftp/rsync and resume/partial fetches
supports http/ftp/rsync/* access
doesn't exist yet (though there is an ftpfs now)

***application layer filesystem***
----------------------------------

Use a generic application layer filesystem (as in gmc/gnome etc).
Not a solution in itself, but useful for building a solution
can fetch using http/ftp/rsync and resume/partial fetches
existing application filesystems are targeted for gui/desktop browsing, not daemons

Problems with mirroring.
========================

proper mirroring requires fetching of mtimes, symbolic links, inodes (for
mirroring of hardlinks only), and directorys. It is also handy to be able to
obtain sizes. It is possible to simply fetch and cache directory listings,
but unfortunately these are formatted differently for different protocols,
so they will need to be translated.

ftp:

much info can only be retrieved by directory listings, but fortunately these
are systematicly structured for easy parsing and interpretation. Some info
can be obtained by special ftp commands (size, mtime?), but these are not
always supported. The sad news is talking through a http proxy means you are
actualy using http, and hence get the crippled http listings (see below).

http:

some info can be extracted from directory listings, but these are painfuly
html formated, and different httpds/proxys format them differently. http
traditionaly supports fetching of metadata through "HEAD" fetches, but all
desired attributes are not required to be sent. I'm not sure that all info
can always be fetched, making true mirroring not always possible.
Particularly tricky is symbolic links...

rsync:

rsync is built for mirroring, but that doesn't necisarily mean dynamic
mirroring. rsync will fetch files and build directory structures, with full
support for symbolic and hard links. However, I'm not sure it can be used to
fetch directory listings or file attributes without fetching the whole file.
This means it cannot be used to fetch directory listings only.

UPDATE: looks like rsync can be used for listing remote files... checking it
out soon.

Other??
not widely used.


PEER-TO_PEER
============

There are some interesting developments in using peer-to-peer tricks
to reduce load on servers. In particular, bittorrent is very
interesting.

Bittorrent has a limitation that it is designed to distribute fetches
of a single large file. It does not work well for lots of little files.

For a mirror peer to peer arrangement, it would be nice if a network
of mirrors could redirect clients to their nearest available server.

For this perhaps a master site could use ICP queries to peers, and
return redirects to clients.

For propogating changes to mirrors, some sort of "distributed" update
would be good, but requires thought.