+The librsync Python extension for pysync
=========================================
++Document Overview
-----------------
This document is a development record for the librsync Python extionsion for
pysync. As this project was being sponsored by Accellion, I felt obliged to
produce a decent record of my efforts.
++Project Overview
----------------
This project was to include a Python interface to librsync in pysync. It is
sponsored by Accellion .
The following is the initial milestones and estimates agreed upon.
A. Establish feasibility of librsync on windows. Produce a working MSVC++
compiled exe that demonstrates librsync working on Windows. Estimated 1 day.
B. Use swig to create a librsync Python extension on Linux. Produce a
librsync Python extension tested and documented on Linux. Estimated 3 days.
C. Port this Python extension to Windows. Produce a Windows Python extension
compiled with Windows MSVC++, plus any windows specific documentation.
Estimated 1 day.
++Development Journal
===================
+++2002-04-15 Monday, Phase 1 Started
----------------------------------
Checked mail for initial enquirys posted to librsync list on Friday. One
post from someone else who has compiled librsync in MSVC++. They hacked it
manually using a MSVC project and seemed to have it working. They've offered
to send me their stuff. Postponed responce untill I have working environment
ready.
Prepared working environment of Debian woody running VMware Workstation 3.1
(update installed on Sunday) with win98 guest, running MS Visual C++ and
Cygwin (update downloaded on Sunday).
Installed MSVC++ in VMware. Played with it. It's quite different to the
Borland C++ I used ages ago... might take a while to get used to. At this
stage I'm planning on just using MSC's nmake so the whole GUI stuff might be
un-necisary.
Installed Cygwin in VMware. Hit major problems running Cygwin in VMware. It
freezes win98 badly. Posted query to VMware and Cygwin news groups. Wasted
much time trying to characterise the problem. I could try rolling back to
VMWare v2.0.4 and/or using Win2000, but I don't have the time to experiment.
Prepared librsync project. Extracted librsync-0.9.5 and checked into PRCS as
"orig.1".
Compiled librsync under linux. Easy, "configure; make" with no incidents.
Adjusted PRCS project file to exclude compiler output and checked in.
Looking at autoconf and automake input and output. Searched web for how
other projects support Unix+MSVC. It would be nice to do this properly so it
could be largely automated and incorporated upstream. It seems there is no
"right way", with some projects using MSVC "projects" and mantaining
config.h.msc, some manually mantaining Makefile.msc+config.h.msc, and some
autogenerating from Makefile.msc.in & config.h.msc.in.
Recieved request from VMware staff to submit incident report. You must have
paid support to file incident reports to VMware, and I had to find where you
do it on their website, after I had to "register online" the licence they
sent me. I guess I'm using the 30days support I got when I paid for it. End
result so far is VMware+Win98+Cygwin = no-go, which means I will need to run
cygwin in Win98 native, which means re-booting each time I switch between
Windows and Linux, which means I need MSVC on Win98 native.
Installed MSVC++ in Win98 native.
Installed Cygwin in Win98 native. Found a few quirks when installing Cygwin
in Win98 on my samba network. Fixed these and updated the cygwin-utils
project's README.txt.
Compiled librsync in Cygwin. Hit a problem with config reporting gcc "not
working". Should be simple to fix, but it'll have to be done tomorrow. I'm
already behind, but because of development environment problems, not the
project itself.
+++2002-04-16 Tuesday, Phase 1 continued
-------------------------------------
Prepared Project Journal and configured web access. While I was at it I
configured similar web access for all my projects. Apache needed to be
tweaked to ensure README's were not stripped from directory listings.
Checked mail for more feedback and/or messages from Accellion. Composed and
sent message to librsync-devel list about putting MSC support into librsync.
Checked vmware and cygwin newsgroups for feedback, and posted more details
to vmware group.
Checked out autoconf handling of win32 platforms in more detail for other
projects, including Python. Downloaded latest Python 2.1.4 windows installer
and source.
Looked ahead at the latest Python distutils documentation for any win32
platform issues. Wow, distutils has come a long way. It can build tar.gz,
rpm or even windows installers from a setup.py, even compiling c source for
you. Looks like I won't need nmake makefiles for the final thing.
Responded to some more queries on the vmware newsgroup, giving more details
of problem.
Rebooted into Win98. Installed latest Python2.1. Re-attempted cygwin compile
after reading cygwin documentation, failed again. After much stuffing around
it seems my cygwin is not happy running off a samba server.
Re installed cygwin localy. Much faster and works fine. My first win32
executable of rdiff that works, but not yet compiled with MSVC++. I needed
to create a symlink from inttypes.h to sys/stdtypes.h to get it working, but
quickly found info on this via google.
There is a next intermediate step of getting cygwin to compile a minGW exe
that doesn't need the cygwin1.dll, but I'll skip that as time is getting on.
Sent status report to Nikhil at Accellion, reporting a little slipage, but
nothing serious.
Research and an email on the librsync-devel list suggest you can get
configure to work for MSVC++. Attempted to get ./configure to configure for
MSVC++. Failed miserably. Looks like I need to read up more on how configure
works.
Looked at latest Python build methods. They use MSVC++ workspaces and
projects for the windows build, and have a config.h that I could probably
just grab if I can't get configure to generate one for me.
Got email from Nikhil at Accellion asking for a phone contact number. Sent
him a reply... still hoping to get an MSVC++ compile done before end of
today.
Damn... still no MSVC++ exe, and another day gone. I'm going to go overtime
by a day or so it looks like. Tomorrow I'll just use q MSVC++ workspace with
projects and the python config.h.
+++2002-04-17 Wednesday, Phase 1 continued
-------------------------------------
Checked mail and News. Nothing special. Updated diary.
Cleaned up seperate build directories, librsync-linux, librsync-cygwin,
librsync-msvcrt.
Found more info on getting config to work with MSVC++, I'll have one more
crack at it, then send an enquiry on the librsync list.
I've been doing research and posting a monolog of my problems and
resolutions to the rproxy-devel list for a public record and in the hope of
getting feedback. I will put an archive of this thread into this swf, rather
than repeat it all here (see librsync.mail).
The 0.9.5 source needs some fixes. It has #includes for headers not
available in cygwin and MSVC that are correctly detected as missing by
configure. The autoconf used to generate configure.in and configure needed
to be updated to version 2.53. There are some unix specific routines (getuid
geteuid) in popt that can be replaced with simple #defines protected by
#ifdef _WIN32.
Reached compile working in both cygwin+MSVC and MSVC project, just failing
the link.
+++2002-04-18 Thursday, Phase 1 completed, Phase 2 started
----------------------------------------------------------
Had final Dentists appointment this morning. New tooth is fine.
Compilation using an MSVC++ project works, so I'm leaving the cygwin+MSVC
for now. I have a working rdiff.exe compiled with MSVC++.
Did some quick testing with compiled rdiff vs Debian distro rdiff...
identical results. Compared results to pysync, and noticed pysync produces
much smaller diffs. I used the same tests used to test and verify pysync.
Posted email about this to librsync-devel list and got responce... someone
is looking at it.
The python wrapper for librsync will provide fast md4sum and rolling
checksum routines, so even if you don't use all of librsync, you can use
these bits to speed up the exising pysync code.
librsync compiling with MSVC6 Patch submitted on SF. Found but did not
bother fixing the issues with cygwin+MSVC6, just documented them with the
patch submission.
Started on swigg'ed wrapper for it all.
+++2002-04-19 Friday, Phase 2
-----------------------------
Checked mail, checked news, updated diary. Posted a little more feedback on
vmware issues. They have agreed that there is a problem and are working on it.
Sent status report to Accellion. The first milestone has been achieved. I am
three days behind schedule. The whole MSVC thing was much more troublesome
than I though. Part of the problem was I was not deeply familiar with MSVC
or autoconf, so I was never sure if the problems were in librsync, MSVC, or
autoconf. I am now much more familiar with both and have found and fixed the
problems in librsync related to MSVC.
Working on swigged wrapper.
VMware have responded to my bug report with a workaround solution. I posted
this to newsgroups for closure on my earlier queries. This will make
development much easier.
Applied the workaround to my vmware installation. I can now compile/debug
etc in windows and linux without needing a reboot.
Recieved some positive feeback for my librsync MSVC patch on sourceforge.
Received interesting post from Van Gale about something called pyrex that
could be useful for pysync. A brief look has me very excited, as this is a
very easy way to convert python into compiled python extensions. This is the
first thing I'll look at after the librsync wrapper.
+++2002-04-22 Monday, Phase2 cont
---------------------------------
Checked mail and news. Updated diary. Responce from Nikhil at Accellion that
they were not majorly concerned about delays. There was a post on the rsync
list about updating popt to v 1.5.1. librsync is currently using v1.3 so
perhaps this will need upgrading soon.
Added a comment to pysync's freshmeat page advertising upcoming librsync
wrapper and sponsorship by Accellion.
the swig wrapper is going fine.
Got a bit off track and had a look at using pyrex to optimize adler32.py
with disapointing results. It looks like the generated C code is slower than
the pure Python. Looking at the generated C code it look like the problem is
a combination of sub-optimal generated code, and the overhead of all the
Python boilerplate for safely getting from Python objects to C and back.
Another thing that could be contributing is pyrex doesn't support classes
yet, so I had to use a Python class wrapper around the pyrex routines,
effectively adding another function call layer. For something like the
"rotate" routine which actualy doesn't do much, all the overheads out weigh
the benefits.
In generaly Pyrex looks very promising, but has a very "pre-alpha" feel to
it. You can almost declare C variable constants, but they are not
initialised. Some things apear to work, untill you run them and look at the
code. When/if pyrex supports classes and improves a bit, it could be
brilliant. In the mean time it still makes a convenient starting point for
converting Python code to C.
+++2002-04-23 Tuesday, Phase 2 cont
-----------------------------------
Checked mail, news. Updated diary. Nikhil sent a message saying he's very
happy with the online SWF tracking of development.
Hoping to get the swig wrapper finished on Linux today.
The swig wrapper is at the point where I have the librsync.i generating the
wrapper module, but I need to trim it down a bit and make it more
Python-friendly. I'll probably borrow a bit of the Python zlibmodule.c code
and embed it in librsync.i to make the Python interface cleaner.
+++2002-04-24 Wednesday
-----------------------
My son Jethro was off school today so I was looking after him. Got very
little done work-wise.
+++2002-04-25 Thurdsday
-----------------------
ANZAC day public holiday in Australia. Jethro not at school, and a childrens
birthday to go to in the park. Ended up being a CFD (Computer Free Day).
+++2002-04-26 Friday
--------------------
Jethro at school, a chance to work at last. Only think in my way is my BAS
needs to be in today (Australian tax thing), which I will do ASAP after I
check my mail.
Checked mail. Heaps as usual after two days away from computer. A note of
concern from Nikhil about my silence which I responded to. "Shirish H.
Phatak" sent in a patch for the librsync delta
size problem. Martin Pool said he was buisy supporting rsync itself and
wanted to hand over managing librsync to someone else like me or Shirish.
Shirish responded he could do it with my support, and I indicated likewise.
It is probable that librsync will become it's own project on either SF or
samba.org with me and Shirish as project admins.
Updated diary.
I will apply the delta fix patch, check it works OK with MSVC, and include
it. The swig wrapper tuning will continue, and hopefuly I can try it under
MSVC while I'm testing the delta fix patch.
+++2002-04-27 Saturday
----------------------
Neglected diary. Updated this later.
Working on cleaning up the swig wrapper, I noticed the librsync
implementation of md4 was quite different to the RSA md4 found in libmd
<"http://www.penguin.cz/~mhi/libmd/">. The RSA version is very similar to
the RSA version of md5 used by Python for it's md5 module. So similar, I
easily created a native md4 module without using swig by modifying Python's
md5module.c.
I created a distutils setup.py for my md4module and built binary
distributions for it on linux and windows. The windows build required some
changes to "md4.h" for MSVC compilation. The distutils windows exe installer
rocks!
I posted a query to the rsync and librsync lists about the origins and
status of the librsync md4 implementation.
+++2002-04-28 Sunday
--------------------
Neglected diary. Updated this later.
Got a responce from Martin Pool on the rsync list saying there was nothing
special about the librsync md4 code, except Tridge had found a bug in it
that results in non-compiant sums in some rare cases. This doesn't affect
rsync or librsync operation, but should probably be fixed.
Hacked together swig wrappers for libmd's md4c.c and librsync's mdfour.c
with a quick test program. Benchmarked them both against the native md4
module using libmd's md4. Results for a Cel-366 doing 10K sums of 4K blocks;
swig'd libmd 3.1secs, swig'd librsync 2.5secs, native libmd 1.6secs.
Conclusion libmd is a nicer implementation to wrap because it's identical to
the RSA md5 used by Python, but librsync is faster. I didn't see any
differences in the sums produced, so I guess I didn't hit the bug Tridge
was talking about.
I'll use the native libmd version for pysync, but make the swig'ed librsync
version available from the librsync wrapper. At some stage in the future it
might be nice to change the librsync implementation to match the libmd
interface so that a native librsync md4 module could be built. Perhaps this
implementation could be incorporated upstream by libmd?
+++2002-04-29 Monday
--------------------
Checked mail. Nothing of significance. Final wrapup day.
Quick benchmark of md4 vs md5 using 256K sums of 1K blocks random data. I
used native md4 and md4 modules on a Cel-366. Results; md4 12.8secs, md5
15.6secs. I suspect that using the rsynclib md4 could bring this as low as
8secs, which would be nearly half the execution time.
Changed pysync.py to use the md4 module. A quick bench gave a small speed
improvement, though the bench I used was just the pysync-test.py script
which has a lot of unrelated overheads.
Finalising swig wrapper and putting together a distutils setup.py
Argh... not happy with the swig wrapper. I've omitted
dull bits implemented by standard python libs like base64 stuff. The general
stuff like result and error codes, converting them to strings, and logging
is all done. The md4 class is all done. The stats class is all done, and
nicely does readable output and logging. The signature class is done, but
the tie-in with the job class needs tidying.
The job class in conjuction with the buffer class were painful to use from
Python. The callbacks for patch were also painful. I'm re-implementing this
borrowing ideas from zlibmodule so that the buffer class is hidden from the
Python interface.
librsync is structured in a way that could encorage small memory leaks...
I'm putting together an email for the librsync list identifying things I'd
like to fix.
The restructure is complete. It's almost working... I have written a test
program that excersises it and it's falling over on delta calculation after
flushing the read input... I suspect I've slightly misunderstood the "iter"
return codes... will fix tomorrow.
+++2002-04-30 Tuesday
--------------------
The never ending story continues...
librsync is playing up. I cannot figure out what is going wrong, but I am
wondering if I've tripped over some inherent bug in librsync that is not
related to the swig wrapper.
signature calculation works fine. signature loading works fine. delta
calculation _seems_ to work, but the delta I get is different to that
returned by rdiff. The delta stats look correct, but the delta is 21 bytes
smaller and starts to differ at the 1907th byte.
Attempting to patch goes totaly screwy immediately with the copy_cb callback
being called by librsync with invalid values, resulting in a segfault.
I have no idea why the delta is slightly different, but I don't think this
is related to the patch failure because it fails before it even processes
any of the delta.
More debugging is required, but I'm pretty frustrated right now. Looking
through the librsync code I keep seeing ways that it could have been done
better. In the process I have written my own librsync compatible rollsum
that I might swig-wrap and integrate into pysync for a change...
I haven't heard back from Nikhil for a while. My last post said I would be
finished before Monday... and now Tuesday's gone with the wind. He must be
thinking I'm a wanker, and I'm starting to feel like one. I'm at the point
where it's all done, just this damn elusive bug. Once I solve this it will
take me an hour to put out rpm's and windows installers.
I sent Nikhil a status report, suggesting I submit what I've got and we
negotiate payment. I could keep working on this forever but he probably
wants some results, and I need to draw a line.
+++2002-05-01 Wednesday
-----------------------
Checked mail. Nothing from Nikhil yet.
Preparing for release using distutils and updating documentation.
Moved files around to clean up directory structure. Fiddled with Makefile to
make things easier. Added MANIFEST.in for distutils to build source
distributions from. Added keywords to PRCS project file for distutils to
automaticly get project details in setup.py.
Started a rollsummodule.c based on md4module.c that will lead to a Python
extension of rollsum.c.
Hit a small problem with librsync wrapper using Python config.h instead of
librsync config.h, which had slight differences related to LONG_LONG. Will
fix tomorrow.
Went out to see a heap of bands playing for the "Rockin in the free world"
charity benefit, which was to collect instruments for the kids of
East-Timor. Usual mix of good-n-bad, with one truely outstanding solo female
vocalist. Also worth mentioning were comedy-punk band "The Twits", and
rockabilly band "The Dice Age".
+++2002-05-02 Thursday
-----------------------
Nursed head. Checked mail.
the LONG_LONG issue seems to be MSVC doesn't like "long long", so librsync's
config.h has SIZEOF_LONG_LONG defined as '0'. Python's config.h has "LONG_LONG"
defined as "__int64" with SIZEOF_LONG_LONG defined as '8'. The 'rsync.h'
header checks SIZEOF_LONG_LONG to see if it can define a "long long' type.
Arg. distutils builds hit the same _nsprintf linking problem that cygwin+MSVC
builds hit. Looking into it. Fixed it, submitted new patch to Sourceforge
that incidentaly fixes the cygwin make problems I had before.
A bit more work on the README. Then tweaked the Makefile. Then tweaked it
some more. Played with the distutils windows installer. Added more
documentation to the README. Cleaned up the test code. Fiddle fiddle fiddle.
Jethro has a major ear ache. Much family disruption and doctor vists etc. So
near to finishing, so many things in the way :-)
Posted email to librsync list about updated patch which includes rolling
checksum fix. Small discussion ensude about the future of librsync
development.
Froze pysync source at version 2.7. built tar.bz2, rpm, zip and exe
distributions. Announced on Freshmeat. Sent final status report.
Prepared invoice, just waiting for confirmation from Nikhil about payment
details.
Added a comment to Freshmeat project page about latest release.
+++2002-05-03 Friday
--------------------
Relaxed. Checked mail. Nikhil responded that payment of the full amount
originaly agreed was reasonable, given that the bug would probably be fixed
sometime in the near future anyway. Emailed him a copy of my invoice, and
will also Fax it to him... need to look into getting minkirri to send faxes
via the dialin modems.
There was also a little mail from Shirish about the future of librsync.
Posted off a responce + summary of additional work required. Suggested we
get SF developer access to the rproxy project.
Added a Diary entry on SF. Added a comment pointing to the windows installer
and SWF on Freshmeat
+++2002-05-06 Monday
--------------------
A bit more mail on lists about librsync development over the week end.
Nikhil indicated he was pleased with the work and payment should be wired
over tomorrow. Baring further work on librsync, this is probably the last
diary entry.
+++2002-06-24 Monday
--------------------
Started looking again at pysync. Since the last official release, I've
implemented a proper rollsum extension module and updated docs. I'm going to
release this now, even though I originaly wanted to include inverse delta
calcs too.
Right now, my higher priority is to make librsync better. I'm going to
integrate rollsum into librsync right now. librsync has a new sourceforge
site that will hopefuly accellerate development.
+++2003-02-24 Monday
--------------------
Got admin rights on the librsync project and set up lists, configured
tracker, assigned myself patches and bugs, and started getting a release
together.
+++2003-10-18 Friday
--------------------
Finally got around to updating pysync to use librsync 0.9.6. Also
tested out psyco and added support for it. psyco reduced the
pysync1T.py tests from 21secs to 14secs! That's a 33% speedup with
only 2mins worth of RTFM and coding!
Did a bit of distro cleanup, adding NEWS and TODO. Also tweaked
Makefile and setup.py. Released 2.24