+The librsync Python extension for pysync ========================================= ++Document Overview ----------------- This document is a development record for the librsync Python extionsion for pysync. As this project was being sponsored by Accellion, I felt obliged to produce a decent record of my efforts. ++Project Overview ---------------- This project was to include a Python interface to librsync in pysync. It is sponsored by Accellion . The following is the initial milestones and estimates agreed upon. A. Establish feasibility of librsync on windows. Produce a working MSVC++ compiled exe that demonstrates librsync working on Windows. Estimated 1 day. B. Use swig to create a librsync Python extension on Linux. Produce a librsync Python extension tested and documented on Linux. Estimated 3 days. C. Port this Python extension to Windows. Produce a Windows Python extension compiled with Windows MSVC++, plus any windows specific documentation. Estimated 1 day. ++Development Journal =================== +++2002-04-15 Monday, Phase 1 Started ---------------------------------- Checked mail for initial enquirys posted to librsync list on Friday. One post from someone else who has compiled librsync in MSVC++. They hacked it manually using a MSVC project and seemed to have it working. They've offered to send me their stuff. Postponed responce untill I have working environment ready. Prepared working environment of Debian woody running VMware Workstation 3.1 (update installed on Sunday) with win98 guest, running MS Visual C++ and Cygwin (update downloaded on Sunday). Installed MSVC++ in VMware. Played with it. It's quite different to the Borland C++ I used ages ago... might take a while to get used to. At this stage I'm planning on just using MSC's nmake so the whole GUI stuff might be un-necisary. Installed Cygwin in VMware. Hit major problems running Cygwin in VMware. It freezes win98 badly. Posted query to VMware and Cygwin news groups. Wasted much time trying to characterise the problem. I could try rolling back to VMWare v2.0.4 and/or using Win2000, but I don't have the time to experiment. Prepared librsync project. Extracted librsync-0.9.5 and checked into PRCS as "orig.1". Compiled librsync under linux. Easy, "configure; make" with no incidents. Adjusted PRCS project file to exclude compiler output and checked in. Looking at autoconf and automake input and output. Searched web for how other projects support Unix+MSVC. It would be nice to do this properly so it could be largely automated and incorporated upstream. It seems there is no "right way", with some projects using MSVC "projects" and mantaining config.h.msc, some manually mantaining Makefile.msc+config.h.msc, and some autogenerating from Makefile.msc.in & config.h.msc.in. Recieved request from VMware staff to submit incident report. You must have paid support to file incident reports to VMware, and I had to find where you do it on their website, after I had to "register online" the licence they sent me. I guess I'm using the 30days support I got when I paid for it. End result so far is VMware+Win98+Cygwin = no-go, which means I will need to run cygwin in Win98 native, which means re-booting each time I switch between Windows and Linux, which means I need MSVC on Win98 native. Installed MSVC++ in Win98 native. Installed Cygwin in Win98 native. Found a few quirks when installing Cygwin in Win98 on my samba network. Fixed these and updated the cygwin-utils project's README.txt. Compiled librsync in Cygwin. Hit a problem with config reporting gcc "not working". Should be simple to fix, but it'll have to be done tomorrow. I'm already behind, but because of development environment problems, not the project itself. +++2002-04-16 Tuesday, Phase 1 continued ------------------------------------- Prepared Project Journal and configured web access. While I was at it I configured similar web access for all my projects. Apache needed to be tweaked to ensure README's were not stripped from directory listings. Checked mail for more feedback and/or messages from Accellion. Composed and sent message to librsync-devel list about putting MSC support into librsync. Checked vmware and cygwin newsgroups for feedback, and posted more details to vmware group. Checked out autoconf handling of win32 platforms in more detail for other projects, including Python. Downloaded latest Python 2.1.4 windows installer and source. Looked ahead at the latest Python distutils documentation for any win32 platform issues. Wow, distutils has come a long way. It can build tar.gz, rpm or even windows installers from a setup.py, even compiling c source for you. Looks like I won't need nmake makefiles for the final thing. Responded to some more queries on the vmware newsgroup, giving more details of problem. Rebooted into Win98. Installed latest Python2.1. Re-attempted cygwin compile after reading cygwin documentation, failed again. After much stuffing around it seems my cygwin is not happy running off a samba server. Re installed cygwin localy. Much faster and works fine. My first win32 executable of rdiff that works, but not yet compiled with MSVC++. I needed to create a symlink from inttypes.h to sys/stdtypes.h to get it working, but quickly found info on this via google. There is a next intermediate step of getting cygwin to compile a minGW exe that doesn't need the cygwin1.dll, but I'll skip that as time is getting on. Sent status report to Nikhil at Accellion, reporting a little slipage, but nothing serious. Research and an email on the librsync-devel list suggest you can get configure to work for MSVC++. Attempted to get ./configure to configure for MSVC++. Failed miserably. Looks like I need to read up more on how configure works. Looked at latest Python build methods. They use MSVC++ workspaces and projects for the windows build, and have a config.h that I could probably just grab if I can't get configure to generate one for me. Got email from Nikhil at Accellion asking for a phone contact number. Sent him a reply... still hoping to get an MSVC++ compile done before end of today. Damn... still no MSVC++ exe, and another day gone. I'm going to go overtime by a day or so it looks like. Tomorrow I'll just use q MSVC++ workspace with projects and the python config.h. +++2002-04-17 Wednesday, Phase 1 continued ------------------------------------- Checked mail and News. Nothing special. Updated diary. Cleaned up seperate build directories, librsync-linux, librsync-cygwin, librsync-msvcrt. Found more info on getting config to work with MSVC++, I'll have one more crack at it, then send an enquiry on the librsync list. I've been doing research and posting a monolog of my problems and resolutions to the rproxy-devel list for a public record and in the hope of getting feedback. I will put an archive of this thread into this swf, rather than repeat it all here (see librsync.mail). The 0.9.5 source needs some fixes. It has #includes for headers not available in cygwin and MSVC that are correctly detected as missing by configure. The autoconf used to generate configure.in and configure needed to be updated to version 2.53. There are some unix specific routines (getuid geteuid) in popt that can be replaced with simple #defines protected by #ifdef _WIN32. Reached compile working in both cygwin+MSVC and MSVC project, just failing the link. +++2002-04-18 Thursday, Phase 1 completed, Phase 2 started ---------------------------------------------------------- Had final Dentists appointment this morning. New tooth is fine. Compilation using an MSVC++ project works, so I'm leaving the cygwin+MSVC for now. I have a working rdiff.exe compiled with MSVC++. Did some quick testing with compiled rdiff vs Debian distro rdiff... identical results. Compared results to pysync, and noticed pysync produces much smaller diffs. I used the same tests used to test and verify pysync. Posted email about this to librsync-devel list and got responce... someone is looking at it. The python wrapper for librsync will provide fast md4sum and rolling checksum routines, so even if you don't use all of librsync, you can use these bits to speed up the exising pysync code. librsync compiling with MSVC6 Patch submitted on SF. Found but did not bother fixing the issues with cygwin+MSVC6, just documented them with the patch submission. Started on swigg'ed wrapper for it all. +++2002-04-19 Friday, Phase 2 ----------------------------- Checked mail, checked news, updated diary. Posted a little more feedback on vmware issues. They have agreed that there is a problem and are working on it. Sent status report to Accellion. The first milestone has been achieved. I am three days behind schedule. The whole MSVC thing was much more troublesome than I though. Part of the problem was I was not deeply familiar with MSVC or autoconf, so I was never sure if the problems were in librsync, MSVC, or autoconf. I am now much more familiar with both and have found and fixed the problems in librsync related to MSVC. Working on swigged wrapper. VMware have responded to my bug report with a workaround solution. I posted this to newsgroups for closure on my earlier queries. This will make development much easier. Applied the workaround to my vmware installation. I can now compile/debug etc in windows and linux without needing a reboot. Recieved some positive feeback for my librsync MSVC patch on sourceforge. Received interesting post from Van Gale about something called pyrex that could be useful for pysync. A brief look has me very excited, as this is a very easy way to convert python into compiled python extensions. This is the first thing I'll look at after the librsync wrapper. +++2002-04-22 Monday, Phase2 cont --------------------------------- Checked mail and news. Updated diary. Responce from Nikhil at Accellion that they were not majorly concerned about delays. There was a post on the rsync list about updating popt to v 1.5.1. librsync is currently using v1.3 so perhaps this will need upgrading soon. Added a comment to pysync's freshmeat page advertising upcoming librsync wrapper and sponsorship by Accellion. the swig wrapper is going fine. Got a bit off track and had a look at using pyrex to optimize adler32.py with disapointing results. It looks like the generated C code is slower than the pure Python. Looking at the generated C code it look like the problem is a combination of sub-optimal generated code, and the overhead of all the Python boilerplate for safely getting from Python objects to C and back. Another thing that could be contributing is pyrex doesn't support classes yet, so I had to use a Python class wrapper around the pyrex routines, effectively adding another function call layer. For something like the "rotate" routine which actualy doesn't do much, all the overheads out weigh the benefits. In generaly Pyrex looks very promising, but has a very "pre-alpha" feel to it. You can almost declare C variable constants, but they are not initialised. Some things apear to work, untill you run them and look at the code. When/if pyrex supports classes and improves a bit, it could be brilliant. In the mean time it still makes a convenient starting point for converting Python code to C. +++2002-04-23 Tuesday, Phase 2 cont ----------------------------------- Checked mail, news. Updated diary. Nikhil sent a message saying he's very happy with the online SWF tracking of development. Hoping to get the swig wrapper finished on Linux today. The swig wrapper is at the point where I have the librsync.i generating the wrapper module, but I need to trim it down a bit and make it more Python-friendly. I'll probably borrow a bit of the Python zlibmodule.c code and embed it in librsync.i to make the Python interface cleaner. +++2002-04-24 Wednesday ----------------------- My son Jethro was off school today so I was looking after him. Got very little done work-wise. +++2002-04-25 Thurdsday ----------------------- ANZAC day public holiday in Australia. Jethro not at school, and a childrens birthday to go to in the park. Ended up being a CFD (Computer Free Day). +++2002-04-26 Friday -------------------- Jethro at school, a chance to work at last. Only think in my way is my BAS needs to be in today (Australian tax thing), which I will do ASAP after I check my mail. Checked mail. Heaps as usual after two days away from computer. A note of concern from Nikhil about my silence which I responded to. "Shirish H. Phatak" sent in a patch for the librsync delta size problem. Martin Pool said he was buisy supporting rsync itself and wanted to hand over managing librsync to someone else like me or Shirish. Shirish responded he could do it with my support, and I indicated likewise. It is probable that librsync will become it's own project on either SF or samba.org with me and Shirish as project admins. Updated diary. I will apply the delta fix patch, check it works OK with MSVC, and include it. The swig wrapper tuning will continue, and hopefuly I can try it under MSVC while I'm testing the delta fix patch. +++2002-04-27 Saturday ---------------------- Neglected diary. Updated this later. Working on cleaning up the swig wrapper, I noticed the librsync implementation of md4 was quite different to the RSA md4 found in libmd <"http://www.penguin.cz/~mhi/libmd/">. The RSA version is very similar to the RSA version of md5 used by Python for it's md5 module. So similar, I easily created a native md4 module without using swig by modifying Python's md5module.c. I created a distutils setup.py for my md4module and built binary distributions for it on linux and windows. The windows build required some changes to "md4.h" for MSVC compilation. The distutils windows exe installer rocks! I posted a query to the rsync and librsync lists about the origins and status of the librsync md4 implementation. +++2002-04-28 Sunday -------------------- Neglected diary. Updated this later. Got a responce from Martin Pool on the rsync list saying there was nothing special about the librsync md4 code, except Tridge had found a bug in it that results in non-compiant sums in some rare cases. This doesn't affect rsync or librsync operation, but should probably be fixed. Hacked together swig wrappers for libmd's md4c.c and librsync's mdfour.c with a quick test program. Benchmarked them both against the native md4 module using libmd's md4. Results for a Cel-366 doing 10K sums of 4K blocks; swig'd libmd 3.1secs, swig'd librsync 2.5secs, native libmd 1.6secs. Conclusion libmd is a nicer implementation to wrap because it's identical to the RSA md5 used by Python, but librsync is faster. I didn't see any differences in the sums produced, so I guess I didn't hit the bug Tridge was talking about. I'll use the native libmd version for pysync, but make the swig'ed librsync version available from the librsync wrapper. At some stage in the future it might be nice to change the librsync implementation to match the libmd interface so that a native librsync md4 module could be built. Perhaps this implementation could be incorporated upstream by libmd? +++2002-04-29 Monday -------------------- Checked mail. Nothing of significance. Final wrapup day. Quick benchmark of md4 vs md5 using 256K sums of 1K blocks random data. I used native md4 and md4 modules on a Cel-366. Results; md4 12.8secs, md5 15.6secs. I suspect that using the rsynclib md4 could bring this as low as 8secs, which would be nearly half the execution time. Changed pysync.py to use the md4 module. A quick bench gave a small speed improvement, though the bench I used was just the pysync-test.py script which has a lot of unrelated overheads. Finalising swig wrapper and putting together a distutils setup.py Argh... not happy with the swig wrapper. I've omitted dull bits implemented by standard python libs like base64 stuff. The general stuff like result and error codes, converting them to strings, and logging is all done. The md4 class is all done. The stats class is all done, and nicely does readable output and logging. The signature class is done, but the tie-in with the job class needs tidying. The job class in conjuction with the buffer class were painful to use from Python. The callbacks for patch were also painful. I'm re-implementing this borrowing ideas from zlibmodule so that the buffer class is hidden from the Python interface. librsync is structured in a way that could encorage small memory leaks... I'm putting together an email for the librsync list identifying things I'd like to fix. The restructure is complete. It's almost working... I have written a test program that excersises it and it's falling over on delta calculation after flushing the read input... I suspect I've slightly misunderstood the "iter" return codes... will fix tomorrow. +++2002-04-30 Tuesday -------------------- The never ending story continues... librsync is playing up. I cannot figure out what is going wrong, but I am wondering if I've tripped over some inherent bug in librsync that is not related to the swig wrapper. signature calculation works fine. signature loading works fine. delta calculation _seems_ to work, but the delta I get is different to that returned by rdiff. The delta stats look correct, but the delta is 21 bytes smaller and starts to differ at the 1907th byte. Attempting to patch goes totaly screwy immediately with the copy_cb callback being called by librsync with invalid values, resulting in a segfault. I have no idea why the delta is slightly different, but I don't think this is related to the patch failure because it fails before it even processes any of the delta. More debugging is required, but I'm pretty frustrated right now. Looking through the librsync code I keep seeing ways that it could have been done better. In the process I have written my own librsync compatible rollsum that I might swig-wrap and integrate into pysync for a change... I haven't heard back from Nikhil for a while. My last post said I would be finished before Monday... and now Tuesday's gone with the wind. He must be thinking I'm a wanker, and I'm starting to feel like one. I'm at the point where it's all done, just this damn elusive bug. Once I solve this it will take me an hour to put out rpm's and windows installers. I sent Nikhil a status report, suggesting I submit what I've got and we negotiate payment. I could keep working on this forever but he probably wants some results, and I need to draw a line. +++2002-05-01 Wednesday ----------------------- Checked mail. Nothing from Nikhil yet. Preparing for release using distutils and updating documentation. Moved files around to clean up directory structure. Fiddled with Makefile to make things easier. Added MANIFEST.in for distutils to build source distributions from. Added keywords to PRCS project file for distutils to automaticly get project details in setup.py. Started a rollsummodule.c based on md4module.c that will lead to a Python extension of rollsum.c. Hit a small problem with librsync wrapper using Python config.h instead of librsync config.h, which had slight differences related to LONG_LONG. Will fix tomorrow. Went out to see a heap of bands playing for the "Rockin in the free world" charity benefit, which was to collect instruments for the kids of East-Timor. Usual mix of good-n-bad, with one truely outstanding solo female vocalist. Also worth mentioning were comedy-punk band "The Twits", and rockabilly band "The Dice Age". +++2002-05-02 Thursday ----------------------- Nursed head. Checked mail. the LONG_LONG issue seems to be MSVC doesn't like "long long", so librsync's config.h has SIZEOF_LONG_LONG defined as '0'. Python's config.h has "LONG_LONG" defined as "__int64" with SIZEOF_LONG_LONG defined as '8'. The 'rsync.h' header checks SIZEOF_LONG_LONG to see if it can define a "long long' type. Arg. distutils builds hit the same _nsprintf linking problem that cygwin+MSVC builds hit. Looking into it. Fixed it, submitted new patch to Sourceforge that incidentaly fixes the cygwin make problems I had before. A bit more work on the README. Then tweaked the Makefile. Then tweaked it some more. Played with the distutils windows installer. Added more documentation to the README. Cleaned up the test code. Fiddle fiddle fiddle. Jethro has a major ear ache. Much family disruption and doctor vists etc. So near to finishing, so many things in the way :-) Posted email to librsync list about updated patch which includes rolling checksum fix. Small discussion ensude about the future of librsync development. Froze pysync source at version 2.7. built tar.bz2, rpm, zip and exe distributions. Announced on Freshmeat. Sent final status report. Prepared invoice, just waiting for confirmation from Nikhil about payment details. Added a comment to Freshmeat project page about latest release. +++2002-05-03 Friday -------------------- Relaxed. Checked mail. Nikhil responded that payment of the full amount originaly agreed was reasonable, given that the bug would probably be fixed sometime in the near future anyway. Emailed him a copy of my invoice, and will also Fax it to him... need to look into getting minkirri to send faxes via the dialin modems. There was also a little mail from Shirish about the future of librsync. Posted off a responce + summary of additional work required. Suggested we get SF developer access to the rproxy project. Added a Diary entry on SF. Added a comment pointing to the windows installer and SWF on Freshmeat +++2002-05-06 Monday -------------------- A bit more mail on lists about librsync development over the week end. Nikhil indicated he was pleased with the work and payment should be wired over tomorrow. Baring further work on librsync, this is probably the last diary entry. +++2002-06-24 Monday -------------------- Started looking again at pysync. Since the last official release, I've implemented a proper rollsum extension module and updated docs. I'm going to release this now, even though I originaly wanted to include inverse delta calcs too. Right now, my higher priority is to make librsync better. I'm going to integrate rollsum into librsync right now. librsync has a new sourceforge site that will hopefuly accellerate development. +++2003-02-24 Monday -------------------- Got admin rights on the librsync project and set up lists, configured tracker, assigned myself patches and bugs, and started getting a release together. +++2003-10-18 Friday -------------------- Finally got around to updating pysync to use librsync 0.9.6. Also tested out psyco and added support for it. psyco reduced the pysync1T.py tests from 21secs to 14secs! That's a 33% speedup with only 2mins worth of RTFM and coding! Did a bit of distro cleanup, adding NEWS and TODO. Also tweaked Makefile and setup.py. Released 2.24