This section of my blog is for those who are into computer technology. It is intended to be followed by computer geeks who are interested in all of my tech-related writings.

If you are interested in more specialised feeds, those are available here:

RSS Atom Add a new post titled:

SamV back-catalog now available

Just realised that a bunch of my projects are not online any more. Well, now they are back up via git dumb http at http://git.utsl.gen.nz/ - gitweb to follow.

Posted Wednesday evening, June 15th, 2011

The GitTorrent Commit Reel

The commit reel is defined in section 5 of the GitTorrent RFC.

It is defined as an uncompressed stream of objects, sorted in a particular way. In practice, it is only the commit objects that are sorted, and all of the dependent objects for those commits are placed with the commit which first introduces them.

So, you start with a repository:

a horizontal chart of a project history

You sort the objects so that they are in reverse date order (tie breaking is still required over git rev-list --date-order, as well as fetching their types and sizes, to produce the commit reel index.

SHA1 hashtypesizeinfo
e951c3b45579blob971lib/VCS/Git/Torrent/Tracker.pm
4a39b387218etree38lib/VCS/Git/Torrent
46a6dd40761eblob1797lib/VCS/Git/Torrent.pm
cb169dea8427tree72lib/VCS/Git
6856da5de8a8tree30lib/VCS
e028c2ec652ftree30lib
a8c6175cb855tree30
6d669a0d7649commit177
d7934d77db6dblob508lib/VCS/Git/Torrent/PWP/Message.pm
831a2dce3123tree38lib/VCS/Git/Torrent/PWP
b67f62af3325blob2062lib/VCS/Git/Torrent/PWP.pm
8e49bb567004tree102lib/VCS/Git/Torrent
d9cfbd2965e1tree72lib/VCS/Git
760c03b92584tree30lib/VCS
58e8231290fatree30lib
08d6743bc1cdtree30
6e85df39b2e9commit233
ae59d4c6cdadblob239t/91-pod-coverage.t
...
9f21fdc6b232commit504
7ed81b753c34blob528lib/VCS/Git/Torrent/Reference.pm
111a3c708d42tree321lib/VCS/Git/Torrent
32f0b74a2902blob6311lib/VCS/Git/Torrent.pm
da591fe54883tree72lib/VCS/Git
7b702d0cf7detree30lib/VCS
39ec1765b517tree30lib
6e5bb34706f6tree245
5e8f6a7807a3commit277

a commit reel

Then, you take the total size of the "tape" and divide by the number of blocks you require. Let's go with 4 for this example.

a horizontal chart of a project history, broken into 4 segments

The listing from the test commit in VCS::Git::Torrent has a total of 233141 bytes of uncompressed object data. Let's divide that into 4 segments on 58285 byte boundaries:

Chunk 1 Chunk 2 Chunk 3 Chunk 4
6d669a0d7649 commit 3145
6e85df39b2e9 commit 6250
d16fe9b37f1c commit 7269
b9b5df08c542 commit 10216
9f5380b003fc commit 13715
3d954bf97808 commit 15211
53b2a50ab357 commit 64934
f8a02453062d commit 76844
60f7c92ec68f commit 78718
8e4c833bc0ed commit 90027
9595e4d0ed4a commit 99113
2499769d4e5b commit 113780
2b67a6d1898a commit 116380
c24dcdcd46de commit 158557
bffe789b4a13 commit 162339
cc77ed21cf03 commit 164454
1dfd53badd66 commit 170494
497da251f9dc commit 174642
5b7e980dce4b commit 178961
6c1fd6467f49 commit 183229
ae4aee0f484e commit 187522
69ff2248cf7f commit 191852
40149c3f6e62 commit 199468
93083bfcc5ee commit 202889
4ff65c62c570 commit 209765
76ed2bbc552c commit 214713
9f21fdc6b232 commit 225327
5e8f6a7807a3 commit 233141

The testpacking.pl script in the VCS::Git::GitTorrent distribution can generate these lists and show how much bandwidth is wasted by using 4 separate packs:

$ git update-ref refs/heads/oldeg 5e8f6a7807a3
$ perl bin/testpacking.pl -n4 oldeg
Generating index...
Length is 233141, 4 blocks of 58286 each
do_pack(3d954bf97808)
Slice #0 (up to 58286): 15211 => 6554 (43%)
do_pack(2b67a6d1898a 9595e4d0ed4a --not 3d954bf97808)
Slice #1 (up to 116572): 101169 => 30035 (29%)
do_pack(497da251f9dc --not 2b67a6d1898a 9595e4d0ed4a)
Slice #2 (up to 174858): 58262 => 16951 (29%)
do_pack(5e8f6a7807a3 --not 497da251f9dc 9595e4d0ed4a)
Slice #3: 58499 => 10224 (17%)
Overall: 233141 => 63764 (27%)
vs Bundle: 233141 => 58297 (25%)
Overall inefficiency: 9%
$ 

So what this is saying is that our repository, originally a 58k bundle, can be split into 4 chunks, defined by the listed boundary commits. At the end, you get 4 bundles of varying sizes, with an extra 5k, or 9% of overhead (yes, these packs are thin).

So that's the idea anyway. To run the above example, you can clone the github repository, and install the requisite modules via CPAN:

$ git clone git://github.com/samv/vcs-git-torrent.git \
        VCS-Git-Torrent
...

$ cd VCS-Git-Torrent

$ perl Makefile.PL ...

$ make ... $

If it complains about missing modules, install via CPAN:

$ cpan Test::Depends Bencode IO::Plumbing
...
Update: Figures for my git.git clone:
arcturus:~/src/git$ time perl ../VCS-Git-Torrent/bin/testpacking.pl -n32 master maint pu
missing fields on Reference at /usr/lib/perl5/Class/MOP/Mixin/AttributeCore.pm line 53
Generating index...
Length is 1104821033, 32 blocks of 34525658 each
do_pack(7e011c40bc6c 466fede1bdfd 76a8323ac7f5)
Slice #0 (up to 34525658): 34518888 => 1909503 (5%)
do_pack(cf1fe88ce1fb b3f041fb0f7d a9572072f0ab fdeb2fb61669 --not 7e011c40bc6c 466fede1bdfd 76a8323ac7f5)
Slice #1 (up to 69051316): 34529558 => 1417850 (4%)
do_pack(38035cf4a51c 1b83ace35e78 50b44eceed21 2326acfa95ac --not cf1fe88ce1fb b3f041fb0f7d a9572072f0ab fdeb2fb61669)
Slice #2 (up to 103576974): 34528221 => 1243468 (3%)
do_pack(c7162c1db6fe b642d9ef6433 ada5853c98c5 --not 38035cf4a51c 1b83ace35e78 f2f880f53707 50b44eceed21 2326acfa95ac cf1fe88ce1fb)
Slice #3 (up to 138102632): 34483917 => 1109044 (3%)
do_pack(f16db173a468 f25b79397c97 61ffbcb98804 8c6ab35efe63 3d234d0afacd efffea033457 53cda8d97e6e da7bad50ed08 c27d205aaefb 96bc4de85cf8 8e27364128b0 a0764cb838c2 b1e9fff7e76c 5faf64cd28bf --not c7162c1db6fe b642d9ef6433 2e1ded44f709 ada5853c98c5 cf1fe88ce1fb)
Slice #4 (up to 172628290): 34429886 => 898299 (2%)
do_pack(a2540023dcf8 3159c8dc2da4 5a03e7f25334 ab41dfbfd4f3 e4fe4b8ef7cd 9c7b0b3fc46e a06f678eb998 d0b353b1a7a2 d0c25035df48 18b0fc1ce1ef 1729fa9878ed 1f24c58724a6 f2b579256475 937a515a15f7 --not f16db173a468 f25b79397c97 61ffbcb98804 8c6ab35efe63 3d234d0afacd efffea033457 53cda8d97e6e da7bad50ed08 c27d205aaefb 96bc4de85cf8 8e27364128b0 a0764cb838c2 b1e9fff7e76c 5faf64cd28bf cf1fe88ce1fb)
...
do_pack(607a9e8aaa9b e39e0d375d1d 106a36509dc7 0e098b6d79fb 14c674e9dc52 43485d3d16e4 7a4ee28f4127 118d938812f3 cc580af88507 86386829d425 3b5ef0e216d2 36e4986f26d1 41fe87fa49cb 9e4b7ab65256 3deffc52d88d b53bb301f578 ad17f01399a9 17635fc90067 375881fa6a43 --not f8b5a8e13cb4 50ff23667020 345a38039414 3f721d1d6d6e 977e289e0d73 2ff4d1ab9ef6 69932bc6117d 1d7b1af42028 fcdd0e92d9d4 754ae192a439 3eb969973335 0cd29a037183 6e0800ef2575 df533f34a318 32d86ca53195 f0cea83f6316 4e65b538acc9 3cb1f9c98203 0eaadfe625fd cf1fe88ce1fb)
Slice #29 (up to 1035769740): 35405510 => 677049 (1%)
do_pack(609621a4ad81 eab58f1e8e5e e7e55483439b 46e09f310567 134748353b2a 500348aa6859 a4ca1465ec8a d23749fe36f1 c8998b4823cb 4d23660e79db ad3f9a71a820 b1a01e1c0762 24ab81ae4d12 c591d5f311e0 9f67d2e8279e 2aae905f23f7 a75d7b54097e 86140d56c150 9bccfcdbff3b 02edd56b84f0 204d363f5a05 7c85d2742978 a5ca8367c223 46148dd7ea41 b7b10385a84c a099469bbcf2 fe0a3cb23c79 6b87ce231d14 1ba447b8dc2e 9fa708dab1cc 1414e5788b85 aa43561ac0c1 63267de2acc1 --not 607a9e8aaa9b e39e0d375d1d 106a36509dc7 30ae47b4cc19 e9c5dcd1313d 51ea55190b6e d5f6a96fa479 0e098b6d79fb 14c674e9dc52 43485d3d16e4 7a4ee28f4127 118d938812f3 cc580af88507 86386829d425 3b5ef0e216d2 36e4986f26d1 41fe87fa49cb 9e4b7ab65256 3deffc52d88d b53bb301f578 ad17f01399a9 17635fc90067 375881fa6a43 3cb1f9c98203 cf1fe88ce1fb)
Slice #30 (up to 1070295398): 34563903 => 641336 (1%)
do_pack(8644f69753e0 --not 609621a4ad81 eab58f1e8e5e e7e55483439b d52dc4b10b2f ebc9d420566d f740cc25298e 492cf3f72f9d 46e09f310567 134748353b2a 500348aa6859 a4ca1465ec8a d23749fe36f1 c8998b4823cb 4d23660e79db ad3f9a71a820 b1a01e1c0762 24ab81ae4d12 c591d5f311e0 9f67d2e8279e 2aae905f23f7 a75d7b54097e 86140d56c150 9bccfcdbff3b 02edd56b84f0 204d363f5a05 7c85d2742978 a5ca8367c223 46148dd7ea41 b7b10385a84c a099469bbcf2 fe0a3cb23c79 6b87ce231d14 1ba447b8dc2e 9fa708dab1cc 1414e5788b85 aa43561ac0c1 63267de2acc1 17635fc90067 cf1fe88ce1fb)
Slice #31: 34528236 => 603576 (1%)
Overall: 1104821033 => 26021211 (2%)
vs Bundle: 1104821033 => 23888867 (2%)
Overall inefficiency: 8%

real	16m54.074s
user	4m30.961s
sys	11m9.302s

That's dividing the pack defined by three branches into 32 generally evenly-sized chunks. Actually the chunks at the beginning are larger than the later ones, which are all between 500kB and 950kB. While they are not perfectly sized, at least they can be generated by any node with the underlying objects, without transferring a binary pack.

However, what will matter is that execution time; the Perl prototype is needlessly inefficient. With a revision cache, we should be able to reduce that time drastically and hopefully be able to retrieve the boundary commits for a given range of commits and number of chunks in milliseconds; the remaining work is mostly on git pack-objects, but given we've drastically reduced the work it has to do, the overall load on the network should not be drastically higher; and because peers can potentially trade these blocks, the workload can be spread out.

Posted in the wee hours of Wednesday night, March 10th, 2011

Improving Google Navigate's Pronounciation of Māori Words

Google Navigate is pronouncing Māori words poorly, which is a shame because it's a phonetic language with few exceptions.

I'd like to contribute some notes for recognising and joining sounds from an English voice syllable bank, to try to make as few errors as possible.

Recognition

A Māori word will match the following regular expression (Unicode quirks notwithstanding):

  (([hkmnprtw]|ng|wh)?([aeiou][aeiou]?|[āēīōū])-?)+

The presence of any letters not on this list means it's not a Māori word.

There are some words (notably, Otago) which match this pattern but are not Māori words, but they are rare (rah-reh).

Pronounciation

What I'm assuming here is that producing a sound bank specifically for Māori, which would be the best solution, is not feasible. But it also can be applied to people who just want to know the closest way to say a Māori word, without sounding awkward. For a more authoritative source, with sound bytes, see Whakahuatanga o te reo Māori from the University of Otago)

  1. simple consonants [hkmnprtw] above: pronounce as in English. Yes, there are exceptions surrounding slight differences for "r" and "t" but they are less important, finessé really.
  2. simple vowels: [aeiou] above. while the actual vowel sounds as spoken by a native Māori speaker and measured on the charts linguists use to map vowels will differ from this, it's perfectly acceptable to use:
    a rhymes with car or far
    eeither a schwa (as in bed) or as in first part of air
    irhymes with tree
    orhymes with paw
    urhymes with shoe
  3. macron vowels: this indicates stress, so emphasise or make the sound slighly longer. Don't try to make a dipthong with the next vowel. If no macron appears on the word then you need to stress the first syllable or probably use no stress if it's a dipthong.

    Eg, "Māori" can be seen as Mā-o-ri (maaa-ore-ree) or Ma-ao-ri (Mar-ow-ree) which are pretty similar.

    Other guides list them as "long" vowels vs. "short" vowels with corresponding English words, though I don't know how useful this is.

  4. dipthongs: these can always be constructed by taking the two vowel sounds (as above, but ideally as natively spoken), saying them next to each other, and then repeating it and trying to merge the two sounds into one smooth transition. As I'm using the term, it's this smooth transition that distinguishes a dipthong from two vowel sounds simply spoken together.

    Unfortunately not all the combinations seem be used in English, or I can't think of a suitably rhyming sound, but these are a good start:

        ae: same as ai
        ai: rhymes with try
        ao: rhymes with how
        au: rhymes with toe (almost)
        ea: sounds like air
        ei: rhymes with way
        eo: (punt)
        eu: rhymes with clue
        ia: sounds like "ya"
        ie: sounds like yeah
        io: sounds like yore
        iu: sounds like you
        oa: rhymes with drawer (not really a dipthong)
        oe: (punt)
        oi: rhymes with toy
        ou: rhymes with toe (li
        ua: sounds like "wah"
        ue: sounds like where
        ui: sounds like "whee"
        uo: sounds like war
    

    The ones I've marked (punt) are either hard to dipthong or I couldn't think of a suitably distinct English word to rhyme it with. They can be pronounced as two syllables, as the difference between a dipthong and two consecutive vowel sounds is only slight anyway.

    As for the dipthongs starting with "u" and "i", I've used "sounds like" because "y" and "w" are not real consonants anyway. You won't find a dipthong following "wu" and there is no "y" in Māori.

  5. Odd consonants:
    • "wh" can correctly be pronounced as "f" consistently, though some regions will pronounce it as "w". There's a historical argument there.
    • "ng" is a nasalised "n", which only really happens mid-word in English but if you don't have sound bank files for that then it's perfectly reasonable to just say "n". In particular NEVER end a syllable with "ng". "orongomai" does not rhyme with "o, wrong am I", it rhymes with "aw, raw know my".

The above 5 rules will let you get pronounciation closer to correct than any attempt to read the words based on English pronounciation rules. They are not perfect, and to some degree include something of an imperial stamp to them. Also I've found people who consider themselves to "know" how a place name is pronounced because they live around it and some mangled-up English mockery of a pronounciation is what everyone in the area has adopted. These tend to not be regular, and certainly not worth trying to emulate.

Posted Monday night, March 7th, 2011

GitTorrent: a synthesis of past efforts

If you read this list post (gmane archive), then you will probably see not much new here. I include it as a back-drop for the subsequent articles.

GitTorrent concept: torrent the pack files

The idea of applying the straight BitTorrent protocol to the pack files was the starting point for GitTorrent. However, this turns out not to be useful, as the pack files are not determinisitic. It is only under a very strict set of precarious circumstances that any two nodes computing a pack for a git set of git objects will produce the same binary content. Fluke, if you will.

Therefore, it seemed to add little to the idea of using unmodified BitTorrent, perhaps distributing a pack file or a git bundle; for instance, no peer could participate in the swarm - even with a complete clone of the repository - without downloading the exact pack file that the repository was serving.

So, over the period of several months, Jonas and I revised the RFC principally to expressed it in terms of stable object manifests, with the goal that nodes could participate with . You can get a flavour for the exchance by glancing at the RFC source history.

The resultant RFC invents terms such as "Commit Reel", defined by a sorting algorithm for objects, similar to the order returned by:

git rev-list --date-order --objects

The above ordering is for all intents and purposes stable, with only a very minor edge case where no strict order exists.

GitTorrent Summer of Code project

There is prototype code from a 2008 Google Summer of Code project. While this project was not considered successful, some key concepts can be demonstrated with it and so I will make that the starting point of the next post in this series, and use it to illustrate the design of the protocol.

One of the practical discoveries was that the code base could not quickly generate the object indexes required for efficiently answering GitTorrent messages.

Related project: git rev-cache

This project was aimed at being a generic cache for git revision tree walking. The idea is that while git's graph colouring algorithm is fast enough for most operations that are important to a user, such as good interactive performance, they are not sufficient for a gittorrent server, or even for the 'initial git clone' case:

  1. Computing the results involves a huge amount of pointer chasing that requires that the cache be hot. If the cache is not hot, such as on a busy server, it can take minutes just to calculate the amount of work to do.

  2. If you want to take a large amount of objects and retrieve a particular sub-section of them, then you have to do all the above work.

So, the revision cache helps by keeping just the important data in a binary, sequential file: all of the important information necessary for graph traversal can be retrieved quickly and computed quickly, too. I will dedicate at least one post to this project, where I will try to merge it with the latest git and show it in action.

GitTorrent distilled: mirror-sync

One of the challenges with GitTorrent was the amount of infrastructure that was required just to get to the point where the core algorithms could be designed. By using Perl, there were already off-the-shelf packages available for things like Bencoding, etc - but it was still quite a drag.

After some reflection on this, and from having read the BitTorrent protocol, I decided that the BitTorrent protocol itself is all cruft and that trying to cut it down to be useful was a waste of time.

The idea of "automatic mirroring" came from this. With Automatic Mirroring, the two main functions of P2P operation - peer discovery and partial transfer - are broken into discrete features.

I presented this idea at GitTogether 2009, and produced a patch series called "client-side mirroring" that was to be efforts towards this goal.

The design of Mirror-Sync is simple enough to be expressed on a single page, making it a vast improvement over GitTorrent already. Additionally, it would fit within the existing git protocol, allowing existing git servers to smoothly get the benefits from peer to peer technology.

If you want to follow this series, you can subscribe to the gittorrent tag, my git section, my comp section or even my entire blog.

Posted Sunday evening, March 6th, 2011 Tags:

ikiwiki

This blog software is awesome, I welcome my new cult overload. Here is how I did it;

apt-get install ikiwiki git-core nginx

Then make a new user and path for the wiki owner

useradd ikiwiki
mkdir /var/www/blog
adduser ikiwiki
chown ikiwiki /var/www/blog

Set up the user for SSH login

mkdir ~ikiwiki/.ssh
ssh-add -L > ~ikiwiki/.ssh/authorized_keys
chown -R ikiwiki ~ikiwiki/.ssh
chown -R og-rwX ~ikiwiki/.ssh

Then you can ssh to the ikiwiki user on your server host and run the basic setup.

ssh ikiwiki@myhost.com
ikiwiki --setup /etc/ikiwiki/auto-blog.setup

... insert a montage of mucking around, editing config files, etc ...

vi myblog.setup

ikiwiki -setup myblog.setup

ikiwiki-mass-rebuild

I got to this nginx configuration (/etc/nginx/sites-available.d/blog):

server {
   listen   80; ## listen for ipv4
   server_name  myblog.com;
   access_log  /var/log/nginx/blog.log;
   location / {
        root   /var/www/blog/docs;
        index  index.html index.htm;

        # I use this little redirector script for an easy, client-side JS redirect
        rewrite ^/blog/(20../.*.html)$ /redirector.html?$1 last;
   }
   error_page  404  /404.html;
   #error_page   403 500 502 503 504  /err/50x.html;
   #location = /50x.html {
   #       root   /var/www/nginx-default;
   #}
   location = /ikiwiki.cgi {
        fastcgi_pass   /var/run/fcgiwrap.socket;
        fastcgi_index  ikiwiki.cgi;
        fastcgi_param  QUERY_STRING       $query_string;
        fastcgi_param  REQUEST_METHOD     $request_method;
        fastcgi_param  CONTENT_TYPE       $content_type;
        fastcgi_param  CONTENT_LENGTH     $content_length;
        fastcgi_param  SCRIPT_NAME        ../cgi-bin/ikiwiki.cgi;
        fastcgi_param  REQUEST_URI        $request_uri;
        fastcgi_param  DOCUMENT_URI       $document_uri;
        fastcgi_param  DOCUMENT_ROOT      /var/www/blog/docs/;
        fastcgi_param  SERVER_PROTOCOL    $server_protocol;
        fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
        fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;
        fastcgi_param  REMOTE_ADDR        $remote_addr;
        fastcgi_param  REMOTE_PORT        $remote_port;
        fastcgi_param  SERVER_ADDR        $server_addr;
        fastcgi_param  SERVER_PORT        $server_port;
        fastcgi_param  SERVER_NAME        $server_name;
   }
}

The SCRIPT_NAME must match what you pass to "cgi_wrapper" in the .setup file configuration.

Activate and restart nginx to get it to pick it up:

cd /etc/nginx/sites-enabled.d
ln -s ../sites-available.d/blog .
sudo /etc/init.d/nginx restart

I had to enable quite a few plugins, like "lockedit html date" which were very useful.

I added an /etc/inittab entry for spawn-fcgi:

iki:4:respawn:/bin/su ikiwiki /usr/bin/spawn-fcgi -s /tmp/fcgi.socket -n -- /usr/sbin/fcgiwrap

(And run 'init q'): Also,

Then, my blog started to work. I could clone my remote repo

git clone ikiwiki@myblog.com:myblog

I find to my delight that I now have a CGI form for adding posts (old school! :-)) and that I can also add them via a text edit and publish via:

git push

Compared to my old wiki scripts, this is much more slick!

I also found it all very difficult until I found the locally installed ikiwiki documentation and got familiar with it. Very useful!

Also useful was this post-commit hook:

#!/bin/sh

# if uncommitted changes, don't do anything yet
if [ -z "$(git ls-files -d -m -u --exclude-standard| tail -1)" ]
then
    branch=$(git symbolic-ref HEAD)
    branch=$(expr "$branch" : 'refs/heads/\(.*\)')
    [ -n "$branch" ] || exit 0;
    [ -n "$(git config -l branch.$branch.merge)" ] || exit 0;
    echo -n "Pull from upstream: "
    git pull --rebase
    echo
fi

This meant that I was always making a linear history.

Posted in the wee hours of Wednesday night, February 24th, 2011 Tags:

making Perl command-line scripts faster with pperl

So, you have a script which is slow, perhaps because you are using a whole collection of modern perl features, which aren't necessarily terribly fast yet. You can't wait for the runtime to implement the features natively and hence run quickly, but there is another solution.

For instance, the XML::SRS distribution on CPAN makes use of some fairly advanced features of Moose, such as meta-attribute meta-roles. These are a win from a coding and maintenance point of view, as they allow a single attribute declaration to give you a Perl class which has XML marshalling as well as type constraints. However it does have a high startup penalty.

How high? Let's try by, say, taking a script which takes a JSON document on input and passes that to a Moose constructor, then outputting the XML.

#!/usr/bin/perl
use XML::SRS;
use JSON::XS;
my $json = join "", <>;
print XML::SRS::Domain::Create->new(
    %{decode_json($json)}
)->to_xml(1);

Fairly simple, right? Now, let's pass into that the data structure from the SYNOPSIS on the man page to JSON, and see how quickly it runs:

$ json=\
'{"domain_name":"kaihoro.co.nz","contact_registrant":{"email":\
"kaihoro.takeaways@gmail.com","name":"Lord Crumb","address":{\
"city":"Kaihoro","cc":"NZ","region":"Nelson","address1":\
"57 Mount Pleasant St","address2":"Burbia"},"phone":{"subscriber":\
"499 2267","ndc":"4","cc":"64"}},"delegate":1,"nameservers":[\
"ns1.registrar.net.nz","ns2.registrar.net.nz"],"action_id":\
"kaihoro.co.nz-create-1298944261","term":12}'
$ echo $json | time ./test.pl
<?xml version="1.0" encoding="ISO-8859-1"?>
<DomainCreate Delegate="1" ActionId="kaihoro.co.nz-create-1298944261" DomainName="kaihoro.co.nz" Term="12">
  <RegistrantContact Name="Lord Crumb" Email="kaihoro.takeaways@gmail.com">
    <PostalAddress Address2="Burbia" Address1="57 Mount Pleasant St" Province="Nelson" City="Kaihoro" CountryCode="NZ"/>
    <Phone LocalNumber="499 2267" AreaCode="4" CountryCode="64"/>
  </RegistrantContact>
  <NameServers>
    <Server FQDN="ns1.registrar.net.nz"/>
    <Server FQDN="ns2.registrar.net.nz"/>
  </NameServers>
</DomainCreate>
1.14user 0.03system 0:01.19elapsed 98%CPU (0avgtext+0avgdata 113152maxresident)k
0inputs+0outputs (0major+7174minor)pagefaults 0swaps

Ok, so the script ran in 1.14s that time. Not exactly a speed demon!

But if we change one line in the script:

#!/usr/bin/perl

to:

#!/usr/bin/pperl

Then we get a much different time the second time that the script is run:

$ echo $json | time ./test.pl
<?xml version="1.0" encoding="ISO-8859-1"?>
<DomainCreate Delegate="1" ActionId="kaihoro.co.nz-create-1298944261" DomainName="kaihoro.co.nz" Term="12">
  <RegistrantContact Name="Lord Crumb" Email="kaihoro.takeaways@gmail.com">
    <PostalAddress Address2="Burbia" Address1="57 Mount Pleasant St" Province="Nelson" City="Kaihoro" CountryCode="NZ"/>
    <Phone LocalNumber="499 2267" AreaCode="4" CountryCode="64"/>
  </RegistrantContact>
  <NameServers>
    <Server FQDN="ns1.registrar.net.nz"/>
    <Server FQDN="ns2.registrar.net.nz"/>
  </NameServers>
</DomainCreate>
0.00user 0.00system 0:00.17elapsed 2%CPU (0avgtext+0avgdata 4704maxresident)k
0inputs+0outputs (0major+376minor)pagefaults 0swaps
$

Great! Down to 170ms! That's much more like an acceptable start-up time :-). Knowing the code base I happen to know that there is a lot of lazy evaluation which is responsible for a lot of that 170ms, so this could probably be improved upon. But a >80% total improvement is a pretty big win for adding a single character to the script.

That's one of the reasons I like Perl. It might suck for a number of reasons, but hey most languages suck for some reason or the other, and at least with Perl there's already a bunch of solutions available, either on CPAN or (in this case) Debian/Ubuntu.

Posted in the wee hours of Sunday night, March 1st, 2010 Tags:

Ah, remember the days?

Remember the days of squeezing TSRs into high memory?

If you want to access that 3GB of XMS, you'll have to page it in and out in 64K chunks.

On a freshly delivered HP Compaq system, without Windows. Clearly they couldn't ship it with no OS loaded at all, because that would be admitting that the person receiving it was pirating (or so the logic goes). So, they pre-loaded FreeDOS on it.

Posted at lunch time on Thursday, October 15th, 2009

Reprap update

A picture of my RepRap with the top frame in place and the three Stepper Motor Controller PCBsNot a huge amount of progress... it's January and of course New Zealand basically shuts down this time of year. But here's another photo of the RepRap. It's now got a top frame and I've soldered together some of the boards. This is shaping up to be one of those "month of Sundays" projects...
Posted late Friday morning, January 30th, 2009 Tags:

Reprap - early beginnings

A picture of Sam holding a beer standing by a well stocked set of fastenings and a RepRap in early stages of constructionSpent today following much of these instructions for building a RepRap - in short, it's a 3D printer printer. It can squirt out little lines of plastic and slowly make shapes, including parts for itself.

Some cut lengths of steel on a tablethe steel rod just after being cut

Of course doesn't print *absolutely* everything it needs - I had to buy a whole lot of steel rod, which I found from suppliers in Petone and Lower Hutt, and today I found various fastenings suppliers in Te Aro for the nuts and bolts. The aim is for it to be able to print all of the parts for making itself that aren't 'easily' found in your friendly neighbourhood hardware stores and engineering supply shops.

Currently there are still some parts which are specially made - principally the electronics; though you could make them out of stripboard if required. The Micro-Controller is a free design called Arduino that in principle anyone with silicon fabrification facilities could produce. It's quite possible that this machine, or the "child" reprap I build with it will become one of the generation of 3D printers which is also able to also drop molten metal and thereby print circuits, potentially getting rid of the need for the circuit boards. There's also one more head design which is a bit like a corking gun, and can print anything which you can squidge in and out of a balloon, such as sealant, glue or even chocolate.

The three motors it uses are not exactly off-the-shelf items; they're a sort of "digital" motor called a stepper motor - they used to be common in dot matrix printers, though. They can be found, and Vik has managed to find a supplier who will stock and sell them for about $25 each.

Of course printers are not the only thing this can print, and there is a small collection of interesting gadgets to make with it appearing over at Thingyverse. In the short term, I hope to contribute to this project by assembling a parts list for this model, and helping Vik tidy up the assembly instructions. Longer term, who knows - producing designs for parts for third world development sure is tempting. But no doubt there will be a bit of room for making some "convenient solutions to modern living" along the way.

Posted late Sunday morning, November 16th, 2008 Tags:

I have a dream! It's a dream about an editor...

To me, the perfect editor;

  1. would run under Parrot.
  2. would be a very simple reimplementation of emacs, in PIR - supporting: files, buffers, windows, frames, major and minor modes, and keyboard <-> function mapping
  3. would be extensible in any parrot-supported language
  4. would support syntax highlighting, by attaching highlighting hints to a TGE grammar, effectively allowing you to write a parsing grammar at the same time as a highlighting mode
  5. would have a keymapping that is identical to VI. Of course being extensible it is likely that people could contrib emacs-like keybindings (sick people - even I don't use those in emacs)

(in response to The Quest for the Perfect Editor)

Posted mid-morning Sunday, September 14th, 2008 Tags: