<?xml version="1.0"?>
<rss version="2.0"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/" >
<channel>
<title>pages tagged gittorrent</title>
<link>http://sam.vilain.net//tags/gittorrent.html</link>
<description>samv.blog</description>
<item>

	<title>past synthesis</title>


	<guid isPermaLink="false">http://sam.vilain.net//comp/git/gittorrent/past_synthesis.html</guid>

	<link>http://sam.vilain.net//comp/git/gittorrent/past_synthesis.html</link>


	<category>git</category>

	<category>gittorrent</category>


	<pubDate>Sun, 06 Mar 2011 19:18:00 +0000</pubDate>
	<dcterms:modified>2011-03-13T16:00:31Z</dcterms:modified>

	<description>&lt;h1&gt;GitTorrent: a synthesis of past efforts&lt;/h1&gt;

&lt;p&gt;If you read &lt;a href=&quot;http://git.661346.n2.nabble.com/Re-Resumable-clone-Gittorrent-again-stable-packs-tp5894379p5908685.html&quot;&gt;this list post&lt;/a&gt; (&lt;a href=&quot;http://thread.gmane.org/gmane.comp.version-control.git/164569/focus=164897&quot;&gt;gmane archive&lt;/a&gt;), then you will probably see not much new here.  I include it as a back-drop for the subsequent articles.&lt;/p&gt;

&lt;h2&gt;GitTorrent concept: torrent the pack files&lt;/h2&gt;

&lt;p&gt;The idea of applying the straight BitTorrent protocol to the pack
files was the starting point for GitTorrent.  However, this turns out
not to be useful, as the pack files are not determinisitic.  It is
only under a very strict set of precarious circumstances that any two
nodes computing a pack for a git set of git objects will produce the
same binary content.  Fluke, if you will.&lt;/p&gt;

&lt;p&gt;Therefore, it seemed to add little to the idea of using unmodified
BitTorrent, perhaps distributing a pack file or a git bundle; for
instance, no peer could participate in the swarm - even with a
complete clone of the repository - without downloading the exact pack
file that the repository was serving.&lt;/p&gt;

&lt;p&gt;So, over the period of several months, Jonas and I revised the RFC
principally to expressed it in terms of stable object manifests, with
the goal that nodes could participate with .  You can get a flavour
for the exchance by glancing at &lt;a href=&quot;https://github.com/samv/gittorrent/commits/master?page=3&quot;&gt;the RFC source
history&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://utsl.gen.nz/gittorrent/rfc.html&quot;&gt;resultant RFC&lt;/a&gt; invents
terms such as &quot;Commit Reel&quot;, defined by a sorting algorithm for
objects, similar to the order returned by:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git rev-list --date-order --objects
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above ordering is for all intents and purposes stable, with only a
very minor edge case where no strict order exists.&lt;/p&gt;

&lt;h2&gt;GitTorrent Summer of Code project&lt;/h2&gt;

&lt;p&gt;There is &lt;a href=&quot;http://github.com/samv/VCS-Git-Torrent&quot;&gt;prototype code&lt;/a&gt; from
a 2008 Google Summer of Code project.  While this project was not
considered successful, some key concepts can be demonstrated with it
and so I will make that the starting point of the next post in this
series, and use it to illustrate the design of the protocol.&lt;/p&gt;

&lt;p&gt;One of the practical discoveries was that the code base could not
quickly generate the object indexes required for efficiently answering
GitTorrent messages.&lt;/p&gt;

&lt;h2&gt;Related project: git rev-cache&lt;/h2&gt;

&lt;p&gt;This project was aimed at being a generic cache for git revision tree
walking.  The idea is that while git&#39;s &lt;a href=&quot;http://en.wikipedia.org/wiki/Graph_coloring&quot;&gt;graph
colouring&lt;/a&gt; algorithm is
fast enough for most operations that are important to a user, such as
good interactive performance, they are not sufficient for a gittorrent
server, or even for the &#39;initial git clone&#39; case:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Computing the results involves a huge amount of &lt;em&gt;pointer chasing&lt;/em&gt; that requires that the cache be &lt;em&gt;hot&lt;/em&gt;.  If the cache is not hot, such as on a busy server, it can take &lt;em&gt;minutes&lt;/em&gt; just to calculate the amount of work to do.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you want to take a large amount of objects and retrieve a particular sub-section of them, then you have to do all the above work.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;So, the revision cache helps by keeping just the important data in a binary, sequential file: all of the important information necessary for graph traversal can be retrieved quickly and computed quickly, too.  I will dedicate at least one post to this project, where I will try to merge it with the latest git and show it in action.&lt;/p&gt;

&lt;h2&gt;GitTorrent distilled: mirror-sync&lt;/h2&gt;

&lt;p&gt;One of the challenges with GitTorrent was the amount of infrastructure
that was required just to get to the point where the core algorithms
could be designed.  By using Perl, there were already off-the-shelf
packages available for things like Bencoding, etc - but it was still
quite a drag.&lt;/p&gt;

&lt;p&gt;After some reflection on this, and from having read the BitTorrent
protocol, I decided that the BitTorrent protocol itself is all cruft
and that trying to cut it down to be useful was a waste of time.&lt;/p&gt;

&lt;p&gt;The idea of &quot;automatic mirroring&quot; came from this.  With Automatic
Mirroring, the two main functions of P2P operation - peer discovery
and partial transfer - are broken into discrete features.&lt;/p&gt;

&lt;p&gt;I presented this idea at &lt;a href=&quot;https://git.wiki.kernel.org/index.php/GitTogether&quot;&gt;GitTogether&lt;/a&gt; 2009, and produced &lt;a href=&quot;http://thread.gmane.org/gmane.comp.version-control.git/133626/focus=133628&quot;&gt;a patch series&lt;/a&gt; called &quot;client-side mirroring&quot; that was to be efforts towards this goal.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;http://code.google.com/p/gittorrent/wiki/MirrorSync&quot;&gt;design of
Mirror-Sync&lt;/a&gt; is
simple enough to be expressed on a single page, making it a vast
improvement over GitTorrent already.  Additionally, it would fit
within the existing git protocol, allowing existing git servers to
smoothly get the benefits from peer to peer technology.&lt;/p&gt;

&lt;p&gt;If you want to follow this series, you can subscribe to &lt;a href=&quot;http://sam.vilain.net/tags/gittorrent.html&quot;&gt;the
gittorrent tag&lt;/a&gt;, &lt;a href=&quot;http://sam.vilain.net/comp/git.html&quot;&gt;my git
section&lt;/a&gt;, &lt;a href=&quot;http://sam.vilain.net/comp.html&quot;&gt;my comp section&lt;/a&gt; or even &lt;a href=&quot;http://sam.vilain.net/blog.html&quot;&gt;my
entire blog&lt;/a&gt;.&lt;/p&gt;
</description>


	<comments>/comp/git/gittorrent/past_synthesis.html#comments</comments>

</item>

</channel>
</rss>
