[TUHS] another conversion of the CSRG BSD SCCS archives to Git

Greg A. Woods woods at robohack.ca
Sun Dec 1 11:25:22 AEST 2019


At Fri, 29 Nov 2019 22:52:58 +0100, Steffen Nurpmeso <steffen at sdaoden.eu> wrote:
Subject: Re: [TUHS] another conversion of the CSRG BSD SCCS archives to Git
>
> Greg A. Woods wrote in <m1iVoBV-0036tPC at more.local>:
>  |I've been fixing and enhancing James Youngman's git-sccsimport to use
>  |with some of my SCCS archives, and I thought it might be the ultimate
>  |stress test of it to convert the CSRG BSD SCCS archives.
>  |
>  |The conversion takes about an hour to run on my old-ish Dell server.
>  |
>  |This conversion is unlike others -- there is some mechanical compression
>  |of related deltas into a single Git commit.
>  |
>  |https://github.com/robohack/ucb-csrg-bsd
>
> Thanks for taking the time to produce a CSRG repo that seems to
> mimic changesets as they really happened.  As i never made it
> there on my own, i have switched to yours some weeks ago.  (Mind
> you, after doing "gc --aggressive --prune=all" the repository size
> has more than halved, it was the final reason to prepare new
> repositories on a vhost with good internet connection before
> getting this through my flaky wifi here.  Storage and internet
> bandwidth and their cost really do not seem to bother anyone
> anymore.  I have no offense in mind, i only recognized it (the
> hard way).)

Ah!  I did indeed forget the "git gc" step that many conversion guides
recommend.  I might change the import script to do that automatically,
particularly if it has also initialised the repository in the same run.

Apparently github themselves run it regularly:

	https://stackoverflow.com/a/56020315/816536

Probably they do this by configuring "gc.auto" in each repository,
though I've not found any reference to what they might configure it to.

However it seems that without the "--aggressive" option, nothing will be
done in this repository.  With it though I go from 316M down to just 71M.

I don't see any way to force/tell/ask github to run "git gc --aggressive".

Perhaps I can just delete it from github and immediately re-create it
with the re-packed repository, and in theory all the hashes should stay
the same and any existing clones should be unaffected.  What do you think?

Note I have some thoughts of re-doing the whole conversion anyway, with
with more ideas on to dealing with "removed" files (SCCS files renamed
to the likes of "S.foo") and also including the many files that were
never checked into SCCS, perhaps even on a per-release basis, thus being
able to create release tags that can be checked out to match the actual
releases on the CDs.  But this will not happen quite so soon.

--
					Greg A. Woods <gwoods at acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods at robohack.ca>
Planix, Inc. <woods at planix.com>     Avoncote Farms <woods at avoncote.ca>


More information about the TUHS mailing list