<div dir="ltr"><div>Larry, thanks for this. I had read some things you've written about the weave before, but not with this level of detail. Sounds weird, but I didn't really appreciate the implications of the weave even though I'm the guy who thought it up. I did understand the importance of not copying data if you can reference it, which is a principle of database design (normal forms, etc).</div><div><br></div><div>In my paper, I can add a little more about the weave and its advantages. Aside from this TUHS post, is there something I can put in the References that people can find?</div><div><br></div><div>Question: Is this right, that TeamWare was literally layered on top of AT&T SCCS, but BitKeeper was layered on your implementation of SCCS? Or, was it more complicated than that?</div><div><br></div><div>Was your implementation of SCCS ever released by itself?</div><div><br></div><div>Marc</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Fri, Dec 13, 2024 at 11:06 AM Larry McVoy <<a href="mailto:lm@mcvoy.com">lm@mcvoy.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Dec 13, 2024 at 09:52:28AM -0700, Marc Rochkind wrote:<br>
> IEEE Transactions on Software Engineering has asked me to write a<br>
> retrospective on the influence of SCCS over the last 50 years, as my SCCS<br>
> paper was published in 1975. They consider it one of the most influential<br>
> papers from TSE's first decade.<br>
> <br>
> There's a funny quote from Ken Thompson that circulates from time-to-time:<br>
> <br>
> "SCCS, the source motel! Programs check in and never check out!"<br>
> <br>
> But nobody seems to know what it means exactly. As part of my research, I<br>
> asked Ken what the quote meant, sunce I wanted to include it. He explained<br>
> that it refers to SCCS storing binary data in its repository file,<br>
> preventing UNIX text tools from operating on the file.<br>
> <br>
> Of course, this is only one of SCCS's many weaknesses. If you have anything<br>
> funny about any of the others, post it here. I already have all the boring<br>
> usual stuff (e.g., long-term locks, file-oriented, no merging).<br>
<br>
Warning, I know more about SCCS than the average person, I've<br>
reimplemented it from scratch and then built BitKeeper on top of an<br>
extended SCCS file format. So lots of info coming. Rick Smith and<br>
Wayne Scott know as much as I do, Rick knows more, he joined me and<br>
promptly started fixing my SCCS implementation. So far as I know, <br>
there is not a more knowledgable person that Rick when it comes to<br>
weave file formats.<br>
<br>
SCCS's strength is the weave format. It's largely not understood, even<br>
by other people working in source management. Here's the benefit of<br>
that weave (if people use it, which, other than BitKeeper, they don't,<br>
looking at you, Clearcase, you had a weave and didn't use it): SCCS can<br>
pass merge data by reference, everyone else copies the data.<br>
<br>
SCCS is a set based system. Each node has a revision number, like 1.5,<br>
but because SCCS, unlike RCS, limited the revisions to at most 4 fields,<br>
like 1.5.1.1, it is _impossible_ to build the history graph from the <br>
revisions, you can in simple graphs but as soon as you branch from a<br>
branch, all bets are off.<br>
<br>
The graph is built from what BitKeeper called serial numbers. Each node<br>
in the graph has at least 2 serials, one that names that node in the<br>
graph, and one that is the parent.<br>
<br>
So if I have a file with 5 revisions in straight line history, the<br>
internal stuff will look something like<br>
<br>
rev me parent<br>
1.5 5 4<br>
1.4 4 3<br>
1.3 3 2<br>
1.2 2 1<br>
1.1 1 0<br>
<br>
So what's the set? Pretty simple for straight line history, you walk<br>
the history from the rev that you want, adding the "me" serial and<br>
going to the parent, repeat until the parent is 0.<br>
<br>
Suppose you branch from rev 1.3.<br>
<br>
rev me parent<br>
1.3.1.1 6 3<br>
1.5 5 4<br>
1.4 4 3<br>
1.3 3 2<br>
...<br>
<br>
See that 1.3.1.1 is me: 6 and parent: 3. So if I were building the set<br>
for 1.3.1.1, it becomes 6, then go to parent 3, 2, 1, skipping over 5<br>
and 4. If you understand that, you are starting to understand the set<br>
and how it is constructed.<br>
<br>
Did you know you can have an argument in the revision history without<br>
adding anything to the data part? SCCS has the ability to include <br>
and/or exclude serials as part of a delta. Lets say Marc looked at<br>
my 1.5 and thought it was garbage. He can exclude it from the <br>
set like so:<br>
<br>
rev me parent include exclude<br>
1.6 7 5 0 5<br>
1.3.1.1 6 3<br>
1.5 5 4<br>
1.4 4 3<br>
1.3 3 2<br>
...<br>
<br>
That doesn't change the data part of the file AT ALL, it's just saying<br>
Marc doesn't want anyone to see the 1.5 changes.<br>
<br>
To understand that, you need to know how SCCS checks out a file. And<br>
you need to know how the data is stored. Which is in a weave. RCS,<br>
and pretty much everything that followed it, doesn't use a weave at<br>
all. RCS stores the most recent version of the file as a complete <br>
copy of the checked out file. Then each delta working backwards up<br>
the trunk is a patch, what diff produces.<br>
<br>
Think about what that means for working on a branch. You have to start<br>
with the most recent version of the file, apply backward patches to go<br>
to earlier versions all the way back to the branch point, then apply<br>
forward patches to work your way to tip of the branch. Ask Dave Miller<br>
how pleasant it is to work on gcc on a branch. It's crazy slow and<br>
painful.<br>
<br>
So how does SCCS do it? Lets say the first version of a file is<br>
<br>
1<br>
2<br>
3<br>
4<br>
5<br>
<br>
The data portion of the history file will look like:<br>
<br>
^AI 1<br>
1<br>
2<br>
3<br>
4<br>
5<br>
^AE 1<br>
<br>
SCCS used ^A at the beginning of a line to mean "this is metadata for<br>
SCCS". ^AI is an insert, ^AD is a delete, and insert/delete are paired<br>
with a ^AE which means end. The number after is the serial. So that<br>
weave says "If serial 1 is in your set, everything after ^AI 1 is part<br>
of that set until you hit the matching ^AE 1.<br>
<br>
Lets say the 2nd version is<br>
<br>
1<br>
2<br>
serial 2 added this<br>
3<br>
4<br>
<br>
Notice that serial 2 deleted what was line 5.<br>
<br>
^AI 1<br>
1<br>
2<br>
^AI 2<br>
serial 2 added this<br>
^AE 2<br>
3<br>
4<br>
^AD 2<br>
5<br>
^AE 2<br>
^AE 1<br>
<br>
So now we can start to see how you walk the weave. If I'm trying to<br>
check out 1.1 aka serial 1, I build a set that has only '1' in the set.<br>
I hit the ^AI 1 see that I have 1 in my set, so I'm now in print mode,<br>
which means print each data line. I hit ^AI 2, that's not in my set,<br>
so I'm now in skip mode. And I skip the stuff inserted by serial 2.<br>
I see the ^AE 2 and I revert back to print mode. I get to ^AD 2,<br>
2 is NOT in my set, so I stay in print mode. Etc.<br>
<br>
Let's make a branch, 1.1.1.1, with lots of data.<br>
<br>
1<br>
2<br>
3<br>
branch line 1<br>
branch line 2<br>
...<br>
branch line 10000<br>
4<br>
5<br>
<br>
^AI 1<br>
1<br>
2<br>
^AI 2<br>
serial 2 added this<br>
^AE 2<br>
3<br>
^AI 3<br>
branch line 1<br>
branch line 2<br>
...<br>
branch line 10000<br>
^AE 3<br>
4<br>
^AD 2<br>
5<br>
^AE 2<br>
^AE 1<br>
<br>
So if I checked out 1.1.1.1, the set is 1, 3, I walk the weave and I'll<br>
print anything inserted by either of those, delete anything deleted<br>
by those, skip anything inserted by anything not in the set, skip any<br>
deletes by anything not in the set.<br>
<br>
The delta table looks like this, notice I've added an author column:<br>
<br>
rev me parent include exclude author<br>
1.1.1.1 3 1 lm<br>
1.2 2 1 lm<br>
1.1 1 0 lm<br>
<br>
If you followed all that, you can see how SCCS can merge by reference.<br>
Lets say Clem decides to merge my branch onto the trunk. The delta table<br>
will get a new entry:<br>
<br>
rev me parent include exclude author<br>
1.3 4 2 3 clem<br>
1.1.1.1 3 1 lm<br>
1.2 2 1 lm<br>
1.1 1 0 lm<br>
<br>
The weave DOES NOT CHANGE. That's the pass by reference. You do the 3 way<br>
merge, it will find the lines "3" and "5" as anchor points in both versions,<br>
so it is a simple insert with no new data added to the weave.<br>
<br>
Here's some magic that *everyone* else gets wrong when they pass by value:<br>
In a system that passes by value (copies) the data, the merge done by clem<br>
would have an annotated listing like so:<br>
<br>
lm 1<br>
lm 2<br>
lm 3<br>
clem branch line 1<br>
clem branch line 2<br>
clem ...<br>
clem branch line 10000<br>
lm 4<br>
lm 5<br>
<br>
Since it copied the data, it looks like Clem wrote it but he didn't, he<br>
just automerged it. In SCCS/BitKeeper it would look like:<br>
<br>
lm 1<br>
lm 2<br>
lm 3<br>
lm branch line 1<br>
lm branch line 2<br>
lm ...<br>
lm branch line 10000<br>
lm 4<br>
lm 5<br>
<br>
which is correct, all of those lines were authored by one person. The only<br>
time the merger should show up as an author is if there was a conflict, <br>
however the merger resolved that conflict is new work and should be<br>
authored by the merger.<br>
<br>
What BitKeeper did, that was a profound step forward, was make the idea<br>
of a repository a formal thing and introduced the concept of changesets<br>
that keeps track of all this stuff at the repository level. So it does<br>
all this stuff at the file level but you don't have to do that low level<br>
work. You could think of SCCS as assembly and BitKeeper as more like C,<br>
it upleveled things to the point that humans can manage the repository<br>
easily.<br>
<br>
Whew. That's a butt load of info. Perhaps better for COFF? Any<br>
questions? It should be obvious that I *love* SCCS, it's a dramatically<br>
better file format than a patch based one, you can get *any* version of<br>
the file in constant time, authorship can be preserved across versions,<br>
it's pretty brilliant and I consider myself blessed to be posting this<br>
in response to SCCS's creator. Hats off to Marc. And big boo, hiss,<br>
to the RCS guy, who got a PhD for RCS (give me a break) and did the<br>
world a huge disservice by bad mouthing SCCS so he could promote RCS.<br>
<br>
--lm<br>
</blockquote></div><div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><i>My new email address is <a href="mailto:mrochkind@gmail.com" target="_blank">mrochkind@gmail.com</a></i></div></div></div></div>