[TUHS] SCCS roach motel
    Clem Cole 
    clemc at ccc.com
       
    Sat Dec 14 07:46:58 AEST 2024
    
    
  
@Marc and Larry
As a satisfied user of SCCS (and later Bitkeeper), it's still my preferred
choice. To this day, I have looked up directions to do simple things in Git
that were so natural for me in SCCS. I don't think it's the old dog
syndrome, either. SCCS was hardly perfect, but it solved a problem very
well.   Eric's sccs(1) front end for it from UCB cleaned up a few of the
rough edges and experience taught us a little about care and feeding.
Truth is I still use for small projects.  It's easier to set up and it just
protected me against myself.
As a side note, it also exposed/demonstrated my real dislike for NFS early
on when we started to see ZERO filled blocks in SCCS files (stateless just
sucks).
So thank you both. I have no idea how many times you saved my team and me
time and bailed us out.
ᐧ
On Fri, Dec 13, 2024 at 1:33 PM Marc Rochkind <mrochkind at gmail.com> wrote:
> Larry, thanks for this. I had read some things you've written about the
> weave before, but not with this level of detail. Sounds weird, but I didn't
> really appreciate the implications of the weave even though I'm the guy who
> thought it up. I did understand the importance of not copying data if you
> can reference it, which is a principle of database design (normal forms,
> etc).
>
> In my paper, I can add a little more about the weave and its advantages.
> Aside from this TUHS post, is there something I can put in the References
> that people can find?
>
> Question: Is this right, that TeamWare was literally layered on top of
> AT&T SCCS, but BitKeeper was layered on your implementation of SCCS? Or,
> was it more complicated than that?
>
> Was your implementation of SCCS ever released by itself?
>
> Marc
>
> On Fri, Dec 13, 2024 at 11:06 AM Larry McVoy <lm at mcvoy.com> wrote:
>
>> On Fri, Dec 13, 2024 at 09:52:28AM -0700, Marc Rochkind wrote:
>> > IEEE Transactions on Software Engineering has asked me to write a
>> > retrospective on the influence of SCCS over the last 50 years, as my
>> SCCS
>> > paper was published in 1975. They consider it one of the most
>> influential
>> > papers from TSE's first decade.
>> >
>> > There's a funny quote from Ken Thompson that circulates from
>> time-to-time:
>> >
>> > "SCCS, the source motel! Programs check in and never check out!"
>> >
>> > But nobody seems to know what it means exactly. As part of my research,
>> I
>> > asked Ken what the quote meant, sunce I wanted to include it. He
>> explained
>> > that it refers to SCCS storing binary data in its repository file,
>> > preventing UNIX text tools from operating on the file.
>> >
>> > Of course, this is only one of SCCS's many weaknesses. If you have
>> anything
>> > funny about any of the others, post it here. I already have all the
>> boring
>> > usual stuff (e.g., long-term locks, file-oriented, no merging).
>>
>> Warning, I know more about SCCS than the average person, I've
>> reimplemented it from scratch and then built BitKeeper on top of an
>> extended SCCS file format.  So lots of info coming.  Rick Smith and
>> Wayne Scott know as much as I do, Rick knows more, he joined me and
>> promptly started fixing my SCCS implementation.  So far as I know,
>> there is not a more knowledgable person that Rick when it comes to
>> weave file formats.
>>
>> SCCS's strength is the weave format.  It's largely not understood, even
>> by other people working in source management.  Here's the benefit of
>> that weave (if people use it, which, other than BitKeeper, they don't,
>> looking at you, Clearcase, you had a weave and didn't use it): SCCS can
>> pass merge data by reference, everyone else copies the data.
>>
>> SCCS is a set based system.   Each node has a revision number, like 1.5,
>> but because SCCS, unlike RCS, limited the revisions to at most 4 fields,
>> like 1.5.1.1, it is _impossible_ to build the history graph from the
>> revisions, you can in simple graphs but as soon as you branch from a
>> branch, all bets are off.
>>
>> The graph is built from what BitKeeper called serial numbers.  Each node
>> in the graph has at least 2 serials, one that names that node in the
>> graph, and one that is the parent.
>>
>> So if I have a file with 5 revisions in straight line history, the
>> internal stuff will look something like
>>
>> rev     me      parent
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> 1.2     2       1
>> 1.1     1       0
>>
>> So what's the set?  Pretty simple for straight line history, you walk
>> the history from the rev that you want, adding the "me" serial and
>> going to the parent, repeat until the parent is 0.
>>
>> Suppose you branch from rev 1.3.
>>
>> rev     me      parent
>> 1.3.1.1 6       3
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> ...
>>
>> See that 1.3.1.1 is me: 6 and parent: 3.  So if I were building the set
>> for 1.3.1.1, it becomes 6, then go to parent 3, 2, 1, skipping over 5
>> and 4.  If you understand that, you are starting to understand the set
>> and how it is constructed.
>>
>> Did you know you can have an argument in the revision history without
>> adding anything to the data part?  SCCS has the ability to include
>> and/or exclude serials as part of a delta.  Lets say Marc looked at
>> my 1.5 and thought it was garbage.  He can exclude it from the
>> set like so:
>>
>> rev     me      parent  include exclude
>> 1.6     7       5       0       5
>> 1.3.1.1 6       3
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> ...
>>
>> That doesn't change the data part of the file AT ALL, it's just saying
>> Marc doesn't want anyone to see the 1.5 changes.
>>
>> To understand that, you need to know how SCCS checks out a file.  And
>> you need to know how the data is stored.  Which is in a weave.  RCS,
>> and pretty much everything that followed it, doesn't use a weave at
>> all.  RCS stores the most recent version of the file as a complete
>> copy of the checked out file.  Then each delta working backwards up
>> the trunk is a patch, what diff produces.
>>
>> Think about what that means for working on a branch.  You have to start
>> with the most recent version of the file, apply backward patches to go
>> to earlier versions all the way back to the branch point, then apply
>> forward patches to work your way to tip of the branch.  Ask Dave Miller
>> how pleasant it is to work on gcc on a branch.  It's crazy slow and
>> painful.
>>
>> So how does SCCS do it?  Lets say the first version of a file is
>>
>> 1
>> 2
>> 3
>> 4
>> 5
>>
>> The data portion of the history file will look like:
>>
>> ^AI 1
>> 1
>> 2
>> 3
>> 4
>> 5
>> ^AE 1
>>
>> SCCS used ^A at the beginning of a line to mean "this is metadata for
>> SCCS".  ^AI is an insert, ^AD is a delete, and insert/delete are paired
>> with a ^AE which means end.  The number after is the serial.  So that
>> weave says "If serial 1 is in your set, everything after ^AI 1 is part
>> of that set until you hit the matching ^AE 1.
>>
>> Lets say the 2nd version is
>>
>> 1
>> 2
>> serial 2 added this
>> 3
>> 4
>>
>> Notice that serial 2 deleted what was line 5.
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So now we can start to see how you walk the weave.  If I'm trying to
>> check out 1.1 aka serial 1, I build a set that has only '1' in the set.
>> I hit the ^AI 1 see that I have 1 in my set, so I'm now in print mode,
>> which means print each data line.  I hit ^AI 2, that's not in my set,
>> so I'm now in skip mode.  And I skip the stuff inserted by serial 2.
>> I see the ^AE 2 and I revert back to print mode.  I get to ^AD 2,
>> 2 is NOT in my set, so I stay in print mode.  Etc.
>>
>> Let's make a branch, 1.1.1.1, with lots of data.
>>
>> 1
>> 2
>> 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> 4
>> 5
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> ^AI 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> ^AE 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So if I checked out 1.1.1.1, the set is 1, 3, I walk the weave and I'll
>> print anything inserted by either of those, delete anything deleted
>> by those, skip anything inserted by anything not in the set, skip any
>> deletes by anything not in the set.
>>
>> The delta table looks like this, notice I've added an author column:
>>
>> rev     me      parent  include exclude author
>> 1.1.1.1 3       1                       lm
>> 1.2     2       1                       lm
>> 1.1     1       0                       lm
>>
>> If you followed all that, you can see how SCCS can merge by reference.
>> Lets say Clem decides to merge my branch onto the trunk. The delta table
>> will get a new entry:
>>
>> rev     me      parent  include exclude author
>> 1.3     4       2       3               clem
>> 1.1.1.1 3       1                       lm
>> 1.2     2       1                       lm
>> 1.1     1       0                       lm
>>
>> The weave DOES NOT CHANGE.  That's the pass by reference.  You do the 3
>> way
>> merge, it will find the lines "3" and "5" as anchor points in both
>> versions,
>> so it is a simple insert with no new data added to the weave.
>>
>> Here's some magic that *everyone* else gets wrong when they pass by value:
>> In a system that passes by value (copies) the data, the merge done by clem
>> would have an annotated listing like so:
>>
>> lm      1
>> lm      2
>> lm      3
>> clem    branch line 1
>> clem    branch line 2
>> clem    ...
>> clem    branch line 10000
>> lm      4
>> lm      5
>>
>> Since it copied the data, it looks like Clem wrote it but he didn't, he
>> just automerged it.  In SCCS/BitKeeper it would look like:
>>
>> lm      1
>> lm      2
>> lm      3
>> lm      branch line 1
>> lm      branch line 2
>> lm      ...
>> lm      branch line 10000
>> lm      4
>> lm      5
>>
>> which is correct, all of those lines were authored by one person.  The
>> only
>> time the merger should show up as an author is if there was a conflict,
>> however the merger resolved that conflict is new work and should be
>> authored by the merger.
>>
>> What BitKeeper did, that was a profound step forward, was make the idea
>> of a repository a formal thing and introduced the concept of changesets
>> that keeps track of all this stuff at the repository level.  So it does
>> all this stuff at the file level but you don't have to do that low level
>> work.  You could think of SCCS as assembly and BitKeeper as more like C,
>> it upleveled things to the point that humans can manage the repository
>> easily.
>>
>> Whew.  That's a butt load of info.  Perhaps better for COFF?  Any
>> questions?  It should be obvious that I *love* SCCS, it's a dramatically
>> better file format than a patch based one, you can get *any* version of
>> the file in constant time, authorship can be preserved across versions,
>> it's pretty brilliant and I consider myself blessed to be posting this
>> in response to SCCS's creator.  Hats off to Marc.  And big boo, hiss,
>> to the RCS guy, who got a PhD for RCS (give me a break) and did the
>> world a huge disservice by bad mouthing SCCS so he could promote RCS.
>>
>> --lm
>>
>
>
> --
> *My new email address is mrochkind at gmail.com <mrochkind at gmail.com>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20241213/e48f9f62/attachment-0001.htm>
    
    
More information about the TUHS
mailing list