[TUHS] origin of string.h and ctype.h

Doug McIlroy doug at cs.dartmouth.edu
Sun Aug 13 11:58:19 AEST 2017


Jon Steinhart <jon at fourwinds.com> asked this question (not on tuhs).
It highlights some less well known contributors to research Unix.

> I'm trying to find out who came up with strcmp and the idea of
> returning -1,0,1 for a string comparison.  I can see that it's not in
> my V6 manual but is in V7.  Don't see anything in Algol, PL/I, BCPL, or B

The -1,0,1 return from comparisons stems from the interface of qsort,
which was written by Lee McMahon. As far as I know, the interface for
the comparison-function parameter originated with him, but conceivably
he borrowed it from some other sort utility. The negative-zero-positive
convention for the return value encouraged (and perhaps was motivated by)
this trivial comparison function for integers
	int compar(a,b) { return(a-b); }
This screws up on overflow, so cautious folks would write it with
comparisons. And -1,0,1 were the easiest conventional values to return:
	int compar(a,b) {
		if(a<b) return(-1);
		if(a>b) return(1);
		return(0);
	}
qsort was in v2. In v3 a string-comparison routine called "compar"
appeared, with a man page titled "string comparison for sort". So the
convention was established early on.

Compar provided the model for strcmp, one of a package of basic string
operations that came much later, in v7, under the banner of string.h
and ctype.h.

These packages were introduced at the urging of Nils-Peter Nelson, a
good friend of the Unix lab, who was in the Bell Labs comp center.
Here's the story in his own words.

I wrote a memo to dmr with some suggestions for additions to C.  I asked
for the str... because the mainframes had single instructions to implement
them. I know for sure I had a blindingly fast implementation of isupper,
ispunct, etc. I had a table of length 128 integers for the ascii character
set; I assigned bits for upper, lower, numeric, punct, control, etc. So
ispunct(c) became

	#define PUNCT 0400
	return(qtable[c]&PUNCT)
instead of
	if(c==':' || c ==';' || ...
[or
	switch(c) {
		default:
			return 0;
		case ':':
		case ';':
		...
		return 1;
	}
MDM]
dmr argued people could easily write their own but when I showed
him my qtable was 20 times faster he gave in.  I also asked for type
logical which dmr implemented as unsigned, which was especially useful
when bitfields were implemented (a 2 bit int would have values -2, -1,
0, 1 instead of 0, 1, 2, 3).  I requested a way to interject assembler,
which became asm() (yes, a bad idea).



More information about the TUHS mailing list