[TUHS] troff was not so widely usable

Thu Feb 11 23:06:23 AEST 2021

Andrew Hume <andrew at humeweb.com> wrote:
> ken wanted to do the right thing and tried to license the fonts from
> Merganthaler, but we had endless discussions where Ken would say "we
> know how the fonts are encoded, can we license them?" and the sales
> person would say "no you can't; they're secret"...

The 202 reconstruction paper also bemoans the bit-rot loss of the
information about the Mergenthaler character representation.

This reminded me of a project that I and a small team did in the 1980s.
We were licensees of Sun's NeWS source code, and we wanted our software
to be able to use the wide variety of fonts sold commercially by Adobe
and font design companies.  The problem was, they were encoded in Adobe
Type 1 font definitions, which Adobe considered a proprietary trade
secret.  Due to the lack of copyright protection for fonts (whose whole
raison d'etre is to be copied onto paper many thousands or millions of
times by each user), font designers used security by obscurity to
protect their work.

Our team ended up pulling the ROMs out of an original LaserWriter, and
writing and improving a 68000 disassembler.  One of our team members
read the code, figured out which parts handled these fonts, and how it
decoded them.  He wrote that down in his own words in a plain text
document, not a program, following the prevailing court decisions about
how to avoid copyright issues while reverse-engineering a trade secret.
Ultimately, we released that document to some interested people, so that
others could implement support for Type 1 fonts.  Shortly afterward,
Adobe magnanimously decided to "release the specification", as Wikipedia
says.  Later, I got a nice note from L. Peter Deutsch, maintainer of
Ghostscript, who said:

  I just received my copy of the Adobe Type 1 Font Format book, and
  compared the contents against the message you sent me last July.  You
  guys at Grasshopper really did a good job of cracking the format.  I
  was amused to see that the book omits quite a few operators that you
  deciphered on your own.

This wasn't too long after Adobe had also been threatening Sun and its
NeWS licensees for re-implementing PostScript from its spec, rather than
licensing it from them.  I had noticed a paragraph in the prospectus for
their IPO, which said something like, "Adobe has put PostScript into the
public domain in order to encourage its wide use."  Either the language
was free for anyone to implement, or they were guilty of securities
fraud.  When hoist on that petard, so to speak, they backed down.

	John

Appendix:  The reverse-engineered font format.  I was happy to be able
to find a copy of this on my current machine -- it too could have
bit-rotted in the last 30 years.  File mod date is 1989.  I am not
the author.

		Description of Adobe PostScript FontType 1

PostScript FontType 1 is Adobe's proprietary font format.  The
internal fonts in Apple LaserWriters (the Times, Helvetica, and
Courier families) are stored in this format, although the stroke
descriptions are not normally available outside the LaserWriter.
Other fonts from Adobe are also in this format, although an additional
layer of encryption prevents the stroke descriptions from being
directly visible.

IMPORTANT NOTE: The shrink-wrap license agreement under which external
Adobe fonts are distributed expressly forbids the decryption and
decompilation of these fonts.  It does not appear to forbid the using
of a program to decrypt and display these fonts, (since this is what
happens to them inside a PostScript printer) as long as they are not
used on more than one PRINTER at a time.

PostScript fonts are accessed through Font Dictionaries.  See the Red
Book, section 5.3, for a description of the standard entries in a Font
Dictionary.  The entries we will concern ourselves with here are the
dictionaries CharStrings and Private.  The character shape
descriptions are stored in CharStrings in an encrypted format.  The
shape descriptions are reached via the Encoding vector, see section
5.4.  When a character needs to be rendered, a pseudo-code interpreter
is called, and handed the encrypted shape description, and the Font
Dictionary.

The pseudo-code interpreter checks that this is a valid Type 1 font.
The Font Dictionary for a valid font must contain an entry called
`Private', which is a dictionary.  Private must contain the entry
`password', an integer having the value 5839.

The shape descriptions have the ability to call other encrypted
routines, accessed by their index in the array `Subrs', defined in
Private.  The encrypted routines can also call PostScript code
directly, executing an element from the array `OtherSubrs', defined in
Private.

Execution of the pseudo-code occurs in the same environment as the
BuildChar routine for a user defined font (see section 5.7).  A gsave
precedes this execution, and the CTM is modified by the FontMatrix.
Upon exiting, a grestore is performed, and the currentpoint is updated
by the width of the character.

Encrypted routines from CharStrings and Subrs can be decrypted with
the following program.  Routines in OtherSubrs are ordinary,
unencrypted PostScript code.

#include <stdio.h>

int magic1, magic2, magic3, magic4, mask;

main()
{
	int c, d, skip4;

#ifdef eexec
	magic1 = 0x9a36dda2 ;
	magic2 = 0x9a3704d3 ;
#else
	magic1 = 0x3f8927b5 ;
	magic2 = 0x3fea375f ;
#endif
	magic3 = 0x3fc5ce6d ;
	magic4 = 0x0d8658bf ;

	mask = magic1 ^ magic2 ;
	skip4 = 0 ;

	printf("<");
	while ((c=getchar())!=EOF) {
		d = ((mask >> 8) ^ c) & 0xff ;
		if (++skip4 > 4)
#ifdef eexec
			putchar(d);
#else
			printf(" %02x", d);
#endif
		mask = magic4 + magic3 * (mask + c) ;
		}
	printf(" >");
	exit(0);
}

Note that this is the same decryption algorythm used for eexec, except
that the initial mask value, as determined by magic1 and magic2, is
different.  The result of decrypting is a font rendering pseudo code.
Strings are stored encrypted in memory, and decrypted on the fly by
the interpreter as it executes the pseudo-code.

		Description of the pseudo-code.

The pseudo-code interpreter obtains bytes in sequence from the
decryption routine.  These bytes are grouped into a sequence of
instructions.  Each instruction encodes either a number (which is
pushed onto an internal stack), or an operation (which is performed).
The initial byte of each pseudo-code instruction encodes its type and
length.

Initial bytes in the range 0x00-0x1f (inclusive) encode operations.
Common operations are encoded completely by a single byte, and require
no additional bytes.  Less common operations are encoded in two bytes.
The initial byte of a two byte operation is always 0x0c.  The second
byte has a value in the range 0x00-0x21 (inclusive), which specifies
the operation.

Initial bytes in the range 0x20-0xf6 (inclusive) encode small numbers.
No additional bytes are required for a small number.  The value pushed
on the stack is the initial byte minus 0x8b.

Initial bytes in the range 0xf7-0xfe (inclusive) encode medium
numbers.  Medium numbers are followed by one additional byte.  Medium
numbers come in two flavors: positive, and negative.  Initial bytes in
the range 0xf7-0xfa (inclusive) encode positive numbers, the remainder
are negative.  The magnitude of the number pushed is:

	(( ((initial_byte-0xf7)&3) << 8) | additional_byte) + 0x6c

If the initial byte indicates that the number is negative, the number
calculated above is negated before being pushed onto the stack.

An initial byte of 0xff indicates a large number.  A large number
requires 4 additional bytes to specify its value.  The large number's
value is encoded directly in the additional bytes, most significant
byte first, in decending order of significance.

When a number is encountered, it is pushed onto an internal stack
(seperate from the PostScript operand stack).  Most operations take
values from this stack, and a few return values on the stack.  In
general it is not important that this is not the real operand stack.
The exceptions are OtherSubrs and StackShift.  One of the arguments to
OtherSubrs is an argument count.  It transfers that many arguments
from the internal stack to the operand stack before calling the
PostScript procedure.  After the procedure returns, StackShift may be
used to transfer individual values from the operand stack to the
internal stack.

Another difference between the stacks is that some of the operations
clear the stack after execution.  These operations read their
arguments from the bottom of the stack, not the top, so in this sence,
it isn't really a stack, but it can still be thought of as behaving
like one, at least locally.  Operations which exhibit this behavior
are marked below with an *.

Many of the operations perform functions similar to certain PostScript
operators.  In these cases, only the name of the appropriate
PostScript operator is given.  Many of the path extension commands
have additional versions which restrict the motion associated with
them to being horizontal or vertical, thus one fewer argument is
needed for them.  These are identified by an `h' or `v', generally as
the second character of the name.  Thus rhlineto is equivalent to:

	{ 0 exch rlineto }

Additionally, the operators max and min are available, which are, for
some reason, missing from PostScript.

Codes labeled `(variable)' look up the given name in the Private
dictionary and push the associated value onto the internal stack.

		Descriptions of short operations.

0x00 *	VHintWidth	ycenter ==> -
0x01 *	VHint		bottom vwidth ==> -
0x02 *	HHintWidth	xcenter ==> -
0x03 *	HHint		left hwidth ==> -
	These operations give information about the position of some
of the features of a character to the non-linear scaling code.  The
dynamics of the non-linear scaling is not yet clear.  The *Width
operations take one argument, assuming StrokeWidth as the second arg
(the first arg indicates the center of the line to be stroked).  The
other operations take 2 arguments, the first being a (horizontal or
vertical) position of the (left or bottom) side of the feature, and
the second being the width of the feature being described.  These, and
similar hint routines, are called before any path construction
operators.

0x04 *	rvmoveto	ydelta ==> -
0x05 *	rlineto		xdelta ydelta ==> -
0x06 *	rhlineto	xdelta ==> -
0x07 *	rvlineto	ydelta ==> -

0x08 *	rspline		dx1 dy1 dx2 dy2 dx3 dy3 ==> -
	similar to rcurveto, but each control point is specified
relative to the previous one, instead of relative to the currentpoint. 

0x09 *	closepath	- ==> -

0x0a	Subrs		sub ==> -
	takes one argument, the number of the subroutine to execute in
Private/Subrs.  

0x0b	Retn		- ==> -
	return from subroutine.  All elements of Subrs end with this
operation.  

0x0c	LongOp
	followed by another operation code, as described below.

0x0d *	Metrics		lsb_x width_x ==> -
	takes two arguments:  the left side bearing, and width.  This
information is used (with the y values of each assumed zero) if it is
not overrided by an element in the font dictionary's Metrics entry.
This is usually the first operation executed by a CharString routine.
The internal currentpoint is set to the left side bearing.

0x0e	Finish		- ==> (stack no longer exists)
	clean up and either fill or stroke the path (depending on
PaintType), grestore, and exit.  Apparently, font cache information is
determined here for all PaintTypes except 3.  Fonts of PaintType 3
would call setcachedevice before executing any rendering operations,
and would exit with QuickFinish.  Some of the operations labeled here
with ?'s are likely to be fill and stroke, for use only by PaintType 3
fonts.

0x0f *	moveto		x y ==> -
0x10 *	lineto		x y ==> -
0x11 *	curveto		x1 y1 x2 y2 x3 y3 ==> -
0x12	min		a b ==> min(a, b)
0x13 *		? 26c8c6(3fe0000000000000) ?			- ==> -
0x14 *	newpath		- ==> -
0x15 *	rmoveto		dx dy ==> -
0x16 *	rhmoveto	dx ==> -
0x17 *		? set two bits of something in gstate ?		ferd ==> -
0x18	mul		a b ==> a*b
0x19	strokewidth	(variable)
0x1a	baseline	(variable)
0x1b	capheight	(variable)
0x1c	bover		(variable)

0x1e *	htovrspline	dx1 dx2 dy2 dy3 ==> -
0x1f *	vtohrspline	dy1 dx2 dy2 dx3 ==> -
	these take 4 args instead of six, set the remaining two to 0,
and call rspline.  The curves are constrained to start and end either
vertically or horizontally.

		Long Operations

0x00 *		? toggle something and remember... strange. ?	- ==> -
	this operation appears around some subrs.  sometimes it is
outside the call, and sometimes inside the subr itself.  it always
appears in pairs.  the first call to it sets a value to zero.  the
second call sets it to some function of the currentpoint.  the value
is used to modify all of the coordinates somehow, apparently in
conjunction with the non-linear scaling, but only when other
conditions are met. 

0x01 *	HMHint		x1 width1 x2 width2 x3 width3 ==> -
0x02 *	VMHint		y1 height1 y2 height2 y3 height3 ==> -
	These each take 6 args and appear to encode information for 3
position/value pairs.  See HHint above.

0x03 *		? something about MinFeature ?
0x04 *	arc		x y r ang1 ang2 ==> -
0x05 *		? 1 arg to 26a9f4 ?	ferd ==> -

0x06 *	Accent		ferd x y base accent ==> (no stack)
	Used to create composite characters.  Executes the character
routine corresponding to base, then adjusts the current position by x
and y, and executes the character routine corresponding to accent.
Base and accent are character codes (0-255).  This terminates
execution for this character.  First arg is unclear, but modifies the
x position of the accent character somehow.

0x07 *	LongMetrics	lsbx lsby widthx widthy ==> -
	same as Metrics, but specifies y values as well.

0x08 *	setcachedevice	llx lly urx ury ==> -
	character width is taken from the metric information, so only
4 args are required.  the bounding box is expanded by strokewidth/2 on
all sides before being passed to the actual routine.  Presumably only
used by PaintType 3 fonts.

0x09 *	QuickFinish	- ==> (no stack)
	exit without cleaning up or stroking or filling.  just grestore.

0x0a	add		a b ==> a+b
0x0b	sub		a b ==> a-b
0x0c	div		a b ==> a/b
0x0d	max		a b ==> max(a, b)
0x0e	neg		a ==> -a
0x0f	TestAdd		a b c d ==> b > c ? a + d : a

0x10	OtherSubrs	a1 a2 a3 ... an n index ==> -
	call directly to PostScript code.  top of stack is index of
code to call, next is number of args to transfer to operand stack,
followed by the args to be transferred.

0x11	StackShift	- ==> n
	move a number from the operand stack to the internal stack.

0x12	decend		(variable)
0x13	ascend		(variable)
0x14	overshoot	(variable)
0x15 *		? set two bits of something in gstate ?		ferd ==> -
0x16	xover		(variable)
0x17	capover		(variable)
0x18	aover		(variable)
0x19	halfsw		(variable)

0x1a	PixelRound
	round a value to an even boundary in device space.  this is
just a guess, not verified.

0x1b *	arcn		x y r ang1 ang2 ==> -
0x1c	exch		a b ==> b a
0x1d	index		an ... a1 a0 n ==> an ... a1 a0 an

0x1e *	VHintRound	bottom vwidth ==> -
0x1f *	HHintRound	left hwidth ==> -
	Same as VHint and HHint, except the width is sent to the
pixel round routine.  Additionally, these will take negative widths
and deal with them correctly, the others might not.

0x20 **	currentpoint	- ==> x y
	after pushing the coordinates onto the top of the stack, the
stack pointer is set to point at the bottom of the stack.  presumably
this should only be used when the stack is clear.  in any case, no
math can be performed on these values, as the stack is `empty' when
they are there.  this might be a bug, as I suspect these were meant to
be passed to Qmoveto, but they can never get there.  moveto, however
will work correctly.

0x21 **	Qmoveto		x y ==> -
	doesn't actually call moveto, just remembers new position
internally.  the arguments are poped from the top of the stack, and
then the stack is cleared.