Wirth fixed it in Modula-2 and Oberon.
On Jul 5, 2021, at 5:29 PM, Clem Cole
<clemc(a)ccc.com> wrote:
On Mon, Jul 5, 2021 at 4:16 PM Dan Stromberg <drsalists(a)gmail.com> wrote:
A null-terminated array of char is a petri dish. A proper string type is more like a
disinfectant.
Hrrmpt.... maybe (in theory), but I can say that never seen it really work in practice --
bwk in Why Pascal is Not My Favorite Programming Language describes much of the practical
realities of this sort of choice:
2.1. The size of an array is part of its type
If one declares
var arr10 : array [1..10] of integer;
arr20 : array [1..20] of integer;
then arr10 and arr20 are arrays of 10 and 20 integers respectively. Suppose we want to
write a procedure 'sort' to sort an integer array. Because arr10 and arr20 have
different types, it is not possible to write a single procedure that will sort them both.
The place where this affects Software Tools particularly, and I think programs in
general, is that it makes it difficult indeed to create a library of routines for doing
common, general-purpose operations like sorting.
The particular data type most often affected is 'array of char', for in Pascal
a string is an array of characters. Consider writing a function 'index(s,c)'
that will return the position in the string s where the character c first occurs, or zero
if it does not. The problem is how to handle the string argument of 'index'.
The calls 'index('hello',c)' and
'index('goodbye',c)' cannot both be legal, since the strings have
different lengths. (I pass over the question of how the end of a constant string like
'hello' can be detected, because it can't.) The next try is
var temp : array [1..10] of char;
temp := 'hello';
n := index(temp,c);
but the assignment to 'temp' is illegal because 'hello' and
'temp' are of different lengths.
The only escape from this infinite regress is to define a family of routines with a
member for each possible string size, or to make all strings (including constant strings
like 'define' ) of the same length.
The latter approach is the lesser of two great evils. In 'Tools', a type
called 'string' is declared as
type string = array [1..MAXSTR] of char;
where the constant 'MAXSTR' is ``big enough,'' and all strings in all
programs are exactly this size. This is far from ideal, although it made it possible to
get the programs running. It does not solve the problem of creating true libraries of
useful routines.
There are some situations where it is simply not acceptable to use the fixed-size array
representation. For example, the 'Tools' program to sort lines of text operates
by filling up memory with as many lines as will fit; its running time depends strongly on
how full the memory can be packed.
Thus for 'sort', another representation is used, a long array of characters and
a set of indices into this array:
type charbuf = array [1..MAXBUF] of char;
charindex = array [1..MAXINDEX] of 0..MAXBUF;
But the procedures and functions written to process the fixed-length representation
cannot be used with the variable-length form; an entirely new set of routines is needed to
copy and compare strings in this representation. In Fortran or C the same functions could
be used for both.
As suggested above, a constant string is written as
'this is a string'
and has the type 'packed array [1..n] of char', where n is the length. Thus
each string literal of different length has a different type. The only way to write a
routine that will print a message and clean up is to pad all messages out to the same
maximum length:
error('short message ');
error('this is a somewhat longer message');
Many commercial Pascal compilers provide a 'string' data type that explicitly
avoids the problem; 'string's are all taken to be the same type regardless of
size. This solves the problem for this single data type, but no other. It also fails to
solve secondary problems like computing the length of a constant string; another built-in
function is the usual solution.
Pascal enthusiasts often claim that to cope with the array-size problem one merely has to
copy some library routine and fill in the parameters for the program at hand, but the
defense sounds weak at best:(12)
``Since the bounds of an array are part of its type (or, more exactly, of the type of its
indexes), it is impossible to define a procedure or function which applies to arrays with
differing bounds. Although this restriction may appear to be a severe one, the
experiences we have had with Pascal tend to show that it tends to occur very infrequently.
[...] However, the need to bind the size of parametric arrays is a serious defect in
connection with the use of program libraries.''
This botch is the biggest single problem with Pascal. I believe that if it could be
fixed, the language would be an order of magnitude more usable. The proposed ISO standard
for Pascal(13) provides such a fix (``conformant array schemas''), but the
acceptance of this part of the standard is apparently still in doubt.
ᐧ