[TUHS] v7 K&R C

Dan Cross crossd at gmail.com
Sun May 17 02:14:28 AEST 2020

On Fri, May 15, 2020 at 8:58 PM Brantley Coile <brantley at coraid.com> wrote:

> I always kept local, single characters in ints. This avoided the problem
> with loading a character being signed or unsigned. The reason for not
> specifying is obvious. Today, you can pick the move-byte-into-word
> instruction that either sign extends or doesn't. But when C was defined
> that wasn't the case. Some machines sign extended when a byte was loaded
> into a register and some filled the upper bits with zero. For machines that
> filled with zero, a char was unsigned. If you forced the language to do one
> or the other, it would be expensive on the opposite kind of machine.

Not only that, but if one used an exactly `char`-width value to hold, er,
character data as returned from `getchar` et al, then one would necessarily
give up the possibility of handling whatever character value was chosen for
the sentinel marking end-of-input stream.  `getchar` et al are defined to
return EOF on end of input; if they didn't return a wider type than `char`,
there would be data that could not be read. On probably every machine I am
ever likely to use again in my lifetime, byte value 255 would be -1 as a
signed char, but it is also a perfect valid value for a byte.

The details of whether char is signed or unsigned aside, use of a wider
type is necessary for correctness and ability to completely represent the
input data.

It's one of the things that made C a good choice on a wide variety of
> machines.
> I guess I always "saw" the return value of the getchar() as being in a int
> sized register, at first namely R0, so kept the character values returned
> as ints. The actual EOF indication from a read is a return value of zero
> for the number of characters read.

That's certainly true. Had C supported multiple return values or some kind
of option type from the outset, it might have been that `getchar`, read,
etc, returned a pair with some useful value (e.g., for `getchar` the value
of the byte read; for `read` a length) and some indication of an
error/EOF/OK value etc. Notably, both Go and Rust support essentially this:
in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF`
on end-of-input; in Rust, the `read` method of the `Read` trait returns a
`Result<usize, io::Error>`, though a `Result::Ok(n)`, where `n==0`
indicates EOF.

But I'm just making noise because I'm sure everyone knows all this.

I think it's worthwhile stating these things explicitly, sometimes.

        - Dan C.

> On May 15, 2020, at 4:18 PM, ron at ronnatalie.com wrote:
> >
> > EOF is defined to be -1.
> > getchar() returns int, but c is a unsigned char, the value of (c =
> getchar()) will be 255.    This will never compare equal to -1.
> >
> >
> >
> > Ron,
> >
> > Hmmm... getchar/getc are defined as returning int in the man page and C
> is traditionally defined as an int in this code..
> >
> > On Fri, May 15, 2020 at 4:02 PM <ron at ronnatalie.com> wrote:
> >> Unfortunately, if c is char on a machine with unsigned chars, or it’s
> of type unsigned char, the EOF will never be detected.
> >>
> >>
> >>
> >>>     • while ((c = getchar()) != EOF) if (c == '\n') { /* entire record
> is now there */
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20200516/383c4cce/attachment.htm>

More information about the TUHS mailing list