There is a certain amount of "use makes master" about word and byte length. I think DECs decision to go with 6bit for 36bit was probably fine, in the context of belief around BCD. That it turned out to be a royal pain in the neck for an 8 byte world was a bit overblown given its time and place. People found ways to exploit 4 extra bits in a word to do things. DEC provided the UUO mechanism, people coded odd things into it. If BCD had been more significant who knows how long packed ASCII might have lasted.

The entire field from bletchley onwards was full of arch, fey witticisms about machine names. ending things with -AC (for automatic computer) led to SILLIAC in Sydney Uni and there was a sort of poem which included them all to MANIAC. I can't but think the neologism byte over 'bite sized chunks of a whole word' goes directly to this tendency to play with language. And, the times were ones with many strange players on many continents, fertile ground for wordplay.  

8 is a useful number. 5 hole Baudot wasn't enough: with parity and cases and control signal in band data was heading to 8 irrespective. 

Aligning data with memory and registers makes sense.

On Fri, Sep 9, 2022 at 9:35 AM Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
>
>  > I heard that the IBM 709
> > series had 36 bit words because Arthur Samuel,
> > then at IBM, needed 32 bits to identify the playable squares on a
> > checkerboard, plus some bits for color and kinged
>
> To be precise, Samuel's checkers program was written for
> the 701, which originated the architecture that the 709 inherited.
>
> Note that IBM punched cards had 72 data columns plus 8
> columns typically dedicated to sequence numbers. 700-series
> machines supported binary IO encoded two words per row, 12
> rows per card--a perfect fit to established technology. (I do
> not know whether the fit was deliberate or accidental.)
>
> As to where the byte came from, it was christened for the IBM
> Stretch, aka 7020. The machine was bit-addressed and the width
> of a byte was variable. Multidimensional arrays of packed bytes
> could be streamed at blinding speeds. Eight bits, which synced
> well with the 7020's 64-bit words, was standardized in the 360
> series. The term "byte" was not used in connection with
> 700-series machines.
>
> Doug