Minix1.5/commands/elvis/READ_ME
E L V I S
Elvis is a clone of vi/ex. It boasts about 97% of the vi commands and about
92% of the ex commands. It is generally quite fast. It can edit files that
are larger than a single process' data space. Elvis also has a few features
that the real vi lacks. Several related programs are included, too.
Elvis runs under BSD/SysV UNIX, MINIX-ST, and MINIX-PC.
Elvis should be fairly easy to port to any OS that has the termcap functions.
It cannot be compiled on MINIX using the standard ACK compiler because asld
runs out of memory while linking. If you must recompile it, you will have to
use some other C compiler and cross compiler on MS-DOS. This problem will
be remedied in MINIX 2.0 when a new ACK compiler becomes available.
------------------------------- COPYING -----------------------------
The copyright of Elvis is controlled by the author, Steve Kirkendall.
(That's me. Hello!)
This document describes the restrictions on how Elvis can be distributed.
The restrictions are, basically, that anybody can make & distribute copies,
and that nobody except me can say otherwise.
You can make any number of copies of Elvis, and distribute it in either
source code form or executable form, either alone or as part of some
other package, to anybody you wish, provided that the following conditions
are met:
* If you are distributing Elvis as part of a package, then you must
either distribute the whole package for free, or you must be willing
to distribute Elvis separately for free to anybody who wants it.
You can charge up to $6 for disks and labor, and still say the
software is free.
* If you are distributing Elvis via a BBS or a network, then you
may charge for connection time & subscription fees, but there must
not be any surcharge for downloading Elvis.
* This document must be included with each copy that you distribute.
* The name "Elvis" cannot be changed -- not even in advertisements.
In particular, I don't want anybody claiming that Elvis is the
real "vi" from BSD. It's okay to call it "Elvis - a clone of vi",
though.
* This agreement cannot be modified without my permission.
My address is: Steve Kirkendall
16820 SW Tallac Way
Beaverton, OR 97006
My phone# is: (503) 642-9905
Email: kirkenda@cs.pdx.edu
...uunet!tektronix!psueea!eecs!kirkenda
------------------------------- Termcap -----------------------------
REQUIRED NUMERIC CAPABILITIES
:co#: number of columns on the screen (characters per line)
:li#: number of lines on the screen
REQUIRED STRING CAPABILITIES
:ce=: clear to end-of-line
:cl=: home the cursor & clear the screen
:cm=: move the cursor to a given row/column
:up=: move the cursor up one line
BOOLEAN CAPABILITIES
:am: auto margins - wrap when a char is written to the last column?
:pt: physical tabs?
OPTIONAL STRING CAPABILITIES
:al=: insert a blank row on the screen
:dl=: delete a row from the screen
:cd=: clear to end of display
:ei=: end insert mode
:ic=: insert a blank character
:im=: start insert mode
:dc=: delete a character
:sr=: scroll reverse (insert a row at the top of the screen)
:vb=: visible bell
OPTIONAL STRINGS RECEIVED FROM THE KEYBOARD
:kd=: sequence sent by the <down arrow> key
:kl=: sequence sent by the <left arrow> key
:kr=: sequence sent by the <right arrow> key
:ku=: sequence sent by the <up arrow> key
:PU=: sequence sent by the <PgUp> key
:PD=: sequence sent by the <PgDn> key
:HM=: sequence sent by the <Home> key
:EN=: sequence sent by the <End> key
OPTIONAL CAPABILITIES THAT DESCRIBE CHARACTER ATTRIBUTES
:so=: :se=: start/end standout mode (We don't care about :sg#:)
:us=: :ue=: start/end underlined mode
:VB=: :Vb=: start/end boldface mode
:as=: :ae=: start/end alternate character set (italics)
:ug#: visible gap left by :us=:, :ue=:, :VB=:, or :Vb=:
------------------------------- Cflags -----------------------------
Elvis uses many preprocessor symbols to control compilation...
-DM_SYSV
If defined, then Elvis uses SysV ioctl() calls to control the tty;
normally it uses V7/BSD/MINIX ioctl() calls.
-DDATE=\'\"`date`\"\'
DATE should be defined to be a string constant. It is printed by the
:version command as the compilation date of the program.
It is only used in cmd1.c, and even there you may leave it undefined
without causing an urp.
The form shown above only works if you use "eval". See the Makefile.
-DTMPNAME=\"/tmp/vi%04x%04x\"
This allows you to use a different name for Elvis' temporary files.
The default value is defined near the top of vi.h, so you only need
to use this on the commandline if the default name is wrong.
It should contain two "%d" or "%x" formats, which are replaced by
the inode number and device major/minor number.
-DCUTNAME=\"/tmp/cut%04x%04x\"
This is similar to TMPNAME, but is used to generate names for old
temp files which are being kept around because they are refered to
by cut buffers.
It should contain two "%d" or "%x" formats, which are replaced by the
Elvis' getpid() and file-descriptor numbers.
-DCRUNCH
This option causes some large & often-used macros to be replaced by
equivelent functions. It reduces the size of the ".text" segment by
about 4K, and you don't sacrifice any features -- just a little speed.
-DSET_NOCHARATTR
Permanently disables the charattr option. This reduces the size of
your ".text" segment by about 850 bytes.
-DNO_RECYCLE
Normally, Elvis will recycle space from the tmp file which contains
totally obsolete text. This flag disables this recycling. Without
recycling, the ".text" segment is about 1K smaller that it would
otherwise be, but the tmp file grows much faster. If you have a lot
of free space on your harddisk, but Elvis is too bulky to run with
recycling, then try it without recycling.
-DDEBUG
This adds the ":debug" and ":validate" commands, and also adds many
internal consistency checks. It increases the size of the ".text"
segment by about 5K.
------------------------------- Mods -----------------------------
A few ideas for modifications...
MODE INDICATORS
Elvis always reads keystrokes via the getkey() function. This function
is called with an argument which describes the context in which the
character will be processed:
WHEN_EX - called from the vgets() function to read
a single line of text. Either EX command
mode, EX text entry mode, or VI while reading
a search string.
WHEN_VICMD - VI mode, getting a command character.
WHEN_VIINP - VI's input mode.
WHEN_VIREP - VI's replace mode (the R command).
0 - misc times, e.g. "HIT A KEY TO CONTINUE"
So, the getkey() function would be a good place to add some kind of
mode indicator. Like, you could change the shape of the cursor for
input mode vs. VI command mode.
ARROW KEYS IN INPUT MODE
The arrow keys are not normally mapped during input mode. It might
be fun, though, to map them to ESC + [hjkl] + a. This way, if you
hit an arrow key while in input mode, elvis would take you out of input
mode momentarily, move the cursor, and drop you back into input mode.
Neat, huh?
Something similar could be done with replace mode.
WRAP LONG LINES (INSTEAD OF SCROLLING SIDEWAYS)
This would mostly require changes to redraw(), mark2phys(), and
drawtext(). All of these are in the file "redraw.c".
ADD MORE SUPPORT FOR NON-ASCII CHARACTER SETS
Elvis displays 8-bit character sets just fine, but is a bit weak in
the input and search departments.
For input, something similar to :map would be nice. Actually, :abbr
is a little closer. How about ":digraph" to map a specified pair of
ASCII characters into a single non-ASCII character?
REWRITE THE REGULAR EXPRESSION PARSER AND THE SEARCHING CODE
The current doesn't allow you to search for non-ASCI characters.
It could probably be made smaller & faster.
Suggestions are welcome.
------------------------------- internal -----------------------------
INTERNAL TEXT REPRESENTATION
When elvis starts up, the file is copied into a temporary file. Small
amounts of extra space are inserted into the temporary file to insure
that no text lines cross block boundaries; this speeds up processing.
The "extra space" is filled with NUL charcters; the input file must
not contain any NULs, to avoid confusion.
The first block of the temporary file is an array of shorts which
describe the order of the blocks; i.e. header[1] is the block number
of the first block, and so on. This limits the temporary file to
512 active blocks, so the largest file you can edit is about 400K
bytes long. I hope that's enough!
When blocks are altered, they are rewritten to a *different* block
in the file, and the in-core version of the header block is updated
accordingly. The in-core header block will be copied to the temp
file immediately before the next change... or, to undo this change,
swap the old header (from the temp file) with the new (in-core)
header.
Elvis maintains another in-core array which contains the line-number
of the last line in every block. This allows you to go directly to a
line, given its line number.
IMPLEMENTATION OF EDITING
There are three basic operations which affect text:
* delete text - delete(from, to)
* add text - add(at, text)
* yank text - cut(from, to)
To yank text, all text between two text positions is copied into
a cut buffer. The original text is not changed. To copy the text
into a cut buffer, you need only remember which physical blocks that
contain the cut text, the offset into the first block of the start of
the cut, the offset into the last block of the end of the cut, and
what kind of cut it was. (Cuts may be either character cuts or line
cuts; the kind of a cut affects the way it is later "put".) This is
implemented in the function cut().
To delete text, you must modify the first and last blocks, and remove
any reference to the intervening blocks in the header's list. The
text to be deleted is specified by two marks. This is implemented in
the function delete();
To add text, you must specify the text to insert (as a NUL-terminated
string) and the place to insert it (as a mark). The block into which
the text is to be inserted may need to be split into as many as four
blocks, with new intervening blocks needed as well... or it could be
as simple as modifying the block. This is implemented in the function
add().
Other interesting functions are paste() (to copy text from a cut buffer
into the file), modify() (for an efficient way to implement a combined
delete/add sequence), and input() (to get text from the user & insert
it into the file).
When text is modified, an internal file-revision counter, called
"changes", is incremented. This counter is used to detect when
certain caches are out of date. (The "changes" counter is also
incremented when we switch to a different file, and also in one
or two similar situations -- all related to invalidating caches.)
MARKS AND THE CURSOR
Marks are places within the text. They are represented internally
as a long variable which is split into two bitfields: a line number
and a character index. Line numbers start with 1, and character
indexes start with 0.
Since line numbers start with 1, it is impossible for a set mark to
have a value of 0L. 0L is therefore used to represent unset marks.
When you do the "delete text" change, any marks that were part of
the deleted text are unset, and any marks that were set to points
after it are adjusted. Similarly, marks are adjusted after new text
is inserted.
The cursor is represented as a mark.
EX COMMAND INTERPRETATION
EX commands are parsed, and the command name is looked up in an array
of structures which also contain a pointer to the function that
implements the command, and a description of the arguments that the
command can take. If the command is recognized and its arguments
are legal, then the function is called.
Each function performs its task; this may cause the cursor to be moved
to a different line, or whatever.
SCREEN CONTROL
The screen is updated via a package which looks like the "curses"
library, but which is actually implemented in a simpler, faster way.
Most curses operations are implemented as macros which copy characters
into a large I/O buffer, which is then written with a single large
write() call as part of the refresh() operation.
The functions which modify text remember where text has been modified;
the screen redrawing function needs this information to help it reduce
the amount of text that is redrawn each time.
------------------------------- regexp -----------------------------
Regular Expressions
Syntax
The code for handling regular expressions is derived from Henry Spencer's
regexp package. However, I have modified the syntax to resemble that of
the real vi.
ELVIS' regexp package treats the following one- or two-character strings
(called meta-characters) in special ways:
\( \) Used to control grouping
^ Matches the beginning of a line
$ Matches the end of a line
\< Matches the beginning of a word
\> Matches the end of a word
. Matches any single character
[ ] Matches any single character inside the brackets
* The preceding may be repeated 0 or more times
+ The preceding may be repeated 1 or more times
? The preceding is optional
\| Separates two alternatives
Anything else is treated as a normal character which must match exactly.
The special strings may all be preceded by a backslash to force them to
be treated normally.
For example, "\(for\|back\)ward" will find "forward" or "backward", and
"\<text\>" will find "text" but not "context".
Options
ELVIS has two options which affect the way regular expressions are used.
These options may be examined or set via the :set command.
The first option is called "[no]magic". This is a boolean option, and it is
"magic" (TRUE) by default. While in magic mode, all of the meta-characters
behave as described above. In nomagic mode, only ^ and $ retain their
special meaning.
The second option is called "[no]ignorecase". This is a boolean option, and
it is "noignorecase" (FALSE) by default. While in ignorecase mode, the
searching mechanism will not distinguish between an uppercase letter and its
lowercase form. In noignorecase mode, uppercase and lowercase are treated
as being different.
Also, the "[no]wrapscan" option affects searches.
Substitutions
The :s command has at least two arguments: a regular expression, and a
substitution string. The text that matched the regular expression is
replaced by text which is derived from the substitution string.
Most characters in the substitution string are copied into the text literally
but a few have special meaning:
& Causes a copy of the original text to be inserted
\1 Inserts a copy of that portion of the original text which
matched the first set of \( \) parentheses.
\2 - \9 Does the same for the second (etc.) pair of \( \).
These may be preceded by a backslash to force them to be treated normally.