E L V I S Elvis is a clone of vi/ex. It boasts about 97% of the vi commands and about 92% of the ex commands. It is generally quite fast. It can edit files that are larger than a single process' data space. Elvis also has a few features that the real vi lacks. Several related programs are included, too. Elvis runs under BSD/SysV UNIX, MINIX-ST, and MINIX-PC. Elvis should be fairly easy to port to any OS that has the termcap functions. It cannot be compiled on MINIX using the standard ACK compiler because asld runs out of memory while linking. If you must recompile it, you will have to use some other C compiler and cross compiler on MS-DOS. This problem will be remedied in MINIX 2.0 when a new ACK compiler becomes available. ------------------------------- COPYING ----------------------------- The copyright of Elvis is controlled by the author, Steve Kirkendall. (That's me. Hello!) This document describes the restrictions on how Elvis can be distributed. The restrictions are, basically, that anybody can make & distribute copies, and that nobody except me can say otherwise. You can make any number of copies of Elvis, and distribute it in either source code form or executable form, either alone or as part of some other package, to anybody you wish, provided that the following conditions are met: * If you are distributing Elvis as part of a package, then you must either distribute the whole package for free, or you must be willing to distribute Elvis separately for free to anybody who wants it. You can charge up to $6 for disks and labor, and still say the software is free. * If you are distributing Elvis via a BBS or a network, then you may charge for connection time & subscription fees, but there must not be any surcharge for downloading Elvis. * This document must be included with each copy that you distribute. * The name "Elvis" cannot be changed -- not even in advertisements. In particular, I don't want anybody claiming that Elvis is the real "vi" from BSD. It's okay to call it "Elvis - a clone of vi", though. * This agreement cannot be modified without my permission. My address is: Steve Kirkendall 16820 SW Tallac Way Beaverton, OR 97006 My phone# is: (503) 642-9905 Email: kirkenda@cs.pdx.edu ...uunet!tektronix!psueea!eecs!kirkenda ------------------------------- Termcap ----------------------------- REQUIRED NUMERIC CAPABILITIES :co#: number of columns on the screen (characters per line) :li#: number of lines on the screen REQUIRED STRING CAPABILITIES :ce=: clear to end-of-line :cl=: home the cursor & clear the screen :cm=: move the cursor to a given row/column :up=: move the cursor up one line BOOLEAN CAPABILITIES :am: auto margins - wrap when a char is written to the last column? :pt: physical tabs? OPTIONAL STRING CAPABILITIES :al=: insert a blank row on the screen :dl=: delete a row from the screen :cd=: clear to end of display :ei=: end insert mode :ic=: insert a blank character :im=: start insert mode :dc=: delete a character :sr=: scroll reverse (insert a row at the top of the screen) :vb=: visible bell OPTIONAL STRINGS RECEIVED FROM THE KEYBOARD :kd=: sequence sent by the <down arrow> key :kl=: sequence sent by the <left arrow> key :kr=: sequence sent by the <right arrow> key :ku=: sequence sent by the <up arrow> key :PU=: sequence sent by the <PgUp> key :PD=: sequence sent by the <PgDn> key :HM=: sequence sent by the <Home> key :EN=: sequence sent by the <End> key OPTIONAL CAPABILITIES THAT DESCRIBE CHARACTER ATTRIBUTES :so=: :se=: start/end standout mode (We don't care about :sg#:) :us=: :ue=: start/end underlined mode :VB=: :Vb=: start/end boldface mode :as=: :ae=: start/end alternate character set (italics) :ug#: visible gap left by :us=:, :ue=:, :VB=:, or :Vb=: ------------------------------- Cflags ----------------------------- Elvis uses many preprocessor symbols to control compilation... -DM_SYSV If defined, then Elvis uses SysV ioctl() calls to control the tty; normally it uses V7/BSD/MINIX ioctl() calls. -DDATE=\'\"`date`\"\' DATE should be defined to be a string constant. It is printed by the :version command as the compilation date of the program. It is only used in cmd1.c, and even there you may leave it undefined without causing an urp. The form shown above only works if you use "eval". See the Makefile. -DTMPNAME=\"/tmp/vi%04x%04x\" This allows you to use a different name for Elvis' temporary files. The default value is defined near the top of vi.h, so you only need to use this on the commandline if the default name is wrong. It should contain two "%d" or "%x" formats, which are replaced by the inode number and device major/minor number. -DCUTNAME=\"/tmp/cut%04x%04x\" This is similar to TMPNAME, but is used to generate names for old temp files which are being kept around because they are refered to by cut buffers. It should contain two "%d" or "%x" formats, which are replaced by the Elvis' getpid() and file-descriptor numbers. -DCRUNCH This option causes some large & often-used macros to be replaced by equivelent functions. It reduces the size of the ".text" segment by about 4K, and you don't sacrifice any features -- just a little speed. -DSET_NOCHARATTR Permanently disables the charattr option. This reduces the size of your ".text" segment by about 850 bytes. -DNO_RECYCLE Normally, Elvis will recycle space from the tmp file which contains totally obsolete text. This flag disables this recycling. Without recycling, the ".text" segment is about 1K smaller that it would otherwise be, but the tmp file grows much faster. If you have a lot of free space on your harddisk, but Elvis is too bulky to run with recycling, then try it without recycling. -DDEBUG This adds the ":debug" and ":validate" commands, and also adds many internal consistency checks. It increases the size of the ".text" segment by about 5K. ------------------------------- Mods ----------------------------- A few ideas for modifications... MODE INDICATORS Elvis always reads keystrokes via the getkey() function. This function is called with an argument which describes the context in which the character will be processed: WHEN_EX - called from the vgets() function to read a single line of text. Either EX command mode, EX text entry mode, or VI while reading a search string. WHEN_VICMD - VI mode, getting a command character. WHEN_VIINP - VI's input mode. WHEN_VIREP - VI's replace mode (the R command). 0 - misc times, e.g. "HIT A KEY TO CONTINUE" So, the getkey() function would be a good place to add some kind of mode indicator. Like, you could change the shape of the cursor for input mode vs. VI command mode. ARROW KEYS IN INPUT MODE The arrow keys are not normally mapped during input mode. It might be fun, though, to map them to ESC + [hjkl] + a. This way, if you hit an arrow key while in input mode, elvis would take you out of input mode momentarily, move the cursor, and drop you back into input mode. Neat, huh? Something similar could be done with replace mode. WRAP LONG LINES (INSTEAD OF SCROLLING SIDEWAYS) This would mostly require changes to redraw(), mark2phys(), and drawtext(). All of these are in the file "redraw.c". ADD MORE SUPPORT FOR NON-ASCII CHARACTER SETS Elvis displays 8-bit character sets just fine, but is a bit weak in the input and search departments. For input, something similar to :map would be nice. Actually, :abbr is a little closer. How about ":digraph" to map a specified pair of ASCII characters into a single non-ASCII character? REWRITE THE REGULAR EXPRESSION PARSER AND THE SEARCHING CODE The current doesn't allow you to search for non-ASCI characters. It could probably be made smaller & faster. Suggestions are welcome. ------------------------------- internal ----------------------------- INTERNAL TEXT REPRESENTATION When elvis starts up, the file is copied into a temporary file. Small amounts of extra space are inserted into the temporary file to insure that no text lines cross block boundaries; this speeds up processing. The "extra space" is filled with NUL charcters; the input file must not contain any NULs, to avoid confusion. The first block of the temporary file is an array of shorts which describe the order of the blocks; i.e. header[1] is the block number of the first block, and so on. This limits the temporary file to 512 active blocks, so the largest file you can edit is about 400K bytes long. I hope that's enough! When blocks are altered, they are rewritten to a *different* block in the file, and the in-core version of the header block is updated accordingly. The in-core header block will be copied to the temp file immediately before the next change... or, to undo this change, swap the old header (from the temp file) with the new (in-core) header. Elvis maintains another in-core array which contains the line-number of the last line in every block. This allows you to go directly to a line, given its line number. IMPLEMENTATION OF EDITING There are three basic operations which affect text: * delete text - delete(from, to) * add text - add(at, text) * yank text - cut(from, to) To yank text, all text between two text positions is copied into a cut buffer. The original text is not changed. To copy the text into a cut buffer, you need only remember which physical blocks that contain the cut text, the offset into the first block of the start of the cut, the offset into the last block of the end of the cut, and what kind of cut it was. (Cuts may be either character cuts or line cuts; the kind of a cut affects the way it is later "put".) This is implemented in the function cut(). To delete text, you must modify the first and last blocks, and remove any reference to the intervening blocks in the header's list. The text to be deleted is specified by two marks. This is implemented in the function delete(); To add text, you must specify the text to insert (as a NUL-terminated string) and the place to insert it (as a mark). The block into which the text is to be inserted may need to be split into as many as four blocks, with new intervening blocks needed as well... or it could be as simple as modifying the block. This is implemented in the function add(). Other interesting functions are paste() (to copy text from a cut buffer into the file), modify() (for an efficient way to implement a combined delete/add sequence), and input() (to get text from the user & insert it into the file). When text is modified, an internal file-revision counter, called "changes", is incremented. This counter is used to detect when certain caches are out of date. (The "changes" counter is also incremented when we switch to a different file, and also in one or two similar situations -- all related to invalidating caches.) MARKS AND THE CURSOR Marks are places within the text. They are represented internally as a long variable which is split into two bitfields: a line number and a character index. Line numbers start with 1, and character indexes start with 0. Since line numbers start with 1, it is impossible for a set mark to have a value of 0L. 0L is therefore used to represent unset marks. When you do the "delete text" change, any marks that were part of the deleted text are unset, and any marks that were set to points after it are adjusted. Similarly, marks are adjusted after new text is inserted. The cursor is represented as a mark. EX COMMAND INTERPRETATION EX commands are parsed, and the command name is looked up in an array of structures which also contain a pointer to the function that implements the command, and a description of the arguments that the command can take. If the command is recognized and its arguments are legal, then the function is called. Each function performs its task; this may cause the cursor to be moved to a different line, or whatever. SCREEN CONTROL The screen is updated via a package which looks like the "curses" library, but which is actually implemented in a simpler, faster way. Most curses operations are implemented as macros which copy characters into a large I/O buffer, which is then written with a single large write() call as part of the refresh() operation. The functions which modify text remember where text has been modified; the screen redrawing function needs this information to help it reduce the amount of text that is redrawn each time. ------------------------------- regexp ----------------------------- Regular Expressions Syntax The code for handling regular expressions is derived from Henry Spencer's regexp package. However, I have modified the syntax to resemble that of the real vi. ELVIS' regexp package treats the following one- or two-character strings (called meta-characters) in special ways: \( \) Used to control grouping ^ Matches the beginning of a line $ Matches the end of a line \< Matches the beginning of a word \> Matches the end of a word . Matches any single character [ ] Matches any single character inside the brackets * The preceding may be repeated 0 or more times + The preceding may be repeated 1 or more times ? The preceding is optional \| Separates two alternatives Anything else is treated as a normal character which must match exactly. The special strings may all be preceded by a backslash to force them to be treated normally. For example, "\(for\|back\)ward" will find "forward" or "backward", and "\<text\>" will find "text" but not "context". Options ELVIS has two options which affect the way regular expressions are used. These options may be examined or set via the :set command. The first option is called "[no]magic". This is a boolean option, and it is "magic" (TRUE) by default. While in magic mode, all of the meta-characters behave as described above. In nomagic mode, only ^ and $ retain their special meaning. The second option is called "[no]ignorecase". This is a boolean option, and it is "noignorecase" (FALSE) by default. While in ignorecase mode, the searching mechanism will not distinguish between an uppercase letter and its lowercase form. In noignorecase mode, uppercase and lowercase are treated as being different. Also, the "[no]wrapscan" option affects searches. Substitutions The :s command has at least two arguments: a regular expression, and a substitution string. The text that matched the regular expression is replaced by text which is derived from the substitution string. Most characters in the substitution string are copied into the text literally but a few have special meaning: & Causes a copy of the original text to be inserted \1 Inserts a copy of that portion of the original text which matched the first set of \( \) parentheses. \2 - \9 Does the same for the second (etc.) pair of \( \). These may be preceded by a backslash to force them to be treated normally.