v13i073: Split file based on patterns or line numbers

Rich Salz rsalz at bbn.com
Mon Feb 29 04:13:47 AEST 1988


Submitted-by: Gary Puckering <cognos!garyp>
Posting-number: Volume 13, Issue 73
Archive-name: slice

Slice splits up a file into lots of little files.  It reads its input
a line at a time, and starts a new output file when

	*  the input line matches a pattern, or
	*  there have been n lines written to the current output file.

You can use it to split a mailbox or an archive of news articles into
one article per file, for example.  In fact, you can do this with
about 5 lines of awk, but you run into problems with long lines (and
speed, if it bothers you!).

Slice was originally contributed by Russell Quinn as the program
"mailsplit".  It was subsequently revised by me and posted to
net.sources.  A patch for System V was also issued.  Since then, I've
made quite a few changes, enough so that this version is not 100%
compatible with the net.sources versions.  Also, I've not retested
this version on System V, so there may be some problems.

Slice allows multiple output formats to be specified (rather than
multiple input files).  This makes it possible to deposit the pieces
(slices!) into files named whatever your want.  For example:

     slice -f article -x -r README '^--* [Cc]ut' article.sh

will deposit everything up to the cut line into README and everything
after it into article.sh.  The -x option causes the matched line to be
excluded.  The -r option specifies the destination of rejected lines
(which in this example includes all lines appearing before the cut line).

There are even options to make slicing mailboxes and files containing
shell scripts easier (-m and -s).  The mailbox option enables you to
easily reorder messages in your mailbox into proper date sequence:
just slice the mailbox with -m then cat the resulting files back
together.  This neat trick is accomplished by the use of substitution
parameters in the output pattern.  If an output pattern contains, say,
a #3 then the 3rd token from each matched line is substituted into the
file name.  Thus, to split a mailbox the default output pattern causes
each message to be put in a file whose name is constructed from the
date and time on the "From" line.

There are some good examples in the man page.

Source, Makefile and manual entry enclosed.  To install, do the
following:

1:	Edit the Makefile: you'll need to alter the "R=/usr/local" if 
	you don't want slice to live in /usr/local/usr/bin.

2:	make slice

3:	have a play with it & satisfy yourself that it behaves reasonably

4:	make install

Make "install" will do a "$(MAKE) $(CLEAN)" afterwards.  If you don't 
want to remove the binary, say

        CLEAN="" make install

at step 4.


--------------------- cut here ----------------------------------------
#!/bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #!/bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	slice.1
#	Makefile
#	opts.h
#	slice.c
#	CHANGES
# By:	Gary Puckering ()
export PATH; PATH=/bin:$PATH
echo shar: extracting "'slice.1'" '(10249 characters)'
if test -f 'slice.1'
then
	echo shar: over-writing existing file "'slice.1'"
fi
sed 's/^X//' << \SHAR_EOF > 'slice.1'
X.TH SLICE 1L "1987 September 14" "Cognos Inc."
X.SH NAME
Xslice \- split a file into pieces, by pattern or every n'th line
X.SH SYNOPSIS
X.B slice
X.tr _-
X[\|\fB_f\ \fIfilename\fP\|]
X[\|\fB_i\fP\fIn\fP\|]
X[\|\fB_a\fP\|]
X[\|\fB_x\fP\|]
X[\|\fB_r\fP\ \fIrejectfile\fP\|]
X[\|\fB_\&m\fP\||\ \fB_\&s\fP\||\ \fB_\&n\fP\fIn\fP\|]
X[\|\fIformat\fP\ .\|.\|.\|]
X.LP
X.sp
X.B slice
X[\|\fB_f\ \fIfilename\fP\|]
X[\|\fB_i\fP\fIn\fP\|]
X[\|\fB_a\fP\|]
X[\|\fB_x\fP\|]
X[\|\fB_r\fP\ \fIrejectfile\fP\|]
X\ \fB_\&e\ \fIexpression\fP\ 
X[\|\fIformat\fP\ .\|.\|.\|]
X.LP
X.sp
X.B slice
X[\|\fB_f\ \fIfilename\fP\|]
X[\|\fB_i\fP\fIn\fP\|]
X[\|\fB_a\fP\|]
X[\|\fB_x\fP\|]
X[\|\fB_r\fP\ \fIrejectfile\fP\|]
X\ \fIexpression\fP\ 
X[\|\fIformat\fP\ .\|.\|.\|]
X.tr
X.SH DESCRIPTION
X.I Slice
Xsplits a file into pieces (called \fIslices\fR) which are deposited
Xinto one or more output files.  The output files are named according
Xto the \fIformat\fR strings provided.  The input file is split
Xwhenever a pattern is matched or every \fIn\fR lines, depending on the
Xoptions selected.  Because some of the options are mutually exclusive,
Xthere are three forms of the command.
X.LP
XWhenever a pattern match is used to slice the file, lines occurring
Xbefore the first match are sent to the \fIreject\fR file (which is
X\fI/dev/null\fR by default).  Lines are also rejected when an output file
Xcannot be opened, for whatever reason.
X.LP
X.SH OPTIONS
X.LP
X.IP "\fB-f\fR \fIfilename\fR"
XInput for \fIslice\fR is taken from the named file rather than \fIstdin\fR.
X.IP "\fB-i\fP\fIn\fP"
XThe starting number for numbering output files generated by formats
Xcontaining "#\&n" (see \fISubstitutions\fR below).  The default
Xstarting number is one.
X.IP "\fB-a\fR"
XCauses the file to be split \fBafter\fR the line matching the pattern,
Xrather than before (as is normally the case).  This option will not
Xwork properly if an output format contains the substitution parameters
X"#\&1", "#\&2", etc.  (See \fIformat\fR below).
X.IP "\fB-x\fR"
XCauses the matched line to be excluded from the output files.  Handy
Xfor eliminating cut lines, etc.
X.IP "\fB-m\fR"
XUses the pattern "^From " to split the file.  This is convenient for
Xbreaking up a mailbox file.  It also changes the default output
Xpattern so that the output files are named according to the date and
Xtime of each message.  This makes it easy to sort the messages in a
Xmailbox into chronological order \- just slice the box and cat the
Xresulting files back together again.
X.IP "\fB-s\fR"
XUses the pattern "^#\&!\ */bin/sh" to split the file.  This is ideal for
Xbreaking up mail or news article containing a Bourne shell script.
X.IP "\fB-n\fIn\fR"
XSplit the file every \fIn\fR lines.  In this case, no pattern matching 
Xis performed.  This is the behaviour of \fIsplit (1)\fR,
Xexcept that the default output filename format for
X\fIslice\fR is different.
X.IP "\fB-r\fR \fIrejectfile\fR"
XEstablishes the name of a file to which rejected lines are sent
X(default is \fI/dev/null\fR).  Lines appearing before the first matched
Xpattern are rejected.  Also, any time an output file cannot be opened,
Xsource lines are sent to the reject file instead.  Substitution
Xparameters are not allowed in the reject file name, but a leading "+"
Xis allowed (to indicated that the reject file should be opened for append).
XThe name \fIstderr\fR can also be used for the standard error file.
X.IP "\fB-e\fR \fIexpression\fR"
XUses the pattern specified by \fIexpression\fR to identify where the
Xfile is to be split.  The pattern may contain newlines (which match
Xthemselves).  See
X.I ed (1)
Xfor details of the regular expressions.
X.IP \fIformat\fR
XAll three forms of the command allow the specification of zero or more
X\fIformat\fR strings as non-flag parameters.  Formats are used to
Xgenerate output filenames.  You can provide as many as you like.
XEvery time a split is required, a format is used to generate the file
Xname.  If the name is the same as the last name generated from that
Xformat, a new format is selected.  If \fIslice\fR runs out of formats,
Xa warning will be issued (unless the last output file is \fIstdout\fR)
Xand the remaining lines will be sent to the reject file.  The format
Xstrings can contain substitution parameters.  
X.SH SUBSTITUTIONS
X.IP "#\&f"
XThe name of the input file, less any leading pathname, is substituted.
XIf the input file is \fIstdin\fR, the name \fIslice\fR is used.
X.IP "#\&n"
XA unique number, beginning with the value specified in the \fB-i\fR
Xoption, is substituted.  This number is incremented by 1 for each
Xoutput file produced by the current output format.  When an output
Xformat produces the same name twice, a new format is selected and
Xnumbering begins again with the initial value.
X.IP "#\&1, #\&2 ..."
XParameters of the form #\&1, #\&2, ... #\&9 are replaced by corresponding
Xtokens drawn from the source line which matched the slice pattern.
XFor example, if each procedure in a C program began with a comment
Xline of the following form:
X.sp
X\ \ \ \ /\&* Procedure:\ \ \ main */
X.br
X\ \ \ \ #\&1\ \ \ #\&2\ \ \ \ \ \ \ \ \ \ \ #\&3\ \ #\&4 
X.IP
Xthen parameter #\&3 could be used to extract the procedure name (in
Xthis case, "main").  If #\&3 appears in an output format, it will be
Xreplaced by this value.
X.IP "#$"
XThis parameter is used to select the last non-blank parameter on the
Xmatched line.  For example, if the matched line were:
X.sp
X\ \ \ \ \From garyp at cognos Tue Sep 15 15:08:23 EDT 1987
X.sp
Xthen "#$" would select "1987", the last token on the line.
X.SH FORMAT SPEC's
X.LP
XSubstitution parameters can be followed by an optional 
X.I printf 
Xstyle format specification.  They can be used to control the width of
Xthe field and formatting of a numeric token into a string.
X.LP
XIf a numeric format like %d is used, the token is assumed to contain
Xan integer value and the function \fIatoi\fR is used to convert it.
XThe resulting value is then formatted with \fIsprintf\fR using the
Xformat code.
X.LP
XFor example, the output format zzz.#\&n%02d will produce files of the
Xform zzz.01, zzz.02, zzz.03, etc.  The #\&n is replaced with the current
Xcount value, which is then formatted by %02d (signed decimal integer,
X2 digits wide, print leading zeros).
X.LP
XFloating point format codes ("f", "e" and "g") are not allowed, nor is the
Xsingle-character code "c".
X.LP
XThere is a special format code (not one of the usual \fIprintf\fR set)
Xfor conversion of 3 character month names to numbers.  The code "m"
Xcan be used to convert "Jan", "Feb", ...  "Dec" to the numbers 1, 2,
X\&... 12.  The numeric month value is then formatted as if the format
Xstring were of the %d type.  For example, the code %02m will result in
Xthe month name token "Aug" being transformed into the string "08".
X.SH DEFAULT FORMATS
XIf no output formats are supplied, the following default format is used:
X.sp .5v
X\ \ \ \ #\&f:#\&n%03d
X.sp .5v
XThis results in files having names like \fIfilename\fR:001, 
X\fIfilename\fR:002, \fIfilename\fR:003 and so on.  
XThe default format was chosen because the resulting files are listed 
Xin numerical order by
X.I ls
Xor by
X.I echo *
Xwhich is sometimes useful.
X.LP
XIf the \fB-m\fR option is used, the default output format is:
X.sp .5v
X\ \ \ \ #\&f:#\&$-#\&4%02m-#\&5%02d.#\&6
X.sp .5v
Xwhich results in files have names like \fIfilename\fR:1987-01-25.08:15:30.
XThe date and time are taken from the "From" line of each message in
Xthe mailbox.  
X.LP
XYou can take advantage of this naming convention to reassemble the
Xmessages in chronological order by \fIcat\fR'ing the files back
Xtogether again using the command:
X.sp
X\ \ \ \ cat \fIfilename\fR:*\ \  >\fImailbox.new\fR
X.SH SPECIAL CONVENTIONS
X.LP
XNormally, \fIslice\fR opens each output file for "write".  By
Xprepending an output format string with "+" (a plus sign) the
Xdesignated output file will be opened for "append".  Thus, a filename
Xcan appear more than once.
X.LP
XIf a format string is simply a "+", then \fIstdout\fR is
Xused.  This is handy for piping one of the pieces to another
Xcommand.  
X.LP
XThe special format string "@" is used as an abbreviation for
X/dev/null.  This is handy for throwing away slices.
X.LP
X.SH EXAMPLES
XSplit up a mail folder into a series of files, one for each message,
Xnamed after the folder and date/time of each message.
X.sp
X	slice -m -f FOLDER
X.sp
XPipe a news article containing a shell script to slice, saving
Xeverything before the line "#!/bin/sh" line to a file named HEADER and
Xpiping the script portion to \fIsh\fR to unshar it:
X.sp
X	| slice -s -r HEADER +\ \ | sh
X.sp
XSplit \fIstdin\fR at every line of dashes into the files 0, 1, 2, etc.:
X.sp
X	cat anyfile | slice -i0\ \  "^--* *$"\ \  #\&n
X.sp
XPipe the middle portion of a file to sh, keeping the head and tail in a
Xfile called README.  Exclude the cut lines:
X.sp
X	slice -f myfile -x -r README "^--* [Cc]ut "\ \ +\ \ +README\ \ | sh
X.sp
XKeep the middle portion of a file, discarding the head and tail:
X.sp
X	slice -f myfile -r @ "^--* [Cc]ut "\ \ middle @
X.sp
XSplit a C program which has an identifying comment before each
Xprocedure into separate files for each procedure (named after the
Xprocedure): 
X.sp
X	slice -f program.c "/[*] *Procedure: *.* *[*]/"\ \  #\&3.c
X.sp
X.SH BUGS
X.IP a) 4
XWatch out for filename expansion by the shell.  This could cause
X\fIslice\fR to interpret the extra filenames as output formats causing
Xfiles to be overwritten.  To be safe, do "set noclobber" in csh.
X.IP b) 4
XWhen the -a option is used and an output format contains token
Xsubstitutions (#\&1, #\&2, etc.) the wrong filenames are generated.  To
Xgenerate the correct filenames, either slice has to lookahead to find
Xthe next match line or it has to direct lines for the current slice
Xinto a temporary file until it finds the line matching the pattern.
X.SH DIAGNOSTICS
X``Internal Error'' indicates a bug in \fIslice\fR, and should be reported.
XExit status 1 indicates an error parsing options \- for example, if an unknown
Xflag was used.
XExit status 2 indicates a meaningless combination was detected and rejected.
XExit status 3 indicates a run-time problem \- for example, if a file couldn't
Xbe opened.
X.LP
XIf a reject file is not provided, a count of rejected lines is reported.
X.SH "SEE ALSO"
X.I cat (1),
X.I ed (1),
X.I ls (1),
X.I mail (1),
X.I split (1),
X.I mailsplit (1L),
X.I atoi (3),
X.I printf (3).
SHAR_EOF
if test 10249 -ne "`wc -c 'slice.1'`"
then
	echo shar: error transmitting "'slice.1'" '(should have been 10249 characters)'
fi
echo shar: extracting "'Makefile'" '(1301 characters)'
if test -f 'Makefile'
then
	echo shar: over-writing existing file "'Makefile'"
fi
sed 's/^X//' << \SHAR_EOF > 'Makefile'
X# Makefile for slice
X#
X# Originally contributed at mailsplit, written by:
X#   R E Quin, October 1986 University of Warwick (UK) Computer Science
X#   warwick!req     +44 203 523193
X#
X# Modified and recontributed by:
X#   Gary Puckering        3755 Riverside Dr.
X#   Cognos Incorporated   Ottawa, Ontario
X#   (613) 738-1440        CANADA  K1G 3N3
X#
X# This makefile is intended for the sys5 Augmented make.
X# 
XMAKE=make 
XCLEAN=clean 
XCC=cc 
X
X# R is the root of the filesystem -- i.e. where to install things.
X# PROG is what to make; BINDIR is where to put it.
X# The binaries are installed in $R/$(BINDIR).
XR=/usr/local
XBINDIR=$R/bin
XMANDIR=$R/man/man1
XPROG=slice 
XMANPAGE=slice.1
X
X# HACKS are for -DBUGFIX style things.
XHACKS=
X
X# For BSD 4.2 or 4.3
XUSG=
X
X# For USG System V version
X# USG=-DUSG -lPW
X
XCFLAGS=-O $(HACKS) $(USG)
X
X
X# "make install " does a $(MAKE) $(CLEAN) at the end, so you can say
X# CLEAN=  make -e install
X# if you don't want to remove the garbage at the end, for example.
X# This is useful primarily for testing the install: entry!
X
Xall: $(PROG)
X 
Xslice: opts.h slice.o
X	$(CC) $(CFLAGS) -o $(PROG) slice.o
X 
Xinstall: slice slice.1
X	chmod a+x $(PROG)
X	/bin/cp $(PROG) $(BINDIR)
X	chmod a+r $(MANPAGE)
X	/bin/cp $(MANPAGE) $(MANDIR)
X	$(MAKE) $(CLEAN)
X 
Xclean: 
X	rm -rf core *.o $(PROG) a.out
SHAR_EOF
if test 1301 -ne "`wc -c 'Makefile'`"
then
	echo shar: error transmitting "'Makefile'" '(should have been 1301 characters)'
fi
echo shar: extracting "'opts.h'" '(1008 characters)'
if test -f 'opts.h'
then
	echo shar: over-writing existing file "'opts.h'"
fi
sed 's/^X//' << \SHAR_EOF > 'opts.h'
X
X#define FALSE 0
X#define TRUE 1
Xtypedef int bool;
X
X#define EXIT_SYNTAX 1	/* syntax error parsing commandline options */
X#define EXIT_SEMANT 2	/* options are correct but meaningless */
X#define EXIT_RUNERR 3	/* error opening a file, for example */
X#define EXIT_INTERN 4	/* internal error -- bug!! */
X
X#define nextstr(s,count,array,failure)	\
X	{if (((count)<2) && !((array)[0][1])) {failure;}\
X	else {if ((array)[0][1]) { s = &((array)[0][1]); } \
X	      else {s = array[1]; --count; array++;}}}
X
X#define DFLTNAME "slice"	/* input filename (for stdin) */
X#define DFLTREJECT "/dev/null" /* default reject file */
X#define BUFLEN BUFSIZ	/* the maximum length of an input line (incl. "\n\0") */
X#define MAXFILENAMELEN BUFSIZ	/* longer than the longest possible file name */
X#define DFLTOUTNAME	"#f:#n%03d"	/* o/p file name format */
X#define MBOXFORMAT  "+#f:#$-#4%02m-#5%02d.#6"	/* default for mailbox */
X#define MAXPARM 9				/* maximum parm index value */
X#define PARMESCAPE '#'			/* parameter escape character */
SHAR_EOF
if test 1008 -ne "`wc -c 'opts.h'`"
then
	echo shar: error transmitting "'opts.h'" '(should have been 1008 characters)'
fi
echo shar: extracting "'slice.c'" '(15915 characters)'
if test -f 'slice.c'
then
	echo shar: over-writing existing file "'slice.c'"
fi
sed 's/^X//' << \SHAR_EOF > 'slice.c'
X/*
X *	Program name:  slice
X *
X *	Copyright (c) 1987 by Gary Puckering
X *
X *	This program may be freely used and/or modified, 
X *	with the following provisions:
X *	1. This notice and the above copyright notice must remain intact.
X *	2. Neither this program, nor any modification of it,
X *	   may be sold for profit without written consent of the author.
X *
X *	-----------------------------------------------------------------
X *
X *	This program allows you to cut a file into pieces, either at every
X *  n lines (like fsplit) or based on a pattern match.  Slices are sent
X *  to output files, which are named by providing a format string.  A
X *  file name may be a constant, or may contain substitution parameters,
X *  such as #f for the input file name or #1, #2, ... #9 for tokens
X *  1 through 9 on the line matching the pattern.
X *
X *	-----------------------------------------------------------------
X */
X
Xchar version[] = "@(#)slice.c	2.4";
X
X#include <stdio.h>
X#include <ctype.h>
X#include <string.h>
X#include <sys/file.h>
X
X#include "opts.h"				/* defines nextstr() etc */
X
Xbool exclude = FALSE;			/* exclude matched line from o/p files */
Xbool split_after = FALSE;		/* split after matched line */
Xbool m_flag = FALSE;			/* was -m option used */
X
XFILE *output = (FILE *) NULL;	/* fd of current output file */
XFILE *rejectfd = (FILE *) NULL;	/* fd of reject file */
X
Xint  n_format;					/* number of format strings */
Xint  filenumber = 0;			/* #n substitution for each file */
Xint  every_n_lines = 0;			/* split every n lines */
Xint rejectcnt = 0;				/* count of rejected lines */
X
Xchar inbuffer[BUFLEN];			/* input buffer */
Xchar *progname = "slice";		/* for error messages */
Xchar *pattern = (char *) NULL;	/* reg expr used to split file */
Xchar **format;					/* ptr for format strings */
Xchar *defaultfmt[] = {DFLTOUTNAME};	/* default format string */
Xchar *mboxformat[] = {MBOXFORMAT};	/* default format for mailboxes */
Xchar parmbuf[BUFLEN];			/* parameter buffer */
Xchar *parm[MAXPARM+1];			/* array of pointers to parms */
Xchar nullstring[1] = {""};		/* a null string */
Xchar *infile = (char *) NULL;	/* input file name */
Xchar rejectfile[MAXFILENAMELEN+2] = {DFLTREJECT}; /* reject file name */
X
X/* forward declarations */
XFILE * openfile();
XFILE * mkreject();
Xchar * mkname();
Xchar * rmpath();
Xchar   getfmt();
X
X
Xmain(argc, argv)
X	char *argv[];
X{
X	/* split files at points that match a given pattern */
X
X	/* initialise things */
X	char *buffer;
X	int i;
X	int getnum();		/* does more checking than atoi */
X	char *rmpath();    /* removes leading pathname from a filename */
X
X	for (i=0; i<=MAXPARM; i++) parm[i] = nullstring;
X
X	/* now remove possible leading pathname
X	 * (e.g. /usr/bin/slice is to report it's errors as slice
X	*/
X	progname = rmpath(argv[0]);
X
X
X	while (--argc) {
X	  if (**++argv == '-') {
X		switch(*++*argv) {
X			case 'a': {				/* split after pattern */
X				split_after = TRUE;
X				break;
X			}
X			case 'e': {				/* pattern (expression) */
X				++argv; argc--;
X				if (argc==0 || !**argv) {
X					error("Pattern after -e missing or null\n");
X					usage(1);
X				}
X				pattern = *argv;
X				break;
X			}
X			case 'm': {				/* mailbox pattern */
X				pattern = "^From ";
X				m_flag = TRUE;
X				break; 
X			}
X			case 's': {				/* shell pattern */
X				pattern = "^#! *\/bin\/sh";
X				break; 
X			}
X			case 'n': {				/* -n n_lines -- split every n lines */
X				nextstr(buffer,argc,argv,usage(2));
X				every_n_lines = getnum(buffer);
X				if (every_n_lines <= 0) {
X					error("-n: number must be at least 1\n");
X					exit(EXIT_SYNTAX);
X				}
X				break;
X			} 
X			case 'f': {
X				++argv; argc--;
X				if (argc==0 || !**argv) {
X					error("Filename after -f missing or null\n");
X					usage(1);
X				}
X				infile = *argv;
X				break;
X			}				
X			case 'r': {
X				++argv; argc--;
X				if (argc==0 ||!**argv) {
X					error("Filename after -r missing or null\n");
X					usage(1);
X				}
X				strcpy(rejectfile,*argv);
X				break;
X			}
X		    case 'i': {	/* -i initial_number */
X				nextstr(buffer,argc,argv,usage(2));
X				filenumber = getnum(buffer);
X				if (filenumber < 0) {
X			    	error("-i must be followed by a positive number\n");
X				    exit(EXIT_SYNTAX);
X				 }
X				filenumber--;	/* needs to be one less to start with */
X				break;
X		    }
X			case 'x': { /* exclude matched lines */
X				exclude = TRUE;
X				break;
X			}
X		    default: {
X				error("Unknown flag -%c\n", **argv);
X				usage(1);
X		    }
X		}			/* end switch */
X	  } else {	
X		if (!pattern) pattern = *argv;	/* first non-flag is pattern */
X		else break;						/* break while loop */
X	  }			/* end if */
X     }		/* end while */
X
X	 if (!argc) {
X		if (m_flag) {
X			format = mboxformat;
X		} else {
X			format = defaultfmt;
X		}
X		n_format = 1; 
X	 } else {
X		format = argv;
X		n_format = argc;
X	 }
X
X#ifdef DEBUG
X	printf("argc=%d\n",argc);
X	printf("format='%s'\n",*format);
X	printf("pattern='%s'\n",pattern);
X#endif
X
X	 if (!infile) split(stdin, DFLTNAME, pattern);
X	 else        fsplit(infile, pattern);
X
X     exit(0);
X}
X
X
X/* split a file that hasn't been opened yet */
X
Xfsplit(name, pat)
X     char *name;
X     char *pat;
X{
X     FILE *fd;
X
X     if (!name || !*name) {
X	  error("Can't split a file with an empty name\n");
X	  usage(2);
X     }
X
X     if ( (fd = fopen(name, "r")) == NULL) {
X	  error("Can't open %s\n", name);
X	  return;
X     }
X
X     (void) split(fd, name, pat);
X
X     if (fclose(fd) == EOF) {	/* something's gone wrong */
X	  error("Can't close %s -- giving up\n", name);
X	  exit(EXIT_RUNERR);
X     }
X}
X
X
X/* Split a file that's already been opened */
X
Xsplit(input, name, pattern)
X     FILE *input;		/* fd of input file */
X     char *name;		/* input filename */
X     char *pattern;		/* pattern used to split file */
X{
X
X#ifndef USG
X     extern char *re_comp();     /* compile string into automaton */
X     extern int   re_exec();	/* try to match string */
X     int reg_status = 0;	/* regular expression status */
X     char *errmessage;
X#define REMATCH 1
X#define RENOMATCH 0
X#define REFAULT -1
X#define match(expr) ((expr) == REMATCH)
X#else
X     extern char *regcmp();	/* compile string into automaton */
X     extern char *regex();	/* match string with automaton */
X     char *reg_status;		/* regular expression status */
X     char *rex;
X#define match(expr) ((expr) != NULL)
X#endif
X
X	 char *fname;
X     int line = 0;
X
X	 rejectcnt = 0;
X
X	 if (split_after && exclude) {
X	  error("Can't specify both -a and -x\n");
X	  usage(2);
X	 }
X
X	 if (every_n_lines && exclude) {
X	  error("Can't specify both -n and -x\n");
X	  usage(2);
X	 }
X
X	 if (every_n_lines && split_after) {
X	  error("Can't specify both -n and -a\n");
X	  usage(2);
X	 }
X
X	 if (every_n_lines && pattern) {
X	  error("Can't specify both -n and pattern\n");
X	  usage(2);
X	 }
X
X     if (!every_n_lines && (!pattern || !*pattern)) {
X	  error("Can't match an empty pattern\n");
X	  usage(2);
X     }
X
X#ifndef USG
X     if (!every_n_lines && (errmessage = re_comp(pattern)) != NULL) {
X	  error("Error in pattern <%s>: %s\n", pattern, errmessage);
X	  exit(EXIT_RUNERR);
X     }
X     /* errmessage is NULL here */
X#else
X     if (!every_n_lines && (rex = regcmp(pattern,(char *)0)) == NULL) {
X	  error("Erron in pattern <%s>\n", pattern);
X	  exit(EXIT_RUNERR);
X     }
X     /* rex is pointer to compiled expression.... */
X#endif
X
X	/* if split after mode, open file at start */
X	if (split_after) {
X		fname = mkname(name);
X		output = openfile(fname);
X}
X	
X     /* the -2 to fgets is because of the null and \n appended */
X     while (fgets(inbuffer, BUFLEN - 2, input) != NULL) {
X
X	  if ((every_n_lines > 0 && (++line == every_n_lines)) || 	/* nth line */
X	     (!every_n_lines &&
X#ifndef USG
X	     ( match(reg_status = re_exec(inbuffer) ) ) ) ) { 		/* matches pat */
X#else
X	     ( match(reg_status = regex(rex, inbuffer) ) ) ) ) { 		/* matches pat */
X#endif
X
X			if (split_after) putbuff(inbuffer,output);
X
X			if (!every_n_lines) get_parms(inbuffer);
X			
X			/* close the current file */
X			if (output && output != stderr && 
X						  output != stdout &&
X						  output != rejectfd) {
X				if (fclose(output) == EOF) {
X					error("Can't close output file\n");
X					exit(EXIT_RUNERR);
X				}
X			}
X
X			fname = mkname(name);
X
X			if (*fname) {
X				/* open a new file */
X				output = openfile(fname);
X			} else {
X				/* no filename to open, so use reject file */
X				error("Insufficient formats -- remainder rejected\n");
X				output = (FILE *) NULL;
X			}
X
X			line = 0;  /* reset input line count */
X
X			/* if matched lines are excluded, skip the putbuff */
X			if (exclude && match(reg_status)) continue;
X
X			/* if file is to be split after pattern, put already done */
X			if (split_after && match(reg_status)) continue;
X#ifndef USG
X	  } else {
X	  		if (reg_status == REFAULT) {	/* the re_exec failed */
X				error("Internal error trying to match <%s> to <%s>\n",pattern, inbuffer);
X				exit(EXIT_INTERN);
X			}
X#endif
X	  }  /* end match pattern test  */
X
X	  putbuff(inbuffer, output);		/* now put line out */
X
X      }  /* end while */
X
X	  if (rejectcnt && strcmp(rejectfile,"/dev/null")==0) {
X	  	error("%d lines rejected to /dev/null\n",rejectcnt);
X	  }
X	  
X      return (filenumber == -1);	/* exit status for main */
X}
X
X
X/* Make a reject file */
X
XFILE *
Xmkreject()
X{
X	if (rejectfd) return(rejectfd);	/* if there's already one, don't bother */
X
X	if ( strcmp(rejectfile,"stderr")==0 ) 
X			rejectfd = stderr;
X	else	rejectfd = openfile(rejectfile);
X
X	if (!rejectfd) {
X		error("Cannot open reject file %s\n",rejectfile);
X		exit(EXIT_RUNERR);
X	}
X
X	return (rejectfd);
X}
X
X
X/* Open an output file (or the reject file) */
X
X/* If filename starts with '+' open for append */
X/* If filename is '@' use /dev/null */
X
XFILE *
Xopenfile(fname)
X	char *fname;				/* file to be opened */
X{
X	FILE *fd = (FILE *) NULL;
X	bool exists = FALSE;
X
X	switch (fname[0]) {
X		case '+': {
X			fname++;
X			/* check for output file = input file */
X			if (infile && (strcmp(fname,infile)==0) ) {
X				error("Output file %s same as input file -- slice rejected\n",fname);
X				break;
X			}
X			if (fname[0]==NULL) {
X				fd = stdout;
X				break;
X			} else {
X				if ((fd = fopen(fname, "a")) == NULL) {
X					error("Can't open output file %s for append -- slice rejected\n", fname[1]);
X					break;
X				}
X			}
X			break;
X		}
X		case '@': {
X			fname++;
X			if (fname[0]==NULL) {
X				if ((fd = fopen("/dev/null", "w")) == NULL) {
X					error("Can't open output file /dev/null -- slice rejected\n");
X					break;
X				}
X				break;
X			} else {
X				fname--;
X				/* fall through to process as normal filename */
X			}
X		}
X		default: {
X			if (access(fname,F_OK)==0) exists = TRUE;
X			if ((fd = fopen(fname, "w")) == NULL) {
X				error("Can't open output file %s for write -- slice rejected\n", fname);
X				break;
X			}
X			if (exists && strcmp(fname,"/dev/null")!=0) {
X				error("File %s overwritten\n",fname);
X			}
X		}
X	}
X	return (fd);
X}
X
X
X/* Make a new file name using a format string */
X
Xchar *
Xmkname(name)
X	char *name;		/* file name for #f substitution */
X{
X	static char fnambuf[MAXFILENAMELEN + 2]; /* +1 for null, +1 for overflow */
X	static char lastname[MAXFILENAMELEN + 2];
X
X	int i, j;
X	char *p, *q, *fn;
X	char fmtcode;
X	char fmt[MAXFILENAMELEN];
X	char tempbuf[MAXFILENAMELEN];
X
X  do_format:
X	for (p=(*format), q=fnambuf; *p; p++) {
X		if (*p != PARMESCAPE) { 
X			*q = *p; 
X			q++;
X		} else {
X			*q = NULL;
X			switch (*++p) {
X				case PARMESCAPE: {
X					*q = PARMESCAPE;
X					q++;
X					break;
X				}
X				case 'f': {
X					fn = rmpath(name);
X					strcat(q,fn);
X					q += strlen(fn);
X					break;
X				}
X				case 'n': {
X					++filenumber;
X					if (*(p+1) == '%') {
X						p++;
X						fmtcode = getfmt(fmt,p);
X						p += strlen(fmt) - 1;
X						sprintf(tempbuf,fmt,filenumber);
X					} else {
X						sprintf(tempbuf,"%d",filenumber);
X					}
X					strcat(q,tempbuf);
X					q += strlen(tempbuf);
X					break;
X				}
X				case '1':
X				case '2':
X				case '3':
X				case '4':
X				case '5':
X				case '6':
X				case '7':
X				case '8':
X				case '9': 
X				case '$': {
X					if (*p == '$') {
X						i = lastparm();
X					} else {
X						i = (*p) - '1';
X					}
X					if (*(p+1) == '%') {
X						p++;
X						fmtcode = getfmt(fmt,p);
X						p += strlen(fmt) - 1;
X						if (fmtcode != 's') {
X							if (fmtcode == 'm') {
X								j = mtoi(parm[i]);
X							} else {
X								j = atoi(parm[i]);
X							}
X							sprintf(tempbuf,fmt,j);
X						} else {
X							sprintf(tempbuf,fmt,parm[i]);
X						}
X					} else {
X						strcpy(tempbuf,parm[i]);
X					}
X					strcat(q,tempbuf);
X					q += strlen(tempbuf);
X					break;
X				}
X				default: {
X					error("Invalid substitution #%c in format '%s'\n",*p,*format);
X					exit(EXIT_RUNERR);
X				}
X			}	/* end switch */
X		}	/* end if-else */
X	}	/* end for */
X	
X	*q = NULL;
X
X	/* if name is same, try next format */
X	if (strcmp(fnambuf,lastname)==0) {
X		if (n_format>1) {	/* must be a format left to try */
X			format++;
X			--n_format;
X			filenumber=0;
X			lastname[0] = NULL;
X			goto do_format;
X		} else {			/* we've run out of formats */
X			fnambuf[0] = NULL;			
X		}
X	}
X
X	if (fnambuf[0]) strcpy(lastname,fnambuf);
X
X	return(fnambuf);
X
X} /* end routine */
X
X
X
X/* Get a printf-style format string from within an output format */
X
Xchar
Xgetfmt(fmt,p)		/* returns last character of string (format code) */
X	char *fmt;		/* target of format string */
X	char *p;		/* p should point to the '%' starting the format */
X{
X	char *q;
X	char fmtcode;
X
X	if (*p != '%') {
X		error("Internal error -- getfmt called when not pointing at %");
X		exit(EXIT_RUNERR);
X	}
X	q = strpbrk(p,"mduxhocsfeg");  /* 'm' is a special extension */
X	
X	if (!q) {
X		error("Can't find end of format spec");
X		exit(EXIT_RUNERR);
X	}
X
X	fmtcode = *q;
X	strncpy(fmt,p,(q-p+1));
X
X	switch (*q) {
X		case 'h':
X		case 'c':
X		case 'f':
X		case 'e':
X		case 'g': {
X			error("Format '%s' is not supported\n",fmt);
X			exit(EXIT_RUNERR);
X		}
X		case 'm': {
X			fmt[strlen(fmt) - 1] = 'd';	/* change to d */
X		}
X	}
X
X	return(fmtcode);
X}
X
X
X
X/* Write an input line to the output file */
X/* If the output file isn't open, write it to the reject file */
X
Xputbuff(buffer,fd)
X	char *buffer;
X	FILE *fd;
X{
X	if (fd) fputs(buffer,fd);
X	else {
X		rejectcnt++;
X		fputs( buffer,mkreject() );
X	}
X}
X
X
X
X/* getnum(s) returns the value of the unsigned int in s.  If there's any
X * trailing garbage, or the number isn't +ve, we return -1
X */
X
Xgetnum(s)
X     char *s;
X{
X     register char *p;
X
X     for (p = s; *p; p++) {
X	  if (!isdigit(*p)) {
X	       return -1;
X	  }
X     }
X     return atoi(s);
X}
X
X
X
X/* Remove the leading pathname from a filename */
X
Xchar *
Xrmpath(fullname)
X    char *fullname;
X{
X    register char *p;
X    char *q = (char *) NULL;
X
X    for (p = fullname; p && *p; p++) {
X         if (*p == '/')
X  	    q = ++p;
X    }
X    if (q && *q) {
X         return(q);
X    }
X    return(fullname);
X}
X
X
X
X/* Get tokens (parameters) from matched input lines */
X
Xget_parms(buffer)
X	char *buffer;
X{
X	int  i,l;
X
X	strcpy(parmbuf,buffer);
X	l = strlen(parmbuf);
X	if(parmbuf[l-1]=='\n') parmbuf[l-1]=NULL;
X
X	parm[0]=strtok(parmbuf," ");
X
X	for (i=1; i<=MAXPARM; i++) {
X		parm[i]=strtok(NULL," ");
X	}
X
X	for (i=0; i<=MAXPARM; i++) {
X		if (!parm[i]) parm[i] = nullstring;
X	}
X}
X
X
X
X/* Find last non-null parameter */
X
Xlastparm()
X{
X	int i;
X
X	for (i=MAXPARM; i>0; i--) {
X		if (parm[i] && *parm[i]) return(i);
X	}
X	return(0);
X}
X
X
X
X/* Convert three character month name to integer */
X
Xmtoi(monthname)
X	char *monthname;
X{
X	static char *months[] = { 
X		"Jan", "Feb", "Mar", "Apr", "May", "Jun",
X		"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
X		};
X
X	int i;
X
X	for (i=1; i<=12; i++) {
X		if (strcmp(monthname,months[i-1])==0) return(i);
X	}
X
X	error("Invalid month '%s' found -- zero used",monthname);
X	i=0;
X	return(i);
X}
X
X
Xerror(fmt, a1, a2, a3, a4)
X     char *fmt;
X{
X     fputs(progname, stderr);
X     fputs(": ", stderr);
X     fprintf(stderr, fmt, a1, a2, a3, a4);
X}
X
X
Xusage(status)
X     int status;	/* exit if status != 0 */
X{
X     fprintf(stderr,"Usage: %s [-f filename] [-a|-x] [-i<n>] [-w] [-m|-s|-n<n>] [-r file] [-e expression | expression] [format...]\n", progname);
X     if (status)
X	  exit(status);
X}
X
SHAR_EOF
if test 15915 -ne "`wc -c 'slice.c'`"
then
	echo shar: error transmitting "'slice.c'" '(should have been 15915 characters)'
fi
echo shar: extracting "'CHANGES'" '(2731 characters)'
if test -f 'CHANGES'
then
	echo shar: over-writing existing file "'CHANGES'"
fi
sed 's/^X//' << \SHAR_EOF > 'CHANGES'
X 
X------------------------------ CHANGES ------------------------------
X
Xv2.3    87/09/20  21:36:52  garyp
X        Make default reject file /dev/null.  Report count of lines
X        rejected, unless reject file specified.
X
Xv2.2    87/09/15  17:33:33  garyp
X        Fix bug that caused dump when parm was null.  Changed mkname so
X        that the path is removed from the input filename.  Added #$
X        substitution parameter, to correctly split mailboxes.  Changed
X        the default pattern for -m to use #$ instead of #7.
X
Xv2.1    87/09/15  00:20:07  garyp
X        New release of slice, now supports substitution parameters.
X        Use of %s and %d replaced by #f and #n respectively.
X        Parameters #1, #2, ... #9 are replaced by tokens from the
X        matched line.  Printf-style format (e.g. %02d) may immediately
X        follow any substitution.
X
Xv1.12   87/09/10  17:05:04  garyp
X        Reject file now added.
X
Xv1.11   87/09/08  23:53:29  garyp
X        Add features "+<filename>" and "@"
X
Xv1.10   87/02/12  10:32:50  garyp
X        Minor revision to Sys V changes Added #define match() to reduce
X        number of #ifndef USG Added some comments
X
Xv1.9    87/02/09  10:10:27  garyp
X        Revisions to support System V regular expression routines.
X        Provided by Jon A. Tankersley.
X
Xv1.8    86/12/12  22:28:40  garyp
X        Add -x and -a options.  Improve semantic checking.  Fix -n
X        (wasn't working -- caused core dump).
X
Xv1.7    86/12/12  11:54:56  garyp
X        Fix problem with multiple '+' formats Switch from 'w' to 'a'
X        (append) access for output files.
X
Xv1.6    86/12/11  21:18:08  garyp
X        Cleaned up error status codes.  Allowed '+' format to designate
X        stdout.
X
Xv1.5    86/12/11  17:29:25  garyp
X        New improved mailsplit with lots of features.  Renamed slice.
X
Xv1.4    86/12/09  17:03:05  garyp
X        First parameter is now either -m, for mailbox pattern, or -s,
X        for shell pattern, or -n for line count split, or a user
X        specified pattern (not starting with '-').  These options
X        replace -p.  The default output filename is now "%s:%03d" where
X        %s is the input filename.
X
Xv1.3    86/12/09  12:57:49  garyp
X        Fixed search for %d.  All forms now allowed.  Only %s allowed,
X        though (no other form seems useful anyway).
X
Xv1.2    86/12/09  12:10:09  garyp
X        Changed to allow output pattern to contain %s to represent
X        input filename.  However, code only does exact match on %d and
X        %s.  It should do a pattern match to allow things like %6d.
X
Xv1.1    86/12/08  21:08:12  garyp
X        date and time created 86/12/08 21:08:12 by garyp
X---------------------------------------------------------------------
X 
SHAR_EOF
if test 2731 -ne "`wc -c 'CHANGES'`"
then
	echo shar: error transmitting "'CHANGES'" '(should have been 2731 characters)'
fi
#	End of shell archive
exit 0


-- 
For comp.sources.unix stuff, mail to rsalz at uunet.uu.net.



More information about the Comp.sources.unix mailing list