crufty memcmp on 386

Doug Ingraham dpi at loft386.uucp
Wed Nov 14 05:21:33 AEST 1990


In article <1990Nov12.173812.3227 at cbnewsh.att.com>, gls at cbnewsh.att.com (Col. G. L. Sicherman) writes:
> There is a problem with the memcmp routine on my 6386WGS running Vr3.2.
> The ordering it returns is not transitive, or even antisymmetric.  In
> other words, it can tell you that x < y and y < x, or that x < y < z < x.
> 
> I do not know whether the formal specifications for memcmp require that
> it produces a lineal ordering, but obviously it should.  A nonlineal
> memcmp cannot be used for sorting and searching.

The ANSI spec that requires the behavior is in section 4.11.4 on page 165
of the standard.

4.11.4 Comparison Functions

   The sign of a nonzero value returned by the comparison functions memcmp,
strcmp, and strncmp is determined by the sign of the difference between the
values of the first pair of characters (both interpreted as unsigned char)
that differ in the objects being compared.

4.11.4.1 The memcmp Function

Synopsis

	#include <string.h>
	int memcmp(const void *s1, const void *s2, size_t n);

Description

   The memcmp function compares the first n characters of the object pointed
to by s1 to the first n characters of the object pointed to by s2.

Returns
   The memcmp function returns an integer greater than, equal to, or less
than zero, accordingly as the object pointed to by s1 is greater than, equal
to, or less than the object pointed to by s2.

I wrote an exhaustive test for the purposes of testing my version of this
function (included at the end) but the version supplied with Bell Tech's
version of System V/386 3.2 passes my test.

> 
> The 386-specific code does this:
> 
> 	jz	zeroexit
> 	lahf		// load low byte of flags to high byte of AX
> 	cwtl		// convert word to longword with sign extension
> 
> There are two problems with this.  One is that if EAX contains zero and
> all the flag bits are 0, it will return 0 instead of positive.  This
> never seems to happen; apparently the unused bit 1 is always set to 1.

The lahf instruction converts the 8086 16 bit flags register to the
8080 8 bit format.  The jz before the lahf is used to ensure that
a zero is returned only when equalty is true.  The not equal case which
is what is being determined is the only case where the lahf cwtl comes
into play.  You will get a negative number or a positive non-zero from
the output of this sequence.  Probably the comparison operation on the
previous line was a rep cmpsb which will compare the two bytes via
an 8 bit subtraction and set the flags.  One of the unused bits is set
to a 1 and was specified to do that on the 8085 (I think) so this is
a very safe assumption.

> Is anybody working on a fix for the code and (if necessary) the formal
> specs for memcmp()?

I can't make the original fail.  Could you provide a short code segment
that will display the claimed behavior?  If you can make AT&T's fail,
I would like to know if mine fails also.

The official specs are copied above and the following is my replacement
routine that can be almost 4 times faster than the AT&T implementation.

------------------------------------------------------------------------

	.file	"memcmp.s"
/ The Intel 80386 implementation of the ANSI memcmp() function.
/ Written April 8, 1990 by Doug Ingraham.  dpi at loft386.UUCP
/ Copyright 1990 by Douglas P. Ingraham.  This code is freely
/ distributable so long as this notice remains intact.
	.text
	.align	4
	.globl	memcmp
memcmp:
/ Save edi and esi in edx and eax to save 4 clocks over popping them
	movl	%edi,%edx		/ 2 clocks 2 bytes
	movl	%esi,%eax		/ 2 clocks 2 bytes
/ esi is pointer to the first string
	movl	0x4(%esp),%esi		/ 2 clocks 4 bytes
/ edi is pointer to the second string
	movl	0x8(%esp),%edi		/ 2 clocks 4 bytes
/ This tests to see if the pointers are to the same string.
	cmpl	%esi,%edi		/ 2 clocks 2 bytes
	je	same			/ 7+m,3 clocks 2 bytes
/ The ecx gets the count.  If Zero then by definition they are the same.
	movl	0xC(%esp),%ecx		/ 2 clocks 4 bytes
	jcxz	same			/ 9+m,5 clocks 2 bytes
/ Test the count to make sure there are enough bytes to bother with this
/ speed optimization.  12 was chosen as a ballpark figure.  Must be greater
/ than 7 as a minimum.
	cmpl	$12,%ecx		/ 2 clocks 3 bytes
	jl	byte			/ 7+m,3 clocks 2 bytes
/ Test to see if either pointer is word aligned.  Most is 3 cmpsb's to align
/ so the loop is unrolled.  75 clocks worst case.
	testl	$3,%esi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	testl   $3,%edi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	cmpsb				/ 10 clocks 1 byte
	jne	different		/ 7+m,3 clocks 2 bytes
	decl	%ecx			/ 2 clocks 1 byte
	testl	$3,%esi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	testl   $3,%edi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	cmpsb				/ 10 clocks 1 byte
	jne	different		/ 7+m,3 clocks 2 bytes
	decl	%ecx			/ 2 clocks 1 byte
	testl	$3,%esi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	testl   $3,%edi			/ 2 clocks 6 bytes
	jz	is_aligned		/ 7+m,3 clocks 2 bytes
	cmpsb				/ 10 clocks 1 byte
	jne	different		/ 7+m,3 clocks 2 bytes
	decl	%ecx			/ 2 clocks 1 byte
/ Save the modified count back on the stack for possible later use.
is_aligned:
	movl	%ecx,0xC(%esp)		/ 2 clocks 4 bytes
/ Divide by 4 so that we can do word compares
	shrl	$2,%ecx			/ 3 clocks 3 bytes
/ Do the long level search.
	repz;	cmpsl			/ 5+9*n clocks 2 bytes
	je	byte_equal		/ 7+m,3 clocks 2 bytes
/ backup 4 bytes
	movl	$4,%ecx			/ 2 clocks 5 bytes
	subl	%ecx,%esi		/ 2 clocks 2 bytes
	subl	%ecx,%edi		/ 2 clocks 2 bytes
/ Do the byte level search.
byte:
	repz;	cmpsb			/ 5+9*n clocks 2 bytes
	je	same			/ 7+m/3 clocks 2 bytes
/ Restore the esi so that eax is available for return value.
different:
	movl	%eax,%esi		/ 2 clocks 2 byte
/ This is a trick to put a positive or negative in the eax.
	lahf				/ 2 clocks 1 byte
	cwtl				/ 3 clocks 1 byte
/ Restore the edi.
	movl	%edx,%edi		/ 2 clocks 2 bytes
	ret				/ 10+m clocks 1 byte
/ If the string compared out then come here.
same:	movl	%eax,%esi		/ 2 clocks 2 bytes
short_same:
	movl	%edx,%edi		/ 2 clocks 2 bytes
	xorl	%eax,%eax		/ 2 clocks 2 bytes
	ret				/ 10+m clocks 1 byte
/ come here if the high speed test was equal 
byte_equal:
	movl	0xC(%esp),%ecx		/ 2 clocks 4 bytes
	andl	$3,%ecx			/ 2 clocks 6 bytes
	repz;	cmpsb			/ 5+9*n clocks 2 bytes
/ Restore the esi so that eax is available for return value.
	movl	%eax,%esi		/ 2 clocks 2 byte
	je	short_same		/ 7+m/3 clocks 2 bytes
/ This is a trick to put a positive or negative in the eax.
	lahf				/ 2 clocks 1 byte
	cwtl				/ 3 clocks 1 byte
/ Restore the edi.
	movl	%edx,%edi		/ 2 clocks 2 bytes
	ret				/ 10+m clocks 1 byte

------------------------------------------------------------------------

It is quite a bit longer than the original, but is much faster in the
general case.

Apologies to those with other architectures.


-- 
Doug Ingraham (SysAdmin)
Lofty Pursuits (Public Access for Rapid City SD USA)
bigtex!loft386!dpi
uunet!loft386!dpi



More information about the Comp.bugs.sys5 mailing list