[COFF] Requesting thoughts on extended regular expressions in grep.

Ralph Corderoy ralph at inputplus.co.uk
Sat Mar 4 20:07:17 AEST 2023


Hi Grant,

> Suppose I have the following two lines:
>
>     aaa aaa
>     aaa bbb
>
> Does the following RE w/ back-reference introduce a big performance
> penalty?
>
>     (aaa|bbb) \1
>
> As in:
>
>     % echo "aaa aaa" | egrep "(aaa|bbb) \1"
>     aaa aaa

You could measure the number of CPU instructions and experiment.

    $ echo xyzaaa aaaxyz >f
    $ ticks() { LC_ALL=C perf stat -e instructions egrep "$@"; }
    $
    $ ticks '(aaa|bbb) \1' <f
    xyzaaa aaaxyz

     Performance counter stats for 'egrep (aaa|bbb) \1':

	       2790889      instructions:u                                              

	   0.009146904 seconds time elapsed

	   0.009178000 seconds user
	   0.000000000 seconds sys


    $

Bear in mind that egreps differ, even within GNU egrep, say, over time.

    $ LC_ALL=C perf stat -e instructions egrep '(aaa|bbb) \1' f
    xyzaaa aaaxyz
    ...
	       2795836      instructions:u                                              
    ...
    $ LC_ALL=C perf stat -e instructions perl -ne '/(aaa|bbb) \1/ and print' f
    xyzaaa aaaxyz
    ...
	       2563488      instructions:u                                              
    ...
    $ LC_ALL=C perf stat -e instructions sed -nr '/(aaa|bbb) \1/p' f
    xyzaaa aaaxyz
    ...
		610213      instructions:u                                              
    ...
    $

-- 
Cheers, Ralph.


More information about the COFF mailing list