[TUHS] pointer disambiguation (was Re: Disassemers)

Sat Jul 3 02:04:21 AEST 2021

On 7/1/21, scj at yaccman.com <scj at yaccman.com> wrote:

> When PCC came along and started running on 32-bit machines, I started
> thinking about algorithms for optimization.  A problem that I had no
> good solution for could be illustrated by a simple piece of code:
>
>          x = *p;
>
>          y = *q;
>
>          q gets changed
>
>          *q = z;
>
> The question is, do I need to reload x now because q might have been
> changed to point to the same place as p?

Yes, this is a very well-known problem in scalar optimization in
compiler engineering.  It's called pointer disambiguation and is part
of the more general problem of data flow analysis.  As you observed,
getting it wrong can lead to very subtle and hard-to-debug correctness
problems.  In the worst case, one has to throw out all current data
flow analysis of global and currently active local variables and start
over. In your example, the statement "*q = z" may end up forcing the
compiler to toss out all data flow information on x and z (and maybe p
and q as well).  If q could possibly point to x and x is in a
register, the assignment forces x to be reloaded before its next use.
Ambiguous pointers prohibit a lot of important optimizations.  This
problem is the source of a lot of bugs in compilers that do aggressive
optimizations.

Fortunately a bit of knowledge of just how "q gets changed" can rescue
the situation.  In strongly-typed languages, for example, if x and z
are different data types, we know the assignment of z through q can't
affect x.  We also know that the assignment can't affect x if x and z
have disjoint scopes.

The 'restrict' keyword in C pointer declarations was added to help
mitigate this problem.

Some compilers also have a command line option that allows the user to
say, "I solemnly swear that I won't do this sort of thing".

-Paul W.