[Mono-devel-list] Alias analysis
Massimiliano Mantione
massi at ximian.com
Wed Jan 26 18:15:51 EST 2005
This is a possible plan to implement alias analysis.
Those interested can read, and comment on it so that I have
some feedback...
Purpose of alias analysis: understand the operations performed
on local variables, so that SSA can be enabled, and liveness
information will be not only "conservatively correct", but as
accurate as possible.
This means we are (in this step) only interested in the aliasing
of local variables, and not of storage locations external to the
current method (analyzing aliasing of those locations could lead
to the elimination of redundant operations on them, or other
optimizations, but this is not the goal now).
So, what should alias analysis do?
[1] Detect when local variables are aliased.
Basically, this means watching when the address of a variable
is taken.
[2] Understand where the address that has been taken goes.
This is important for the effectiveness of the next step:
[3] Whenever an address is used, if the address is actually an
alias, this means that the aliased variable is used. The use
could be a load or a store (according to how the pointer is
used, for reading or writing).
The key point is that we should be accurate.
Being conservative, one could say that whenever a pointer is
used, all variables are potentially used. This would produce
correct code, but would not help much in optimizing it.
Instead, we should reason in the following way: every time a
pointer is used, only the variables potentially aliased by that
pointer are potentially used.
In fact, a variable whose address is never taken is not aliased.
And also a variable whose address *has* been taken is obviously
unaffected by operations on *other* addresses: this is why step
two is important, so that step three will be more accurate.
Now, how should this be implemented?
This raises another question, which in fact is more important:
how would the JIT code make use of the aliasing data?
Before answering this, let's look at what the code does now.
Currently there is an enum that states which kind of operation
is performed on local variables, and every instruction is flagged
with one of the values (MONO_SSA_NOP is the default):
/* values for MonoInst.ssa_op */
enum {
MONO_SSA_NOP,
MONO_SSA_LOAD,
MONO_SSA_STORE,
MONO_SSA_MAYBE_LOAD,
MONO_SSA_MAYBE_STORE
};
The LOAD and STORE values are widely used, often (always?) with
the assumption that "inst->inst_i0->inst_c0" is a valid variable
index. IMHO, when reworking these flags, it would be wise not to
break this assumption. Basically, these values flag pure "read"
and "write" accesses to local variables or arguments.
The MONO_SSA_NOP value, as said, is the default.
It is used explicitly only when making a MonoInst a NOP inside
the ssa code, and it is tested only in one place, in local copy
propagation. The test is the following: if a MonoInst is a store,
but it is flagged MONO_SSA_NOP, assume that we do not know where
it is stored, so invalidate the state of *all* local variables
(because all of them have now a potentially unknown value).
The MONO_SSA_MAYBE_LOAD value has a misleading name.
It is set in each MonoInst that takes the address of a local (or
argument). Note that this operation is not really a load, the
variable value is not used and the variable itself is unchanged.
And, after such an instruction, the address could in fact be used
to write to the variable instead of reading (this is why the name
is misleading IMHO).
This value is tested for in a couple of places in SSA construction
and destruction code, simply because it points to a use of the
variable (taking the address is a form of use after all), and SSA
must rename all variables in all places where they are used.
It is also used in SSA based copyprop, but IMHO in the wrong way
(and SSA based copyprop seems broken anyway).
What is important about MONO_SSA_MAYBE_LOAD is that it already
flags *all* the places where an alias is created (because taking
the address of a variable means creating an alias).
The MONO_SSA_MAYBE_STORE value is never used, so it can disappear.
OK, so this is the current state of the code, where do we go from
here?
The idea is to work on the MonoInst.ssa_op values we have seen,
making them represent also operations that happen through aliases
so that we can properly track them.
Since MonoInst.ssa_op has three bits, we could use them fully in
the following way:
/* values for MonoInst.ssa_op */
enum {
MONO_SSA_NOP = 0,
MONO_SSA_ADDRESS_TAKEN = 1,
MONO_SSA_LOAD = 2,
MONO_SSA_STORE = 4,
MONO_SSA_INDIRECT_LOAD = MONO_SSA_LOAD|MONO_SSA_ADDRESS_TAKEN,
MONO_SSA_INDIRECT_STORE = MONO_SSA_STORE|MONO_SSA_ADDRESS_TAKEN,
MONO_SSA_INDIRECT_LOAD_STORE =
MONO_SSA_LOAD|MONO_SSA_STORE|MONO_SSA_ADDRESS_TAKEN
};
In the existing code, all occurrences of MONO_SSA_LOAD and
MONO_SSA_STORE would work unchanged, and MONO_SSA_MAYBE_LOAD would
be replaced by MONO_SSA_ADDRESS_TAKEN.
Also MONO_SSA_NOP would work as before, with the exception of the
single place where it is tested... how this would change will be
clear afterwords.
The new "*_INDIRECT_*" values would mark operations performed
through aliases, which basically belong to one of the following
categories:
- CEE_STIND_* instructions that store to something that could be
an alias (instead of storing to a local or arg), these should
be marked MONO_SSA_INDIRECT_STORE.
- CEE_LDIND_* instructions that read from something that could be
an alias (instead of reading from a local or arg), these should
be marked MONO_SSA_INDIRECT_LOAD.
- CALL instructions that have at least one parameter that is passed
by reference (it could either point to a local, or be an alias),
these should be marked MONO_SSA_INDIRECT_LOAD_STORE.
The real issue, then is: which values are aliases?
This is why step [2] above is so important: every value that is
the address of a local variable is an alias, so we should track
all MONO_SSA_ADDRESS_TAKEN instructions, and see where they put
the address.
There are not many alternatives for this, the possible places
where an alias can end up are the following:
[a] An OP_OUTARG instruction (so the alias is passed to a called
method, which can use it for reading, writing, or both).
[b] A local variable (or an argument).
[c] A field of an object. The field could be static, and the object
could be a value type (instead of a class instance), but in this
first analysis we will ignore the differences between these cases.
Note that in verifiable code only [a] is allowed (see Partition I,
section 12.4.1.5.2 of the ECMA standard for CIL).
This is very important, because we expect the vast majority of
the code we will run to be verifiable, and case [a] is very easy
to handle.
In contrast, case [b] is harder (we have full control of local
variables, but tracking them is in any case nontrivial), and case
[c] is likely to be unsolvable.
So, an initial approach would be to handle case [a] accurately,
and cases [b] and [c] conservatively.
- Case [a] (verifiable code)
To handle case [a], we must examine the MonoCallInst, and follow
in parallel the arguments (eventually skipping the "this") and
their parameter declarations in the signature.
For each "byref" parameter, we should see if we can determine
which local or argument has been passed. For each of them, the
combined "LOAD+STORE" operation applies.
If what is passed to one byref parameter is not just the address
of a local, we have some trouble... basically this method call
behaves like an operation of kind [b] or [c] (we must handle it
conservatively).
- Cases [b] and [c]
Basically, when flagging instructions as MONO_SSA_ADDRESS_TAKEN,
if the generated alias is of kind [b] or [c] we should also flag
the local (or argument), stating that its address has been taken.
Since local variables have a MonoInst as well, and since their
MonoInst.ssa_op makes no sense (they are not real operations), it
could make sense to flag them exactly setting that field to
MONO_SSA_ADDRESS_TAKEN :-)
Then, we just know that each instruction flagged as
MONO_SSA_INDIRECT_LOAD or MONO_SSA_INDIRECT_STORE that operates
on an unknown value will potentially affect all variables flagged
as MONO_SSA_ADDRESS_TAKEN that have a type which is compatible
with the type of the operation.
Maybe it would be worth adding a flag to MonoCompile that is true
if at leas one variable is potentially aliased, this would allow
to avoid scanning the whole array of variables to find the ones
that are potentially aliased.
Or maybe we could keep one list of MonoInst for each primitive
type, and put in there the aliased variables... but these are just
implementation details.
OK, this seems all...
If you made it that far... congratulations ;-)
More information about the Mono-devel-list
mailing list