[Mono-devel-list] Alias analysis

Paolo Molaro lupus at ximian.com
Thu Jan 27 10:13:03 EST 2005


On 01/27/05 Massimiliano Mantione wrote:
> Purpose of alias analysis: understand the operations performed
> on local variables, so that SSA can be enabled, and liveness
> information will be not only "conservatively correct", but as
> accurate as possible.

Note that there are two issues to solve for correct liveness
info when full optimizations are enabled:
*) keeping track of aliased memory locations so that
we properly flag when the var is used or stored to.
*) keeping track of CFG edges to exception handlers
(we don't have a good solution for this yet, so SSA and other
affected optimizations need to remain disabled for now when
we have exception clauses).

> [3] Whenever an address is used, if the address is actually an
> alias, this means that the aliased variable is used. The use

s/variable/variables/: there could be many

> The MONO_SSA_MAYBE_LOAD value has a misleading name.
> It is set in each MonoInst that takes the address of a local (or
> argument). Note that this operation is not really a load, the
> variable value is not used and the variable itself is unchanged.
> And, after such an instruction, the address could in fact be used
> to write to the variable instead of reading (this is why the name
> is misleading IMHO).
> This value is tested for in a couple of places in SSA construction
> and destruction code, simply because it points to a use of the
> variable (taking the address is a form of use after all), and SSA
> must rename all variables in all places where they are used.
> It is also used in SSA based copyprop, but IMHO in the wrong way
> (and SSA based copyprop seems broken anyway).
> What is important about MONO_SSA_MAYBE_LOAD is that it already
> flags *all* the places where an alias is created (because taking
> the address of a variable means creating an alias).

The important thing here is: the variables whose address is taken need
to be recorded (it is explicitly recorded in the instruction, but
we need to propagate this info where the pointer is used, even if it 
is copied around in other variables).

> Since MonoInst.ssa_op has three bits, we could use them fully in
> the following way:

We can add more bits if needed.

> 
> /* values for MonoInst.ssa_op */
> enum {
> MONO_SSA_NOP = 0,
> MONO_SSA_ADDRESS_TAKEN = 1,
> MONO_SSA_LOAD = 2,
> MONO_SSA_STORE = 4,
> MONO_SSA_INDIRECT_LOAD = MONO_SSA_LOAD|MONO_SSA_ADDRESS_TAKEN,
> MONO_SSA_INDIRECT_STORE = MONO_SSA_STORE|MONO_SSA_ADDRESS_TAKEN,
> MONO_SSA_INDIRECT_LOAD_STORE =
> MONO_SSA_LOAD|MONO_SSA_STORE|MONO_SSA_ADDRESS_TAKEN
> };
> 
> In the existing code, all occurrences of MONO_SSA_LOAD and
> MONO_SSA_STORE would work unchanged, and MONO_SSA_MAYBE_LOAD would
> be replaced by MONO_SSA_ADDRESS_TAKEN.
> Also MONO_SSA_NOP would work as before, with the exception of the
> single place where it is tested... how this would change will be
> clear afterwords.
> 
> The new "*_INDIRECT_*" values would mark operations performed
> through aliases, which basically belong to one of the following
> categories:
> - CEE_STIND_* instructions that store to something that could be
>   an alias (instead of storing to a local or arg), these should
>   be marked MONO_SSA_INDIRECT_STORE.
> - CEE_LDIND_* instructions that read from something that could be
>   an alias (instead of reading from a local or arg), these should
>   be marked MONO_SSA_INDIRECT_LOAD.
> - CALL instructions that have at least one parameter that is passed
>   by reference (it could either point to a local, or be an alias),
>   these should be marked MONO_SSA_INDIRECT_LOAD_STORE.
> 
> The real issue, then is: which values are aliases?
> This is why step [2] above is so important: every value that is
> the address of a local variable is an alias, so we should track
> all MONO_SSA_ADDRESS_TAKEN instructions, and see where they put
> the address.
> 
> There are not many alternatives for this, the possible places
> where an alias can end up are the following:
> 
> [a] An OP_OUTARG instruction (so the alias is passed to a called
> method, which can use it for reading, writing, or both).
> [b] A local variable (or an argument).
> [c] A field of an object. The field could be static, and the object
> could be a value type (instead of a class instance), but in this
> first analysis we will ignore the differences between these cases.
> 
> Note that in verifiable code only [a] is allowed (see Partition I,
> section 12.4.1.5.2 of the ECMA standard for CIL).
> This is very important, because we expect the vast majority of
> the code we will run to be verifiable, and case [a] is very easy
> to handle.
> In contrast, case [b] is harder (we have full control of local
> variables, but tracking them is in any case nontrivial), and case

Note that I don't think that part of the standard excludes the 
storing of a managed pointer in a local to be unverifiable and
I can't think of a reason why it would be unverifiable. The attached
test case passes PEVerify, for example and besides, we store such
pointers allt eh time internally during code generation, so we must
handle this case.
As for non-verifiable code: I must re-read the standard, but non-verifiable
code doesn't mean that the jit is allowed to produce incorrect code, it 
only means that executing the code may violate the type and security
constraints.

> - Case [a] (verifiable code)
> 
> To handle case [a], we must examine the MonoCallInst, and follow
> in parallel the arguments (eventually skipping the "this") and
> their parameter declarations in the signature.

Valuetypes are passed by ref as this, for example, so I don't think we
should skip arg0.

> For each "byref" parameter, we should see if we can determine
> which local or argument has been passed. For each of them, the
> combined "LOAD+STORE" operation applies.

We could use the parameter flags here to optimize later to
handle the args that are only in or only out.

> Maybe it would be worth adding a flag to MonoCompile that is true
> if at leas one variable is potentially aliased, this would allow
> to avoid scanning the whole array of variables to find the ones
> that are potentially aliased.

As a first step it's fine to do this: later we want to track which
vars an address could point to more precisely to do more
aggressive optimizations.
Thanks.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list