[Mono-devel-list] Again on alias analysis

Fri Feb 18 12:02:23 EST 2005

On Fri, 2005-02-18 at 17:30 +0100, Massimiliano Mantione wrote:
> Cases [1] to [4] look fairly easy, because the alias seems short
> lived, it is "consumed" by the following call instruction.
> So it is easy to consider that call as an use/redefinition of
> the local variable, and be OK with that.
> The problem is that this is true only if the alias is *really*
> short lived.
> What I mean is that in principle the called method could store
> the pointer in a location that outlives the call.
> Since we do not perform any interprocedural analysis, we don't
> know if the called method does this or not.
> The problem is that if this happens, then any subsequent call to
> any method could use/redefine our local variable (to be picky,
> any use of any pointer of which we don't know exactly the value
> could do it).
> If I understand the standard correctly, this can happen only if
> the parameter of the called method were a pointer (int*), and it
> cannot happen if it is an argument passed by reference (ref int).
> So it is important being able to relate each method argument to
> its declaration in the called method's signature, to know if it
> is a pointer passed by value (int*, like in C, which is unsafe
> and we cannot really control), or a local passed by reference
> (ref int or int&, which we know will be used only in the called
> method).
> To verify this, I have written a small test program (attached).
> In that program, it is evident that "(outarg (ldaddr local[0]))"
> is used both for "int*" and "int&" parameters, so it is obvious
> that to distinguish them we must look at the call signature.
> By the way, all of this actually matters only for case [1].
> In cases [2] to [4], the "parameter" is the "this" of a method
> call, so we can simply assume that the value is used/written.

For verifiable code, your assumption is correct that the value of a `ref
int' will never be stored into a place where it outlives the method
(IIRC, the verifiable code prevents you from storing an int& anywhere
other than in a local variable). However, for unverifiable code, I see
no reason why somebody couldn't do that. For example, one could convert
the int& to an int*.

Just did a test case. Consider this:

using System;

unsafe class X {

	static int* xx;

	static void Main ()
	{
		int foo = 0;

		Blah (ref foo);

		Console.WriteLine ("foo = {0}", foo);

		foo = 0;

		*xx = 1;

		Console.WriteLine ("foo = {0}", foo);
	}

	static void Blah (ref int mptr)
	{
		xx = &mptr;
	}
}

Note how we just sent `foo' to `Blah' as a ref param. However, by doing
this, it got its address taken. If you assume that the ldaddr in the
method call to `Blah' is just a use/redef, this test case would print
0/0. However, it prints 0/1

This test case does not compile in csc. There is an mcs bug. However,
you can write:

		fixed (int* x = &mptr) {
				xx = x;
		}

(note that doing that does not work in mcs, because of the bug).

This test case is 100% non-verifiable. However, I am not sure if it is
legal as non-verifiable code. As suggested by csc, I need to fix the
managed pointer when I take its address, because it could be the address
of a field in the gc heap. However, if I know that it is already pinned,
I am not sure if it is legal to un-pin the pointer.

MSFT prints 0/1 here, but that may because they don't even attempt
this...

But it seems that once the ldaddr goes outside the methods, all bets are
off.

> I still have to look at how cases [5] and [6] work exactly, but
> my impression is that they do not generate any alias, they just
> change the local value overwriting its memory area.

It'd be nice if we could have the IR express these operations in such a
way that they did not look like taking an address.

> Case [7] is simple in principle, but problematic in practice
> (because without proper data flow analysis we cannot really know
> where the alias will be used).
> Since it is relatively rare (with respect to other cases), and
> since this seems the *only* case where the alias propagator
> should perform data flow analysis, in the beginning I'd propose
> to handle in conservatively, and just assume that the pointer is
> potentially everywhere, so any "suspicious" operation can affect
> that local variable.
> However, this should change in the future.
> Particularly, when we'll have the linear IR, I guess that many
> cases will become of this kind: we will not have instruction
> trees, so all the tree nodes will be virtual registers.
> Now, on the other hand, traversing a MonoInst tree is like
> performing a simple local data flow analysis, but this will be
> lost with the linear IR.
> To ease this problem, perhaps we could handle in a special way
> those virtual registers that happen to be defined and used only
> once (maybe recording explicitly those unique use and definition
> points in the virtual register data structure).
> Zoltan, as you are looking at this linear IR thing, do you have
> any comment?

Where do these operations happen? C# doesn't let you create an int&
local variable. Are these unsafe methods?

> On case [8] I have a few doubts.
> First of all I'd like to know if the assumption that those cases
> correspond to field accesses is true.

Like cases [5] and [6], it'd be nice for the IR to express these ops in
such a way that they didn't look like taking an address.

> I have verified a few of them, and it seems so...
> Moreover mono_method_to_ir in fact translates field accesses that
> way (as displacements from the base value address, getting the
> offset from the metadata).
> If it were all like this, we could safely assume that those cases
> do not really generate aliasing (not until we want to take care of
> the value of struct fields individually).
> The point is that up to now we do not (yet) have the framework to
> track each field's value individually. In fact each field use seems
> just an operation to/from a calculated memory address.
> We can optimize the *calculation* of those addresses (SSAPRE
> already does it), but we still cannot assume anything on the
> values contained there.
> In any case, to analyze what happens when pointer arithmetic is
> related to field accesses, we should see where the result of the
> add is stored/used. We would then reconduct this analysis to the
> same categories we are examining now.

That sounds like scalar replacement.

> The reason why I have doubts is that (like for method arguments)
> we do not really know if this pointer arithmetic is done to
> access a field, or instead because the code is playing with unsafe
> pointers.
> At least, we do not know it if we just look at the "add" instruction
> that applies the displacement. We could look at the MonoType of the
> local, and see if that displacement is "compatible" with the type
> declaration (it refers to a specific field). But this seems a bit
> tricky, and I wouldn't try it in the first development of the alias
> analyzer.

Rather than tracking this, it seems more logical to ensure that the IR
contains the information logically. 

> Last, in cases [9] and [10], I would just ignore the operation.
> Even if the address of a local is taken, it is used in such a way
> that no real aliasing can happen, so I think this should be safe.

Am interested in who is generating these. Maybe these are just field
loads in a struct, but it is looks like ldaddr [1] + 0, and the jit is
doing constant folding before you even see it?

-- Ben