[Mono-dev] How To Convince Mono To Allocate Big Arrays

Rodrigo Kumpera kumpera at gmail.com
Thu May 8 09:29:48 EDT 2008


One important thing I forgot. If you break your patch into a few smaller
ones the review process will be a lot easier to every one involved.

The first one can introduce new types and configuration foo; then other to
fix codegen and Array methods and; at last, a bunch of fixes to classlib
issues -like sockets, file i/o and so on.


Cheers,
Rodrigo


On Wed, May 7, 2008 at 11:50 PM, Rodrigo Kumpera <kumpera at gmail.com> wrote:

> Hi Luis,
>
> To have your patch integrated, some changes are needed. First, we want to
> default to 32bits sized arrays on 64bits machines, so your changes must be
> conditionally compiled. To help with that some changed to your patch are
> due. Next are some comments about it:
>
> Instead of replacing guint32 for gsize, it's better to create a new type,
> let's say array_size_t. This would reduce conditional compilation to fewer
> places.
>
>  /* helper macros to check for overflow when calculating the size of arrays
> */
> -#define MYGUINT32_MAX 4294967295U
> +#if (GLIB_SIZEOF_SIZE_T < 4 )
> +#define MYGUINT32_MAX 0xFFFFFFFFUL
> +#define MYGUINT_MAX MYGUINT32_MAX
>
> This #if seens bogus, don't you mean "if ((GLIB_SIZEOF_SIZE_T == 4 )" as
> mono never supported 16bits machines.
> The macros can be unified by using MYGUINT_MAX and the 'array_size_t' type
> I talked before. The definition of MYGUINT_MAX
> should be put together in the same place we define 'array_size_t'. And we
> could go with a meaningful name, don't you think?
>
>
> -    if (CHECK_MUL_OVERFLOW_UN (n, elem_size))
> -        mono_gc_out_of_memory (MYGUINT32_MAX);
> +    if (CHECK_MUL_OVERFLOW_UN (n, elem_size)) {
> +        g_print("CHECK_MUL_OVERFLOW_UN(%zd,%zd) failed\n",n, elem_size);
> +        mono_gc_out_of_memory (MYGUINT_MAX);
> +        }
>
> If you find that keeping such debug code is really important, you should
> follow the same pattern of the rest of the project. Take a look at how
> DEBUG_IMT is used on object.c.
>
>
> @@ -3548,34 +3559,30 @@
>      /* A single dimensional array with a 0 lower bound is the same as an
> szarray */
>      if (array_class->rank == 1 && ((array_class->byval_arg.type ==
> MONO_TYPE_SZARRAY) || (lower_bounds && lower_bounds [0] == 0))) {
>          len = lengths [0];
> -        if ((int) len < 0)
> -            arith_overflow ();
>
>
> Why are you removing overflow checks here?
>
>
> @@ -562,6 +607,26 @@
>      if (this->bounds == NULL)
>          return this->max_length;
>
> +    length = this->bounds [dimension].length;
> +    if (length > G_MAXINT32)
> +            mono_raise_exception (mono_get_exception_overflow ());
> +
> +    return length;
> +}
>
> Why throwing an exception here? I'm not sure this is the way to go,
> unfortunately this is an area underspecified on ecma. Not that truncating is
> a good option either.
>
> Changes to amd64 code I'll leave to Zoltan.
>
>
> Cheers,
> Rodrigo
>
>
> 2008/5/5 Luis F. Ortiz <LuisOrtiz at verizon.net>:
>
>> Back in September ("Big Arrays, Many Changes --- Request for Advice") I
>> asked folks for advice on how to go about adding the capability to Mono to
>> allocate arrays with more than Int32.MaxValue elements.  After playing
>> around with it for a few months, I'm at the point where I have an
>> implementation that mostly works, with a couple of warts which could
>> probably be quickly fixed by someone who knows more than I do about Mono
>> internals. I spoke with Miguel about these patches, and he encouraged me to
>> post them to mono-dev as soon as I got them to pass the "make check" test
>> suite.  So here I am a week later.
>> I want to start by going over the changes themselves, what alternatives
>> there might be to what I had done and what flaws I know to exist in the
>> implementation.
>>
>> First off, though Microsoft chose for some reason NOT to implement large
>> array allocation, the necessary APIs are in the .NET specification.  For
>> example, in the System.Array class, we find:
>>
>> public long GetLongLength<http://msdn.microsoft.com/en-us/library/system.array.getlonglength.aspx>(int
>> dimension);
>>  public long LongLength<http://msdn.microsoft.com/en-us/library/system.array.longlength.aspx> {
>> get; }
>>  public static Array CreateInstance<http://msdn.microsoft.com/en-us/library/1z8w3at5.aspx>(Type
>> elementType, params long[] lengths);
>>  public Object GetValue<http://msdn.microsoft.com/en-us/library/2zexc3z9.aspx>(long
>> index);
>>  public void SetValue<http://msdn.microsoft.com/en-us/library/czx562xz.aspx>(Object
>> value, long index);
>>  ... (other overloads omitted, but there)
>> and we find that the Newarr<http://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.newarr%28VS.71%29.aspx>  opcode
>> takes a natural int or an Int32 as the length, so the bytecode level is
>> ready too.
>>
>> Mono as of 1.2.6 already implemented most (all?) of the necessary
>> interfaces in mcs/class/corlib/System/Array.cs, but they all cast down their
>> long arguments down to integers as the underlying implementation was not
>> there.
>>
>> So the first set of changes were to:
>> * **mono/metadata/object.c*
>> * **mono/metadata/object.h*
>> * **mono/metadata/icall-def.h*
>> * **mono/metadata/icall.c*
>> * **mono/metadata/socket-io.c*
>>
>> In object.h I made three changes:
>>
>> 1) Changed MonoArrayBounds to use gsize instead of guint32 as the type for
>> length and lower_bound,
>>  2) Changed MonoArray to use gsize instead of guint32 as the type of
>> max_length,
>> 3) Changed the prototypes for mono_array_new(), mono_array_new_full(), and
>> mono_array_new_specific() to
>>  take gsize's instead of guint32's for their size and bounds parameters.
>> I.e.:
>> MonoArray*
>>  -mono_array_new (MonoDomain *domain, MonoClass *eclass, guint32 n);
>> +mono_array_new (MonoDomain *domain, MonoClass *eclass, gsize n);
>>
>>   MonoArray*
>>   mono_array_new_full        (MonoDomain *domain, MonoClass *array_class,
>> -                            guint32 *lengths, guint32 *lower_bounds);
>>  +                            gsize *lengths, gsize *lower_bounds);
>>
>>   MonoArray *
>> -mono_array_new_specific            (MonoVTable *vtable, guint32 n);
>>  +mono_array_new_specific            (MonoVTable *vtable, gsize n);
>>
>> This is the first place an alternative shows up.  ¿Which type is better:
>> gsize or gssize?  The unsigned type gsize better matches the type memory
>> allocation functions use (size_t or some variant) and the existing guint32,
>> but the signed type better matches the interface presented at the top level
>> (i.e. CreateInstance).  I chose the unsigned alternative, but an argument
>> could be made for the signed type.  Another alternative would be to create
>> 64 bit versions of the mono_array_new(), mono_array_new_full(), and
>> mono_array_new_specific() functions, but that seemed to be too much work.
>>
>> The changes in *object.c* add the implementations of the
>> modified mono_array_new(), mono_array_new_full(), and
>> mono_array_new_specific() functions.  There was some confusing #defines
>> around MYGUINT32_MAX that I did not like, but rather than replace that
>> cruft, I extended it.
>>
>> The changes in *icall-def.h* add two new method calls to the array
>> object, CreateInstanceImpl64() and GetLongLength().  It might be possible to
>> avoid the CreateInstanceImpl64() definition and make it an overload
>> of CreateInstanceImpl() with long parameters, if I was sure on how to do
>> that.
>>
>> The changes to *icall.c* tweak the implementation
>> of ves_icall_System_Array_CreateInstanceImpl() to change the type of the
>> sizes array and add the implementations
>> of ves_icall_System_Array_CreateInstanceImpl64()
>> and ves_icall_System_Array_GetLongLength().
>>
>> The change to *socket-io.c* was to tweak its usage of mono_array_new_full
>> to use gssize instead of int for for the array of lengths.
>>
>> So all these changes get us to the point where the basic foundation
>> is laid down, but there is still the JIT to contend with.  It requires a few
>> more files to be changed:
>> * **mono/mini/mini.c*
>> * **mono/mini/jit-icalls.c*
>> * **mono/mini/exceptions.cs*
>>
>> The changes in *mini.c* change the signature of mono_array_new
>> and mono_array_new_specific to take "int" instead of "int32".
>>
>> The changes in *jit-calls.c* do the boring change of guint32's into
>> gsize's.
>>
>> The changes in *exceptions.cs* split the test case for test_0_array_size
>> into a 32 and 64 bit variant, because an allocation of Int32.MaxValue can
>> succeed after these changes are applied.
>>
>> There was only one touch-up needed in the the C# compiler:  the GetLength
>> property code special-inlined code-generation needed to be tweaked since it
>> is now possible to get an array length that will not fit into an I4.
>>  Changing *mcs/mcs/ecore.cs* and *mcs/mcs/expression.cs* to use *
>> OpCodes.Conv_Ovf_I4* after *OpCodes.Ldlen* instead of *OpCodes.Conv_I4*  fixed
>> that.
>>
>> Oh, yeah, and ALL the long method calls in *
>> mcs/class/corlib/System/Array.cs* needed to be converted over to
>> use CreateInstanceImpl64() and GetLongLength().  The SetValue() and
>> GetValue() implementations still need work, but since there are no unit
>> tests for those methods, I put them off.
>>
>> That gets us to the point where we can allocate a large array, but it does
>> not let us index a large array.   I changed the following files to start the
>> process of converting the indexing operations to do bounds checking against
>> the now 32/64 bit length of arrays and to index using a 64/32 bit index:
>> * **mono/mini/inssel-amd64.brg*
>> * **mono/mini/mini-amd64.c*
>> * **mono/mini/mini-ops.h*
>> * **mono/mini/cpu-amd64.md*
>>
>> In *inssel-amd64.brg*, I changed the
>> macro MONO_EMIT_NEW_AMD64_ICOMPARE_MEMBASE_REG to use a new
>> opcode OP_AMD64_COMPARE_MEMBASE_REG_I8 instead
>> of OP_AMD64_ICOMPARE_MEMBASE_REG and hacked CEE_LDELEMA to no longer use
>>  the faster OP_X86_LEA because I was not sure how to generalize it.
>> Perhaps MONO_EMIT_NEW_AMD64_ICOMPARE_MEMBASE_REG should be retired and
>> explicit I4 and I8 versions substituted where appropriate.
>>
>> In *mini-amd64.c* I added the OP_AMD64_COMPARE_MEMBASE_REG_I8 opcode as
>> being the same as OP_AMD64_ICOMPARE_MEMBASE_REG, except with an operand size
>> of 8 bytes instead of 4.
>>
>> In *mini-ops.h* and *cpu-amd64.md* I added an entries
>> for OP_AMD64_COMPARE_MEMBASE_REG_I8 and amd64_compare_membase_reg_i8.
>>
>> These changes seem to get 64 bit indexing working, and passed all the
>> regression tests in 1.2.6, but in 1.9.1 a regression test
>> called test_0_regress_75832() breaks.  It could be that the changes I made
>> in mono/mini are incorrect.  I am sure the changes are incomplete, and I
>> have not considered what to do to other architectures.
>>
>> Advice or assistance is greatly appreciated.
>>
>> Luis F. Ortiz
>> Official Mono Modifier
>> Interactive Supercomputing,  Inc.
>>
>> PS:  Here are the changes proper:
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080508/6e7e473f/attachment-0001.html 


More information about the Mono-devel-list mailing list