[Mono-devel-list] Interop with unmanaged code without copyingor memory allocation?

Tue Jan 13 07:05:53 EST 2004

.NET provides the System.Xml.XmlReader API instead of a the SAX API
(which has also been implemented by libxml2).  It avoids the memory
overhead of a DOM, and the callback issues with SAX.  Conceptually, it
resembles a compiler lexer -- you Read() to get the next "token" (XML
element), which you can then inspect using the XmlReader properties.

XmlReader is intended for processing multi-gigabyte files.  I'm not sure
if it's actually successful in doing so, but it *should* be able to do
so, without requiring gigabytes of memory (like a DOM would require).

It would be interesting to know if XmlReader can be used for your
particular high performance applications.

- Jon

On Tue, 2004-01-13 at 00:10, Karl Waclawek wrote:
> > It sounds like you're trying to wrap a difficult API.  Good luck.
> 
> Thanks. ;-) Did it with Delphi. Hope it isn't much more difficult with C#.
>  
> > Regarding Problem 1 (interning strings), is there any particular reason
> > you want the strings interned?  The only time it's useful is if you want
> > to use pointer comparison instead of string comparison for strings,
> > which would require that users do this:
> > 
> > ((object) String1) == ((object) String2)
> > 
> > I suspect most people will stick with the typical:
> > 
> > String1 == String2
> > 
> > which calls Object.Equals, so there's no reason to intern the string
> > (unless you want to require your users use the first code).
> 
> The reason is that in a typical case the XML parser will report
> the same name over and over again. Rather than allocating the
> same string over and over I would prefer to look it up
> in the pool. I did the same with Delphi and it seems to be
> beneficial, resource and performance wise. Memory allocation
> is expensive.
>  
> > Assuming you do want to intern the string, you could create a hashtable,
> > manually hash the "const XML_Char *name", and use this hash value to
> > lookup the interned string.  This would likely require writing your own
> > hash function (so it can operate on a "const XML_Char*"), and you'd have
> > to consider hash table conflicts, but this could be made to work.
> 
> Yes - I have done this already with another wrapper (see above) - I was just
> hoping I cold use the built-in string pool. Since this seems
> rather impossible from what I see I will have to port the string
> hash table to C#. 
> 
> > 
> > Personally, I wouldn't worry about it until you've done the performance
> > profiling (mono --profile is your friend!) and determined that string
> > interning would actually be a benefit.
> 
> Yes, the question is: what works somewhere else may not work in C#.
> However, it seems avoiding memory allocations is a good optimization
> technique regardless of language (at least for the popular ones).
> 
> In the C/C++ world, Expat is used for high performance applications,
> like parsing XML files in the giga-byte range. Maybe in the C# world
> there is no audience for that. You are right, and if I had to write
> all the wrapper code from scratch I would delay this to the end.
> However, I think I can port most of it from Delphi, in the style
> of a "re-write in spirit", not a straight copy. Will be a good
> C# exercise.
>  
> > Regarding Problem 2, I can't think of any good way to avoid the
> > marshaling/copying overhead.  Managed and unmanaged memory must be kept
> > separate (to permit the use of non-conservative garbage collectors). 
> > You could employ C# "unsafe" code in the callback methods, but this
> > would prevent non-C# languages (VB, JavaScript, etc.) from being used as
> > callbacks...
> 
> Yes, that is what I want to avoid.
> 
> One theoretical possibility is this: Expat allows you to pass it a set
> of memory handlers (same signatures as the C runtime functions
> malloc, realloc and free).
> I could use this to force Expat to use "managed" memory which exists
> as some byte array on the C# side. That way I could go from the
> pointer passed in the call-back to the index in the array, without
> having to copy. However, writing a memory allocator looks like
> overkill and would probably cost more than it benefits.
>  
> > This is really the problem behind Problem 1 -- memory must be kept
> > separate, and coming up with efficient ways to bridge the barrier is
> > difficult, hence the marshaling overhead...
> > 
> >  - Jon
> 
> 
> Thanks for your insights,
> 
> Karl 
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list