[Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms?

Alan McGovern alan.mcgovern at gmail.com
Tue Apr 28 05:20:25 EDT 2009


Hi,

On Tue, Apr 28, 2009 at 10:08 AM, David Henderson <dnadavewa at yahoo.com>wrote:

>
> My apologies for a tardy reply.  I'll address all of the questions in this
> e-mail, rather than reply multiple times.
>
> 1) I used file to determine that the .exe files were 32bit.  It is entirely
> possible that file returns 32bit for all .exe, rather than examine the file.
> 2) Is there a way to store char/string data as something smaller than
> UTF-16?  The data are SNP genotypes, i.e. a single SNP genotype looks like A
> T and there are almost a million of these per individual.  I'm thinking that
> what I need to do is record the genotype as bits, i.e. 0 or 1, and relate
> that back to a translation class thet returns A or T when that SNP is
> queried.  It would be simpler if I could store char/string data as something
> reasonably small.


Use the BitArray class. That's exactly what it's for. If it's possible for
you to store your genotype using bits as opposed to strings you'll *vastly*
reduce your memory requirements.

Alan.

3) What I'm currently doing is:
>  a) read in each line as a single string which is split based upon
> whitespace
>  b) input each SNP into a class which is stored in an ArrayList, or as a
> string array in a List<string> (I've implemented it both ways)
>  c) once the while file is read in, output each collection of SNPs by
> chromosome to a different file for processing by other software
>
> I've been able to get past my initial problem by re-compiling mono with the
> large heap size GC and when the entire data is read in, it takes up 17GB RAM
> for a 300MB file.  I know I'm new to mono/C#, but I've been programming in
> C++ for years and have written many commerical applications for large data
> and nothing I've written to date has been as memory hungry as this.  I'm
> hoepful I can get some good suggestions on how to improve performance.
>
> Thanks!!
>
> Dave H
>
>
>
> ----- Original Message ----
> From: Jonathan Pryor <jonpryor at vt.edu>
> To: dnadavewa <dnadavewa at yahoo.com>
> Cc: mono-list at lists.ximian.com
> Sent: Friday, April 24, 2009 12:14:12 PM
> Subject: Re: [Mono-list]  64bit gmcs/mcs in SLES/openSuSE rpms?
>
> On Thu, 2009-04-23 at 14:20 -0700, dnadavewa wrote:
> > I'm working on a large data problem where I'm reading in data from text
> files
> > with almost 2 million columns.  In doing this, I can read in about 25
> rows
> > before Mono bombs out with an out of memory error.
>
> How are you reading in these lines?
>
> > What I found was the mono executable was indeed 64 bit, but gmcs.exe and
> > mcs.exe were 32 bit.
>
> As Chris Howie mentioned, these are actually in platform-neutral IL, and
> will be run using a 64-bit address space when using `mono`.
>
> > One other point, memory usage is horrible.  I admit that I'm new to C#
> and
> > mono, so my coding skills are not as good as others, but a 300MB file
> should
> > not use 2GB RAM to read in 1/8 of the file.
>
> That depends ~entirely on how you're reading in the file.
>
> Also keep in mind that .NET strings are UTF-16, so if your input text is
> ASCII, you will require twice as much RAM as the size of the file, e.g.
> 600MB of RAM to store the entire file as a string.  (Then there is
> various object overhead considerations, but these are likely tiny
> compared to the 300MB you're looking at.)
>
> > I stopped using classes to
> > store the data and went with List<string> and List<string[]> to read in
> this
> > much data.  Any comments on how I might improve this performance would be
> > appreciated.
>
> To provide any comments we'd need to know more about what you're trying
> to do.  For example, reading a 300MB XML file using XmlDocument will
> require *lots* of RAM, as in addition to the UTF-16 string issue, each
> element, attribute, etc. will be represented as separate objects, with
> varying amounts of memory required.  DOM would be something to avoid
> here, while XmlReader would be much better.
>
> The easiest question, though, is this: do you really need to keep the
> entire file contents in memory all at once?
>
> Or can you instead process each line independently (or while caching
> minimal data from one line to the next, so that the contents of previous
> lines don't need to be maintained).  This would allow you to remove your
> List<string>, and save a ton of memory.
>
> - Jon
>
>
>
> _______________________________________________
> Mono-list maillist  -  Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-list/attachments/20090428/778fab74/attachment-0001.html 


More information about the Mono-list mailing list