[Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms?
Alan McGovern
alan.mcgovern at gmail.com
Tue Apr 28 05:20:25 EDT 2009
Hi,
On Tue, Apr 28, 2009 at 10:08 AM, David Henderson <dnadavewa at yahoo.com>wrote:
>
> My apologies for a tardy reply. I'll address all of the questions in this
> e-mail, rather than reply multiple times.
>
> 1) I used file to determine that the .exe files were 32bit. It is entirely
> possible that file returns 32bit for all .exe, rather than examine the file.
> 2) Is there a way to store char/string data as something smaller than
> UTF-16? The data are SNP genotypes, i.e. a single SNP genotype looks like A
> T and there are almost a million of these per individual. I'm thinking that
> what I need to do is record the genotype as bits, i.e. 0 or 1, and relate
> that back to a translation class thet returns A or T when that SNP is
> queried. It would be simpler if I could store char/string data as something
> reasonably small.
Use the BitArray class. That's exactly what it's for. If it's possible for
you to store your genotype using bits as opposed to strings you'll *vastly*
reduce your memory requirements.
Alan.
3) What I'm currently doing is:
> a) read in each line as a single string which is split based upon
> whitespace
> b) input each SNP into a class which is stored in an ArrayList, or as a
> string array in a List<string> (I've implemented it both ways)
> c) once the while file is read in, output each collection of SNPs by
> chromosome to a different file for processing by other software
>
> I've been able to get past my initial problem by re-compiling mono with the
> large heap size GC and when the entire data is read in, it takes up 17GB RAM
> for a 300MB file. I know I'm new to mono/C#, but I've been programming in
> C++ for years and have written many commerical applications for large data
> and nothing I've written to date has been as memory hungry as this. I'm
> hoepful I can get some good suggestions on how to improve performance.
>
> Thanks!!
>
> Dave H
>
>
>
> ----- Original Message ----
> From: Jonathan Pryor <jonpryor at vt.edu>
> To: dnadavewa <dnadavewa at yahoo.com>
> Cc: mono-list at lists.ximian.com
> Sent: Friday, April 24, 2009 12:14:12 PM
> Subject: Re: [Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms?
>
> On Thu, 2009-04-23 at 14:20 -0700, dnadavewa wrote:
> > I'm working on a large data problem where I'm reading in data from text
> files
> > with almost 2 million columns. In doing this, I can read in about 25
> rows
> > before Mono bombs out with an out of memory error.
>
> How are you reading in these lines?
>
> > What I found was the mono executable was indeed 64 bit, but gmcs.exe and
> > mcs.exe were 32 bit.
>
> As Chris Howie mentioned, these are actually in platform-neutral IL, and
> will be run using a 64-bit address space when using `mono`.
>
> > One other point, memory usage is horrible. I admit that I'm new to C#
> and
> > mono, so my coding skills are not as good as others, but a 300MB file
> should
> > not use 2GB RAM to read in 1/8 of the file.
>
> That depends ~entirely on how you're reading in the file.
>
> Also keep in mind that .NET strings are UTF-16, so if your input text is
> ASCII, you will require twice as much RAM as the size of the file, e.g.
> 600MB of RAM to store the entire file as a string. (Then there is
> various object overhead considerations, but these are likely tiny
> compared to the 300MB you're looking at.)
>
> > I stopped using classes to
> > store the data and went with List<string> and List<string[]> to read in
> this
> > much data. Any comments on how I might improve this performance would be
> > appreciated.
>
> To provide any comments we'd need to know more about what you're trying
> to do. For example, reading a 300MB XML file using XmlDocument will
> require *lots* of RAM, as in addition to the UTF-16 string issue, each
> element, attribute, etc. will be represented as separate objects, with
> varying amounts of memory required. DOM would be something to avoid
> here, while XmlReader would be much better.
>
> The easiest question, though, is this: do you really need to keep the
> entire file contents in memory all at once?
>
> Or can you instead process each line independently (or while caching
> minimal data from one line to the next, so that the contents of previous
> lines don't need to be maintained). This would allow you to remove your
> List<string>, and save a ton of memory.
>
> - Jon
>
>
>
> _______________________________________________
> Mono-list maillist - Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-list/attachments/20090428/778fab74/attachment-0001.html
More information about the Mono-list
mailing list