[Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms?
David Henderson
dnadavewa at yahoo.com
Tue Apr 28 05:08:44 EDT 2009
My apologies for a tardy reply. I'll address all of the questions in this e-mail, rather than reply multiple times.
1) I used file to determine that the .exe files were 32bit. It is entirely possible that file returns 32bit for all .exe, rather than examine the file.
2) Is there a way to store char/string data as something smaller than UTF-16? The data are SNP genotypes, i.e. a single SNP genotype looks like A T and there are almost a million of these per individual. I'm thinking that what I need to do is record the genotype as bits, i.e. 0 or 1, and relate that back to a translation class thet returns A or T when that SNP is queried. It would be simpler if I could store char/string data as something reasonably small.
3) What I'm currently doing is:
a) read in each line as a single string which is split based upon whitespace
b) input each SNP into a class which is stored in an ArrayList, or as a string array in a List<string> (I've implemented it both ways)
c) once the while file is read in, output each collection of SNPs by chromosome to a different file for processing by other software
I've been able to get past my initial problem by re-compiling mono with the large heap size GC and when the entire data is read in, it takes up 17GB RAM for a 300MB file. I know I'm new to mono/C#, but I've been programming in C++ for years and have written many commerical applications for large data and nothing I've written to date has been as memory hungry as this. I'm hoepful I can get some good suggestions on how to improve performance.
Thanks!!
Dave H
----- Original Message ----
From: Jonathan Pryor <jonpryor at vt.edu>
To: dnadavewa <dnadavewa at yahoo.com>
Cc: mono-list at lists.ximian.com
Sent: Friday, April 24, 2009 12:14:12 PM
Subject: Re: [Mono-list] 64bit gmcs/mcs in SLES/openSuSE rpms?
On Thu, 2009-04-23 at 14:20 -0700, dnadavewa wrote:
> I'm working on a large data problem where I'm reading in data from text files
> with almost 2 million columns. In doing this, I can read in about 25 rows
> before Mono bombs out with an out of memory error.
How are you reading in these lines?
> What I found was the mono executable was indeed 64 bit, but gmcs.exe and
> mcs.exe were 32 bit.
As Chris Howie mentioned, these are actually in platform-neutral IL, and
will be run using a 64-bit address space when using `mono`.
> One other point, memory usage is horrible. I admit that I'm new to C# and
> mono, so my coding skills are not as good as others, but a 300MB file should
> not use 2GB RAM to read in 1/8 of the file.
That depends ~entirely on how you're reading in the file.
Also keep in mind that .NET strings are UTF-16, so if your input text is
ASCII, you will require twice as much RAM as the size of the file, e.g.
600MB of RAM to store the entire file as a string. (Then there is
various object overhead considerations, but these are likely tiny
compared to the 300MB you're looking at.)
> I stopped using classes to
> store the data and went with List<string> and List<string[]> to read in this
> much data. Any comments on how I might improve this performance would be
> appreciated.
To provide any comments we'd need to know more about what you're trying
to do. For example, reading a 300MB XML file using XmlDocument will
require *lots* of RAM, as in addition to the UTF-16 string issue, each
element, attribute, etc. will be represented as separate objects, with
varying amounts of memory required. DOM would be something to avoid
here, while XmlReader would be much better.
The easiest question, though, is this: do you really need to keep the
entire file contents in memory all at once?
Or can you instead process each line independently (or while caching
minimal data from one line to the next, so that the contents of previous
lines don't need to be maintained). This would allow you to remove your
List<string>, and save a ton of memory.
- Jon
More information about the Mono-list
mailing list