memory

Robert Van Dam rvandam00 at gmail.com
Tue May 2 20:05:09 CEST 2006


I've been working for some time with some large datasets that require
me to use large amounts of memory in Mozart.  As a result, I've come
across several issues and I was hoping to find out if anyone considers
these to be bugs or not.

Most of these involve the various 'gc.*' values relating to the
Property module.  Also, I am using a 32 bit machine with 4 GB of
memory.

1. 'gc.threshold' and 'gc.size' can grow until they become negative
(clearly using signed ints).  If one or both is negative and garbage
collection occurs, the emulator dies.

For example, assume that 'gc.size' is slightly greater than 1 GB.  So
then the smallest that 'gc.threshold' can be is slightly greater than
2 GB (see (2) below).  But >2 GB in a signed int is < 0 GB.  So your
active heap size is actually limited to 1 GB, not 2 GB as you might
expect (since garbage collection has to copy everything over).  In
practice, this limit is always slightly less than 1 GB when the
emulator dies from Virtual Memory exhausted errors.

2. You can supposedly set 'gc.free' to any number in the range 1..100.
 However, that number gets run through the calculation 100/(100-X)
using integer division.  So then the only useful values for 'gc.free'
are (0,50,67,75,80,84,86,88-100) (don't ask me how 100 works).  This 
contributes to the limitation in (1) above because setting 'gc.free'
to 0-49 (inclusive) is useless (active*1) and setting it to 50-66
means the threshold is (approximately) twice the active heap
(active*2).  I would find a lot of values between 1 and 2 VERY useful.
 Was this integer division intentional?

3. If you open the Oz Panel and turn on configuration you can set the
minimal heap size to any value between 1 and 1024 MB (which works
exactly as it should).  However, if you try to do

{Property.put 'gc.min' {Pow 2 30}} (equivalent to 1024 MB) you get:

%*************************** type error *************************
%**
%** Expected type: int>=0
%** At argument:   2
%** In statement:  {Property.put 'gc.min' 1073741824}
%**
%** Call Stack:
%** toplevel abstraction in line 1, column 0, PC = 274433260
%**--------------------------------------------------------------

{Property.put 'gc.min' {Pow 2 28}} (equivalent to 128 MB): same error
{Property.put 'gc.min' {Pow 2 28}-1} (equivalent to 128MB - 1B): works fine
This hints of the note in the 'Limitation' section where it states
that integers of 28 bits or fewer are handled in registers while
larger integers are handled on the stack.

Ideally, I'd like to use up my full 2 GB but I'd settle with starting
out at 1024 MB.

4. Very large nested records can't be pickled.  You don't get any kind
of informative message, only an 'Aborted' message.  I've not been able
to discover the exact limitations here because different types of
trees can result in very different pickle sizes.

I guess I would feel somewhat better if some of these were documented
as limitations but at the same time, some of them seem like bugs that
could be fixed without too much trouble (for example, using unsigned
ints and not doing integer division would be a start).

Can anyone comment on whether they consider these legitimate bugs? 
Also, I've tried scouring through the source code to possibly fix some
of these problems myself but have not been successfully I pinpointing
what changes to make.  I would greatly appreciate any pointers into
the code so that at least I can make these changes on my own system.

I would assume that most if not all of these bugs would need to be
fixed before a 64 bit port of Mozart would be viable.  I know some
work has gone into eventually porting to 64 bit.  Perhaps someone on
the list already knows what needs to be done in these limited cases (I
don't have a 64 bit machine so I don't need a full port, just the
ability to max out my 32 bit machine).

Rob




More information about the mozart-hackers mailing list