gump tokenizer problem
Renaud De Landtsheer
rdl at info.ucl.ac.be
Fri Jan 16 10:24:55 CET 2004
I got the solution:
\gumpscannerprefix <atom>
and it's working fine (but I don't fully understand why)
--
Renaud De Landtsheer
>
> Dear all,
>
> I (nearly) wrote a compiler with mozart.
> (I have to say that I was astonished by the productivity gain of using oz
> instead of, say, java)
> (btw: sorry for my english in this mail ;) )
>
> I used gump for tokenizing & parsing purpose.
> Now, I want to put a preprocessor in front of my Lexer.
> I wrote a preprocessor, which uses gump tokenizing as it is build on top
> of a tokenizer.
>
> Everything works fine when it is executed separately
> (with emulator restarting between precompiler execution and compiler
> execution)
>
> BUT
>
> When I put everything together, I just got a big mess.
> I decided "okay, gump uses flex, and the generated code of flex (as far as
> I know) is C and not C++, maybye it has a reference to one
> lexer.so-linuxxxx file.
> (NOTE: my classes have different names to avoid *.so-linux* file
> collisions)
> Let's use preprocessor and lexer sequentially."
>
> I then wrote a preprocessor buffer, which gets every line
> from the preprocessor, keeps everything in a queue and has the same
> interface
> to my lexer than the preprocessor.
> I especially took care of loosing every reference to the preprocessor
> class
> before instanciating the lexer, I also make a call to the garbage
> collector in between.
>
> Big mess again.
>
> I tried to run everything with the debugger on.
> I got that the Lexer failed because it tried to do some test on a token
> in a lex construct
> and the token it got actually does not match the rule of the lex
> construct.
>
> lex <'.'> S X in
> GumpScanner.'class', getString(?S)
> GumpScanner.'class', getAtom(X)
> case S of
> [39 A 39] then
> GumpScanner.'class', putToken('CONSTANT' A)
> end
> end
>
> NOTE:the X variable was inserted to display the token in the debuger.
> I got that X has the value 'DataAdress'.
>
> In my debugger, I have a variable that is called 'Syn0' and whose value is
> 44.
> I looked into PreProcessor.l the lex file generated by gump during the
> compilation of
> my preprocessor and I saw that the rule whoose action is "return 44;"
> has an associated rule that is matched by my token.
>
> letter [A-Za-z]
> mletter {letter}|"_"
>
> <INITIAL>{mletter}({mletter}|{digit})* return 44;
>
> I draw the conclusion that the gump generated native C code has not been
> properly unloaded
> from memory by the garbage collector.
>
> I inserted a 1 second delay with {Delay 1000} after garbage collect
> invocation,
> to ensure garbage collect, if started in another thread, to finish.
> But it didn't change anything.
> I suppose that this kind of data is out of the scope of the garbage
> collector.
>
> => How can I force this unloading?
> => Is there any way to have two lexer operating at the same time?
> => Is there any way to have two lexer operating sequentially?
>
> ... or do I need to put everything in the same lexer with lexer state to
> select appropriate operation mode?
>
> ... or am I completely wrong?
>
> thank you.
>
> --
> Renaud De Landtsheer
>
>
-
Please send submissions to users at mozart-oz.org
and administriva mail to users-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.
Please send bug reports to bugs at mozart-oz.org.
More information about the mozart-users
mailing list