gump tokenizer problem

Renaud De Landtsheer rdl at info.ucl.ac.be
Fri Jan 16 10:24:55 CET 2004


I got the solution: 

\gumpscannerprefix <atom>

and it's working fine (but I don't fully understand why)

--
Renaud De Landtsheer

> 
> Dear all, 
> 
> I (nearly) wrote a compiler with mozart. 
> (I have to say that I was astonished by the productivity gain of using oz
> instead of, say, java)
> (btw: sorry for my english in this mail ;) )
> 
> I used gump for tokenizing & parsing purpose. 
> Now, I want to put a preprocessor in front of my Lexer. 
> I wrote a preprocessor, which uses gump tokenizing as it is build on top
> of a tokenizer. 
> 
> Everything works fine when it is executed separately
> (with emulator restarting between precompiler execution and compiler
> execution)
> 
> BUT
> 
> When I put everything together, I just got a big mess. 
> I decided "okay, gump uses flex, and the generated code of flex (as far as
> I know) is C and not C++, maybye it has a reference to one
> lexer.so-linuxxxx file. 
> (NOTE: my classes have different names to avoid *.so-linux* file 
> collisions)
> Let's use preprocessor and lexer sequentially."
> 
> I then wrote a preprocessor buffer, which gets every line
> from the preprocessor, keeps everything in a queue and has the same
> interface 
> to my lexer than the preprocessor. 
> I especially took care of loosing every reference to the preprocessor
> class 
> before instanciating the lexer, I also make a call to the garbage
> collector in between. 
> 
> Big mess again. 
> 
> I tried to run everything with the debugger on. 
> I got that the Lexer failed because it tried to do some test on a token
> in a lex construct 
> and the token it got actually does not match the rule of the lex
> construct. 
> 
> lex <'.'> S X in
>   GumpScanner.'class', getString(?S)
>   GumpScanner.'class', getAtom(X)
>     case S of
>       [39 A 39] then 
>          GumpScanner.'class', putToken('CONSTANT' A)
>     end
> end
> 
> NOTE:the X variable was inserted to display the token in the debuger. 
> I got that X has the value 'DataAdress'. 
> 
> In my debugger, I have a variable that is called 'Syn0' and whose value is
> 44. 
> I looked into PreProcessor.l the lex file generated by gump during the
> compilation of 
> my preprocessor and I saw that the rule whoose action is "return 44;" 
> has an associated rule that is matched by my token. 
> 
> letter [A-Za-z]
> mletter {letter}|"_"
> 
> <INITIAL>{mletter}({mletter}|{digit})* return 44;
> 
> I draw the conclusion that the gump generated native C code has not been
> properly unloaded 
> from memory by the garbage collector. 
> 
> I inserted a 1 second delay with {Delay 1000} after garbage collect
> invocation, 
> to ensure garbage collect, if started in another thread, to finish. 
> But it didn't change anything. 
> I suppose that this kind of data is out of the scope of the garbage
> collector. 
> 
> => How can I force this unloading?
> => Is there any way to have two lexer operating at the same time?
> => Is there any way to have two lexer operating sequentially?
> 
> ... or do I need to put everything in the same lexer with lexer state to
> select appropriate operation mode?
> 
> ... or am I completely wrong?
> 
> thank you. 
> 
> --
> Renaud De Landtsheer
> 
> 
-
Please send submissions to users at mozart-oz.org
and administriva mail to users-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.
Please send bug reports to bugs at mozart-oz.org.





More information about the mozart-users mailing list