2021

## Implementing a C like program in racket using lex/yacc

##### Por N Katz

##### 2 Comentarios

##### Visto 107 veces

I was looking at these two resources (https://github.com/racket/parser-tools/blob/master/parser-tools-lib/parser-tools/examples/calc.rkt and https://gist.github.com/gcr/1318240) and although I don't fully understand yet how the main calc function works I wondered if it's possible to extend this to work for a simple c like program just without functions? So it will lex, parse and evaluate ifs, whiles and print statements. So something like `(define-empty-tokens op-tokens ( newline = OC CC (open-curly/closed-curly for block statements) DEL PRINT WHILE (WHILE exp S) S IF S1 S2 (IF exp S1 S2) OP CP + - * / || % or && == != >= <= > < EOF ))`

Here is how I've extended it (the code of the first link) so far to also work with booleans:

So in calcl I added these two lines:

```
[ (:= 2 #\|) (token-||)]
[(:or "=" "+" "-" "*" "/" "%" "&&" "==" "!=" ">=" "<=" ">" "<") (string->symbol lexeme)]
```

And then later:

```
(define calcp
(parser
(start start)
(end newline EOF)
(tokens value-tokens op-tokens)
(error (lambda (a b c) (void)))
(precs (right =)
(left ||)
(left &&)
(left == !=)
(left <= >= < >)
(left - +)
(left * / %)
)
(grammar
(start [() #f]
[(error start) $2]
[(exp) $1])
(exp [(NUM) $1]
[(VAR) (hash-ref vars $1 (lambda () 0))]
[(VAR = exp) (begin (hash-set! vars $1 $3)
$3)]
[(exp || exp) (if (not(and (equal? $1 0) (equal? $3 0) )) 1 0) ]
[(exp && exp) (and $1 $3)]
[(exp == exp) (equal? $1 $3)]
[(exp != exp) (not(equal? $1 $3))]
[(exp < exp) (< $1 $3)]
[(exp > exp) (> $1 $3)]
[(exp >= exp) (>= $1 $3)]
[(exp <= exp) (<= $1 $3)]
[(exp + exp) (+ $1 $3)]
[(exp - exp) (- $1 $3)]
[(exp * exp) (* $1 $3)]
[(exp / exp) (/ $1 $3)]
[(exp % exp) (remainder $1 $3)]
[(OP exp CP) $2]))))
```

But I'm struggling to understand the above code as well as the below. I would lile to change it so that it also to works for ifs and whiles etc. if it's at all possible?

```
(define (calc ip)
(port-count-lines! ip)
(letrec ((one-line
(lambda ()
(let ((result (calcp (lambda () (calcl ip)) )))
(when result (printf "~a\n" result) (one-line))
)
) ))
(one-line))
)
```

Also, this guy seems to be relying on newlines to mark the end of a statement. i.e. you can't have more than 1 statement on one line. I want the program to recognise two statements on one line and evaluate them separately by somehow looking ahead and checking whether there's a new undeclared variable, special keyword or open/closed bracket etc.

Update:

I managed, with the below rules, to build in brag an AST for arith expressions but how do I get rid of all but the important parens so that I can evaluate it?

Eg: with input list: `(list (token 'NUM 17) '+ (token 'NUM 1) '* (token 'NUM 3) '/ 'OP (token 'NUM 6) '- (token 'NUM 5) 'CP)`

I'm getting back:

```
'(exp (((((factor 17)))) + (((((factor 1))) * ((factor 3))) / ((((((factor 6)))) - (((factor 5))))))))
```

Here are my rules:

```
exp : add
/add : add ('+' mul)+ | add ('-' mul)+ | mul
/mul : mul ('*' atom)+ | mul ('/' atom)+ | mul ('%' atom)+ | atom
/atom : /OP add /CP | factor
factor : NUM | ID
```

#### COMENTARIOS

#### rici

You cannot easily implement a language with conditionals and looping constructs using an evaluator based on immediate evaluation.

That should be clear at least for loops. If you have something like (using a super-simplified syntax):

```
repeat 3 { i = i + 1 }
```

If you evaluate during the parse, `i = i + 1`

will be evaluated exactly once, since the string is parsed exactly once. In order for it to be evaluated several times, the parser needs to convert `i = i + 1`

into something that can be evaluated several times when the `repeat`

is evaluated.

This something is usually an *Abstract Syntax Tree (AST)*, or possibly a list of *virtual machine* operations. With Scheme, you could also just turn the expression being parsed into a functional.

All of this is totally practical and not even particularly difficult, but you do need to be prepared to do some reading, both about parsing and about generating executables. For the latter, I highly recommend the classic *Structure and Interpretation of Computer Programs* (Abelson & Sussman).

#### N Katz

Based on what I've seen here and here I managed to implement the lex and parse phase of an interpreter for a simple c-like program. It doesn't have functions in it but it has assignments, variables, conditionals, loops and print statements (As well as arithmetic, it also contains logical expressions.). I'm posting it below, in case others may find it useful (The whole thing including the evaluation phase and samples of input is here):

```
(require parser-tools/yacc //provides you with the lexer, the parser and the lexeme tools eg. string-> symbol, string->number etc - In general, with the ability to map the literals in the input
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens value-tokens (NUM VAR ))
(define-empty-tokens op-tokens ( newline = OC CC DEL OP CP + - * / || % or && == != >= <= > < EOF PRINT WHILE IF ELSE ))
(define vars (make-hash)) ;to store the values in the variables
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9")))
(define calcl ;lexer is mapping the literals to their tokens or values
(lexer
[(eof) 'EOF]
[(:or #\tab #\space #\return #\newline ) (calcl input-port)]
[ (:= 2 #\|) (token-||)]
[(:or "=" "+" "-" "*" "/" "%" "&&" "==" "!=" ">=" "<=" ">" "<") (string->symbol lexeme)]
["(" 'OP]
[")" 'CP]
["{" 'OC]
["}" 'CC]
[ "print" 'PRINT ]
[#\, 'DEL ]
[ "while" 'WHILE ]
[ "if" 'IF ]
[ "else" 'ELSE ]
[(:+ (:or lower-letter upper-letter)) (token-VAR (string->symbol lexeme))]
[(:+ digit) (token-NUM (string->number lexeme))]
))
(define calcp ;defines how to parse the program and how the program structure is made up (recursively)
(parser
(start start);refers to the block named 'start' below. Every parser has a start, end, tokens definition, optional error message when needed and operator precedence def.
(end EOF)
(tokens value-tokens op-tokens )
(error (?(ok? name value) (if (boolean? value) (printf "Couldn't parse: ~a\n" name) (printf "Couldn't parse: ~a\n" value))))
(precs ;sets the precedence of the operators in relation to each other - that is, to which operand they bind stronger
(left DEL);from lowest to highest
(right =)
(left ||)
(left &&)
(left == !=)
(left <= >= < >)
(left - +)
(left * / %)
(right OP)
(left CP)
(right OC)
(left CC)
)
(grammar ;what is the grammar of my program?
(start
[() '()] ; returns empty list when it matches onto nothing
[(statements) `(,$1)]
[(statements start) `(,$1,$2)] ;we can have more than one statement - one example of the recursiveness
)
(statements /what type of major statements we might have
[(var = exp ) `(assign ,$1 ,$3)]
[(IF ifState) $2]
[(WHILE while) $2]
[(PRINT printVals) `(print ,$2)]
)
(ifState ; It's assumed that you cannot have an if inside an if unless it's within curly braces
[(OP exp CP statements) `(if ,$2 ,$4)] /combinations of different major statements
[(OP exp CP block) `(if ,$2 ,$4)]
[(OP exp CP block ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE block ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP block ELSE block ) `(if ,$2 ,$4 ,$6)]
)
(while
[(OP exp CP block) `(while ,$2, $4)]
)
(block
[(OC start CC) $2] ;we can statements or entire program wrapped into curly braces - a block
)
(var
[(VAR) $1]
)
(printVals
[(exp DEL printVals ) `(,$1 ,$3)]
[(exp) $1]
)
(exp [(NUM) $1] ; smallest (most reducible) chunk in an expression when the chunk is an integer
[(VAR) $1] ; smallest (most reducible) chunk in an expression when the chunk is a variable
[(exp || exp) `((lambda (a b) (or a b)) ,$1 ,$3) ]
[(exp && exp) `((lambda (a b) (and a b)) ,$1 ,$3)]
[(exp == exp) `(equal? ,$1 ,$3)]
[(exp != exp) `(not(equal? ,$1 ,$3))]
[(exp < exp) `(< ,$1 ,$3)]
[(exp > exp) `(> ,$1 ,$3)]
[(exp >= exp) `(>= ,$1 ,$3)]
[(exp <= exp) `(<= ,$1 ,$3)]
[(exp + exp) `(+ ,$1 ,$3)]
[(exp - exp) `(- ,$1 ,$3)]
[(exp * exp) `(* ,$1 ,$3)]
[(exp / exp) `(quotient ,$1 ,$3)]
[(exp % exp) `(modulo ,$1 ,$3)]
[(OP exp CP) $2]) ;when the expressions are wrapped parentheses
)
)
)
```

Procedure calls the parser passing it a lambda (that calls the lexer providing it with the input) which it will use to get values from the lexer, as many values at a time as it sees fit. ; i.e. according to what the parsing rules above, prescribe

```
(define (calceval ip)
(calcp (lambda () (calcl ip))))
```

To see the evaluator (unfortunately without much comments yet) see here