N Katz
Based on what I've seen here and here I managed to implement the lex and parse phase of an interpreter for a simple c-like program. It doesn't have functions in it but it has assignments, variables, conditionals, loops and print statements (As well as arithmetic, it also contains logical expressions.). I'm posting it below, in case others may find it useful (The whole thing including the evaluation phase and samples of input is here):
(require parser-tools/yacc //provides you with the lexer, the parser and the lexeme tools eg. string-> symbol, string->number etc - In general, with the ability to map the literals in the input
parser-tools/lex
(prefix-in : parser-tools/lex-sre))
(define-tokens value-tokens (NUM VAR ))
(define-empty-tokens op-tokens ( newline = OC CC DEL OP CP + - * / || % or && == != >= <= > < EOF PRINT WHILE IF ELSE ))
(define vars (make-hash)) ;to store the values in the variables
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9")))
(define calcl ;lexer is mapping the literals to their tokens or values
(lexer
[(eof) 'EOF]
[(:or #\tab #\space #\return #\newline ) (calcl input-port)]
[ (:= 2 #\|) (token-||)]
[(:or "=" "+" "-" "*" "/" "%" "&&" "==" "!=" ">=" "<=" ">" "<") (string->symbol lexeme)]
["(" 'OP]
[")" 'CP]
["{" 'OC]
["}" 'CC]
[ "print" 'PRINT ]
[#\, 'DEL ]
[ "while" 'WHILE ]
[ "if" 'IF ]
[ "else" 'ELSE ]
[(:+ (:or lower-letter upper-letter)) (token-VAR (string->symbol lexeme))]
[(:+ digit) (token-NUM (string->number lexeme))]
))
(define calcp ;defines how to parse the program and how the program structure is made up (recursively)
(parser
(start start);refers to the block named 'start' below. Every parser has a start, end, tokens definition, optional error message when needed and operator precedence def.
(end EOF)
(tokens value-tokens op-tokens )
(error (?(ok? name value) (if (boolean? value) (printf "Couldn't parse: ~a\n" name) (printf "Couldn't parse: ~a\n" value))))
(precs ;sets the precedence of the operators in relation to each other - that is, to which operand they bind stronger
(left DEL);from lowest to highest
(right =)
(left ||)
(left &&)
(left == !=)
(left <= >= < >)
(left - +)
(left * / %)
(right OP)
(left CP)
(right OC)
(left CC)
)
(grammar ;what is the grammar of my program?
(start
[() '()] ; returns empty list when it matches onto nothing
[(statements) `(,$1)]
[(statements start) `(,$1,$2)] ;we can have more than one statement - one example of the recursiveness
)
(statements /what type of major statements we might have
[(var = exp ) `(assign ,$1 ,$3)]
[(IF ifState) $2]
[(WHILE while) $2]
[(PRINT printVals) `(print ,$2)]
)
(ifState ; It's assumed that you cannot have an if inside an if unless it's within curly braces
[(OP exp CP statements) `(if ,$2 ,$4)] /combinations of different major statements
[(OP exp CP block) `(if ,$2 ,$4)]
[(OP exp CP block ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE block ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP statements ELSE statements ) `(if ,$2 ,$4 ,$6)]
[(OP exp CP block ELSE block ) `(if ,$2 ,$4 ,$6)]
)
(while
[(OP exp CP block) `(while ,$2, $4)]
)
(block
[(OC start CC) $2] ;we can statements or entire program wrapped into curly braces - a block
)
(var
[(VAR) $1]
)
(printVals
[(exp DEL printVals ) `(,$1 ,$3)]
[(exp) $1]
)
(exp [(NUM) $1] ; smallest (most reducible) chunk in an expression when the chunk is an integer
[(VAR) $1] ; smallest (most reducible) chunk in an expression when the chunk is a variable
[(exp || exp) `((lambda (a b) (or a b)) ,$1 ,$3) ]
[(exp && exp) `((lambda (a b) (and a b)) ,$1 ,$3)]
[(exp == exp) `(equal? ,$1 ,$3)]
[(exp != exp) `(not(equal? ,$1 ,$3))]
[(exp < exp) `(< ,$1 ,$3)]
[(exp > exp) `(> ,$1 ,$3)]
[(exp >= exp) `(>= ,$1 ,$3)]
[(exp <= exp) `(<= ,$1 ,$3)]
[(exp + exp) `(+ ,$1 ,$3)]
[(exp - exp) `(- ,$1 ,$3)]
[(exp * exp) `(* ,$1 ,$3)]
[(exp / exp) `(quotient ,$1 ,$3)]
[(exp % exp) `(modulo ,$1 ,$3)]
[(OP exp CP) $2]) ;when the expressions are wrapped parentheses
)
)
)
Procedure calls the parser passing it a lambda (that calls the lexer providing it with the input) which it will use to get values from the lexer, as many values at a time as it sees fit.
; i.e. according to what the parsing rules above, prescribe
(define (calceval ip)
(calcp (lambda () (calcl ip))))
To see the evaluator (unfortunately without much comments yet) see here