November 2013

Kalami: Working with the Syntax Tree

While the grammar from the previous post on parsing puts some restrictions on the form of a valid Kalami programme, it doesn’t reject all invalid programmes. In particular, it doesn’t ensure that all variables are bound (declared) before use. Defining a simple type system to check for this is straight forward, but will also serve to illustrate how easy it is to work with an abstract syntax tree in Caml.

To quickly illustrate the problem, consider programmes such as:

if x == 5 then accept
otherwise reject

which are meaningless since x is unbound. Now, since I don’t know any good way to present the rules of the type system here on the blog in the usual visual style of inference rules (see the appendix in my master’s thesis for instance), I’m just going to give the code implementing them. Fortunately though, Caml allows us to express such rules so concisely that not much is lost this way anyway — a legacy of ML (its distant father) being a language for formulating logical rules for theorem provers!

Starting with expressions, we assume that a list definedvars of variables have already been bound (by outer statements) and by recursion on the structure of expression expr check that all variables it mentions are in the list. The code looks as follows:

let rec wellformed_expr definedvars expr =
    match expr with

        ExprNumber(n) ->
            true

        | ExprVariable(id) ->
            List.mem id definedvars

        | ExprProduct(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprDivision(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprSum(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprSubtraction(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprPower(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

with the most interesting case being for ExprVariable where a call to standard library function List.mem checks the id is a member of the list. Checking a condition is similar, again assuming a list of variables that have already been bound; notice that it calls the above function for cases CndEqual and CndLessthan:

let rec wellformed_cnd definedvars cnd =
    match cnd with

        CndTrue ->
            true

        | CndFalse ->
            true

        | CndEqual(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | CndLessthan(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | CndNegation(cnd) ->
            wellformed_cnd definedvars cnd

        | CndOr(cnd1, cnd2) ->
            (wellformed_cnd definedvars cnd1) &&
            (wellformed_cnd definedvars cnd2)

        | CndAnd(cnd1, cnd2) ->
            (wellformed_cnd definedvars cnd1) &&
            (wellformed_cnd definedvars cnd2)

Now, for statements it finally gets slightly more interesting. The function still takes a list definedvars as input since it calls itself recursively, and hence some variables may be bound by outer statements. But it now also extends this list when a let– or guess-statement is encountered; in these cases it also ensures that no variable is re-bound:

let rec wellformed_stmt definedvars stmt =
    match stmt with

        StmtLet(id, expr, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_expr definedvars expr) && 
            (wellformed_stmt (id::definedvars) stmt)

        | StmtIf(cnd, stmt_true, stmt_false) ->
            (wellformed_cnd definedvars cnd) &&
            (wellformed_stmt definedvars stmt_true) &&
            (wellformed_stmt definedvars stmt_false)

        | StmtGuess(id, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtGuessLowerBound(id, _, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtGuessLowerUpperBound(id, _, _, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtAccept ->
            true

        | StmtReject ->
            true

The above functions are all that is needed to perform our validation check: simply invoke wellformed_stmt on the main programme statement with an empty list, and check that it returns true. We may wrap this in function:

let wellformed stmt =
    wellformed_stmt [] stmt

in order to hide some of the implementation details.

Kalami: Parsing

Next up in the presentation of the little toy language Kalami is parsing. I remember how we used SableCC during my undergraduate years, and I also remember how we simply couldn’t make it pretty no matter what measure we used! Perhaps it can be made better in the Java environment than we ever managed to, but after I saw how it is done in the Caml environment I never looked back: all of a sudden it became fun to write parsers and interpreters!

So, parsing is basically about how to turn a input text string into an in-memory representation in the form of an abstract syntax tree. I don’t want to go into a lot of background but a good source for slightly more details is Chapter 12 in the Ocaml Manual.

The files needed can be found in the ZIP file kalami-parsing.zip; out of these the most important are structure.ml, lexer.mll, and parser.mly.

Starting with structure.ml we first have a piece of Caml code defining what we would like the abstract syntax tree of a Kalami programme to look like:

type identifier = string
type number = int
type bound = int

type expression =
	ExprNumber of number
	| ExprVariable of identifier
	| ExprProduct of expression * expression
	| ExprDivision of expression * expression
	| ExprSum of expression * expression
	| ExprSubtraction of expression * expression
	| ExprPower of expression * expression

type condition =
	CndTrue
	| CndFalse
	| CndEqual of expression * expression
	| CndLessthan of expression * expression
	| CndNegation of condition
	| CndOr of condition * condition
	| CndAnd of condition * condition

type statement =
	StmtLet of identifier * expression * statement
	| StmtIf of condition * statement * statement
	| StmtGuess of identifier * statement
	| StmtGuessLowerBound of identifier * ...
	| StmtGuessLowerUpperBound of identifier * ...
	| StmtAccept
	| StmtReject

Essentially, the above code defines expressions, conditions, and statements as labelled tuples, such that when we for instance write “4*5” in Kalami then this can be represented in memory as expression ExprProduct(ExprNumber(4), ExprNumber(5)). We shall later see that in the end a Kalami programme is simply a statement.

By the way, note the elegance with which we can express an abstract syntax tree in Caml: were we to express the same in Java it would look a lot less intuitive, at least in my experience. This is one of the great strengths of Caml, of which there are plenty more when we start evaluating and analysing the abstract syntax tree in later posts.

Next up is the lexer responsible for the first layer of abstraction of the input, namely turning the raw string into a stream of tokens. Looking in lexer.mll we see that several regular expressions are used to abstract strings, from the very basic labelling of characters and keywords:

	| '('				{ LPAREN }
	| ')'				{ RPAREN }
	| '+'				{ PLUS }
	| '-'				{ MINUS }

	| "let"				{ LET }
	| "in"				{ IN }
	| "if"				{ IF }
	| "then"			{ THEN }

to the slightly more advanced regular expressions:

	| (['0'-'9']+ as i)		{ INT(int_of_string i) }

that also includes a native Caml instruction for turning a string into a number.

Besides this abstraction of strings as tokens not much happens in the lexer, perhaps apart from the little extra code needed for allowing nested comments, and the insistence that STRs, and in turn identifiers, start with a digit (an easy way to distinguish them from INTs).

However, in parser.mly we may use these tokens to put together a grammar. In particular we see from rule main that an input (now a string of tokens) is a Kalami programme if it can be parsed according to rule “stmt” and its sub-rules as given by:

id:
	STR				{ $1 }
;

expr:
	INT				{ ExprNumber($1) }
	| id				{ ExprVariable($1) }
	| expr MULT expr		{ ExprProduct($1, $3) }
	| expr DIV expr			{ ExprDivision($1, $3) }
	| expr PLUS expr		{ ExprSum($1, $3) }
	| expr MINUS expr		{ ExprSubtraction($1...) }
	| expr EXP expr			{ ExprPower($1, $3) }
	| LPAREN expr RPAREN		{ $2 }
;

cnd:
	expr EQ expr			{ CndEqual($1, $3) }
	| expr LT expr			{ CndLessthan($1, $3) }
	| NOT cnd			{ CndNegation($2) }
	| cnd OR cnd			{ CndOr($1, $3) }
	| cnd AND cnd			{ CndAnd($1, $3) }
;

stmt:
	LET id ASSIGN expr IN stmt	{ StmtLet($2, $4, $6) }
	| IF cnd THEN stmt OTHERWI...	{ StmtIf($2, $4, $6) }
	| GUESS id IN stmt		{ StmtGuess($2, $4) }
	| GUESS id FROM expr IN ...	{ StmtGuessLowerBo... }
	| GUESS id FROM expr TO ...	{ StmtGuessLowerUp... }
	| ACCEPT			{ StmtAccept }
	| REJECT			{ StmtReject }
;

Besides these rules the file also contains instructions defining the precedence of for example the mathematical operations as well as their associativity, as needed to help the parser solve ambiguity.

Having looked at the primary files for the parser we turn to the plumbing needed to put them together. File printing.ml simply takes a statement and recursively prints it on the screen, and file kalamiparsing.ml contains Caml code that reads the input, invokes the lexer and the parser, and prints the statement in case no error occurred:

open Printing

let _ =
	try
		let lexbuf = Lexing.from_channel stdin in
		let stmt = Parser.main Lexer.token lexbuf in

		print_string "\n*** Statement ***\n\n";
		print_stmt stmt;
		print_string "\n\n"

	with 
		Lexer.Eof ->
			print_string "End of file\n";
			exit 0
		| Parsing.Parse_error ->
			print_string "Parsing error\n";
			exit 1

The only thing missing now is how to compile everything. This is not completely straight-forward since the lexer and the parser are referencing each other, and as a result we must execute the compilation commands in the specific order as seen in file compile.sh. Doing so will produce the kalamiparsing binary which we can tell to either load a programme from a file through

./kalamiparsing < "inputfile"

or run “interactively” by simply executing

./kalamiparsing

and typing a programme directly into it (ending with an EOF signal by pressing control + d).

In the next post we’ll look at how we may evaluate an in-memory Kalami programme in order to find a satisfying guess; as mentioned previously this requires something slightly more involved than a trivial brute-force.

Street Art Walking Tour

Yesterday I joined a street art walking tour led by Richard, the guy behind Paris Play. Although I often notice the Space Invader pieces put up by Invader, it was amazing to see how many other pieces by other artists I’ve just walked by several times without noticing.

We were a group of circa 15 people from The Paris Photography Meetup Group walking a four hour circle in the 11th and 18th arrondissement. Besides the street art, it was also nice to get another view on the neighbourhoods (especially the 18th offered positive surprises in the form of splendid views and cosy cafés) and spend time with the friendly and English-speaking group.

Extraordinarily I brought along the digital camera this time, since 1) it’s better to keep the hipster tendencies to a minimum on a first meeting, and 2) I wanted to be free to re-take photos without worries.

The tour is arranged monthly by Underground Paris and comes highly recommended, in particular for a day with dry, or even warm, weather!

Nikon FM and Film #10

Another analogue camera arrived in the mail about a month ago, but this time a Nikon FM instead of the usual Olympus OM-1! Granted, it is silly to get involved in too many systems as the lenses from one system often cannot be used on a camera from the other (and as a result you need to buy two separate sets) — but I kept reading about how much of a classic the FM is, … et voilà!

Here’s first the result of quickly running a film through the camera to see if everything worked alright before the return option expired:

I like Le Guardien for how it draws me into the frame, and the dynamics added by the central walking character. For Passing Sky it’s the silhouettes, although this was not entirely my intention. And as RooftopBalcony Overview, and Paris shows, I’m apparently not quite done with negative space and rooftops yet. For Gare du Nord it’s the dynamics of the various lines, the contrast they contribute, and the movement of the two characters at the bottom.

The camera is the black version and I thought at first that it could be used together with the 18-55mm lens from my digital Nikon D60. The lens mounted alright, but after a few shots something seemed wrong. I went back to check the fine print, and while I had already read that the lens would not cover the entire frame at the wide focal lengths, I had missed that you’re forced to always use the smallest aperture, meaning f/22 in my case! The Passing Sky photo above is a perfect example of the implications: the corners are black and rounded, and the shadows are heavily underexposed. The former is not critical, perhaps even a bit interesting, but the later made me look around for a new lens.

At first I thought about a 50mm f/1.8 standard lens. But discovering that the price-drop for Nikon lenses has nowhere-near followed that of the Olympus lenses, I gave up on also starting a set of Nikon primes and went with a 28-70mm zoom instead; at f/3.5-4.5 it is a lot darker that my Olympus primes, but at 70€ it seems to have been a bargain.

In the end I hence ended up spending around twice, but so far have been very happy with the outcome. The camera it slightly bigger than the OM-1 it was apparently made to compete with, but this is actually not a bad thing as it fits my hands better. It also feels slightly more robust, not least the film advancer and the shutter. On the other hand, the viewfinder in the OM-1 kicks arse (!!) and having the shutter dial around the lens mount so far also works better for me, even if I most of the time work only with the aperture.

Film #9

Another film has been processed at the kitchen sink, this time a T-Max 400 shot at ASA 800 and accordingly pushed during development. With the 50mm f/1.4 this has given usable photos from low-light concert and night scenes.

To be honest, no photo in this set gets me really exited; there are a few cute ones but haven’t given more than 2 stars to any of them. Maybe this will have changed by the next time I look at them, but until then it doesn’t really matter because I remember having a great time shooting them: walking around the city at night is a sure way to get a glimpse of the romance seen in the famous photos from the 1920-30s!

Surprise from Lomography X-Pro Chrome Film

I bought my Fisheye together with two three-packs of Lomography film: the Color Negative 400 and the X-Pro Chrome 100. But while the former have produced decent results, the latter has until now been quite a let-down.

The first two rolls of the X-Pro Chrome was shot on a Diana Mini, and while this is certainly a toy camera, it has actually resulted in some usable photos in the past. However, for the two X-Pro Chrome films the results have been very poor, as bad in fact that the negatives just sit here in my drawer without having been printed nor properly scanned (the colours got too weird for me to continue).

To test if it was really a poor film or simply a bad combination with the Diana Mini, I decided to load the last roll into an Olympus OM-1. Not excepting much I quickly finished it and went to the developer, who, as an additional experiment, I asked to make small prints instead of scans (same price). Since I’m always a bit embarrassed to hand over a Lomography film for its low quality, my surprise when picking them up was even better: the photos were way better than expected, especially the vivid colours. It doesn’t renders the sky that nicely, but for the Autumn colours of yellow and red it is definitely not too bad.

As mentioned I asked for prints instead of scans, and this might be the strategy from now on, at least when I feel like saving the 5€ extra it is to have both: obtaining a good result from scanning the prints myself are far easier than from scanning the negatives! It doesn’t allow me to change the exposure of course (Under the Bridge, for instance, could benefit from more shadow detail), but given how much trouble this particular kind of film caused it was no doubt worth it. Plus, it’s nice to actually have prints to hold in your hands or even go as far as putting them into frames.

Bottom-line? When the film comes back in stock I’ll be getting a lot more of it, even if it means making excuses and covering my face when I go to the developer.

Kalami: Introduction

A few years ago I was the teaching assistant in a course on computability and as such was involved in explaining the concept of non-deterministic “guessing”. I always found this topic fascinating myself, so since I also had to come up with a few project suggestions I thought some of them might be interested in making a programming language with a guessing construct. And to give them a place to start I quickly put together a prototype interpreter in Caml that I’ll explain over the course of a few posts. But let’s first get an idea of what the little toy language can do.

Say you want to find two factors in a given number. There are many ways to do this, but using non-determinism we may simple ask the computer to “guess” them. Expressed in the toy language we can write:

let n = 681 in

guess p from 2 in
guess q from 2 in

if n == p*q then accept
otherwise reject

where p and q are the factors to be guessed, and assumed to be larger than 2 to avoid the trivial solutions with p=1 or q=1. This could of course be expressed similarly in e.g. Prolog, but the aim was also for the project to have a touch of finding suitable evaluation strategies and using static analysis, as well as to illustrate the power of working with languages in Caml.

To illustrate some of the issues consider programme

guess x in
guess y in
if x == y+1 then accept
otherwise reject

which have solution x=1 and y=0. Yet if the evaluation strategy simply starts with x=0 and tries all values for y then it will never find it; in other words, the evaluation strategy must eventually enumerate all tuples of values.

For some programmes we may also benefit from first employing static analysis. Consider for instance programme

guess x in
if x == x+1 then accept
otherwise reject

that is easily seen to not have any solutions — yet the interpreter might not realise this without some kind of static analyse. Likewise, for programmes such as

guess n in
if n == 1*2*3 then accept
otherwise reject

static analysis may speed up execution time by letting the interpreter discover that it only needs to consider a subset of values — in this case only n=6 — instead of it simply brute forcing for all possibly values.

So there you have it. In the next post I’ll provide the source code for the interpreter and begin dinning into it by looking at its parser. Oh, and by the way, the name Kalami came from The Meaning of Liff by Douglas Adams and John Lloyd, which I was reading at the time.