Programming

From Academia to Industry

So, now that the year is coming to an end and I’ve been in my first “real job” for about half a year or so, I figured it was about time to write a few lines about what happens when you enter the software industry as a full-time software developer, coming rom days spent reading papers, attending seminars, teaching, and trying to make proofs go through.

Well, there has obviously been a lot of catching up to do. Getting used to the Android ecosystem is one thing (which is understated given my rather limited prior knowledge), but also a refresher of the basic skills and an update on where the industry has been going over the last 5-6 years — since I already have my nose in Android a lot these days I’m only going to mention here that 1) I like the upgrade from iOS and am not missing the latter at all, and 2) I’m surprised how much guessing is involved in developing for this system due to inadequate, or at least uneasy to find, documentation.

The refreshing part has been quite enjoyable, and involved going through the CLRS algorithms book again, re-finding the beauty in software architecture and general design patterns, and covering concrete Java stuff such as best practices and its concurrency library.

At the same time it was also good to catching up on where the industry was gone. There were some things that I didn’t see coming at all (JavaScript on the server for instance), and some things that I was particular happy to see, such as functional languages (e.g. OCaml and Haskell) growing in popularity and influencing the mainstream languages. Finally having the time to go deeper in the NoSQL databases was good, and again happy to see that some of the more theoretical concepts are making their way into industry such as graph databases (agreed, relational database has relational algebra, but never found it overwhelmingly exciting).

All of this was before getting a job though, and after starting it’s been incredible nice to actually have some energy/time/conscience left in the evenings to look into other interesting things — and since this is pure pleasure I’ve been taking a breadth-first approach opening several books at the same time:

Besides software-related topics I also finally had the time to get a non-algorithmic point of view on game theory, including concrete games and mechanism design. And finally, getting involved in business, management, and interviewing processes at work has also been very interesting, admittedly still having to draw a lot from teaching and early university group work.

Book: Programming Pearls

Another book I came across when refreshing the more practical computer science skills was Jon Bentley’s Programming Pearls (second edition), composed of columns — blog posts, I suppose, by today’s terminology — written by him for the ACM back in the 1980s. And while it is somewhat outdated in terms of technology, the strength of this little book lies in something entirely more timeless.

The book covers several topic within software construction, including programme verification, testing, and code tuning, but most pages concern the construction of good algorithms to solve a given problem. Incidentally, one of my favourite problems from the book was posed to me at a job interview, supporting the claim that the book is well-known within the software community, at least by senior staff (as a side-note, I mentioned that I’d already encountered the problem from reading this book, and funnily enough that almost seemed as good an answer as an algorithm).

The problem was this (section 2.4 and 2.8):

Given a dictionary of words, find all sets of anagrams.

The good thing is that the book not only poses interesting questions, it also discusses several solutions, comparing them in terms of time and space efficiency. For the anagram problem above there’s the brute force approach as usual, yet as soon as you remember equivalence classes and their representatives you’re close to a much better algorithm.

Another favourite was presented in chapter 8:

Given a array of n integers (positive and negative), find a maximum sum contiguous subarray.

For instance, in [ 31, -41, 59, 26, -53, 58, 97, -93, -23, 84 ] one solution is [ 59, 26, -53, 58, 97 ] with sum 187, obtained by removing the first two and last three elements. Again there’s the obvious brute force approach — e.g. computing the sum for all C(n, 2) pairs of indices — but by a bit of dynamic programming it’s possible to reach an O(n) algorithm that only scans the array twice.

Now, as mentioned above, the book does not just talk about algorithms but through various topics aims to improve your overall programming skills. And for that I recommend it highly: it will probably not teach you a new technology but it might improve the quality of your code.

Which brings me to a pattern I’ve noticed over the years: my technology books usually end up somewhere in the basement, mostly kept for nostalgic reasons, whereas books like this one, focusing on the fundamentals, are much more likely to still sit high on my office bookshelf, and as such still attractive to buy in dead-wood format.

The Power of Pressure

The other day I was, relatively speaking, very close to landing a good job in the software industry. However, during the last 5% of the interview process pressure kicked in and my mental processes collapsed, leaving me starring at the white board unable it seemed, to even spell my name.

At this point it had already been a surprisingly long recruiting process, consisting of more than 8 hours of problem solving interviews. I’d had a break for the weekend, and was now called in for a final interview with the big boss. This was of course an indication that they were considering me seriously, however the recruiting agent had let me know that a few of the interviewers had expressed a wish that I’d been a bit faster at coming up with solutions.

The first half hour with the big boss went well, focusing more on me being a cultural fit than technical skill. But then, for the last half hour he suddenly asked me another problem solving question: implement method transposeMatrix(int[] matrix, int n) where the matrix is assumed to be of size n by n represented by a one-dimensional array, and with only a constant memory overhead (so no allocation of another array).

I quickly realised how the algorithm would work, using two pointers running respectively out of the rows and down the columns. However, when it came to writing the code I froze up — and not because it’s difficult code, but because I knew that me getting the job or not most likely came down to these particular 30 minutes. As a result, I got self-conscious and my focus started drifting away from the white board towards the big boss looking over my shoulder.

The outcome was that I turned completely blank, fighting to bring back focus where it was desperately needed. But it didn’t happen until after 20 minutes of nonsense, when I got a 5 minute break while he left to pick up a new whiteboard marker. When he came back my focus was there already and the solution came quickly:

void transposeMatrix(int[] matrix, int n) {
    for (int row = 0; row < n-1; row++) {
        for (int col = row+1; col < n; col++) {
            int i = row * n + col;
            int j = col * n + row;
            // swap matrix[i] and matrix[j]
        }
    }
}

yet it was too late; given the performance I illustrated that day I wouldn’t have hired myself, so in full understanding of their decision.

The frustrating part instead, is that this can still happen to me when I want something badly (it never happens when I don’t care). But what’s to learn from this? Well, I’m better at problem solving with a piece of paper in front of me and 5 minutes of “quite time”; I’ve been hesitating to take this at interviews for a fear of disrupting the flow, but it seems it can come to a point where it’s worth it. Also, I’ve started on ways of controlling nerves by basically caring less about the surroundings; since this is rooted in a current insecurity (from being unemployed?) it is easy to practise in everyday life.

Book: Java Generics and Collections

During preparations for a job interview I noticed that my understand of Java Generics had some holes in it; and during the interview itself it became clear that there were also a few holes in my knowledge about the corner cases of the Java Collections framework introduced with Java 5 and 6. So, after looking around a bit, which book could be more suiting than Java Generics and Collections?

Now, I wasn’t completely lost for either of the two, but they were both introduced after I had my Java programming course (in Java 1.4) and apparently since then there hasn’t been a situation requiring much more than your typical

List<String> strList = new ArrayList<String>();

and the like (perhaps there rarely are?). However, when I came across the follwing signature:

public static <T extends Comparable<? super T>> T max(
    Collection<? extends T> coll) {
    // iterate to find max ...
}

it was clear that there was something not so trivial going on; a nice little example of why this crooked specification is useful is given in Section 3.3, but funnily enough I find the use of keywords extends and super a bit counter-intuitive here — mostly likely because I’ve “grown up” with the formal <: notation instead.

Now, in general the chapters on generics are well-written and usefully sprinkled with illustrative examples. There could sometimes have been more technical explanations, and perhaps links to formal type theory (it was written by Philip Wadler after all), however, overall it’s a highly recommended read!

It even taught me a little peculiar thing about Java bytecode, namely that generics allows one to overload methods based on return type:

class Works {

    public static void main(String[] args) {
        Works o = new Works();
        int i = o.<Integer>foo();
        boolean b = o.<Boolean>foo();
    }

    public <T extends Integer> int foo() {
        return 0;
    }

    public <T extends Boolean> boolean foo() {
        return false;
    }

}

which wouldn’t compile without generics:

class DoesntWork {

    public static void main(String[] args) {
        DoesntWork o = new DoesntWork();
        int i = o.foo();
        boolean b = o.foo();
    }

    public int foo() {
        return 0;
    }

    public boolean foo() {
        return false;
    }

}

since the compiler doesn’t know which method to pick in main. However, the bytecode calls a method by looking up in an index table, and with generics the compiler has enough information to differentiate these.

For the second part of the book, on the Collection framework, the book disappoints a bit: it covers most classes and uses, but few details are given and there’s quite a bit of fluff. It gave me the overview I needed, but not much more than going over the online Java documentation.

Book: Programming Interviews Exposed

In my preparation for potentially taking on a general software developer role, Programming Interviews Exposed was recommended to me on several occasions and hence gave it a shot. It’s not too bad a read, and would most likely recommended it in the future, despite is having a few glitches; at the very least, it serves its purpose as a good source of concrete problems to solve.

Overall the book has the decent “Wrox feeling” I remember from way back in the days of secondary school reading Professional ASP.NET and Professional C#, and later Beginning Visual C++ 6 in the early university days (fond memories of a world opening up): on the surface the paper feels the same, even a few pages with faded print, and the contents is the same easy-to-read-yet-with-good-substance. It contains some good non-technical advise as well, so for its price I don’t see why anyone looking for a job in this area (and perhaps a bit rusty in the practical areas of computer science) shouldn’t pick up a copy.

The trees do not grow all the way into Heaven though, as the Danish saying goes, and there are a few place with room for improvement; for the interested here’s my two cents.

First of all, the writing follows the idea that the reader could have come up with every step of reasoning the authors take by just thinking the right way (and not letting much room for the use of prior knowledge). When coming up with a proposal, this approach may seem a bit strange yet it still teaches you the tricks. However, in analysing a solution they often skip arguments that could potentially teach you something. As a concrete example, the two-pointer solution for detecting cycles in a linked list (page 60 in the third edition) seems more like a useful trick to know than something I would have come up with in an interview situation; fair enough I’ve now learnt it. However, they give no argument to the claim that it has a linear running time; while I may not see the obvious, coming up with a proof was a useful exercise, taking me a car ride and the use of modular arithmetic to find. In short, you’ll need to look elsewhere for brushing up on proving and analysing algorithms (go Cormen et al.!).

Secondly, the given solutions sometimes seem to ignore standard mathematical thinking, and simply brute force their way through, in turn making it more confusing (at least to me) than it has to be. One concrete example is the problem of generating all combinations of characters in a string (page 116 in the third edition), which has a simple mathematical formulation in the form of the power set that also leads to an easy recursive implementation. Yet they do not follow nor mention this approach, instead doing a lot of hand-waving to end up with something that works but leaves me less certain about its correctness (their solution leaves out the empty string for instance, which may or may not be an error, but at least we know when the goal is a well-known concept such as the power set).

Thirdly, they do not always follow the official Java code conventions, which honestly seems a bit silly after several companies have highlighted the importance of this during interviews. For instance, the following code (page 93 in the third edition):

public void foo(String str) {
    int i, length;
    Character c;

    length = str.length();
    for (i = 0; i < length; i++) {
        c = str.charAt(i);
        // do first scan ...
    }
    for (i = 0; i < length; i++) {
        c = str.charAt(i);
        // do second scan ...
    }
}

goes against both the advise on initialisation at the point of declaration (length) and declaring local indices within their for-loop (i). Although a bit contradictory, it also goes against the advise given in Effective Java (Item 45 in the second edition) of not declaring a variable before use (c). To me, rewriting the code to this:

public void foo(String str) {
    int length = str.length();
    for (int i = 0; i < length; i++) {
        Character c = str.charAt(i);
        // do first scan ...
    }
    for (int i = 0; i < length; i++) {
        Character c = str.charAt(i);
        // do second scan ...
    }
}

seems both safer and easier to follow.

That was a lot of bashing actually; let me just make sure to express that overall I enjoyed reading the book, and almost all problems in it are interesting and with an entertaining side; of the latter I especially liked the pancake sorting algorithm. So, getting a copy is not a bad idea!

If curious, my programming solutions are available at the progin project on GitHub.

The Ten Minutes After Enlightenment

I had a job interview earlier today, which didn’t go completely awful, yet didn’t go terribly well either. During the interview I only found a simple solution to one of the problems and then the ten-minutes-after-enlightenment happened: what you should have said pops up in your head just after the interview is over. Sometimes it may take a book or an internet search for this to happen, but in these cases it seems to be because of nerves settling down and the brain starting to work again, fortunately. Why? Well perhaps a methodology can help, by extending the offline preparation phase a bit.

What happened? Well, I was asked to come up with an algorithm for computing a certain result, and the only viable thing that came to mind was a brute force method (not completely unexpected by the way, as important exams have earlier been seen to make my brain strike and confusion taking over — one might assume it to be different after 18+ years in the educational systems, yet here we are). This is obviously not the best solution, but when nothing else presents itself I’ve found earlier that coding this naive approach often leads to an insight that in turn yields an improvement, sometimes simply in the form of a better understanding of the problem and the objective.

So, I started coding the naive solution, hoping something better would show up along the way. Unfortunately it didn’t, and I even managed to mess up the coding despite the fact that I’ve used the same programming pattern numerous times, not least in my functional programming. At that point desperation kicked in, a dark well opened up in the ground, and the universe decided to test my ability to catch helping hands while tumbling towards a deep bottom. The usual stuff. Fortunately though, my interviewer was very kind and basically put out a platform for me to fall on. I made it up the well again, and by then time ran out and the interview ended (with me having goosebumps, a mild shaking, and the urge to play the guitar loudly).

And then it happens. Less than ten minutes later, after having first summoned a ringing in my right ear and next looked out the window from the back of my armchair, a better idea that I wish we could have pursued occurred to me: a data structure could have helped (a radix tree in this case), at the very least by providing a space/time trade-off. But the interviewer is now long gone and this is for your enjoyment only.

This insight may of course have been just around the corner independently of the interview ending or not, yet my point here is that in my state at the time it seems unlikely: the problem, I suspect, lies more in not being in your best working state than in the problem being too difficult for the time frame.

So, how to improve? Given that it’s difficult to just stop being nervous, one thing I’ll try for the next one is to improve my methodology. More concretely, I’ll keep with me a list of the most commonly used data structures:

  • hash table
  • (doubly) linked list
  • heap
  • binary search tree
  • radix
  • etc.

and if needed, go over them asking How could the problem be solved using one of these? Similarly, I’ll try with a list of important problems (and good algorithms for them):

  • sorting
  • breadth-first search
  • max flow
  • etc.

in the hope that a suitable reduction could be enough. And last but not least, it may also pay off to keep a list of the most common programming techniques:

  • tail recursion
  • dynamic programming
  • greedy
  • etc.

in case nothing else yields.

Book: Java Puzzlers

It’s always a bit embarrassing to acknowledge, but Java Puzzlers: Traps, Pitfalls, and Corner Cases is one of the books on my bookshelf that has remained largely unread for quite a while (since 2006 in fact; I have a habit of writing the acquisition date inside the cover). Yet this week I finally finished it, and for a book on programming languages it’s actually a pretty good read. Here’s a few off the top of my head reasons.

Most importantly of course is that it’s educational. While a few of the puzzles are indeed corner cases that I suspect most will only rarely if ever encounter (such as the weird behaviour that may arise in floating point arithmetic), most of the material seems likely to come in handy from time to time when a bug creeps in. And not only for the simple bug: on a few occasions were the bugs I found in the puzzles not really bugs at all, with the real problem being something a lot more profound.

Fortunately, while the depth of the material sometimes makes the puzzle format of the book seem a bit quaint (for instance when the bug lies deep in the Java framework), the book also contains an elaborate overview/index in the back so that it’s also useful for those of us without photographic memory.

Which brings me to the final point, namely that the presentation is also quite pleasant: the font is easy to read, the writing is concise, and — most importantly — the code snippets are nicely formatted, to the point, and without clutter (to see how much impact this can have just open the horrible Thinking in Java). Due to the puzzle/solution nature of the book some blank space had to occur, but even this is nicely filled, with visual illusions.

Book: Violent Python

Since I’ve been refreshing my practical computer science knowledge recently, including picking up the Python language and getting dirty with the practical aspects of computer security, I thought that the Violent Python book might be the perfect match. It’s definitely not a bad book, but it will probably get a nice little place on the basement bookshelf.

It is hardly scientific, but the fact that the book contains full terminal dumps from installing packages and downloading files using wget is usually not a good sign. I don’t know why this is done (if not to just increase the page count for higher profit) yet it makes the book harder to read since a lot has to be skipped. Of course this shouldn’t be used as the only means of judging the book, but also in this case does the correlation manifest itself.

Having said that though, the book also contains a lot of interesting material, most noticeably a good survey of the extension module available for Python as well as interesting case studies of previous exploits. Although for the latter, it is no match for Hacking Exposed which contain a lot more concentrated.

What would I have liked to see in the book? Well I suppose its subscript “a cookbook for hackers, forensic analysts, penetration testers, and security engineers” is actually rather fitting, except for the latter: as someone who’s primarily interested in building security systems using Python, it helped me understand more about how to break into them but very little about how to build them.

Kalami: Evaluation

It’s been a while now since the last post in my little series on Kalami, but I’ve finally polished the code for evaluating programmes as I wanted to. Overall, it’s not that different from the wellformed code of the previous post; in fact, the most challenging aspect is how to enumerate all guesses.

As earlier, we’re dealing with statements, conditions, and expressions. The first one is the tricky one, so to start off simple I’ll first look at the other two. The full source code is available in the evalwoa branch on GitHub.

 

Assuming bindings in environment env for all identifiers in an expression expr, it may be evaluated straight-forwardly by the following recursive function eval_expr:

let rec eval_expr env expr =
    match expr with

        ExprNumber(n) ->
            n

        | ExprVariable(id) ->
            List.assoc id env

        | ExprProduct(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            val1 * val2

        | ExprDivision(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            val1 / val2

        | ExprSum(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            val1 + val2

        | ExprSubtraction(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            val1 - val2

        | ExprPower(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            power val1 val2

Note first that since env is just a list of pairs binding identifiers to integers

let env = [ (id1, n1); (id2, n2); ... (idk, nk) ];;

we may extract the latter using built-in function List.assoc as in case ExprVariable. Next, note that there’s no built-in function for integer exponentiation, hence the power function used in case ExprPower is implemented in evaluation.ml as a tail recursive repeated squaring (initially it was using the naive approach, but since I did do my PhD in crypto, this was one of the small changes needed; see separate file power.ml for the various implementations).

 

Evaluating condition cnd is similar, with the small difference that for one reason or another (purist perhaps) I used the prefix notation for function invocation here:

let rec eval_cnd env cnd =
    match cnd with

        CndTrue ->
            true

        | CndFalse ->
            false

        | CndEqual(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            (=) val1 val2

        | CndLessthan(expr1, expr2) ->
            let val1 = eval_expr env expr1 in
            let val2 = eval_expr env expr2 in
            (<) val1 val2

        | CndNegation(cnd) ->
            let val1 = eval_cnd env cnd in
            (not) val1

        | CndOr(cnd1, cnd2) ->
            let val1 = eval_cnd env cnd1 in
            let val2 = eval_cnd env cnd2 in
            (||) val1 val2

        | CndAnd(cnd1, cnd2) ->
            let val1 = eval_cnd env cnd1 in
            let val2 = eval_cnd env cnd2 in
            (&&) val1 val2

 

For statement stmt it finally gets a bit more interesting. Cases StmtLet and StmtIf are straight-forward, as are StmtAccept and StmtReject which respectively return the current environment and None. However, for a guessing statement it’s a bit more involved. If the guess has an upper bound then we may just try all possibilities as in case StmtGuessLowerUpperBound below. But this will of course not work when no upper bound is specified since an inner guess will prevent outer guesses from ever increasing. As an example, consider programme:

guess x in
guess y in
if x == y+1 then accept
otherwise reject

which has solution [ ("x", 1); ("y", 0) ], yet if we first let x=0 and then use the strategy of enumerating all values for y then we’ll never make it back to incrementing x.

Instead, for a programme with n (unbounded) guesses, i.e. guesses without an upper bound, the simple strategy used here is to enumerate all n-arity integer tuples in an outer loop, and for each evaluate the main statement with a second environment guesses linking the tuples with the identifiers of the (unbounded) guesses. Statements may then be evaluated as follows, using an initial empty environment env for the main statement:

let rec eval_stmt env guesses stmt =
    match stmt with

        StmtLet(id, expr, stmt) ->
            let expr_value = eval_expr env expr in
            let env' = (id, expr_value) :: env in
            eval_stmt env' guesses stmt

        | StmtIf(cnd, stmt_true, stmt_false) ->
            let cnd_value = eval_cnd env cnd in
            if cnd_value then
                eval_stmt env guesses stmt_true
            else
                eval_stmt env guesses stmt_false

        | StmtGuess(id, stmt) ->
            let guess = List.assoc id guesses in
            let env' = (id, guess) :: env in
            eval_stmt env' guesses stmt

        | StmtGuessLowerBound(id, lower_bound_expr, stmt) ->
            let lower_bound = eval_expr env lower_bound_expr in
            let guess = List.assoc id guesses in
            if guess < lower_bound then
                  None
             else
                 let env' = (id, guess) :: env in
                 eval_stmt env' guesses stmt

        | StmtGuessLowerUpperBound(id, lb_expr, ub_expr, stmt) ->
            let lower_bound = eval_expr env lb_expr in
            let upper_bound = eval_expr env ub_expr in
            let rec helper guess =
                let env' = (id, guess) :: env in
                let result = eval_stmt env' guesses stmt in
                match result with
                    None ->
                        if guess < upper_bound then
                            helper (guess + 1)
                        else
                            None
                    | Some(_) ->
                        result
            in
            helper lower_bound

        | StmtAccept ->
            Some(env)

        | StmtReject ->
            None

where we see that in cases StmtGuess and StmtGuessLowerBound the guess performed in the outer enumeration is simply added to the environment.

The only thing missing is hence the outer enumeration. First we have a simple function collecting identifiers:

let rec get_unbounded_guesses stmt =
    match stmt with

        StmtLet(_, _, stmt) ->
            get_unbounded_guesses stmt

        | StmtIf(_, stmt_true, stmt_false) ->
            (get_unbounded_guesses stmt_true) 
            @ (get_unbounded_guesses stmt_false)

        | StmtGuess(id, stmt) ->
            id :: (get_unbounded_guesses stmt)

        | StmtGuessLowerBound(id, _, stmt) ->
            id :: (get_unbounded_guesses stmt)

        | StmtGuessLowerUpperBound(_, _, _, stmt) ->
            get_unbounded_guesses stmt

        | StmtAccept ->
            []

        | StmtReject ->
            []

Next is a projection function project arity number performing a one-to-one mapping of an integer into a integer tuple with the given arity (see also projection.ml). The code for this gets a bit messy so I won’t give it here, yet the main principle is to partition the digits in number such that for instance:

# project 3 123456;;
- : (int * int) list = [(3, 14); (2, 25); (1, 36)]

It could definitely be done in other ways (as well as optimised), but I vaguely remember going with this simply because of it keeping the projected numbers slightly “balanced”.

Finally, we put this together to obtain:

let eval stmt =
    let unb_guess_ids = get_unbounded_guesses stmt in
    if (List.length unb_guess_ids) > 0 then
        let projector = project (List.length unb_guess_ids) in
        let rec helper number = 
            let projection = projector number in
            let guesses = List.combine unb_guess_ids projection in
            let result = eval_stmt [] guesses stmt in
            match result with
                None ->
                    helper (number + 1)
                | Some(_) ->
                    result
        in
        helper 0
    else
        eval_stmt [] [] stmt

that will indeed find a solution to the example programme from above!

 

While this is now a working interpreter, it does have its shortcomings when used without any kind of static analysis thrown in. For example, on programme:

guess x in
reject

the interpreter will enter an infinite loop, despite the fact that the programme obviously doesn’t have any solutions. Fixing this will be the topic of the next post.

Kalami: Working with the Syntax Tree

While the grammar from the previous post on parsing puts some restrictions on the form of a valid Kalami programme, it doesn’t reject all invalid programmes. In particular, it doesn’t ensure that all variables are bound (declared) before use. Defining a simple type system to check for this is straight forward, but will also serve to illustrate how easy it is to work with an abstract syntax tree in Caml.

To quickly illustrate the problem, consider programmes such as:

if x == 5 then accept
otherwise reject

which are meaningless since x is unbound. Now, since I don’t know any good way to present the rules of the type system here on the blog in the usual visual style of inference rules (see the appendix in my master’s thesis for instance), I’m just going to give the code implementing them. Fortunately though, Caml allows us to express such rules so concisely that not much is lost this way anyway — a legacy of ML (its distant father) being a language for formulating logical rules for theorem provers!

Starting with expressions, we assume that a list definedvars of variables have already been bound (by outer statements) and by recursion on the structure of expression expr check that all variables it mentions are in the list. The code looks as follows:

let rec wellformed_expr definedvars expr =
    match expr with

        ExprNumber(n) ->
            true

        | ExprVariable(id) ->
            List.mem id definedvars

        | ExprProduct(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprDivision(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprSum(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprSubtraction(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | ExprPower(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

with the most interesting case being for ExprVariable where a call to standard library function List.mem checks the id is a member of the list. Checking a condition is similar, again assuming a list of variables that have already been bound; notice that it calls the above function for cases CndEqual and CndLessthan:

let rec wellformed_cnd definedvars cnd =
    match cnd with

        CndTrue ->
            true

        | CndFalse ->
            true

        | CndEqual(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | CndLessthan(expr1, expr2) ->
            (wellformed_expr definedvars expr1) &&
            (wellformed_expr definedvars expr2)

        | CndNegation(cnd) ->
            wellformed_cnd definedvars cnd

        | CndOr(cnd1, cnd2) ->
            (wellformed_cnd definedvars cnd1) &&
            (wellformed_cnd definedvars cnd2)

        | CndAnd(cnd1, cnd2) ->
            (wellformed_cnd definedvars cnd1) &&
            (wellformed_cnd definedvars cnd2)

Now, for statements it finally gets slightly more interesting. The function still takes a list definedvars as input since it calls itself recursively, and hence some variables may be bound by outer statements. But it now also extends this list when a let– or guess-statement is encountered; in these cases it also ensures that no variable is re-bound:

let rec wellformed_stmt definedvars stmt =
    match stmt with

        StmtLet(id, expr, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_expr definedvars expr) && 
            (wellformed_stmt (id::definedvars) stmt)

        | StmtIf(cnd, stmt_true, stmt_false) ->
            (wellformed_cnd definedvars cnd) &&
            (wellformed_stmt definedvars stmt_true) &&
            (wellformed_stmt definedvars stmt_false)

        | StmtGuess(id, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtGuessLowerBound(id, _, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtGuessLowerUpperBound(id, _, _, stmt) ->
            (not (List.mem id definedvars)) &&
            (wellformed_stmt (id::definedvars) stmt)

        | StmtAccept ->
            true

        | StmtReject ->
            true

The above functions are all that is needed to perform our validation check: simply invoke wellformed_stmt on the main programme statement with an empty list, and check that it returns true. We may wrap this in function:

let wellformed stmt =
    wellformed_stmt [] stmt

in order to hide some of the implementation details.