Skip to content

Implementing ML like language

Alex edited this page May 22, 2018 · 12 revisions

Simon Peyton Jones's book The Implementation of Functional Programming Languages is worth reading, it discusses implementing an ML-like programming language, compiling it to lambda calculus, then discusses optimizations. CAML Light realizes aspects of the book, compiling to the Categorical Abstract Machine (CAM). It might be worth while to look at this in greater detail.

Algebraic Data Types

How it seems to work under the hood is:

data Tree = Empty
          | Leaf Int
          | Node Tree Tree

depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)

Loosely speaking, this is "the same" as the following code:

typedef struct tree tree;

enum {Tree_Empty, Tree_Leaf, Tree_Node} Tree_constructor;

typedef struct {} Empty;
typedef struct { int leaf; } Leaf;
typedef struct { tree *t1; tree *t2; } Node;

struct Tree {
    enum Tree_constructor constructor;
    union {
        Empty *empty;
        Leaf *leaf;
        Node *node;
    }
};

int depth(struct Tree *tree) {
    switch(tree->constructor) {
        case Tree_Empty: return 0;
        case Tree_Leaf:  return 1;
        case Tree_Node:  return 1 + max(depth(tree->t1), depth(tree->t2));
    }
}

Or something like that. Also see:

Abstract Machines

Apparently, the approach many languages take is to implement an "abstract machine" like an extension of the SECD, and compile down to that. GHC's abstract machine is the STG — spineless Tagless G-Machine.

Most (at least, all I have seen) are stack machines. At least, the SECD, the Categorical Abstract Machine, the Three Instruction Machine, and (I think) the G-Machine are all stack machines.

Stack Machine

Categorical Abstract Machine

The Categorical abstract machine is a generic stack machine, with a register storing the most recently popped value. It is well suited for strict/eager languages.

Three Instruction Machine (TIM)

  • Jon Fairbairn, Stuart Wray, "Tim: A Simple, Lazy Abstract Machine to Execute Supercombinators". Eprint

G-Machine

The Chalmers group implemented a lazy ML compiler using a G-Machine as the target abstract machine.

Haskell's Virtual Machine: The STG

In its bare essentials, the STG machine consists of three parts:

  1. The STG registers:
    • There are rather a lot of registers here: more than can be practicably stored in actual available processor registers on most architectures.
    • To deal with the lack of processor registers, most of the STG registers are actually kept on the stack in a block of memory pointed to by a special STG register called the "base register" (or BaseReg). To get or set values of registers which are not kept in processor registers, the STG machine generates an instruction to load or store from an address relative to the BaseReg.
    • The most important four registers are the BaseReg, the stack pointer (Sp), the heap pointer (Hp), and the general purpose register R1 which is used for intermediate values, as well as for returning evaluated values when unwinding the stack. These are the four registers which are assigned actual processor registers when implementing the STG machine on x86.
  2. The STG stack:
    • Stores function arguments and continuations (i.e. the stack frames which are executed when a function returns)
    • Grows downwards in memory
    • The top of the stack is pointed to by the STG register Sp, and the maximum available stack pointer is stored in SpLim. There is no frame pointer.
  3. The heap:
    • Used to store many different sorts of heap object: notably functions, thunks and data constructors
    • Grows upwards in memory, towards the stack
    • All allocation occurs using a bump-allocator: the heap pointer is simply incremented by the number of bytes desired (subject to to a check that this does not exhaust available memory). The garbage collector is responsible for moving objects out of the area of the heap managed by the bump allocator and into the care of its generational collector.
    • The last address in the bump-allocated part of the heap that has been used is pointed to by the STG register Hp, with HpLim holding the maximum address available for bump-allocation.

This actually poses a challenge when using the LLVM as a backend for the GHC (from blogpost):

GHC defines an abstract machine that implements the execution model for Haskell code. This is called the 'STG Machine' (or Spineless Tagless G-Machine) and its job is to evaluate the final, functional IR used by GHC, STG. The STG machine consists of three main parts, registers, a stack and a heap. For this stack, GHC doesn't use the standard C stack but implements its own. What we are concerned with though is just how the registers are implemented. The easiest method is to just store them all in memory as a structure and indeed GHC supports this method (its refereed to as 'unregistered mode' and is used for easier porting of GHC). However because of how often they are accessed a far more efficient way to implement them is to map them onto real hardware registers, which roughly halves the runtime of a typical Haskell program. So this is what GHC does, although as there are far too many STG machine registers to map onto real registers, it has to still store some of them in memory.

This is a problem for the LLVM backend though as it has no control over the register allocation. We can still create a working backend by only supporting 'unregistered mode' but this isn't very useful due to the poor performance. Also we aren't just focused on performance, compatibility with the other backends is a major concern. We need to support the same register mapping as they use so that Haskell code compiled by LLVM will be able to link with code compiled by the other backends. Lets look quickly at how the other backends achieve this register mapping.

With the native code generator its very straight forward since it has full control over the register allocation. How about the C backend though? Typically this would be a problem for C as well since it offers no control over register allocation. Thankfully GCC offers an extension, 'Global Register Variables', which allows you to assign a global variable to always reside in a specific hardware register. GCC implements this feature basically by removing the register specified from the list of known registers that its register allocator uses.

So the solution for LLVM is a new calling convention but how does this work? Well the calling convention passes arguments in the hardware registers that GHC expects to find the STG machine registers in. So on entry to any function they're in the correct place. Unlike with the NCG or C backend this doesn't exclusively reserve the registers, so in the middle of a function we can't guarantee that the STG machine registers will still be in the hardware registers, they may have been spilled to the stack. This is fine, in fact its an improvement. It allows LLVM to generate the most efficient register allocation, having more registers and flexibility than the other backends, while still maintaining compatibility with them since on any function entry the STG machine registers are guaranteed to be in the correct hardware registers.

Reading List

Some random links on stuff related to Haskell GHC-implementation-details.

Apparently there is a core component to GHC, which is some flavor of typed lambda calculus (System FC, a proper superset of system F). See:

A random interesting read to think about is the LLVM Language Reference.

For some toy models:

Eval/apply vs push/enter has been the dichotomy for semantics of eager and lazy languages (respectively), with graph reduction models as an optimized lazy approach to evaluation.

STG Expressions

An GenStgExpr is defined in StgSyn.hs

{-
************************************************************************
*                                                                      *
\subsection{STG expressions}
*                                                                      *
************************************************************************

The @GenStgExpr@ data type is parameterised on binder and occurrence
info, as before.

************************************************************************
*                                                                      *
\subsubsection{@GenStgExpr@ application}
*                                                                      *
************************************************************************

An application is of a function to a list of atoms [not expressions].
Operationally, we want to push the arguments on the stack and call the
function. (If the arguments were expressions, we would have to build
their closures first.)

There is no constructor for a lone variable; it would appear as
@StgApp var []@.
-}

data GenStgExpr bndr occ
  = StgApp
        occ             -- function
        [GenStgArg occ] -- arguments; may be empty

{-
************************************************************************
*                                                                      *
\subsubsection{@StgConApp@ and @StgPrimApp@---saturated applications}
*                                                                      *
************************************************************************

There are specialised forms of application, for constructors,
primitives, and literals.
-}

  | StgLit      Literal

        -- StgConApp is vital for returning unboxed tuples or sums
        -- which can't be let-bound first
  | StgConApp   DataCon
                [GenStgArg occ] -- Saturated
                [Type]          -- See Note [Types in StgConApp] in UnariseStg

  | StgOpApp    StgOp           -- Primitive op or foreign call
                [GenStgArg occ] -- Saturated.
                Type            -- Result type
                                -- We need to know this so that we can
                                -- assign result registers

{-
************************************************************************
*                                                                      *
\subsubsection{@StgLam@}
*                                                                      *
************************************************************************

StgLam is used *only* during CoreToStg's work. Before CoreToStg has
finished it encodes (\x -> e) as (let f = \x -> e in f)
-}

  | StgLam
        (NonEmpty bndr)
        StgExpr    -- Body of lambda

{-
************************************************************************
*                                                                      *
\subsubsection{@GenStgExpr@: case-expressions}
*                                                                      *
************************************************************************

This has the same boxed/unboxed business as Core case expressions.
-}

  | StgCase
        (GenStgExpr bndr occ)
                    -- the thing to examine

        bndr        -- binds the result of evaluating the scrutinee

        AltType

        [GenStgAlt bndr occ]
                    -- The DEFAULT case is always *first*
                    -- if it is there at all

{-
************************************************************************
*                                                                      *
\subsubsection{@GenStgExpr@: @let(rec)@-expressions}
*                                                                      *
************************************************************************

The various forms of let(rec)-expression encode most of the
interesting things we want to do.
\begin{enumerate}
\item
\begin{verbatim}
let-closure x = [free-vars] [args] expr
in e
\end{verbatim}
is equivalent to
\begin{verbatim}
let x = (\free-vars -> \args -> expr) free-vars
\end{verbatim}
\tr{args} may be empty (and is for most closures).  It isn't under
circumstances like this:
\begin{verbatim}
let x = (\y -> y+z)
\end{verbatim}
This gets mangled to
\begin{verbatim}
let-closure x = [z] [y] (y+z)
\end{verbatim}
The idea is that we compile code for @(y+z)@ in an environment in which
@z@ is bound to an offset from \tr{Node}, and @y@ is bound to an
offset from the stack pointer.

(A let-closure is an @StgLet@ with a @StgRhsClosure@ RHS.)

\item
\begin{verbatim}
let-constructor x = Constructor [args]
in e
\end{verbatim}

(A let-constructor is an @StgLet@ with a @StgRhsCon@ RHS.)

\item
Letrec-expressions are essentially the same deal as
let-closure/let-constructor, so we use a common structure and
distinguish between them with an @is_recursive@ boolean flag.

\item
\begin{verbatim}
let-unboxed u = an arbitrary arithmetic expression in unboxed values
in e
\end{verbatim}
All the stuff on the RHS must be fully evaluated.
No function calls either!

(We've backed away from this toward case-expressions with
suitably-magical alts ...)

\item
~[Advanced stuff here! Not to start with, but makes pattern matching
generate more efficient code.]

\begin{verbatim}
let-escapes-not fail = expr
in e'
\end{verbatim}
Here the idea is that @e'@ guarantees not to put @fail@ in a data structure,
or pass it to another function. All @e'@ will ever do is tail-call @fail@.
Rather than build a closure for @fail@, all we need do is to record the stack
level at the moment of the @let-escapes-not@; then entering @fail@ is just
a matter of adjusting the stack pointer back down to that point and entering
the code for it.

Another example:
\begin{verbatim}
f x y = let z = huge-expression in
        if y==1 then z else
        if y==2 then z else
        1
\end{verbatim}

(A let-escapes-not is an @StgLetNoEscape@.)

\item
We may eventually want:
\begin{verbatim}
let-literal x = Literal
in e
\end{verbatim}
\end{enumerate}

And so the code for let(rec)-things:
-}

  | StgLet
        (GenStgBinding bndr occ)    -- right hand sides (see below)
        (GenStgExpr bndr occ)       -- body

  | StgLetNoEscape
        (GenStgBinding bndr occ)    -- right hand sides (see below)
        (GenStgExpr bndr occ)       -- body

{-
%************************************************************************
%*                                                                      *
\subsubsection{@GenStgExpr@: @hpc@, @scc@ and other debug annotations}
%*                                                                      *
%************************************************************************

Finally for @hpc@ expressions we introduce a new STG construct.
-}

  | StgTick
    (Tickish bndr)
    (GenStgExpr bndr occ)       -- sub expression

-- END of GenStgExpr

Gofer's Machine

Gofer has a rather curious abstract machine worth studying. See section 9 of Jones's writeup. It is more pantheistic than the GHC, borrowing from the three instruction machine, the G-machine, among others.

C is the Lingua Franca

OCaml is implemented in C, SML/NJ and (I think) Moscow Standard ML uses C, Haskell (GHC, Gofer/HUGS) uses C. The only language which doesn't use C is F#, and that's because C# is Microsoft's cash cow...or so I can infer/guess, the early history of F# is shrouded in mystery to me because it's not freely available nor is it openly discussed, and now F# is self-bootstrapped. Mark Jones justifies using C for implementing Gofer (section 3.1):

One of the first decisions to be made was the choice of implementation language. Given that Gofer was intended to run on a wide variety of machines, including small PCs with limited memory, the C language was an obvious candidate. Even so, the decision to use C was not easy; we believe that the development of Gofer would have been both easier, and less error-prone, had it been written in a strongly typed functional language which it accepts. This approach is often used by designers as a means of testing, and of demonstrating confidence, in the use of a new language for program development. For example, Chalmers LML [4], Standard ML of New Jersey [3] and Glasgow Haskell [12] are all able to compile their own source code. However, each of these systems requires substantial machine resources. In addition, we were also concerned about bootstrapping; each of the systems above relies on a compiler to achieve reasonable performance, while Gofer was originally conceived as an interpreter. Nevertheless, even though Gofer was written in an imperative language, it also shows strong influences from functional programming.

For what it's worth, Gofer implements a fairly decent Haskell environment in under 20k lines of C.

Good C style, well, we can examine xv6 which has the following characteristics:

  • struct is usually defined in headers, sometimes (when used "locally") in .c files
  • the only typedefs used are aliasing basic types (and 2 isolated typedef in umalloc.c, and 1 in mmu.h), all struct and enum types are still used as struct mytype x (as opposed to defining typedef struct mytype mytype)
  • K&R indentation is used, and two spaces are the indentation length
  • enum values are uppercased, macro constants are uppercased, everything else is lowercased
  • there's an average of 132 lines in any given .c file
  • seldom are functions longer than 40 lines

(Curiously, if we were to print out xv6 source code --- which, including whitespace, is 8273 lines of code --- the xv6.pdf is 91 pages; Hugs has 52674 lines of code in 61 .c files, including whitespace, which would produce something like 546 pages, assuming the number of pages is linear in the number of lines of code. Gofer, in contrast, has 25 .c files consisting of a total of 25147 lines of code, including whitespace. This isn't far off, Hugs is 429 pages, when put on letter paper landscaped.)

Clone this wiki locally