The Objective Caml system release 1.02 Documentation and user's manual Xavier Leroy September 27, 1996 Copyright oc 1996 Institut National de Recherche en Informatique et Automatique Contents I An introduction to Objective Caml 6 1 The core language 7 2 Objects in Caml 19 3 The module system 28 II The Objective Caml language 36 4 The Objective Caml language 37 4.1 Lexical conventions . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Type expressions. . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.7 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.8 Type and exception definitions. . . . . . . . . . . . . . . . . . 55 4.9 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.10 Module types (module specifications). . . . . . . . . . . . . . . 59 4.11 Module expressions (module implementations) . . . . . . . . . . . 62 4.12 Compilation units . . . . . . . . . . . . . . . . . . . . . . . . 65 5 Language extensions 66 5.1 Streams and stream parsers. . . . . . . . . . . . . . . . . . . . 66 5.2 Range patterns. . . . . . . . . . . . . . . . . . . . . . . . . . 67 III The Objective Caml tools 68 6 Batch compilation (ocamlc) 69 6.1 Overview of the compiler. . . . . . . . . . . . . . . . . . . . . 69 6.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3 Modules and the file system . . . . . . . . . . . . . . . . . . . 72 6.4 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7 The toplevel system (ocaml) 76 7.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2 Toplevel directives . . . . . . . . . . . . . . . . . . . . . . . 78 7.3 The toplevel and the module system. . . . . . . . . . . . . . . . 79 7.4 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.5 Building custom toplevel systems: ocamlmktop . . . . . . . . . . 80 7.6 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8 The runtime system (ocamlrun) 82 8.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 8.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 1 2 8.3 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9 Native-code compilation (ocamlopt) 85 9.1 Overview of the compiler. . . . . . . . . . . . . . . . . . . . . 85 9.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 9.3 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 87 9.4 Compatibility with the bytecode compiler. . . . . . . . . . . . . 87 10 Lexer and parser generators (ocamllex, ocamlyacc) 89 10.1 Overview of ocamllex. . . . . . . . . . . . . . . . . . . . . . . 89 10.2 Syntax of lexer definitions . . . . . . . . . . . . . . . . . . . 89 10.3 Overview of ocamlyacc . . . . . . . . . . . . . . . . . . . . . . 91 10.4 Syntax of grammar definitions . . . . . . . . . . . . . . . . . . 92 10.5 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 10.6 A complete example. . . . . . . . . . . . . . . . . . . . . . . . 94 11 Dependency generator (ocamldep) 96 11.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 11.2 A typical Makefile. . . . . . . . . . . . . . . . . . . . . . . . 96 12 Profiling (ocamlprof) 98 12.1 Compiling for profiling . . . . . . . . . . . . . . . . . . . . . 98 12.2 Profiling an execution. . . . . . . . . . . . . . . . . . . . . . 98 12.3 Printing profiling information. . . . . . . . . . . . . . . . . . 99 12.4 Time profiling. . . . . . . . . . . . . . . . . . . . . . . . . . 99 13 Interfacing C with Objective Caml 100 13.1 Overview and compilation information. . . . . . . . . . . . . . . 100 13.2 The value type. . . . . . . . . . . . . . . . . . . . . . . . . . 103 13.3 Representation of Caml Light data types . . . . . . . . . . . . . 104 13.4 Operations on values. . . . . . . . . . . . . . . . . . . . . . . 105 13.5 Living in harmony with the garbage collector. . . . . . . . . . . 108 13.6 A complete example. . . . . . . . . . . . . . . . . . . . . . . . 110 IV The Objective Caml library 113 14 The core library 114 14.1 Module Pervasives: the initially opened module . . . . . . . . . 114 15 The standard library 128 15.1 Module Arg: parsing of command line arguments. . . . . . . . . . 129 15.2 Module Array: array operations . . . . . . . . . . . . . . . . . 130 15.3 Module Char: character operations. . . . . . . . . . . . . . . . 131 15.4 Module Digest: MD5 message digest. . . . . . . . . . . . . . . . 132 15.5 Module Filename: operations on file names. . . . . . . . . . . . 132 15.6 Module Format: pretty printing . . . . . . . . . . . . . . . . . 133 15.7 Module Gc: memory management control and statistics. . . . . . . 139 15.8 Module Genlex: a generic lexical analyzer. . . . . . . . . . . . 140 15.9 Module Hashtbl: hash tables and hash functions . . . . . . . . . 141 15.10 Module Lexing: the run-time library for lexers generated by camllex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 15.11Module List: list operations . . . . . . . . . . . . . . . . . . 144 15.12Module Map: association tables over ordered types. . . . . . . . 146 15.13Module Oo: object-oriented extension . . . . . . . . . . . . . . 148 15.14Module Parsing: the run-time library for parsers generated by camlyacc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 15.15Module Printexc: a catch-all exception handler . . . . . . . . . 148 15.16Module Printf: formatting printing functions . . . . . . . . . . 149 3 15.17Module Queue: first-in first-out queues. . . . . . . . . . . . . 150 15.18Module Random: pseudo-random number generator. . . . . . . . . . 150 15.19Module Set: sets over ordered types. . . . . . . . . . . . . . . 151 15.20Module Sort: sorting and merging lists . . . . . . . . . . . . . 153 15.21Module Stack: last-in first-out stacks . . . . . . . . . . . . . 153 15.22Module Stream: streams and stream parsers operations . . . . . . 154 15.23Module String: string operations . . . . . . . . . . . . . . . . 155 15.24Module Sys: system interface . . . . . . . . . . . . . . . . . . 156 16 The unix library: Unix system calls 159 16.1 Module Unix: interface to the Unix system. . . . . . . . . . . . 160 17 The num library: arbitrary-precision rational arithmetic 180 17.1 Module Num: operation on arbitrary-precision numbers . . . . . . 181 17.2 Module Arith_status: flags that control rational arithmetic. . . 183 18 The str library: regular expressions and string processing 185 18.1 Module Str: regular expressions and high-level string processing 185 19 The threads library 189 19.1 Module Thread: user-level lightweight threads. . . . . . . . . . 189 19.2 Module Mutex: locks for mutual exclusion . . . . . . . . . . . . 191 19.3 Module Condition: condition variables to synchronize between threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 19.4 Module Event: first-class synchronous communication. . . . . . . 192 19.5 Module ThreadUnix: thread-compatible system calls. . . . . . . . 193 20 The graphics library 196 20.1 Module Graphics: machine-independent graphics primitives . . . . 197 21 The dbm library: access to NDBM databases 203 21.1 Module Dbm: interface to the NDBM databases. . . . . . . . . . . 203 V Appendix 205 Index to the library 206 Index of keywords 213 Foreword This manual documents the release 1.02 of the Objective Caml system. It is organized as follows. - Part I, ``An introduction to Objective Caml'', gives an overview of the language. - Part II, ``The Objective Caml language'', is the reference description of the language. - Part III, ``The Objective Caml tools, documents the compilers, toplevel system, and programming utilities. - Part IV, ``The Objective Caml library'', describes the modules provided in the standard library. - Part V, ``Appendix'', contains an index of all identifiers defined in the standard library, and an index of keywords. Conventions Objective Caml runs on several operating systems. The parts of this manual that are specific to one operating system are presented as shown below: Unix: This is material specific to Unix. Windows: This is material specific to MS Windows (NT and 95). License c The Objective Caml system is copyright o 1996 Institut National de Recherche en Informatique et en Automatique (INRIA). INRIA holds all ownership rights to the Objective Caml system. See the file LICENSE in the distribution for licensing information. The Objective Caml system can be freely copied, but not sold. More precisely, INRIA grants any user of the Objective Caml system the right to reproduce it, provided that the copies are distributed free of charge and under the conditions given in the LICENSE file. The present documentation is distributed under the same conditions. Availability The complete Objective Caml distribution resides on the machine ftp.inria.fr. The distribution files can be transferred by anonymous FTP: 4 5 Host: ftp.inria.fr (Internet address 192.93.2.54) Login name: anonymous Password: your e-mail address Directory: lang/caml-light Files: see the index in file README More information on the Caml family of languages is also available on the World Wide Web, http://pauillac.inria.fr/caml/. Part I An introduction to Objective Caml 6 Chapter 1 The core language This part of the manual is a tutorial introduction to the Objective Caml language. A good familiarity with programming in a conventional languages (say, Pascal or C) is assumed, but no prior exposure to functional languages is required. The present chapter introduces the core language. Chapter 2 deals with the object-oriented features, and chapter 3 with the module system. Basics For this overview of Caml, we use the interactive system, which is started by running ocaml from the Unix shell, or by launching the OCamlwin.exe application under Windows. This tutorial is presented as the transcript of a session with the interactive system: lines starting with # represent user input; the system responses are printed below, without a leading #. Under the interactive system, the user types Caml phrases, terminated by ;;, in response to the # prompt, and the system compiles them on the fly, executes them, and prints the outcome of evaluation. Phrases are either simple expressions, or let definitions of identifiers (either values or functions). #1+2*3;; - : int = 7 #let pi = 4.0 *. atan 1.0;; val pi : float = 3.14159265359 #let square x = x *. x;; val square : float -> float = #square(sin pi) +. square(cos pi);; - : float = 1 The Caml system computes both the value and the type for each phrase. Even function parameters need no explicit type declaration: the system infers their types from their usage in the function. Notice also that integers and floating-point numbers are distinct types, with distinct operators: + and * operate on integers, but +. and *. operate on floats. #1.0 * 2;; Characters 0-3: This expression has type float but is here used with type int Recursive functions are defined with the let rec binding: #let rec fib n = 7 Chapter 1. The core language 8 # if n < 2 then 1 else fib(n-1) + fib(n-2);; val fib : int -> int = #fib 10;; - : int = 89 Data types In addition to integers and floating-point numbers, Caml offers the usual basic data types: booleans, characters, and character strings. #(1 < 2) = false;; - : bool = false #'a';; - : char = 'a' #"Hello world";; - : string = "Hello world" Predefined data structures include tuples, arrays, and lists. General mechanisms for defining your own data structures are also provided. They will be covered in more details later; for now, we concentrate on lists. Lists are either given in extension as a bracketed list of semicolon-separated elements, or built from the empty list [] (pronounce ``nil'') by adding elements in front using the :: (``cons'') operator. #let l = ["is"; "a"; "tale"; "told"; "etc."];; val l : string list = ["is"; "a"; "tale"; "told"; "etc."] #"Life" :: l;; - : string list = ["Life"; "is"; "a"; "tale"; "told"; "etc."] As with all other Caml data structures, lists do not need to be explicitly allocated and deallocated from memory: all memory management is entirely automatic in Caml. Similarly, there is no explicit handling of pointers: the Caml compiler silently introduces pointers where necessary. As with most Caml data structures, inspecting and destructuring lists is performed by pattern-matching. List patterns have the exact same shape as list expressions, with identifier representing unspecified parts of the list. As an example, here is insertion sort on a list: #let rec sort lst = # match lst with # [] -> [] # | head :: tail -> insert head (sort tail) #and insert elt lst = # match lst with # [] -> [elt] # | head :: tail - > if elt <= head then elt :: lst else head :: insert elt tail #;; val sort : 'a list -> 'a list = val insert : 'a -> 'a list -> 'a list = #sort l;; - : string list = ["a"; "etc."; "is"; "tale"; "told"] Chapter 1. The core language 9 The type inferred for sort, 'a list -> 'a list, means that sort can actually apply to lists of any type, and returns a list of the same type. This is because the comparisons (=, <=, etc.) are polymorphic in Caml: they operate between any two values of the same type. This makes sort itself polymorphic over all list types. #sort [6;2;5;3];; - : int list = [2; 3; 5; 6] #sort [3.14; 2.718];; - : float list = [2.718; 3.14] The sort function above does not modify its input list: it builds and returns a new list containing the same elements as the input list, in ascending order. There is actually no way in Caml to modify in-place a list once it is built: we say that lists are immutable data structures. Most Caml data structures are immutable, but a few (most notably arrays) are mutable, meaning that they can be modified in-place at any time. Functions as values Caml is a functional language: functions in the full mathematical sense are supported and can be passed around freely just as any other piece of data. For instance, here is a deriv function that takes any float function as argument and returns an approximation of its derivative function: #let deriv f dx = function x -> (f(x +. dx) -. f(x)) /. dx;; val deriv : (float -> float) -> float -> float -> float = #let cos' = deriv sin 1e-6;; val cos' : float -> float = #cos' pi;; - : float = -1.00000000014 Even function composition is definable: #let compose f g = function x -> f(g(x));; val compose : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = #let cos2 = compose square cos;; val cos2 : float -> float = Functions that take other functions as arguments are called ``functionals'', or ``higher-order functions''. Functionals are especially useful to provide iterators or similar generic operations over a data structure. For instance, the standard Caml library provides a List.map functional that applies a given function to each element of a list, and returns the list of the results: #List.map (function n -> n * 2 + 1) [0;1;2;3;4];; - : int list = [1; 3; 5; 7; 9] This functional, along with a number of other list and array functionals, is predefined because it is often useful, but there is nothing magic with it: it can easily be defined as follows. #let rec map f l = # match l with Chapter 1. The core language 10 # [] -> [] # | hd :: tl -> f hd :: map f tl;; val map : ('a -> 'b) -> 'a list -> 'b list = Records and variants User-defined data structures include records and variants. Both are defined with the type declaration. Here, we declare a record type to represent rational numbers. #type ratio = {num: int; denum: int};; type ratio = { num: int; denum: int } #let add_ratio r1 r2 = # {num = r1.num * r2.denum + r2.num * r1.denum; # denum = r1.denum * r2.denum};; val add_ratio : ratio -> ratio -> ratio = #add_ratio {num=1; denum=3} {num=2; denum=5};; - : ratio = {num=11; denum=15} The declaration of a variant type lists all possible shapes for values of that type. Each case is identified by a name, called a constructor, which serves both for constructing values of the variant type and inspecting them by pattern-matching. Constructor names are capitalized to distinguish them from variable names (which must start with a lowercase letter). For instance, here is a variant type for doing mixed arithmetic (integers and floats): #type number = Int of int | Float of float | Error;; type number = Int of int | Float of float | Error This declaration expresses that a value of type number is either an integer, a floating-point number, or the constant Error representing the result of an invalid operation (e.g. a division by zero). Enumerated types are a special case of variant types, where all alternatives are constants: #type sign = Positive | Negative;; type sign = Positive | Negative #let sign_int n = if n >= 0 then Positive else Negative;; val sign_int : int -> sign = To define arithmetic operations for the number type, we use pattern-matching on the two numbers involved: #let add_num n1 n2 = # match (n1, n2) with # (Int i1, Int i2) -> # (* Check for overflow of integer addition *) # if sign_int i1 = sign_int i2 && sign_int(i1 + i2) <> sign_int i1 # then Float(float i1 +. float i2) # else Int(i1 + i2) # | (Int i1, Float f2) -> Float(float i1 +. f2) # | (Float f1, Int i2) -> Float(f1 +. float i2) # | (Float f1, Float f2) -> Float(f1 +. f2) # | (Error, _) -> Error Chapter 1. The core language 11 # | (_, Error) -> Error;; val add_num : number -> number -> number = #add_num (Int 123) (Float 3.14159);; - : number = Float 126.14159 The most common usage of variant types is to describe recursive data structures. Consider for example the type of binary trees: #type 'a btree = Empty | Node of 'a * 'a btree * 'a btree;; type 'a btree = Empty | Node of 'a * 'a btree * 'a btree This definition reads as follow: a binary tree containing values of type 'a (an arbitrary type) is either empty, or is a node containing one value of type 'a and two subtrees containing also values of type 'a, that is, two 'a btree. Operations on binary trees are naturally expressed as recursive functions following the same structure as the type definition itself. For instance, here are functions performing lookup and insertion in ordered binary trees (elements increase from left to right): #let rec member x btree = # match btree with # Empty -> false # | Node(y, left, right) -> # if x = y then true else # if x < y then member x left else member x right;; val member : 'a -> 'a btree -> bool = #let rec insert x btree = # match btree with # Empty -> Node(x, Empty, Empty) # | Node(y, left, right) -> # if x <= y then Node(y, insert x left, right) # else Node(y, left, insert x right);; val insert : 'a -> 'a btree -> 'a btree = Imperative features Though all examples so far were written in purely applicative style, Caml is also equipped with full imperative features. This includes the usual while and for loops, as well as mutable data structures such as arrays. Arrays are either given in extension between [| and |] brackets, or allocated and initialized with the Array.create function, then filled up later by assignments. For instance, the function below sums two vectors (represented as float arrays) componentwise. #let add_vect v1 v2 = # let len = min (Array.length v1) (Array.length v2) in # let res = Array.create len 0.0 in # for i = 0 to len - 1 do # res.(i) <- v1.(i) +. v2.(i) # done; # res;; val add_vect : float array -> float array -> float array = #add_vect [| 1.0; 2.0 |] [| 3.0; 4.0 |];; - : float array = [|4; 6|] Chapter 1. The core language 12 Record fields can also be modified by assignment, provided they are declared mutable in the definition of the record type: #type mutable_point = { mutable x: float; mutable y: float };; type mutable_point = { mutable x: float; mutable y: float } #let translate p dx dy = # p.x <- p.x +. dx; p.y <- p.y +. dy;; val translate : mutable_point -> float -> float -> unit = #let mypoint = { x = 0.0; y = 0.0 };; val mypoint : mutable_point = {x=0; y=0} #translate mypoint 1.0 2.0;; - : unit = () #mypoint;; - : mutable_point = {x=1; y=2} Caml has no built-in notion of variable -- identifiers whose current value can be changed by assignment. (The let binding is not an assignment, it introduces a new identifier with a new scope.) However, the standard library provides references, which are mutable indirection cells (or one-element arrays), with operators ! to fetch the current contents of the reference and := to assign the contents. Variables can then be emulated by let-binding a reference. For instance, here is an in-place insertion sort over arrays: #let insertion_sort a = # for i = 1 to Array.length a - 1 do # let val_i = a.(i) in # let j = ref i in # while !j > 0 && val_i < a.(!j) do # a.(!j) <- a.(!j - 1); # j := !j - 1 # done; # a.(!j) <- val_i # done;; val insertion_sort : 'a array -> unit = References are also useful to write functions that maintain a current state between two calls to the function. For instance, the following pseudo-random number generator keeps the last returned number in a reference: #let current_rand = ref 0;; val current_rand : int ref = {contents=0} #let random () = # current_rand := !current_rand * 25713 + 1345; # !current_rand;; val random : unit -> int = Again, there is nothing magic with references: they are implemented as a one-field mutable record, as follows. #type 'a ref = { mutable contents: 'a };; type 'a ref = { mutable contents: 'a } #let (!) r = r.contents;; Chapter 1. The core language 13 val ! : 'a ref -> 'a = #let (:=) r newval = r.contents <- newval;; val := : 'a ref -> 'a -> unit = Exceptions Caml provides exceptions for signalling and handling exceptional conditions. Exceptions can also be used as a general-purpose non-local control structures. Exceptions are declared with the exception construct, and signalled with the raise operator. For instance, the function below for taking the head of a list uses an exception to signal the case where an empty list is given. #exception Empty_list;; exception Empty_list #let head l = # match l with # [] -> raise Empty_list # | hd :: tl -> hd;; val head : 'a list -> 'a = #head [1;2];; - : int = 1 #head [];; Uncaught exception: Empty_list Exceptions are used throughout the standard library to signal cases where the library functions cannot complete normally. For instance, the List.assoc function, which returns the data associated with a given key in a list of (key, data) pairs, raises the predefined exception Not_found when the key does not appear in the list: #List.assoc 1 [(0, "zero"); (1, "one")];; - : string = "one" #List.assoc 2 [(0, "zero"); (1, "one")];; Uncaught exception: Not_found Exceptions can be trapped with the try...with construct: #let name_of_binary_digit digit = # try # List.assoc digit [0, "zero"; 1, "one"] # with Not_found -> # "not a binary digit";; val name_of_binary_digit : int -> string = #name_of_binary_digit 0;; - : string = "zero" #name_of_binary_digit (-1);; - : string = "not a binary digit" The with part is actually a regular pattern-matching on the exception value. Thus, several exceptions can be caught by one try...with construct. Also, Chapter 1. The core language 14 finalization can be performed by trapping all exceptions, performing the finalization, then raising again the exception: #let temporarily_set_reference ref newval funct = # let oldval = !ref in # try # ref := newval; # let res = funct () in # ref := oldval; # res # with x -> # ref := oldval; # raise x;; val temporarily_set_reference : 'a ref -> 'a -> (unit -> 'b) -> 'b = Symbolic processing of expressions We finish this introduction with a more complete example representative of the use of Caml for symbolic processing: formal manipulations of arithmetic expressions containing variables. The following variant type describes the expressions we shall manipulate: #type expression = # Const of float # | Var of string # | Sum of expression * expression (* e1 + e2 *) # | Diff of expression * expression (* e1 - e2 *) # | Prod of expression * expression (* e1 * e2 *) # | Quot of expression * expression (* e1 / e2 *) #;; type expression = Const of float | Var of string | Sum of expression * expression | Diff of expression * expression | Prod of expression * expression | Quot of expression * expression We first define a function to evaluate an expression given an environment that maps variable names to their values. For simplicity, the environment is represented as an association list. #exception Unbound_variable of string;; exception Unbound_variable of string #let rec eval env exp = # match exp with # Const c -> c # | Var v -> # (try List.assoc v env with Not_found -> raise(Unbound_variable v)) # | Sum(f, g) -> eval env f +. eval env g # | Diff(f, g) -> eval env f -. eval env g # | Prod(f, g) -> eval env f *. eval env g # | Quot(f, g) -> eval env f /. eval env g;; val eval : (string * float) list -> expression -> float = #eval [("x", 1.0); ("y", 3.14)] (Prod(Sum(Var "x", Const 2.0), Var "y"));; Chapter 1. The core language 15 - : float = 9.42 Now for a real symbolic processing, we define the derivative of an expression with respect to a variable dv: #let rec deriv exp dv = # match exp with # Const c -> Const 0.0 # | Var v -> if v = dv then Const 1.0 else Const 0.0 # | Sum(f, g) -> Sum(deriv f dv, deriv g dv) # | Diff(f, g) -> Diff(deriv f dv, deriv g dv) # | Prod(f, g) -> Sum(Prod(f, deriv g dv), Prod(deriv f dv, g)) # | Quot(f, g) -> Quot(Diff(Prod(deriv f dv, g), Prod(f, deriv g dv)), # Prod(g, g)) #;; val deriv : expression -> string -> expression = #deriv (Quot(Const 1.0, Var "x")) "x";; - : expression = Quot (Diff (Prod (Const 0, Var "x"), Prod (Const 1, Const 1)), Prod (Var "x", Var "x")) Pretty-printing and parsing As shown in the examples above, the internal representation (also called abstract syntax) of expressions quickly becomes hard to read and write as the expressions get larger. We need a printer and a parser to go back and forth between the abstract syntax and the concrete syntax, which in the case of expressions is the familiar algebraic notation (e.g. 2*x+1). For the printing function, we take into account the usual precedence rules (i.e. * binds tighter than +) to avoid printing unnecessary parentheses. To this end, we maintain the current operator precedence and print parentheses around an operator only if its precedence is less than the current precedence. #let print_expr exp = # (* Local function definitions *) # let open_paren prec op_prec = # if prec > op_prec then print_string "(" in # let close_paren prec op_prec = # if prec > op_prec then print_string ")" in # let rec print prec exp = (* prec is the current precedence *) # match exp with # Const c -> print_float c # | Var v -> print_string v # | Sum(f, g) -> # open_paren prec 0; # print 0 f; print_string " + "; print 0 g; # close_paren prec 0 # | Diff(f, g) -> # open_paren prec 0; # print 0 f; print_string " - "; print 1 g; # close_paren prec 0 # | Prod(f, g) -> # open_paren prec 2; # print 2 f; print_string " * "; print 2 g; # close_paren prec 2 Chapter 1. The core language 16 # | Quot(f, g) -> # open_paren prec 2; # print 2 f; print_string " / "; print 3 g; # close_paren prec 2 # in print 0 exp;; val print_expr : expression -> unit = #let e = Sum(Prod(Const 2.0, Var "x"), Const 1.0);; val e : expression = Sum (Prod (Const 2, Var "x"), Const 1) #print_expr e; print_newline();; 2 * x + 1 - : unit = () #print_expr (deriv e "x"); print_newline();; 2 * 1 + 0 * x + 0 - : unit = () Parsing (transforming concrete syntax into abstract syntax) is usually more delicate. Caml offers several tools to help write parsers: on the one hand, Caml versions of the lexer generator Lex and the parser generator Yacc (see chapter 10), which handle LALR(1) languages using push-down automata; on the other hand, a predefined type of streams (of characters or tokens) and pattern-matching over streams, which facilitate the writing of recursive-descent parsers for LL(1) languages. An example using ocamllex and ocamlyacc is given in chapter 10. Here, we will use stream parsers. #open Genlex;; #let lexer = make_lexer ["("; ")"; "+"; "-"; "*"; "/"];; val lexer : char Stream.t -> Genlex.token Stream.t = For the lexical analysis phase (transformation of the input text into a stream of tokens), we use a ``generic'' lexer provided in the standard library module Genlex. The make_lexer function takes a list of keywords and returns a lexing function that ``tokenizes'' an input stream of characters. Tokens are either identifiers, keywords, or literals (integer, floats, characters, strings). Whitespace and comments are skipped. #let token_stream = lexer(Stream.of_string "1.0 +x");; val token_stream : Genlex.token Stream.t = #Stream.next token_stream;; - : Genlex.token = Float 1 #Stream.next token_stream;; - : Genlex.token = Kwd "+" #Stream.next token_stream;; - : Genlex.token = Ident "x" The parser itself operates by pattern-matching on the stream of tokens. As usual with recursive descent parsers, we use several intermediate parsing functions to reflect the precedence and associativity of operators. Pattern-matching over streams is more powerful than on regular data structures, as it allows recursive calls to parsing functions inside the patterns, for matching sub-components of the input stream. See chapter 5 for more details. Chapter 1. The core language 17 #let rec parse_expr = parser # [< e1 = parse_mult; e = parse_more_adds e1 >] -> e #and parse_more_adds e1 = parser # [< 'Kwd "+"; e2 = parse_mult; e = parse_more_adds (Sum(e1, e2)) >] -> e # | [< 'Kwd "-"; e2 = parse_mult; e = parse_more_adds (Diff(e1, e2)) >] -> e # | [< >] -> e1 #and parse_mult = parser # [< e1 = parse_simple; e = parse_more_mults e1 >] -> e #and parse_more_mults e1 = parser # [< 'Kwd "*"; e2 = parse_simple; e = parse_more_mults (Prod(e1, e2)) >] - > e # | [< 'Kwd "/"; e2 = parse_simple; e = parse_more_mults (Quot(e1, e2)) >] - > e # | [< >] -> e1 #and parse_simple = parser # [< 'Ident s >] -> Var s # | [< 'Int i >] -> Const(float i) # | [< 'Float f >] -> Const f # | [< 'Kwd "("; e = parse_expr; 'Kwd ")" >] -> e;; val parse_expr : Genlex.token Stream.t -> expression = val parse_more_adds : expression -> Genlex.token Stream.t -> expression = val parse_mult : Genlex.token Stream.t -> expression = val parse_more_mults : expression -> Genlex.token Stream.t -> expression = val parse_simple : Genlex.token Stream.t -> expression = Composing the lexer and parser, we finally obtain a function to read an expression from a character string: #let read_expr s = parse_expr(lexer(Stream.of_string s));; val read_expr : string -> expression = #read_expr "2*(x+y)";; - : expression = Prod (Const 2, Sum (Var "x", Var "y")) Standalone Caml programs All examples given so far were executed under the interactive system. Caml code can also be compiled separately and executed non-interactively using the batch compilers ocamlc or ocamlopt. The source code must be put in a file with extension .ml. It consists of a sequence of phrases, which will be evaluated at runtime in their order of appearance in the source file. Unlike in interactive mode, types and values are not printed automatically; the program must call printing functions explicitly to produce some output. Here is a sample standalone program to print Fibonacci numbers: (* File fib.ml *) let rec fib n = if n < 2 then 1 else fib(n-1) + fib(n-2);; let main () = let arg = int_of_string Sys.argv.(1) in print_int(fib arg); print_newline(); exit 0;; main ();; Chapter 1. The core language 18 Sys.argv is an array of strings containing the command-line parameters. Sys.argv.(1) is thus the first command-line parameter. The program above is compiled and executed with the following shell commands: $ ocamlc -o fib fib.ml $ ./fib 10 89 $ ./fib 20 10946 Chapter 2 Objects in Caml This chapter gives an overview of the object-oriented features of Objective Caml. Classes and objects The class point has one instance variable x and two methods get_x and move. The initial value of the instance variable is given here by the class parameter x_init. The variable x is declared mutable, so the method move can change its value. #class point x_init = # val mutable x = x_init # method get_x = x # method move d = x <- x + d #end;; class point (int) = val mutable x : int method get_x : int method move : int -> unit end We now create a new point p, giving the initialization argument 7. #let p = new point 7;; val p : point = Note that the type of p is point. This is an abbreviation automatically defined by the class definition above. It stands for the object type < get_x : int; move : int -> unit>, listing the methods of class point along with their types. Let us apply some methods to p: #p#get_x;; - : int = 7 #p#move 3;; - : unit = () #p#get_x;; - : int = 10 The library function Oo.copy copies an object. Its type is < .. > as 'a -> 'a. The keyword as in that type binds the type variable 'a to the object type 19 Chapter 2. Objects in Caml 20 < .. >. Oo.copy therefore takes an object with any methods (represented by the ellipsis), and returns an object of the same type. The type of Oo.copy is different from type < .. > -> < .. > as each ellipsis represents a different set of methods. Ellipsis actually behaves as a type variable. #let q = Oo.copy p;; val q : point = #q#move 7; (p#get_x, q#get_x);; - : int * int = 10, 17 Inheritance We now define a new class color_point. This class inherits from class point. So, it has all the instance variable and all the methods of point, plus a new instance variable c and a new method color. #class color_point x (c : string) = # inherit point x # val c = c # method color = c #end;; class color_point (int) (string) = val c : string val mutable x : int method get_x : int method move : int -> unit method color : string end #let p' = new color_point 5 "red";; val p' : color_point = #p'#get_x, p'#color;; - : int * string = 5, "red" A point and a color point have incompatible types: a point has no method color. Thus, if one wants to put p and p' in the same list, one needs to coerce p' to the type of points, hiding its color method. #let l = [p; (p' :> point)];; val l : point list = [; ] The function get_x below is a generic function applying method get_x to any object p which has this method (and possibly some others, which are represented by an ellipsis in the type). The method needs not be declared previously, as shown by the companion function set_x. Function get_x can then be mapped on list l. #let get_x p = p#get_x;; val get_x : < get_x : 'a; .. > -> 'a = #let set_x p = p#set_x;; val set_x : < set_x : 'a; .. > -> 'a = #List.map get_x l;; - : int list = [10; 5] Chapter 2. Objects in Caml 21 Parameterized classes Reference cells can also be implemented as objects: #class ref x_init = # val mutable x = x_init # method get = x # method set y = x <- y #end;; Characters 5-85: The type variable 'a is not bound in implicit type definition ref = < get : 'a; set : 'a -> unit > It should be captured by a class type parameter The reason why this definition does not typecheck is that at least one of the methods has a polymorphic type (here, the type of the value stored in the reference cell), thus the class should be parametric. A monomorphic instance of the class could be defined by: #class ref (x_init:int) = # val mutable x = x_init # method get = x # method set y = x <- y #end;; class ref (int) = val mutable x : int method get : int method set : int -> unit end A class for polymorphic references must explicitly list the type parameters in its declaration. The type parameters must also be bound somewhere in the class body by a type constraint. #class 'a ref x_init = # val mutable x = (x_init : 'a) # method get = x # method set y = x <- y #end;; class 'a ref ('a) = val mutable x : 'a method get : 'a method set : 'a -> unit end #let r = new ref 1 in r#set 2; (r#get);; - : int = 2 The type parameter in the declaration may actually be constrained in the body of the class definition. In the class type, the actual value of the type parameter is displayed in the constraint clause. #class 'a ref (x_init:'a) = # val mutable x = x_init + 1 # method get = x # method set y = x <- y #end;; class 'a ref ('a) = Chapter 2. Objects in Caml 22 constraint 'a = int val mutable x : int method get : int method set : int -> unit end Let us consider a more realistic example. We put an additional type constraint in method move, since no free variables must remain uncaptured by a type parameter. #class 'a circle (c : 'a) = # val mutable center = c # method center = center # method set_center c = center <- c # method move = (center#move : int -> unit) #end;; class 'a circle ('a) = constraint 'a = < move : int -> unit; .. > val mutable center : 'a method center : 'a method set_center : 'a -> unit method move : int -> unit end An alternate definition of circle, using a constraint clause in the class definition, is shown below. The type #point used below in the constraint clause is an abbreviation produced by the definition of class point. This abbreviation unifies with the type of any object belonging to a subclass of class point. It actually expands to < get_x : int; move : int -> unit; .. >. This leads to the following alternate definition of circle, which has slightly stronger constraints on its argument, as we now expect center to have a method get_x. #class 'a circle (c : 'a) = # constraint 'a = #point # val mutable center = c # method center = center # method set_center c = center <- c # method move = center#move #end;; class 'a circle ('a) = constraint 'a = #point val mutable center : 'a method center : 'a method set_center : 'a -> unit method move : int -> unit end The class color_circle is a specialized version of class circle which requires the type of the center to unify with #color_point, and adds a method color. #class 'a color_circle c = # constraint 'a = #color_point # inherit ('a) circle c # method color = center#color #end;; class 'a color_circle ('a) = Chapter 2. Objects in Caml 23 constraint 'a = < get_x : int; move : int -> unit; color : string; .. > val mutable center : 'a method center : 'a method set_center : 'a -> unit method move : int -> unit method color : string end Self reference A method can also send messages to the object that invoked the method. For that, self must be explicitly bound, here to the variable s. #class printable_point y as s = # inherit point y # method print = print_int s#get_x #end;; class printable_point (int) = val mutable x : int method get_x : int method move : int -> unit method print : unit end #let p = new printable_point 7;; val p : printable_point = #p#print;; 7- : unit = () The variable s is bound at the invocation of a method. In particular, if the class printable_point is inherited, the variable s will correctly be bound to an object of the subclass. Multiple inheritance Multiple inheritance is allowed. Only the last definition of a method (or of an instance variable) is kept. But previous definitions of a method can be reused by binding the related ancestor. Here, super is bound to the ancestor printable_point. The name super is not actually a variable and can only be used to select a method as in super#print. #class printable_color_point y c as self = # inherit color_point y c # inherit printable_point y as super # method print = # print_string "("; # super#print; # print_string ", "; # print_string (self#color); # print_string ")" #end;; class printable_color_point (int) (string) = val c : string val mutable x : int method get_x : int Chapter 2. Objects in Caml 24 method move : int -> unit method color : string method print : unit end #let p' = new printable_color_point 7 "red";; val p' : printable_color_point = #p'#print;; (7, red)- : unit = () Non-mutable objects It is possible to write a version of class point without assignments on the instance variables. The construct {< ... >} returns a copy of ``self'' (that is, the current object), possibly changing the value of some instance variables. #class functional_point y = # val x = y # method get_x = x # method move d = {< x = x + d >} #end;; class functional_point (int) : 'a = val x : int method get_x : int method move : int -> 'a end #let p = new functional_point 7;; val p : functional_point = #p#get_x;; - : int = 7 #(p#move 3)#get_x;; - : int = 10 #p#get_x;; - : int = 7 Note that the type abbreviation functional_point is recursive, which can be seen in the class type of functional_point: the type of self to 'a and 'a appears inside the type of the move method. Virtual methods The class comparable below is a template for classes with a binary method leq of type 'a -> bool where the type variable 'a is bound to the type of self. Since this class has a method declared but not defined, it must be flagged virtual and cannot be instantiated (that is, no object of this class can be created). It still defines abbreviations. In particular, #comparable expands to < leq : 'a -> bool; .. > as 'a. We see here that the binder as also allows to write recursive types. #class virtual comparable () : 'a = Chapter 2. Objects in Caml 25 # virtual leq : 'a -> bool #end;; class virtual comparable (unit) : 'a = virtual leq : 'a -> bool end We then defines a subclass of comparable, which wraps integers as comparable objects. There is a type constraint on the class parameter x as the primitive <= is a polymorphic comparison function in Objective Caml. The inherit clause ensures that the type of objects of this class is an instance of #comparable. #class int_comparable (x : int) = # inherit comparable () # val x = x # method x = x # method leq p = x <= p#x #end;; class int_comparable (int) : 'a = val x : int method leq : 'a -> bool method x : int end Objects of class int_comparable2 below can also modify the integer they hold. The status of instance variable x is changed. It is now mutable and private; that is, subclasses cannot access it (it does no longer appear in the class type). Note that the type int_comparable2 is not a subtype of type int_comparable, as the self type appears in contravariant position in the type of method leq. #class int_comparable2 x = # inherit int_comparable x # val private mutable x # method set_x y = x <- y #end;; class int_comparable2 (int) : 'a = method leq : 'a -> bool method x : int method set_x : int -> unit end The function min will return the minimum of any two objects whose type unify with #comparable. The type of min is not the same as #comparable -> #comparable -> #comparable, as the abbreviation #comparable hides a type variable (an ellipsis). Each occurrence of this abbreviation generates a new variable. #let min (x : #comparable) y = # if x#leq y then x else y;; val min : (#comparable as 'a) -> 'a -> 'a = This function can be applied to objects of type int_comparable or int_comparable2. #(min (new int_comparable 7) (new int_comparable 11))#x;; - : int = 7 #(min (new int_comparable2 5) (new int_comparable2 3))#x;; - : int = 3 Chapter 2. Objects in Caml 26 Recursive classes This last example is somewhat more complicated. It demonstrates recursive classes. Indeed, lists can also be implemented by classes. We recursively define three classes: 1. A virtual class lst implements functional over lists, here a map method, an interation method, and a printing method. 2. A class nil defines the behavior of the empty list. 3. A class cons defines the behavior of non-empty lists. All three classes have the same interface. The class lst depends on classes nil and cons since the map method creates objets of the classes nil and cons, which themselves depend on the class lst, since they both inherits the class lst. #class virtual 'a lst () as self = # virtual null : bool # virtual hd : 'a # virtual tl : 'a lst # method map f = # (if self#null then # new nil () # else # new cons (f self#hd) (self#tl#map f) # : 'a lst) # method iter (f : 'a -> unit) = # if self#null then () # else begin # f self#hd; # self#tl#iter f # end # method print (f : 'a -> unit) = # print_string "("; # self#iter (fun x -> f x; print_string "::"); # print_string "[]"; # print_string ")" #and 'a nil () = # inherit ('a) lst () # method null = true # method hd = failwith "hd" # method tl = failwith "tl" # #and 'a cons h t = # inherit ('a) lst () # val h = h val t = t # method null = false # method hd = h # method tl = t #end;; class virtual 'a lst (unit) = method map : ('a -> 'a) -> 'a lst method iter : ('a -> unit) -> unit method print : ('a -> unit) -> unit virtual null : bool virtual hd : 'a Chapter 2. Objects in Caml 27 virtual tl : 'a lst end class 'a nil (unit) = method null : bool method hd : 'a method tl : 'a lst method map : ('a -> 'a) -> 'a lst method iter : ('a -> unit) -> unit method print : ('a -> unit) -> unit end class 'a cons ('a) ('a lst) = val h : 'a val t : 'a lst method null : bool method hd : 'a method tl : 'a lst method map : ('a -> 'a) -> 'a lst method iter : ('a -> unit) -> unit method print : ('a -> unit) -> unit end It is a weakness of Objective Caml that objects cannot have polymorphic methods: the method map can only return a list of the same type as the objet to which it applies. #let l1 = new cons 3 (new cons 10 (new nil ()));; val l1 : int cons = #l1#print print_int;; (3::10::[])- : unit = () #let l2 = l1#map (fun x -> x + 1);; val l2 : int lst = #l2#print print_int;; (4::11::[])- : unit = () A polymorphic map function can still be defined: #let rec map_list f (x:'a lst) = # if x#null then new nil() # else new cons (f x#hd) (map_list f x#tl);; val map_list : ('a -> 'b) -> 'a lst -> 'b lst = #let p1 = (map_list (fun x -> new printable_color_point x "red") l1);; val p1 : printable_color_point lst = #p1#print (fun x -> x#print);; ((3, red)::(10, red)::[])- : unit = () Chapter 3 The module system This chapter introduces the module system of Objective Caml. Structures A primary motivation for modules is to package together related definitions (such as the definitions of a data type and associated operations over that type) and enforce a consistent naming scheme for these definitions. This avoids running out of names or accidentally confusing names. Such a package is called a structure and is introduced by the struct...end construct, which contains an arbitrary sequence of definitions. The structure is usually given a name with the module binding. Here is for instance a structure packaging together a type of priority queues and their operations: #module PrioQueue = # struct # type priority = int # type 'a queue = Empty | Node of priority * 'a * 'a queue * 'a queue # let empty = Empty # let rec insert queue prio elt = # match queue with # Empty -> Node(prio, elt, Empty, Empty) # | Node(p, e, left, right) -> # if prio <= p # then Node(prio, elt, insert right p e, left) # else Node(p, e, insert right prio elt, left) # exception Queue_is_empty # let rec remove_top = function # Empty -> raise Queue_is_empty # | Node(prio, elt, left, Empty) -> left # | Node(prio, elt, Empty, right) -> right # | Node(prio, elt, (Node(lprio, lelt, _, _) as left), # (Node(rprio, relt, _, _) as right)) -> # if lprio <= rprio # then Node(lprio, lelt, remove_top left, right) # else Node(rprio, relt, left, remove_top right) # let extract = function # Empty -> raise Queue_is_empty # | Node(prio, elt, _, _) as queue -> (prio, elt, remove_top queue) # end;; module PrioQueue : sig type priority = int type 'a queue = Empty | Node of priority * 'a * 'a queue * 'a queue 28 Chapter 3. The module system 29 val empty : 'a queue val insert : 'a queue -> priority -> 'a -> 'a queue exception Queue_is_empty val remove_top : 'a queue -> 'a queue val extract : 'a queue -> priority * 'a * 'a queue end Outside the structure, its components can be referred to using the ``dot notation'', that is, identifiers qualified by a structure name. For instance, PrioQueue.insert in a value context is the function insert defined inside the structure PrioQueue. Similarly, PrioQueue.queue in a type context is the type queue defined in PrioQueue. #PrioQueue.insert PrioQueue.empty 1 "hello";; - : string PrioQueue.queue = PrioQueue.Node (1, "hello", PrioQueue.Empty, PrioQueue.Empty) Signatures Signatures are interfaces for structures. A signature specifies which components of a structure are accessible from the outside, and with which type. It can be used to hide some components of a structure (e.g. local function definitions) or export some components with a restricted type. For instance, the signature below specifies the three priority queue operations empty, insert and extract, but not the auxiliary function remove_top. Similarly, it makes the queue type abstract (by not providing its actual representation as a concrete type). #module type PRIOQUEUE = # sig # type priority = int (* still concrete *) # type 'a queue (* now abstract *) # val empty : 'a queue # val insert : 'a queue -> int -> 'a -> 'a queue # val extract : 'a queue -> int * 'a * 'a queue # exception Queue_is_empty # end;; module type PRIOQUEUE = sig type priority = int type 'a queue val empty : 'a queue val insert : 'a queue -> int -> 'a -> 'a queue val extract : 'a queue -> int * 'a * 'a queue exception Queue_is_empty end Restricting the PrioQueue structure by this signature results in another view of the PrioQueue structure where the remove_top function is not accessible and the actual representation of priority queues is hidden: #module AbstractPrioQueue = (PrioQueue : PRIOQUEUE);; module AbstractPrioQueue : PRIOQUEUE #AbstractPrioQueue.remove_top;; Characters 0-28: Unbound value AbstractPrioQueue.remove_top Chapter 3. The module system 30 #AbstractPrioQueue.insert AbstractPrioQueue.empty 1 "hello";; - : string AbstractPrioQueue.queue = The restriction can also be performed during the definition of the structure, as in module PrioQueue = (struct ... end : PRIOQUEUE);; An alternate syntax is provided for the above: module PrioQueue : PRIOQUEUE = struct ... end;; Functors Functors are ``functions'' from structures to structures. They are used to express parameterized structures: a structure A parameterized by a structure B is simply a functor F with a formal parameter B (along with the expected signature for B) which returns the actual structure A itself. The functor F can then be applied to one or several implementations B1,...,Bn of B, yielding the corresponding structures A1,...,An. For instance, here is a structure implementing sets as sorted lists, parameterized by a structure providing the type of the set elements and an ordering function over this type (used to keep the sets sorted): #type comparison = Less | Equal | Greater;; type comparison = Less | Equal | Greater #module type ORDERED_TYPE = # sig # type t # val cmp: t -> t -> comparison # end;; module type ORDERED_TYPE = sig type t val cmp : t -> t -> comparison end #module Set = # functor (Elt: ORDERED_TYPE) -> # struct # type element = Elt.t # type set = element list # let empty = [] # let rec add x s = # match s with # [] -> [x] # | hd::tl -> # match Elt.cmp x hd with # Equal -> s (* x is already in s *) # | Less -> x :: s (* x is smaller than all elements of s *) # | Greater -> hd :: add x tl # let rec member x s = # match s with # [] -> false # | hd::tl -> # match Elt.cmp x hd with # Equal -> true (* x belongs to s *) # | Less -> false (* x is smaller than all elements of s *) # | Greater -> member x tl # end;; module Set : Chapter 3. The module system 31 functor(Elt : ORDERED_TYPE) -> sig type element = Elt.t type set = element list val empty : 'a list val add : Elt.t -> Elt.t list -> Elt.t list val member : Elt.t -> Elt.t list -> bool end By applying the Set functor to a structure implementing an ordered type, we obtain set operations for this type: #module OrderedString = # struct # type t = string # let cmp x y = if x = y then Equal else if x < y then Less else Greater # end;; module OrderedString : sig type t = string val cmp : 'a -> 'a -> comparison end #module StringSet = Set(OrderedString);; module StringSet : sig type element = OrderedString.t type set = element list val empty : 'a list val add : OrderedString.t -> OrderedString.t list -> OrderedString.t list val member : OrderedString.t -> OrderedString.t list -> bool end #StringSet.member "bar" (StringSet.add "foo" StringSet.empty);; - : bool = false Functors and type abstraction As in the PrioQueue example, it would be good style to hide the actual implementation of the type set, so that users of the structure will not rely on sets being lists, and we can switch later to another, more efficient representation of sets without breaking their code. This can be achieved by restricting Set by a suitable functor signature: #module type SETFUNCTOR = # functor (Elt: ORDERED_TYPE) -> # sig # type element = Elt.t (* concrete *) # type set (* abstract *) # val empty : set # val add : element -> set -> set # val member : element -> set -> bool # end;; module type SETFUNCTOR = functor(Elt : ORDERED_TYPE) -> sig type element = Elt.t type set val empty : set val add : element -> set -> set Chapter 3. The module system 32 val member : element -> set -> bool end #module AbstractSet = (Set : SETFUNCTOR);; module AbstractSet : SETFUNCTOR #module AbstractStringSet = AbstractSet(OrderedString);; module AbstractStringSet : sig type element = OrderedString.t type set = AbstractSet(OrderedString).set val empty : set val add : element -> set -> set val member : element -> set -> bool end #AbstractStringSet.add "gee" AbstractStringSet.empty;; - : AbstractStringSet.set = In an attempt to write the type constraint above more elegantly, one may wish to name the signature of the structure returned by the functor, then use that signature in the constraint: #module type SET = # sig # type element # type set # val empty : set # val add : element -> set -> set # val member : element -> set -> bool # end;; module type SET = sig type element type set val empty : set val add : element -> set -> set val member : element -> set -> bool end #module WrongSet = (Set : functor(Elt: ORDERED_TYPE) -> SET);; module WrongSet : functor(Elt : ORDERED_TYPE) -> SET #module WrongStringSet = WrongSet(OrderedString);; module WrongStringSet : sig type element = WrongSet(OrderedString).element type set = WrongSet(OrderedString).set val empty : set val add : element -> set -> set val member : element -> set -> bool end #WrongStringSet.add "gee" WrongStringSet.empty;; Characters 19-24: This expression has type string but is here used with type WrongStringSet.element = WrongSet(OrderedString).element Chapter 3. The module system 33 The problem here is that SET specifies the type element abstractly, so that the type equality between element in the result of the functor and t in its argument is forgotten. Consequently, WrongStringSet.element is not the same type as string, and the operations of WrongStringSet cannot be applied to strings. As demonstrated above, it is important that the type element in the signature SET be declared equal to Elt.t; unfortunately, this is impossible above since SET is defined in a context where Elt does not exist. To overcome this difficulty, Objective Caml provides a with type construct over signatures that allow to enrich a signature with extra type equalities: #module AbstractSet = # (Set : functor(Elt: ORDERED_TYPE) -> (SET with type element = Elt.t));; module AbstractSet : functor(Elt : ORDERED_TYPE) -> sig type element = Elt.t type set val empty : set val add : element -> set -> set val member : element -> set -> bool end As in the case of simple structures, an alternate syntax is provided for defining functors and restricting their result: module AbstractSet(Elt: ORDERED_TYPE) : (SET with type element = Elt.t) = struct ... end;; Abstracting a type component in a functor result is a powerful technique that provides a high degree of type safety, as we now illustrate. Consider an ordering over character strings that is different from the standard ordering implemented in the OrderedString structure. For instance, we compare strings without distinguishing upper and lower case. #module NoCaseString = # struct # type t = string # let normalize s = (* Convert to lowercase *) # let r = String.copy s in # for i = 0 to String.length r - 1 do # if r.[i] >= 'A' && r.[i] <= 'Z' then # r.[i] <- Char.chr(Char.code r.[i] + 32) # done; # r # let cmp s1 s2 = OrderedString.cmp (normalize s1) (normalize s2) # end;; module NoCaseString : sig type t = string val normalize : string -> string val cmp : string -> string -> comparison end #module NoCaseStringSet = AbstractSet(NoCaseString);; module NoCaseStringSet : sig type element = NoCaseString.t type set = AbstractSet(NoCaseString).set Chapter 3. The module system 34 val empty : set val add : element -> set -> set val member : element -> set -> bool end #NoCaseStringSet.add "FOO" AbstractStringSet.empty;; Characters 26-49: This expression has type AbstractStringSet.set = AbstractSet(OrderedString).set but is here used with type NoCaseStringSet.set = AbstractSet(NoCaseString).set Notice that the two types AbstractStringSet.set and NoCaseStringSet.set are not compatible, and values of these two types do not match. This is the correct behavior: even though both set types contain elements of the same type (strings), both are built upon different orderings of that type, and different invariants need to be maintained by the operations (being strictly increasing for the standard ordering and for the case-insensitive ordering). Applying operations from AbstractStringSet to values of type NoCaseStringSet.set could give incorrect results, or build lists that violate the invariants of NoCaseStringSet. Modules and separate compilation All examples of modules so far have been given in the context of the interactive system. However, modules are most useful for large, batch-compiled programs. For these programs, it is a practical necessity to split the source into several files, called compilation units, that can be compiled separately, thus minimizing recompilation after changes. In Objective Caml, compilation units are special cases of structures and signatures, and the relationship between the units can be explained easily in terms of the module system. A compilation unit a comprises two files: - the implementation file a.ml, which contains a sequence of definitions, analogous to the inside of a struct...end construct; - the interface file a.mli, which contains a sequence of specifications, analogous to the inside of a sig...end construct. Both files define a structure named A (same name as the base name a of the two files, with the first letter capitalized), as if the following definition was entered at top-level: module A: sig (* contents of file a.mli *) end = struct (* contents of file a.ml *) end;; The files defining the compilation units can be compiled separately using the ocaml -c command (the -c option means ``compile only, do not try to link''); this produces compiled interface files (with extension .cmi) and compiled object code files (with extension .cmo). When all units have been compiled, their .cmo files are linked together using the ocaml command. For instance, the following commands compile and link a program composed of two compilation units aux and main: $ ocamlc -c aux.mli # produces aux.cmi $ ocamlc -c aux.ml # produces aux.cmo $ ocamlc -c main.mli # produces main.cmi $ ocamlc -c main.ml # produces main.cmo Chapter 3. The module system 35 $ ocamlc -o theprogram aux.cmo main.cmo The program behaves exactly as if the following phrases were entered at top-level: module Aux: sig (* contents of aux.mli *) end = struct (* contents of aux.ml *) end;; module Main: sig (* contents of main.mli *) end = struct (* contents of main.ml *) end;; In particular, Main can refer to Aux: the definitions and declarations contained in main.ml and main.mli can refer to definition in aux.ml, using the Aux.ident notation, provided these definitions are exported in aux.mli. The order in which the .cmo files are given to ocaml during the linking phase determines the order in which the module definitions occur. Hence, in the example above, Aux appears first and Main can refer to it, but Aux cannot refer to Main. Notice that only top-level structures can be mapped to separately-compiled files, but not functors nor module types. However, all module-class objects can appear as components of a structure, so the solution is to put the functor or module type inside a structure, which can then be mapped to a file. Part II The Objective Caml language 36 Chapter 4 The Objective Caml language Foreword This document is intended as a reference manual for the Objective Caml language. It lists the language constructs, and gives their precise syntax and informal semantics. It is by no means a tutorial introduction to the language: there is not a single example. A good working knowledge of Caml is assumed. No attempt has been made at mathematical rigor: words are employed with their intuitive meaning, without further definition. As a consequence, the typing rules have been left out, by lack of the mathematical framework required to express them, while they are definitely part of a full formal definition of the language. Notations The syntax of the language is given in BNF-like notation. Terminal symbols are set in typewriter font (like this). Non-terminal symbols are set in italic font (like that). Square brackets [...] denote optional components. Curly brackets {...} denotes zero, one or several repetitions of the enclosed components. Curly bracket with a trailing plus sign {...}+ denote one or several repetitions of the enclosed components. Parentheses (...) denote grouping. 4.1 Lexical conventions Blanks The following characters are considered as blanks: space, newline, horizontal tabulation, carriage return, line feed and form feed. Blanks are ignored, but they separate adjacent identifiers, literals and keywords that would otherwise be confused as one single identifier, literal or keyword. Comments Comments are introduced by the two characters (*, with no intervening blanks, and terminated by the characters *), with no intervening blanks. Comments are treated as blank characters. Comments do not occur inside string or character literals. Nested comments are handled correctly. 37 Chapter 4. The Objective Caml language 38 Identifiers ident ::= letter {letter | 0...9 | _ | '} letter ::= A...Z | a...z Identifiers are sequences of letters, digits, _ (the underscore character), and ' (the single quote), starting with a letter. Letters contain at least the 52 lowercase and uppercase letters from the ASCII set. The current implementation also recognizes as letters all accented characters from the ISO 8859-1 (``ISO Latin 1'') set. All characters in an identifier are meaningful. The current implementation places no limits on the number of characters of an identifier. Integer literals integer-literal ::= [-] {0...9}+ | [-] (0x | 0X) {0...9 | A...F | a...f}+ | [-] (0o | 0O) {0...7}+ | [-] (0b | 0B) {0...1}+ An integer literal is a sequence of one or more digits, optionally preceded by a minus sign. By default, integer literals are in decimal (radix 10). The following prefixes select a different radix: -------------------------------- |Prefix|Radix | -------------------------------- |0x, 0X|hexadecimal (radix 16) | |0o, 0O|octal (radix 8) | |0b, 0B|binary (radix 2) | -------------------------------- (The initial 0 is the digit zero; the O for octal is the letter O.) The interpretation of integer literals that fall outside the range of representable integer values is undefined. Floating-point literals float-literal ::= [-] {0...9}+ [. {0...9}] [(e | E) [+ | -] {0...9}+] Floating-point decimals consist in an integer part, a decimal part and an exponent part. The integer part is a sequence of one or more digits, optionally preceded by a minus sign. The decimal part is a decimal point followed by zero, one or more digits. The exponent part is the character e or E followed by an optional + or - sign, followed by one or more digits. The decimal part or the exponent part can be omitted, but not both to avoid ambiguity with integer literals. The interpretation of floating-point literals that fall outside the range of representable floating-point values is undefined. Character literals char-literal ::= ' regular-char ' | ' \ (\ | ' | n | t | b | r) ' | ' \ (0...9) (0...9) (0...9) ' Character literals are delimited by ' (single quote) characters. The two single quotes enclose either one character different from ' and \, or one of the escape sequences below: Chapter 4. The Objective Caml language 39 -------------------------------------------------------- |Sequence|Character denoted | -------------------------------------------------------- |\\ |backslash (\) | |\' |single quote (') | |\n |newline (LF) | |\r |return (CR) | |\t |horizontal tabulation (TAB) | |\b |backspace (BS) | |\ddd |the character with ASCII code ddd in decimal | -------------------------------------------------------- String literals string-literal ::= " {string-character} " string-character ::= regular-char | \ (\ | " | n | t | b | r) | \ (0...9) (0...9) (0...9) String literals are delimited by " (double quote) characters. The two double quotes enclose a sequence of either characters different from " and \, or escape sequences from the table below: -------------------------------------------------------- |Sequence|Character denoted | -------------------------------------------------------- |\\ |backslash (\) | |\" |double quote (") | |\n |newline (LF) | |\r |return (CR) | |\t |horizontal tabulation (TAB) | |\b |backspace (BS) | |\ddd |the character with ASCII code ddd in decimal | -------------------------------------------------------- The current implementation places no restrictions on the length of string literals. Prefix and infix symbols infix-symbol ::= (= | < | > | @ | ^ | | | & | + | - | * | / | $ | %) {operator-char} prefix-symbol ::= (! | ? | ~) {operator-char} operator-char ::= ! | $ | % | & | * | + | - | . | / | : | < | = | > | ? | @ | ^ | | | ~ Sequences of ``operator characters'', such as <=> or !!, are read as a single token from the infix-symbol or prefix-symbol class. These symbols are parsed as prefix and infix operators inside expressions, but otherwise behave much as identifiers. Keywords The identifiers below are reserved as keywords, and cannot be employed otherwise: and as asr begin class closed constraint do done downto else end exception external false for fun function functor if in include inherit land let lor lsl lsr lxor match method mod module mutable new of open or parser private rec sig struct then to true try type Chapter 4. The Objective Caml language 40 val virtual when while with The following character sequences are also keywords: # & ' ( ) * , -> ? . .. .( .[ : :: := ; ;; <- = [ [| [< {< ] |] >] >} _ { | } Ambiguities Lexical ambiguities are resolved according to the ``longest match'' rule: when a character sequence can be decomposed into two tokens in several different ways, the decomposition retained is the one with the longest first token. 4.2 Values This section describes the kinds of values that are manipulated by Caml Light programs. 4.2.1 Base values Integer numbers 30 30 Integer values are integer numbers from -2 to 2 -1, that is -1073741824 to 1073741823. The implementation may support a wider range of integer values: on 64-bit platforms, the current implementation supports integers 62 62 ranging from -2 to 2 -1. Floating-point numbers Floating-point values are numbers in floating-point representation. The current implementation uses double-precision floating-point numbers conforming to the IEEE 754 standard, with 53 bits of mantissa and an exponent ranging from -1022 to 1023. Characters Character values are represented as 8-bit integers between 0 and 255. Character codes between 0 and 127 are interpreted following the ASCII standard. The current implementation interprets character codes between 128 and 255 following the ISO 8859-1 standard. Character strings String values are finite sequences of characters. The current implementation 24 supports strings containing up to 2 -6 characters (16777210 characters). 4.2.2 Tuples Tuples of values are written (v1,...,vn), standing for the n-tuple of values 22 v1 to vn. The current implementation supports tuple of up to 2 -1 elements Chapter 4. The Objective Caml language 41 (4194303 elements). 4.2.3 Records Record values are labeled tuples of values. The record value written {label1=v1 ;...;labeln =vn} associates the value vi to the record label labeli, for i=1...n. The current implementation supports records with up to 22 2 -1 fields (4194303 fields). 4.2.4 Arrays Arrays are finite, variable-sized sequences of values of the same type. The 22 current implementation supports arrays containing to 2 -1 elements (4194303 elements). 4.2.5 Variant values Variant values are either a constant constructor, or a pair of a non-constant constructor and a value. The former case is written cconstr; the latter case is written ncconstr(v), where v is said to be the argument of the non-constant constructor ncconstr. The following constants are treated like built-in constant constructors: ------------------------------ |Constant|Constructor | ------------------------------ |false |the boolean false | |true |the boolean true | |() |the ``unit'' value | |[] |the empty list | ------------------------------ The current implementation limits the number of distinct constructors in a given variant type to at most 249. 4.2.6 Functions Functional values are mappings from values to values. 4.2.7 Objects Objects are composed of a hidden internal state which is a record of instance variables, and a set of methods for accessing and modifying these variables. The structure of an object is described by the toplevel class that created it. 4.3 Names Identifiers are used to give names to several classes of language objects and refer to these objects by name later: - value names (syntactic class value-name), - value constructors (constant -- class cconstr-name -- or non-constant -- class ncconstr-name), - type constructors (typeconstr-name), Chapter 4. The Objective Caml language 42 - record labels (label-name), - class names (class-name), - method names (method-name), - instance variable names (inst-var-name), - module names (module-name), - module type names (modtype-name). These nine name spaces are distinguished both by the context and by the capitalization of the identifier: whether the first letter of the identifier is in lowercase (written lowercase-ident below) or in uppercase (written capitalized-ident). Naming objects value-name ::= lowercase-ident | ( operator-name ) operator-name ::= prefix-symbol | infix-symbol | * | = | or | & | := cconstr-name ::= capitalized-ident | false | true | [ ] | ( ) ncconstr-name ::= capitalized-ident | :: typeconstr-name ::= lowercase-ident label-name ::= lowercase-ident module-name ::= capitalized-ident modtype-name ::= ident class-name ::= lowercase-ident inst-var-name ::= lowercase-ident method-name ::= lowercase-ident As shown above, prefix and infix symbols as well as some keywords can be used as value names, provided they are written between parentheses. Keywords such as '::' and 'false' are also constructor names. The capitalization rules are summarized in the table below. ----------------------------------------- |Name space |Case of first letter | ----------------------------------------- |Values |lowercase | |Constructors |uppercase | |Type constructors |lowercase | |Record labels |lowercase | |Classes |lowercase | |Methods |lowercase | |Modules |uppercase | |Module types |any | ----------------------------------------- Chapter 4. The Objective Caml language 43 Referring to named objects value-path ::= value-name | module-path . lowercase-ident cconstr ::= cconstr-name | module-path . capitalized-ident ncconstr ::= ncconstr-name | module-path . capitalized-ident typeconstr ::= typeconstr-name | extended-module-path . lowercase-ident label ::= label-name | module-path . lowercase-ident module-path ::= module-name | module-path . capitalized-ident extended-module-path ::= module-name | extended-module-path . capitalized-ident | extended-module-path ( extended-module-path ) modtype-path ::= modtype-name | extended-module-path . ident class-path ::= class-name | module-path . lowercase-ident A named object can be referred to either by its name (following the usual static scoping rules for names) or by an access path prefix . name, where prefix designates a module and name is the name of an object defined in that module. The first component of the path, prefix, is either a simple module name or an access path name1 . name2..., in case the defining module is itself nested inside other modules. For referring to type constructors or module types, the prefix can also contain simple functor applications (as in the syntactic class extended-module-path above), in case the defining module is the result of a functor application. Instance variable names and method names need not be qualified: the former are local to a class while the latter are global labels. 4.4 Type expressions typexpr ::= ' ident | ( typexpr ) | typexpr -> typexpr | typexpr {* typexpr}+ | typeconstr | typexpr typeconstr | ( typexpr {, typexpr} ) typeconstr | typexpr as ' ident | < [..] > | < method-type {; method-type} [; ..] > | # class-path | typexpr # class-path | ( typexpr {, typexpr}) # class-path method-type ::= method-name : typexpr The table below shows the relative precedences and associativity of operators and non-closed type constructions. The constructions with higher precedences come first. Chapter 4. The Objective Caml language 44 --------------------------------------------- |Operator |Associativity | --------------------------------------------- |Type constructor application |-- | |* |-- | |-> |right | --------------------------------------------- Type expressions denote types in definitions of data types as well as in type constraints over patterns and expressions. Type variables The type expression ' ident stands for the type variable named ident. In data type definitions, type variables are names for the data type parameters. In type constraints, they represent unspecified types that can be instantiated by any type to satisfy the type constraint. Parenthesized types The type expression ( typexpr ) denotes the same type as typexpr. Function types The type expression typexpr1 -> typexpr2 denotes the type of functions mapping arguments of type typexpr1 to results of type typexpr2. Tuple types The type expression typexpr1 *...* typexprn denotes the type of tuples whose elements belong to types typexpr1,...typexprn respectively. Constructed types Type constructors with no parameter, as in typeconstr, are type expressions. The type expression typexpr typeconstr, where typeconstr is a type constructor with one parameter, denotes the application of the unary type constructor typeconstr to the type typexpr. The type expression (typexpr1,...,typexprn) typeconstr, where typeconstr is a type constructor with n parameters, denotes the application of the n-ary type constructor typeconstr to the types typexpr1 through typexprn. Recursive types The type expression typexpr as ' ident denotes the same type as typexpr, and also binds the type variable ident to type typexpr both in typexpr and in the remaining part of the toplevel phrase. If the type variable ident actually occurs in typexpr, a recursive type is created. Recursive types are allowed as long as any recursion crosses a type constructor or an object type. Object types An object type < method-type {; method-type} > is a record of method types. The type < method-type {; method-type} ; .. > is the type of an object with methods and their associated types are described by method-type1,...,method-typen, and possibly some other methods represented by the ellipsis. This ellipsis actually behaves as a type variable. Chapter 4. The Objective Caml language 45 #-types The type # class-path is a special kind of abbreviation. This abbreviation unifies with the type of any object belonging to a subclass of class class-path. It is handled in a special way as it usually hides a type variable (an ellipsis, representing the methods that may be added in a subclass). In particular, it vanishes when the ellipsis gets instantiated. Each type expression # class-path defines a new type variable, so type # class-path -> # class-path is usually not the same as type # class-path as ' ident -> ' ident. 4.5 Constants constant ::= integer-literal | float-literal | char-literal | string-literal | cconstr The syntactic class of constants comprises literals from the four base types (integers, floating-point numbers, characters, character strings), and constant constructors. 4.6 Patterns pattern ::= value-name | _ | constant | pattern as value-name | ( pattern ) | ( pattern : typexpr ) | pattern | pattern | ncconstr pattern | pattern {, pattern} | { label = pattern {; label = pattern} } | [ pattern {; pattern} ] | pattern :: pattern The table below shows the relative precedences and associativity of operators and non-closed pattern constructions. The constructions with higher precedences come first. ---------------------------------------- |Operator |Associativity | ---------------------------------------- |Constructor application|-- | |:: |right | |, |-- | || |left | |as |-- | ---------------------------------------- Patterns are templates that allow selecting data structures of a given shape, and binding identifiers to components of the data structure. This selection operation is called pattern matching; its outcome is either ``this value does not match this pattern'', or ``this value matches this pattern, resulting in the following bindings of names to values''. Chapter 4. The Objective Caml language 46 Variable patterns A pattern that consists in a value name matches any value, binding the name to the value. The pattern _ also matches any value, but does not bind any name. Patterns are linear: a variable cannot appear several times in a given pattern. In particular, there is no way to test for equality between two parts of a data structure using only a pattern (but when guards can be used for this purpose). Constant patterns A pattern consisting in a constant matches the values that are equal to this constant. Alias patterns The pattern pattern1 as value-name matches the same values as pattern1. If the matching against pattern1 is successful, the name name is bound to the matched value, in addition to the bindings performed by the matching against pattern1. Parenthesized patterns The pattern ( pattern1 ) matches the same values as pattern1. A type constraint can appear in a parenthesized pattern, as in ( pattern1 : typexpr ). This constraint forces the type of pattern1 to be compatible with type. ``Or'' patterns The pattern pattern1 | pattern2 represents the logical ``or'' of the two patterns pattern1 and pattern2. A value matches pattern1 | pattern2 either if it matches pattern1 or if it matches pattern2. The two sub-patterns pattern1 and pattern2 must contain no identifiers. Hence no bindings are returned by matching against an ``or'' pattern. Variant patterns The pattern ncconstr pattern1 matches all variants whose constructor is equal to ncconstr, and whose argument matches pattern1. The pattern pattern1 :: pattern2 matches non-empty lists whose heads match pattern1, and whose tails match pattern2. This pattern behaves like ( :: ) ( pattern1 , pattern2 ). The pattern [ pattern1 ;...; patternn ] matches lists of length n whose elements match pattern1 ...patternn, respectively. This pattern behaves like pattern1 ::...:: patternn :: []. Tuple patterns The pattern pattern1 ,..., patternn matches n-tuples whose components match the patterns pattern1 through patternn. That is, the pattern matches the tuple values (v1,...,vn) such that patterni matches vi for i =1, ...,n. Record patterns The pattern { label1 = pattern1 ;...; labeln = patternn } matches records that define at least the labels label1 through labeln, and such that the Chapter 4. The Objective Caml language 47 value associated to labeli match the pattern patterni, for i= 1,...,n. The record value can define more labels than label1 ...labeln; the values associated to these extra labels are not taken into account for matching. 4.7 Expressions expr ::= value-path | constant | ( expr ) | begin expr end | ( expr : typexpr ) | expr , expr {, expr} | ncconstr expr | expr :: expr | [ expr {; expr} ] | [| expr {; expr} |] | { label = expr {; label = expr} } | expr expr | prefix-symbol expr | expr infix-op expr | expr . label | expr . label <- expr | expr .( expr ) | expr .( expr ) <- expr | expr .[ expr ] | expr .[ expr ] <- expr | if expr then expr [else expr] | while expr do expr done | for ident = expr (to | downto) expr do expr done | expr ; expr | match expr with pattern-matching | function pattern-matching | fun multiple-matching | try expr with pattern-matching | let [rec] let-binding {and let-binding} in expr | new class-path | expr # method-name | ( expr :> typexpr ) | ( expr : typexpr :> typexpr ) | {< inst-var-name = expr {; inst-var-name = expr} >} pattern-matching ::= pattern [when expr] -> expr {| pattern [when expr] -> expr} multiple-matching ::= {pattern}+ [when expr] -> expr let-binding ::= pattern = expr | value-name {pattern}+ [: typexpr] = expr infix-op ::= infix-symbol | * | = | or | & The table below shows the relative precedences and associativity of operators and non-closed constructions. The constructions with higher precedence come first. For infix and prefix symbols, we write ``*...'' to mean ``any symbol starting with *''. Chapter 4. The Objective Caml language 48 ---------------------------------------------------------------------- |Construction or operator |Associativity | ---------------------------------------------------------------------- |prefix-symbol |-- | |. .( .[ |-- | |function application |right | |constructor application |-- | |- -. (prefix) |-- | |**... |right | |*... /... %... mod |left | |+... -... |left | |:: |right | |@ ^ |right | |comparisons (= == < etc.), all other infix symbols|left | |not |-- | |& && |left | |or || |left | |, |-- | |<- := |right | |if |-- | |; |right | |let match fun function try |-- | ---------------------------------------------------------------------- 4.7.1 Basic expressions Constants Expressions consisting in a constant evaluate to this constant. Value paths Expressions consisting in an access path evaluate to the value bound to this path in the current evaluation environment. The path can be either a value name or an access path to a value component of a module. Parenthesized expressions The expressions ( expr ) and begin expr end have the same value as expr. Both constructs are semantically equivalent, but it is good style to use begin...end inside control structures: if ... then begin ... ; ... end else begin ... ; ... end and (...) for the other grouping situations. Parenthesized expressions can contain a type constraint, as in ( expr : type ). This constraint forces the type of expr to be compatible with type. Parenthesized expressions can also contain coercions ( expr [: type] :> type ) (see subsection 4.7.5 below). Function application Function application is denoted by juxtaposition of expressions. The expression expr1 expr2...exprn evaluates the expressions expr1 to exprn. The expression expr1 must evaluate to a functional value, which is then applied to the values of expr2,...,exprn. The order in which the expressions expr1,...,exprn are evaluated is not specified. Chapter 4. The Objective Caml language 49 Function definition Two syntactic forms are provided to define functions. The first form is introduced by the keyword function: function pattern1 -> expr1 | ... | patternn -> exprn This expression evaluates to a functional value with one argument. When this function is applied to a value v, this value is matched against each pattern pattern1 to patternn. If one of these matchings succeeds, that is, if the value v matches the pattern patterni for some i, then the expression expri associated to the selected pattern is evaluated, and its value becomes the value of the function application. The evaluation of expri takes place in an environment enriched by the bindings performed during the matching. If several patterns match the argument v, the one that occurs first in the function definition is selected. If none of the patterns matches the argument, the exception Match_failure is raised. The other form of function definition is introduced by the keyword fun: fun pattern1...patternn -> expr This expression is equivalent to: function pattern1 ->...function patternn -> expr That is, the fun expression above evaluates to a curried function with n arguments: after applying this function n times to the values v1 ... vm, the values will be matched in parallel against the patterns pattern1...patternn. If the matching succeeds, the function returns the value of expr in an environment enriched by the bindings performed during the matchings. If the matching fails, the exception Match_failure is raised. Guards in pattern-matchings Cases of a pattern matching (in the function, fun, match and try constructs) can include guard expressions, which are arbitrary boolean expressions that must evaluate to true for the match case to be selected. Guards occur just before the -> token and are introduced by the when keyword: function pattern1 [whencond1] -> expr1 | ... | patternn [whencondn] -> exprn Matching proceeds as described before, except that if the value matches some pattern patterni which has a guard condi, then the expression condi is evaluated (in an environment enriched by the bindings performed during matching). If condi evaluates to true, then expri is evaluated and its value returned as the result of the matching, as usual. But if condi evaluates to false, the matching is resumed against the patterns following patterni. Local definitions The let and let rec constructs bind value names locally. The construct let pattern1 = expr1 and...and patternn = exprn in expr Chapter 4. The Objective Caml language 50 evaluates expr1...exprn in some unspecified order, then matches their values against the patterns pattern1...patternn. If the matchings succeed, expr is evaluated in the environment enriched by the bindings performed during matching, and the value of expr is returned as the value of the whole let expression. If one of the matchings fails, the exception Match_failure is raised. An alternate syntax is provided to bind variables to functional values: instead of writing let ident = fun pattern1...patternm -> expr in a let expression, one may instead write let ident pattern1 ...patternm = expr Recursive definitions of names are introduced by let rec: let rec pattern1 = expr1 and...and patternn = exprn in expr The only difference with the let construct described above is that the bindings of names to values performed by the pattern-matching are considered already performed when the expressions expr1 to exprn are evaluated. That is, the expressions expr1 to exprn can reference identifiers that are bound by one of the patterns pattern1,...,patternn, and expect them to have the same value as in expr, the body of the let rec construct. The recursive definition is guaranteed to behave as described above if the expressions expr1 to exprn are function definitions (fun... or function...), and the patterns pattern1...patternn are just value names, as in: let rec name1 = fun...and...and namen = fun...in expr This defines name1...namen as mutually recursive functions local to expr. The behavior of other forms of let rec definitions is implementation-dependent. The current implementation also supports a certain class of recursive definitions of non-functional values, such as let rec name1 = 1 :: name2 and name2 = 2 :: name1 in expr which binds name1 to the cyclic list 1::2::1::2::..., and name2 to the cyclic list 2::1::2::1::...Informally, the class of accepted definitions consists of those definitions where the defined names occur only inside function bodies or as argument to a data constructor. 4.7.2 Control structures Sequence The expression expr1 ; expr2 evaluates expr1 first, then expr2, and returns the value of expr2. Conditional The expression if expr1 then expr2 else expr3 evaluates to the value of expr2 if expr1 evaluates to the boolean true, and to the value of expr3 if expr1 evaluates to the boolean false. The else expr3 part can be omitted, in which case it defaults to else (). Chapter 4. The Objective Caml language 51 Case expression The expression match expr with pattern1 -> expr1 | ... | patternn -> exprn matches the value of expr against the patterns pattern1 to patternn. If the matching against patterni succeeds, the associated expression expri is evaluated, and its value becomes the value of the whole match expression. The evaluation of expri takes place in an environment enriched by the bindings performed during matching. If several patterns match the value of expr, the one that occurs first in the match expression is selected. If none of the patterns match the value of expr, the exception Match_failure is raised. Boolean operators The expression expr1 & expr2 evaluates to true if both expr1 and expr2 evaluate to true; otherwise, it evaluates to false. The first component, expr1, is evaluated first. The second component, expr2, is not evaluated if the first component evaluates to false. Hence, the expression expr1 & expr2 behaves exactly as if expr1 then expr2 else false. The expression expr1 or expr2 evaluates to true if one of expr1 and expr2 evaluates to true; otherwise, it evaluates to false. The first component, expr1, is evaluated first. The second component, expr2, is not evaluated if the first component evaluates to true. Hence, the expression expr1 or expr2 behaves exactly as if expr1 then true else expr2. Loops The expression while expr1 do expr2 done repeatedly evaluates expr2 while expr1 evaluates to true. The loop condition expr1 is evaluated and tested at the beginning of each iteration. The whole while...done expression evaluates to the unit value (). The expression for name = expr1 to expr2 do expr3 done first evaluates the expressions expr1 and expr2 (the boundaries) into integer values n and p. Then, the loop body expr3 is repeatedly evaluated in an environment where name is successively bound to the values n, n+1, ..., p -1, p. The loop body is never evaluated if n>p. The expression for name = expr1 downto expr2 do expr3 done evaluates similarly, except that name is successively bound to the values n, n-1, ..., p+1, p. The loop body is never evaluated if n expr1 | ... | patternn -> exprn evaluates the expression expr and returns its value if the evaluation of expr does not raise any exception. If the evaluation of expr raises an exception, the exception value is matched against the patterns pattern1 to patternn. If the matching against patterni succeeds, the associated expression expri is evaluated, and its value becomes the value of the whole try expression. The evaluation of expri takes place in an environment enriched by the bindings performed during matching. If several patterns match the value of expr, the one that occurs first in the try expression is selected. If none of the patterns matches the value of expr, the exception value is raised again, thereby transparently ``passing through'' the try construct. 4.7.3 Operations on data structures Products The expression expr1 ,..., exprn evaluates to the n-tuple of the values of expressions expr1 to exprn. The evaluation order for the subexpressions is not specified. Variants The expression ncconstr expr evaluates to the variant value whose constructor is ncconstr, and whose argument is the value of expr. For lists, some syntactic sugar is provided. The expression expr1 :: expr2 stands for the constructor ( :: ) applied to the argument ( expr1 , expr2 ), and therefore evaluates to the list whose head is the value of expr1 and whose tail is the value of expr2. The expression [ expr1 ;...; exprn ] is equivalent to expr1 ::...:: exprn :: [], and therefore evaluates to the list whose elements are the values of expr1 to exprn. Records The expression { label1 = expr1 ;...; labeln = exprn } evaluates to the record value { label1 = v1 ;...; labeln = vn }, where vi is the value of expri for i=1, ...,n. The labels label1 to labeln must all belong to the same record types; all labels belonging to this record type must appear exactly once in the record expression, though they can appear in any order. The order in which expr1 to exprn are evaluated is not specified. The expression expr1 . label evaluates expr1 to a record value, and returns the value associated to label in this record value. The expression expr1 . label <- expr2 evaluates expr1 to a record value, which is then modified in-place by replacing the value associated to label in this record by the value of expr2. This operation is permitted only if label has been declared mutable in the definition of the record type. The whole expression expr1 . label <- expr2 evaluates to the unit value (). Chapter 4. The Objective Caml language 53 Arrays The expression [| expr1 ;...; exprn |] evaluates to a n-element array, whose elements are initialized with the values of expr1 to exprn respectively. The order in which these expressions are evaluated is unspecified. The expression expr1 .( expr2 ) returns the value of element number expr2 in the array denoted by expr1. The first element has number 0; the last element has number n-1, where n is the size of the array. The exception Invalid_argument is raised if the access is out of bounds. The expression expr1 .( expr2 ) <- expr3 modifies in-place the array denoted by expr1, replacing element number expr2 by the value of expr3. The exception Invalid_argument is raised if the access is out of bounds. The value of the whole expression is (). Strings The expression expr1 .[ expr2 ] returns the value of character number expr2 in the string denoted by expr1. The first character has number 0; the last character has number n-1, where n is the length of the string. The exception Invalid_argument is raised if the access is out of bounds. The expression expr1 .[ expr2 ] <- expr3 modifies in-place the string denoted by expr1, replacing character number expr2 by the value of expr3. The exception Invalid_argument is raised if the access is out of bounds. The value of the whole expression is (). 4.7.4 Operators Symbols from the class infix-symbols, as well as the keywords *, =, or and &, can appear in infix position (between two expressions). Symbols from the class prefix-symbols can appear in prefix position (in front of an expression). Infix and prefix symbols do not have a fixed meaning: they are simply interpreted as applications of functions bound to the names corresponding to the symbols. The expression prefix-symbol expr is interpreted as the application ( prefix-symbol ) expr. Similarly, the expression expr1 infix-symbol expr2 is interpreted as the application ( infix-symbol ) expr1 expr2. The table below lists the symbols defined in the initial environment and their initial meaning. (See the description of the standard library module Pervasive in chapter 14 for more details). Their meaning may be changed at any time using let ( infix-op ) name1 name2 =... Chapter 4. The Objective Caml language 54 ------------------------------------------------------------------------ |Operator |Initial meaning | ------------------------------------------------------------------------ |+ |Integer addition. | |- (infix) |Integer subtraction. | |- (prefix) |Integer negation. | |* |Integer multiplication. | |/ |Integer division. Raise Division_by_zero if second | | |argument is zero. The result is unspecified if either | | |argument is negative. | |mod |Integer modulus. Raise Division_by_zero if second | | |argument is zero. The result is unspecified if either | | |argument is negative. | |land |Bitwise logical ``and'' on integers. | |lor |Bitwise logical ``or on integers. | |lxor |Bitwise logical ``exclusive or'' on integers. | |lsl |Bitwise logical shift left on integers. | |lsr |Bitwise logical shift right on integers. | |asr |Bitwise arithmetic shift right on integers. | |+. |Floating-point addition. | |-. (infix) |Floating-point subtraction. | |-. (prefix) |Floating-point negation. | |*. |Floating-point multiplication. | |/. |Floating-point division. | |** |Floating-point exponentiation. | |@ |List concatenation. | |^ |String concatenation. | |! |Dereferencing (return the current contents of a | | |reference). | |:= |Reference assignment (update the reference given as | | |first argument with the value of the second argument). | |= |Structural equality test. | |<> |Structural inequality test. | |== |Physical equality test. | |!= |Physical inequality test. | |< |Test ``less than''. | |<= |Test ``less than or equal''. | |> |Test ``greater than''. | |>= |Test ``greater than or equal'' | ------------------------------------------------------------------------ 4.7.5 Objects Object creation The expression new class-path denotes a function that takes some initialization arguments and returns a new object of class class-path. Message sending The expression expr # method-name invokes the method method-name of the object denoted by expr. Coercion The type of an object can be coerced (weakened) to a subtype. The expression ( expr :> typexpr ) coerces the expression expr to type typexpr. The expression ( expr : typexpr1 :> typexpr2 ) coerces the expression expr from type typexpr1 to type typexpr2. It is more general than the previous one. Chapter 4. The Objective Caml language 55 Object duplication An object can be duplicated using the library function Oo.copy (see section 15.13). Inside a method, the expression {< inst-var-name = expr {; inst-var-name = expr} >} returns a copy of self with the given instance variables replaced by the values of the associated expressions; other instance variables have the same value in the returned object as in self. 4.8 Type and exception definitions 4.8.1 Type definitions Type definitions bind type constructors to data types: either variant types, record types, type abbreviations, or abstract data types. They also bind the value constructors and record labels associated with the definition. type-definition ::= type typedef {and typedef} typedef ::= [type-params] typeconstr-name [type-equation] [type-representation] type-equation ::= = typexpr type-representation ::= = constr-decl {| constr-decl} | = { label-decl {; label-decl} } type-params ::= ' ident | ( ' ident {, ' ident} ) constr-decl ::= cconstr-name | ncconstr-name of typexpr label-decl ::= label-name : typexpr | mutable label-name : typexpr Type definitions are introduced by the type keyword, and consist in one or several simple definitions, possibly mutually recursive, separated by the and keyword. Each simple definition defines one type constructor. A simple definition consists in a lowercase identifier, possibly preceded by one or several type parameters, and followed by an optional type equation, then an optional type representation. The identifier is the name of the type constructor being defined. The optional type parameters are either one type variable ' ident, for type constructors with one parameter, or a list of type variables (' ident1,...,' identn), for type constructors with several parameters. These type parameters can appear in the type expressions of the right-hand side of the definition. The optional type equation = typexpr makes the defined type equivalent to the type expression typexpr on the right of the = sign: one can be substituted for the other during typing. If no type equation is given, a new type is generated: the defined type is incompatible with any other type. The optional type representation describes the data structure representing the defined type, by giving the list of associated constructors (if it is a variant type) or associated labels (if it is a record type). If no type representation is given, nothing is assumed on the structure of the type besides what is stated in the optional type equation. The type representation = constr-decl {| constr-decl} describes a variant type. The constructor declarations constr-decl1,...,constr-decln describe the constructors associated to this variant type. The constructor declaration ncconstr-name of typexpr declares the name ncconstr-name as a non-constant constructor, whose argument has type typexpr. The constructor declaration cconstr-name declares the name cconstr-name as a constant constructor. Chapter 4. The Objective Caml language 56 Constructor names must be capitalized. The type representation = { label-decl {; label-decl} } describes a record type. The label declarations label-decl1,...,label-decln describe the labels associated to this record type. The label declaration label-name : typexpr declares label-name as a label whose argument has type typexpr. The label declaration mutable label-name : typexpr behaves similarly; in addition, it allows physical modification over the argument to this label. The two components of a type definition, the optional equation and the optional representation, can be combined independently, giving rise to four typical situations: Abstract type: no equation, no representation. When appearing in a module signature, this definition specifies nothing on the type constructor, besides its number of parameters: its representation is hidden and it is assumed incompatible with any other type. Type abbreviation: an equation, no representation. This defines the type constructor as an abbreviation for the type expression on the right of the = sign. New variant type or record type: no equation, a representation. This generates a new type constructor and defines associated constructors or labels, through which values of that type can be directly built or inspected. Re-exported variant type or record type: an equation, a representation. In this case, the type constructor is defined as an abbreviation for the type expression given in the equation, but in addition the constructors or labels given in the representation remain attached to the defined type constructor. The type expression in the equation part must agree with the representation: it must be of the same kind (record or variant) and have exactly the same constructors or labels, in the same order, with the same arguments. 4.8.2 Exception definitions exception-definition ::= exception constr-decl Exception definitions add new constructors to the built-in variant type exn of exception values. The constructors are declared as for a definition of a variant type. Chapter 4. The Objective Caml language 57 4.9 Classes 4.9.1 Class definitions class-definition ::= class-header = {constraint} {class-fields} class-header ::= class-tags parameterized-class-name class-params class-binders class-tags ::= [virtual] [closed] parameterized-class-name ::= class-name | ' ident class-name | ( ' ident {, ' ident} ) class-name class-params ::= {pattern}+ class-binders ::= [as value-name] [: ' ident] constraint ::= constraint ' ident = typexpr class-fields ::= inherit ancestor | val value | virtual method-type | method method ancestor ::= [( typexpr {, typexpr} )] class-path {expr}+ [as value-name] value ::= [private] [mutable] inst-var-name [= expr] method ::= method-name {pattern} = expr method-type ::= method-name : typexpr Class definitions A class definition class class-definition {and class-definition} is recursive. Each class-definition defines a class-name that can be used in the whole expression except for inheritance. It can also be used for inheritance, but only in the definitions that follow its own. Type parameters A class class-name automatically defines two abbreviations : class-name and # class-name. The first one is the type of objects of this class, while the second is more general as it unifies with the type of any object belonging to a subclass (see section 4.4). The class type parameters correspond to the ones of these two abbreviations. They must be bound to actual types in the class definition using type constraints. The abbreviations are obtained by pruning the method types at all occurrences of a type parameter. So that the abbreviations are well-formed, no type variable must remain in the pruned types. Class parameters The parameters class-params are the ones of the object creation function new class-path, as well as the ones of the inheritance construct. Self and self type binders The binders class-binders, i.e. as value-name and : ' ident, allow to bind self (the current object) and its type, respectively. The variable value-name can then be used as any variable in method body. Constraints on type parameters The construct constraint ' ident = typexpr allows to specify type parameters. The value of the type parameter ident will be an instance of typexpr (more Chapter 4. The Objective Caml language 58 precisely, ident and typexpr are unified). Inheritance The inheritance construct inherit [( typexpr {, typexpr} )] class-path {expr}+ allows to reuse methods from other classes. It adds the instance variables and methods from class class-path into the current class, possibly overriding previously defined ones of the same name. Parent instance variables are initialized with parent class parameters bound to the arguments {expr}+. Parent type parameters are unified with type arguments typexpr1,...,typexprn. An ancestor can be bound by prepending the construct as value-name to the inheritance construct above. value-name is not a true variable and can only be used to select a method, i.e. in an expression value-name # method-name. This gives access to the method method-name as it was defined in the parent class even if it is redefined in the current class. Instance variable definition Instance variables can be defined or their status can be changed using the construct val value. The definition val [private] [mutable] inst-var-name = expr adds an instance variable inst-var-name whose initial value is the value of expression expr. If the variable was previously defined, its previous initial value is overridden. By default, this variable is visible in subclasses of current class. The flag private makes this variable only visible in the current class. The flag mutable allows physical modification of this variable by methods. The construct val [private] [mutable] inst-var-name enables to change the variable status (makes it private and/or mutable), while keeping the same initial value. Method definition Method definition is written method method. The definition of a method overrides any previous definition of this method. Method bodies do not have access to class parameters, but to instance variables. Some special expressions are available in method bodies for manipulating instance variables and duplicating self: expr ::= ... | inst-var-name | inst-var-name <- expr | {< [inst-var-name = expr {; inst-var-name = expr}] >} The expression inst-var-name evaluates to the value of the corresponding instance variable in the current object, while the expression inst-var-name <- expr modifies in-place the current object by replacing the value associated to inst-var-name by the value of expr. Of course, this instance variable must have been declared mutable. The expression {< [inst-var-name = expr {; inst-var-name = expr}] >} evaluates to a copy of the current object in which the values of instance variables inst-var-name1,...,inst-var-namen have been replaced by the values of the corresponding expressions expr1,...,exprn. Chapter 4. The Objective Caml language 59 Virtual class Methods can be declared, without being defined, with the construct virtual method-type. Methods that are declared in this way or applied to self but not actually defined are said to be virtual. A class must be flagged virtual if one of its methods is virtual. Objects cannot be created from a virtual class. Closed class A class can be flagged as closed. A closed class is a class to which subclasses cannot add methods. 4.9.2 Class types Class types are specifications for class definitions. The syntax of class types is closely modeled on class definitions. Some type information can be hidden in a class type: it can be given stronger constraints than a matched class; some instance variables can be omitted; some previously non-mutable instance variables can be flagged mutable. class-type ::= class-type-header = {constraint} {class-type-fields} class-type-header ::= class-tags parameterized-class-name class-type-params [: ' ident] class-type-params ::= {( typexpr )}+ class-type-fields ::= inherit ancestor-type | val value-type | virtual method-type | method method-type ancestor-type ::= [( typexpr {, typexpr} )] class-path value-type ::= [private] [mutable] inst-var-name [: typexpr] method-type ::= method-name : typexpr 4.10 Module types (module specifications) Module types are the module-level equivalent of type expressions: they specify the general shape and type properties of modules. module-type ::= modtype-path | sig {specification [;;]} end | functor ( module-name : module-type ) -> module-type | module-type with constraint {and constraint} | ( module-type ) specification ::= val value-name : typexpr | external value-name : typexpr = external-declaration | type-definition | exception-definition | class class-type {and class-type} end | module module-name : module-type | module module-name {( module-name : module-type )} : module-type | module type modtype-name | module type modtype-name = module-type | open module-path constraint ::= type [type-parameters] typeconstr = typexp | module module-path = extended-module-path Chapter 4. The Objective Caml language 60 4.10.1 Simple module types The expression modtype-path is equivalent to the module type bound to the name modtype-path. The expression ( module-type ) denotes the same type as module-type. 4.10.2 Signatures Signatures are type specifications for structures. Signatures sig...end are collections of type specifications for value names, type names, exceptions, module names and module type names. A structure will match a signature if the structure provides definitions (implementations) for all the names specified in the signature (and possibly more), and these definitions meet the type requirements given in the signature. For compatibility with Caml Light, an optional ;; is allowed after each specification in a signature. The ;; has no semantic meaning. Value specifications A specification of a value component in a signature is written val value-name : typexpr, where value-name is the name of the value and typexpr its expected type. The form external value-name : typexpr = external-declaration is similar, except that it requires in addition the name to be implemented as the external function specified in external-declaration (see chapter 13). Type specifications A specification of one or several type components in a signature is written type typedef {and typedef} and consists of a sequence of mutually recursive definitions of type names. Each type definition in the signature specifies an optional type equation = typexp and an optional type representation = constr-decl... or = { label-decl...}. The implementation of the type name in a matching structure must be compatible with the type expression specified in the equation (if given), and have the specified representation (if given). Conversely, users of that signature will be able to rely on the type equation or type representation, if given. More precisely, we have the following four situations: Abstract type: no equation, no representation. Names that are defined as abstract types in a signature can be implemented in a matching structure by any kind of type definition (provided it has the same number of type parameters). The exact implementation of the type will be hidden to the users of the structure. In particular, if the type is implemented as a variant type or record type, the associated constructors and labels will not be accessible to the users; if the type is implemented as an abbreviation, the type equality between the type name and the right-hand side of the