Skip to main content

Union-find

In this example, we will highlight an advanced use case of ghost declarations in Gospel through the specification of a union-find datastructure.

Union-find is a fairly simple datastructure used to keep track of a collection of elements partitioned into disjoint sets. Its most simple interface is as follows:

type 'a element
(** The type for set elements. *)

val make : 'a -> 'a element
(** [make x] is the singleton containing an element wrapping [x]. *)

val find : 'a element -> 'a element
(** [find e] is the representative of the subset containing [e]. *)

val union : 'a element -> 'a element -> unit
(** [union x y] merges the subset containing [x] with the subset
containing [y]. *)

Why is this hard to specify?

Union-find structures store a partition of a set. Notice that a typical interface for union-find doesn't materialise this greater set (let's call it our universe). Instead, the programmer has access to individual elements of the set, so they may merge them or access a representative of its partition.

However, for the sake of specification, not having this global set is a problem. How can we state that the subsets are disjoint? How do we refer to the set of elements that exist in the union-find universe? Can we even tell that the representative of an element (returned by find) is indeed part of a subset that was unioned with it at some point? It seems that we cannot do any of these by attaching contracts to our functions and type only.

This example shows how Gospel's ghost declarations help describe such complex behaviours.

Introducing the Ghost Universe

Seemingly, the root of our problems is that union-find relies on the implicit universe of elements. It doesn't exist in the OCaml's interface, so let's introduce it as a ghost-type declaration. We can also add that this universe is mutable. As long as elements are added, for instance, it will be modified.

(*@ type 'a universe *)
(*@ ephemeral *)

Now, our three functions still apply to set elements, but they also apply in the context of a universe since you create a new singleton subset in the context of the greater set, or find the representative of an element in the rest of the universe. This translates into our three functions taking a value of type 'a universe as its argument. Of course, 'a universe is a ghost type and you don't want to modify the functions' signatures anyway, so this argument is ghost, too1. We already get a sense that make and union will modify the universe.

val make : 'a -> 'a element
(*@ e = make [u: 'a universe] v
modifies u *)

val find : 'a element -> 'a element
(*@ e = find [u: 'a universe] x *)

val union : 'a element -> 'a element -> unit
(*@ union [u: 'a universe] x y
modifies u *)

Since we now have a type for universes, and functions that take values of such a type, we probably need a constructor for this type. Let's introduce that as a ghost value:

(*@ val make_universe: unit -> 'a universe *)

Having this abstract type on is not very useful so far. We're able to state that the functions apply in the context of a universe and they may mutate it, but that's pretty much it. Let's be more precise than that.

Elements: Gotta Catch 'em All

The first interesting property to capture is that the subsets are indeed disjoint. For instance, make shouldn't create an element that's already in the universe, otherwise you'd have two different subsets containing the same element.

In order to specify this, we need to talk about the set of existing elements in the universe. Let's introduce this as a logical model attached to our universe type:

(*@ type 'a universe *)
(*@ mutable model dom : 'a element set *)
tip

Since at least one model is mutable, we may now omit the ephemeral keyword (although it's also valid to keep it if you prefer). For instance, you may keep it if you want to indicate that the type is also mutable in a way that is not visible in the models.

Now let's add more information on how the functions interact with it. The constructor should ensure two things:

  • The returned element is a fresh element.
  • The universe's domain is augmented with the singleton containing the provided value.
val make : 'a -> 'a element
(*@ e = make [u: 'a universe] v
modifies u.dom
ensures not (Set.mem e (old u.dom))
ensures u.dom = Set.add e (old u.dom) *)

The find function obviously needs an element of the universe, and it also returns an element that's part of the universe:

val find : 'a element -> 'a element
(*@ e = find [u: 'a universe] x
requires Set.mem x u.dom
ensures Set.mem e u.dom *)

Finally, union requires that the provided elements are part of the universe too. Please note: it doesn't modify the universe domain (no element is added nor removed); however, since we added the modify u clause, we need to state that explicitly, or the contract may imply that u.dom was modified (in an unspecified way).

val union : 'a element -> 'a element -> unit
(*@ union [u: 'a universe] x y
requires Set.mem x u.dom
requires Set.mem y u.dom
modifies u
ensures u.dom = old u.dom *)

Find Your Representative

We can now talk about all the elements in our partition, but we still haven't mentioned the elements' representatives. Each element in the universe has a representative in the set, so we can represent this using a 'a element -> 'a element function. We can also add two invariants:

  • The representative of an element must live in the same universe as the element itself.
  • The rep function is idempotent: the representative of an element is its own representative.
(*@ type 'a universe *)
(*@ mutable model dom : 'a element set
mutable model rep : 'a element -> 'a element
with u
invariant forall e. Set.mem e u.dom -> Set.mem (u.rep e) u.dom
invariant forall e. Set.mem e u.dom -> u.rep (u.rep e) = u.rep e *)
caution

Notice how in both cases, speaking of the element's representative only makes sense for elements that live in the universe. However, Gospel's logic is total, so rep is also defined outside of the universe; however, it's unspecified there.

Now let's add clauses to our functions that indicate how they interact with rep. After a call to our constructor, the created element is obviously its own representative, and all other representatives are left unchanged:

val make : 'a -> 'a element
(*@ e = make [u: 'a universe] v
modifies u.dom
ensures not (Set.mem e (old u.dom))
ensures u.dom = Set.add e (old u.dom)
ensures u.rep = (old u.rep)[e -> e] *)
note

The _[_ -> _] operator is defined in the standard library. The notation f[x -> y] is a shorthand notation for the function defined as fun i -> if i = x then y else f i

The find function doesn't modify the representatives, but they return the input element's representative:

val find : 'a element -> 'a element
(*@ e = find [u: 'a universe] x
requires Set.mem x u.dom
ensures Set.mem e u.dom
ensures e = u.rep x *)

Finally, the postcondition for union is a bit more involved. We need to capture that the new representative of an element may change.

First, it's left unchanged if the element is not in x nor y subset.

Let's start by introducing a predicate that will help us decide if two elements are in the same subset (or equivalence class). This is the case if and only if they have the same subset representative:

(*@ predicate equiv (u: 'a universe) (x y: 'a element) =
u.rep x = u.rep y *)

We can now use this to state that elements that aren't equivalent to x or y have the same representative:

val union : 'a element -> 'a element -> unit
(*@ union [u: 'a universe] x y
requires Set.mem x u.dom
requires Set.mem y u.dom
modifies u
ensures u.dom = old u.dom
ensures forall e.
not (old (equiv u x e \/ equiv u y e))
-> u.rep e = old (u.rep e) *)

And finally, elements that were in the unioned subsets now must all have the same representative, and that element is either the old x representative, or the old y representative.

val union : 'a element -> 'a element -> unit
(*@ union [u: 'a universe] x y
requires Set.mem x u.dom
requires Set.mem y u.dom
modifies u
ensures u.dom = old u.dom
ensures forall e. not (old (equiv u x e \/ equiv u y e))
-> u.rep e = old (u.rep e)
ensures exists r. (r = old (u.rep x) \/ r = old (u.rep y))
/\ forall e. old (equiv u x e \/ equiv u y e)
-> u.rep e = r *)

We could go further and add more functions, like an equality function over elements or a get function to extract an element's value, but we'll keep it there for this tutorial. Hopefully, this gives you a better overview of the purpose of ghost types in Gospel specifications and how they can help you refer to meta-elements not present or not exposed in the code.


  1. If you're not comfortable with ghost arguments, you may want to read our Fibonacci walk-through first.