Types and values

2024-09-25

Analyzing the distinction that is easily ignored, and a soft introduction to GADTs.

When most people think about types, they usually view them as describing the underlying data associated with the type’s name. This is generally good practice – if I tell you that I have a value of type int, you can probably infer a couple of things about that value:

its purpose
- numbers count things, and integers are numbers
its size
- lets assume we’re dealing with word-sized integers here
its shape

This also extends to parameterized or generic types. We almost instinctually comprehend that a value of type List<int> is simply a list of “things” where those things are integers.

With all that being said, many programmers will scratch their heads when they see something like the following OCaml code:

type _ t =
  | Int: int -> string t

or the equivalent Haskell:

data T a where
  Int :: Int -> T String

It’s unfortunate that the syntax of both languages is relatively arcane, but the concept above is pretty simple:

we define a new data type t
we specify that t has a single constructor, Int which takes a value of type int

Until now, this could be represented in any language that has sum types, like Rust:

enum T {
    Int(i64)
}

The confusing bit is when I actually construct a value:

let my_t = Int i (* has type string t *)

Huh? Where is the string? We haven’t seen any "s or string values, how could this value possibly have type string t?

I’ll admit, this was pretty confusing to me as well when I first started learning about the more advanced features of the OCaml type system. But the crucial thing to realize is that, at the end of the day, types are just names.

They’re useful names, no doubt. If I’m calling a library function that takes two integers (like add), the fact that we have these names means that I can’t pass a string or even an Option<int> into this function. I as the programmer need to put in the work to prove that I can call add with two Option<int>s, usually using something like bind and map.

But to drive home this point, I think its useful to think about the cardinality of types. A type’s cardinality is essentially the number of values that that type can take on. Here are a couple (relatively) language agnostic examples:

void - this is known as the “absurd” type, because it can’t take on any values.
uint8_t - an unsigned 8-bit integer can take on any value from $0$ to $2^{8} - 1 = 255$ .
string - this one’s a bit trickier, but we can assume that we have a computer with infinite memory, meaning that the cardinality of this type is also infinite.
Option<T> - because Options can either be Some v or None, the cardinality of Option<T> is $|\mathtt{T}| + 1$ .

Ok, what’s the big deal? Try running this exercise using your language of choice:

Is it possible to construct a generic/parameterized type such that the cardinality of that type is less than the cardinality of the parameter?

There is one basic rule: the parameterized type has to use its parameter in some non-trivial way. i.e. the following Java class won’t cut it:

class MyClass<T> {
  private int foo;

  public int getFoo() {
    return this.foo;
  }
}

As far as I know, in most languages this is not possible. It’s only narrowly possible in Rust using the std type PhantomData<T>, and even that has some weird quirks. You may be saying “well of course this isn’t possible because it’s useless – why would I ever want to use a type that doesn’t immediately correspond with some value?”. In order to answer that, lets take another look at that strange OCaml type.

What is the cardinality of that type? Well, we can only construct a 'a t if we provide an int, so that means $|\mathtt{t}| = |\mathtt{int}|$ . But because 'a t only has the Int constructor, any value of 'a t must be a value of string t. But we established above that $|\mathtt{string}| = \infty$ . And so we’ve done it: we’ve constructed a type (string t) where its cardinality ( $2^{n} - 1$ ) is less than its parameter’s ( $\infty$ ).

Even though I’ve spent a decent amount of time talking about it, the cardinality here isn’t what we actually care about. Rather, it’s simply a proxy for showing that most languages only let you define types that imply the existence of an associated value.

I somewhat violated my rule from above where I said that the parameter has to be used in some non-trivial way. In order to show how that’s possible, let’s build a new type:

type (_, _) eq = Eq : ('a, 'a) t

Again, you can be forgiven for not understanding what’s going on here. What I’ve done is define a new type eq with two generic parameters. eq has a single constructor Eq that takes no data but can only be constructed over two equal types.

I implore you to use this link to try this out, because it’s still not immediately intuitive.

Before we get into why this is useful, we should first examine the cardinality of this type. Lets use the trivial example of (int, int) eq, meaning “the type int is equivalent to the type int”. As we’ve shown before, the cardinality of an integer with $n$ bits of precision is $2^{n} - 1$ . We have two of them in our type, so the minimum “implied” cardinality is $\left(2^{n} - 1\right) \cdot \left(2^{n} - 1\right) = \left(2^{n} - 1\right)^{2}$ . But (int, int) eq has a single constructor with no data, so its cardinality is actually $1$ .

Ok, so why is this useful? I’ll grant that proving that int is equivalent to int is not useful, but what if we could prove that some abstract type t is equivalent to int? That would mean that we could cast any value of t to int and be sure that we won’t end up with any memory issues. You might think of this as a safer version of Java’s instanceof – it’s safer because instanceof relies on runtime checks (i.e. reflection), whereas we can do this completely statically. In fact, the types of OCaml values are completely erased after the code is compiled, meaning we couldn’t use something like reflection even if we tried.

If you want to play around with this, try starting here. We define an abstract type, PrivInt.t, and expose a proof that t is equal to int. This allows us to do transparent and safe casting of PrivInt.one to int.

Similar to how it was intuitive what a List<int> is, with a little bit of reframing, the semantics of ('a, 'b) eq can be equally clear. The key is to understand that types can be used to describe other types themselves, rather than only values. Even though ('a, 'b) eq does tell you something about values of type 'a and 'b, it tells you more about the types themselves (i.e. they’re equal).

How does all of this interact with type safety? Remember earlier when I said that the onus is on the programmer to prove that a value of type Option<int> can be mapped to a value of type int? The same is true here – I can parameterize my type however I want, at the end of the day its still my burden to prove that my foo t is mappable to foo.

One of the most prolific OCaml compiler maintainers, octachron, has a great StackOverflow post explaining when the above type might come in handy. In the first sentence of his post, he says:

The equality type [..] is probably the quintessential example of GADTs

Tada! I’ve taught you about GADTs without you even knowing it. The formal definition of what a GADT is is pretty complicated, but I think the easiest way to understand them is that they allow you to divorce types from their value representations and establish constraints on those types. A constraint of the eq type is that both of it’s parameters must be equal. And even though (int, int) eq references int twice, we don’t actually carry around any integers when we pass a value of it around.

Programming this way can feel extremely clunky at first, but once you start getting the hang of it, GADTs become a powerful tool that you reach for more than you might initially expect.

The type/value distinction extends beyond GADTs, but they provide a simple example of how to realize it. If you want to learn more about GADTs or how you can more effectively use types, I suggest you write some OCaml 🙂.