Types and values
Analyzing the distinction that is easily ignored, and a soft introduction to GADTs.
When most people think about types, they usually view them as
describing the underlying data associated with the type’s name. This is
generally good practice – if I tell you that I have a value of type
int
, you can probably infer a couple of things about that
value:
-
its purpose
- numbers count things, and integers are numbers
-
its size
- lets assume we’re dealing with word-sized integers here
- its shape
This also extends to parameterized or generic types. We almost
instinctually comprehend that a value of type
List<int>
is simply a list of “things” where those
things are integers.
With all that being said, many programmers will scratch their heads when they see something like the following OCaml code:
type _ t =
Int: int -> string t |
or the equivalent Haskell:
data T a where
Int :: Int -> T String
It’s unfortunate that the syntax of both languages is relatively arcane, but the concept above is pretty simple:
-
we define a new data type
t
-
we specify that
t
has a single constructor,Int
which takes a value of typeint
Until now, this could be represented in any language that has sum types, like Rust:
enum T {
i64)
Int(}
The confusing bit is when I actually construct a value:
let my_t = Int i (* has type string t *)
Huh? Where is the string
? We haven’t seen any
"
s or string
values, how could this value
possibly have type string t
?
I’ll admit, this was pretty confusing to me as well when I first started learning about the more advanced features of the OCaml type system. But the crucial thing to realize is that, at the end of the day, types are just names.
They’re useful names, no doubt. If I’m calling a library function
that takes two integers (like add
), the fact that we have
these names means that I can’t pass a string
or even an
Option<int>
into this function. I as the programmer
need to put in the work to prove that I can call add
with
two Option<int>
s, usually using something like
bind
and map
.
But to drive home this point, I think its useful to think about the cardinality of types. A type’s cardinality is essentially the number of values that that type can take on. Here are a couple (relatively) language agnostic examples:
-
void
- this is known as the “absurd” type, because it can’t take on any values. -
uint8_t
- an unsigned 8-bit integer can take on any value from to . -
string
- this one’s a bit trickier, but we can assume that we have a computer with infinite memory, meaning that the cardinality of this type is also infinite. -
Option<T>
- becauseOption
s can either beSome v
orNone
, the cardinality ofOption<T>
is .
Ok, what’s the big deal? Try running this exercise using your language of choice:
Is it possible to construct a generic/parameterized type such that the cardinality of that type is less than the cardinality of the parameter?
There is one basic rule: the parameterized type has to use its parameter in some non-trivial way. i.e. the following Java class won’t cut it:
class MyClass<T> {
private int foo;
public int getFoo() {
return this.foo;
}
}
As far as I know, in most languages this is not possible. It’s only
narrowly possible in Rust using the std
type
PhantomData<T>
, and even that has some weird quirks.
You may be saying “well of course this isn’t possible because it’s
useless – why would I ever want to use a type that doesn’t immediately
correspond with some value?”. In order to answer that, lets take another
look at that strange OCaml type.
What is the cardinality of that type? Well, we can only construct a
'a t
if we provide an int
, so that means
.
But because 'a t
only has the Int
constructor,
any value of 'a t
must be a value of
string t
. But we established above that
.
And so we’ve done it: we’ve constructed a type (string t
)
where its cardinality
(
)
is less than its parameter’s
(
).
Even though I’ve spent a decent amount of time talking about it, the cardinality here isn’t what we actually care about. Rather, it’s simply a proxy for showing that most languages only let you define types that imply the existence of an associated value.
I somewhat violated my rule from above where I said that the parameter has to be used in some non-trivial way. In order to show how that’s possible, let’s build a new type:
type (_, _) eq = Eq : ('a, 'a) t
Again, you can be forgiven for not understanding what’s going on
here. What I’ve done is define a new type eq
with two
generic parameters. eq
has a single constructor
Eq
that takes no data but can only be constructed
over two equal types.
I implore you to use this link to try this out, because it’s still not immediately intuitive.
Before we get into why this is useful, we should first examine the
cardinality of this type. Lets use the trivial example of
(int, int) eq
, meaning “the type int
is
equivalent to the type int
”. As we’ve shown before, the
cardinality of an integer with
bits of precision is
.
We have two of them in our type, so the minimum “implied” cardinality is
.
But (int, int) eq
has a single constructor with no data, so
its cardinality is actually
.
Ok, so why is this useful? I’ll grant that proving that
int
is equivalent to int
is not useful, but
what if we could prove that some abstract type t
is
equivalent to int
? That would mean that we could cast any
value of t
to int
and be sure that we won’t
end up with any memory issues. You might think of this as a safer
version of Java’s instanceof
– it’s safer because
instanceof
relies on runtime checks (i.e. reflection),
whereas we can do this completely statically. In fact, the types of
OCaml values are completely erased after the code is compiled, meaning
we couldn’t use something like reflection even if we tried.
If you want to play around with this, try starting here.
We define an abstract type, PrivInt.t
, and expose a proof
that t
is equal to int
. This allows us to do
transparent and safe casting of PrivInt.one
to
int
.
Similar to how it was intuitive what a List<int>
is, with a little bit of reframing, the semantics of
('a, 'b) eq
can be equally clear. The key is to understand
that types can be used to describe other types themselves,
rather than only values. Even though ('a, 'b) eq
does tell
you something about values of type 'a
and 'b
,
it tells you more about the types themselves (i.e. they’re equal).
How does all of this interact with type safety? Remember earlier when
I said that the onus is on the programmer to prove that a value of type
Option<int>
can be mapped to a value of type
int
? The same is true here – I can parameterize my type
however I want, at the end of the day its still my burden to prove that
my foo t
is mappable to foo
.
One of the most prolific OCaml compiler maintainers, octachron, has a great StackOverflow post explaining when the above type might come in handy. In the first sentence of his post, he says:
The equality type [..] is probably the quintessential example of GADTs
Tada! I’ve taught you about GADTs without you even knowing it. The formal
definition of what a GADT is is pretty complicated, but I think the
easiest way to understand them is that they allow you to divorce types
from their value representations and establish constraints on those
types. A constraint of the eq
type is that both of
it’s parameters must be equal. And even though
(int, int) eq
references int
twice, we don’t
actually carry around any integers when we pass a value of it
around.
Programming this way can feel extremely clunky at first, but once you start getting the hang of it, GADTs become a powerful tool that you reach for more than you might initially expect.
The type/value distinction extends beyond GADTs, but they provide a simple example of how to realize it. If you want to learn more about GADTs or how you can more effectively use types, I suggest you write some OCaml 🙂.