You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello Internet. In this blog post we are going to learn some type theory, which
is a field that lies in the intersection of mathematics and computer science. We
are eventually going to learn the variant called Martin-Löf Type Theory, or MLTT for short.
To motivate it, we are going to try answering the question, “What is a valid
program?”
Not every program makes sense, for example what would a program consisting of
the single expression x + 2 mean? If we don’t know what x is, then we don’t
know what the meaning of that expression is. More importantly, a program can
combine expressions that are incompatible with each other. For example, if +
is the usual operation that adds two numbers then an expression like "abc" + 5
can’t be given a meaning. To avoid such meaningless programs, we should talk
about the different kinds of data and the valid operations on these pieces of
data. In Type Theory, we call the different kinds of data “types.” Type theory
gives each piece of data a type, and the type tells you how the expression is
constructed and how to write a meaningful program that processes that data. We
will be working at an informal level of detail, much more detail will be
required if for example, we want to implement a programming language based on
type theory. Also MLTT is quite complex at first glance, so we’ll build our way
there in small steps.
System T
System T has two types. The first one is natural numbers, whose type we’ll call
$\mathbb{N}$. The second one is functions $A → B$, which take a value of type
$A$ and return a a value of type $B$. So an example function type will be
$\mathbb{N} → \mathbb{N}$, any value of this type will be a function that maps
natural numbers to natural numbers. Before we jump in, it’ll be useful to
introduce some notation. An obviously useful piece of notation will be $x : A$,
which means $x$ has the type $A$, for example we expect to prove that $5 :
\mathbb{N}$, meaning $5$ is a natural number. Occasionally, I will say 5 is
an element of the type of natural numbers, to mean $5 : \mathbb{N}$.
Let’s start defining the function type.
What is a function $A → B$? It is a machine, that allows you to put in a value of type
$A$ in and get out a value of type $B$. For example, the statement $(+1) :
\mathbb{N} → \mathbb{N}$ means that assuming you have a natural number $x :
\mathbb{N}$, you can get another one $(+1)(x) : \mathbb{N}$. To express this
idea more formally, we need a way of asserting that “Given certain assumptions,
some value has the type $A$”. We write this as $Γ \vdash x : A$, which
means given the assumptions $Γ$, the value $x$ has the type $A$. This assertion
is pronounced “$Γ$ entails that $x$ has the type $A$. The
assumptions will $Γ$ be of the form $x_1 : A_1, x_2 : A_2, \cdots$, which
simply means that we are assuming that $x_1$ has the type $A_1$, $x_2$ has the
type $A_2$ and so on. In Type Theory, we call the rule that tells you how to use
a value of a given type the elimination rule. So for functions we get the
following:
Elimination Rule for Functions If $Γ \vdash f : A → B$, and $Γ
\vdash a : A$ then $Γ \vdash f(a) : B$. In other words, if certain
assumptions allow you to make a function then you can plug in a value of the
appropriate type and get a particular type of output. Technically speaking we
need one such axiom for every expression $f$, $A$ and $B$, and we need to write
down a grammar that defines all the valid expressions, I’ll leave that to you
it’s not that difficult.
Okay, but how do you make a function? Well, let’s say you are making a
function with the type $\mathbb{N} → \mathbb{N}$, for example. You have to
create a program with a variable $x$, which will be a placeholder for the input
to the function. But, in order for your function to really have the type
$\mathbb{N} → \mathbb{N}$, you can’t just use any old expression involving
$x$, you have to make sure that whenever $x$ is a natural number then that
expression is also a natural number. So, in conclusion if you have an expression
$φ$ such that given the assumption $x : \mathbb{N}$, we have $φ :
\mathbb{N}$, then you can construct a function $\mathbb{N}$ → $\mathbb{N}$.
The notation for this function is $λ \, x.φ$ which is shorthand for
“The function which takes $x$ as input and returns $φ$ as output. If we
generalise to arbitrary input and output types, we get this rule:
Introduction Rule for Functions If $Γ, x : A \vdash φ : B$, then
$Γ \vdash λ \, x.φ : A → B$.
Let me give an example of using the above two rules. Let’s assume that we have a
function that adds one to a natural number $(+1) : \mathbb{N} → \mathbb{N}$.
How do we define a function that adds two to a natural number? Well, $(+1) :
\mathbb{N} → \mathbb{N}, x : \mathbb{N} \vdash (+1)(x) : \mathbb{N}$ by the
elimination rule for functions. Applying the elimination rule again, we have
$(+1) : \mathbb{N}, x : \mathbb{N} \vdash (+1)((+1)(x)) : \mathbb{N}$. The
introduction rule then allows us to form the function $λ \, x.
(+1)((+1)(x)) : \mathbb{N} → \mathbb{N}$, that takes in $x$ and adds one to it
twice. I’ll introduce some shorthand for this definition. Let us write $f(x) :≡ y$ to mean
that the function $f$ is defined to be equal to $λ x.y$, so for example we would write
$$\mathrm{plus\_two}(x) :≡ (+1)((+1)(x))$$
The elimination rule tells us that we use a function by applying it to input.
The natural question is, “What is the resulting value of the expression?” Since,
we hope, every function is made using the introduction rule, we just have to
specify what $(λ \, x.φ)(y)$ means when $y : A$. Intuitively, what
should happen is that we substitute $y$ for $x$ in $φ$, however defining this
is quite difficult. Let’s say that, after a lot of work you managed to define
integration, and a type of real numbers $\mathbb{R}$. Consider the function
$λ \, x.∫^2_1 x - y\, dy$. Assuming that $y : \mathbb{R}$, how do we
calculate $\left(λ \, x. ∫^2_1 \, x - y \, dy\right)(y)$? The wrong
thing to do would be to simply replace the symbol $x$ for the symbol $y$ and get
$∫^2_1 \, y - y \, dy$, because the two uses of $y$ are different, one is the
variable we are integrating with respect to and the other is the input to the
function. Making sure that different variables which have the same name don’t
get confused with each other, so that instead of the wrong answer we get
something like $∫ ^2_1 y - t\, dt$ is the primary difficulty with actually
defining substitution. For now, we’ll just sweep this detail under the rug. So
what we want to say is that the value of $(λ \, x ⋅ φ)(y)$ is
defined to be $φ$ but with $x$ substituted to be $y$. This motivates the idea
of definitional equality. We say that two values of the same type, $a, b : A$
for example, are definitionally equal if $a$ is equal to $b$ by definition. This
means that whenever we have $a$ we can always swap it with $b$ and get the same
program. To give an example, given an appropriate definition of addition $2 + 3$
will be definitionally equal to $5$, but if we assume that $x, y : \mathbb{N}$
then $x + y$ won’t be definitionally equal to $y + x$. The definition of
addition allows one to simplify$2 + 3$ as being equal to $5$, but $x + y$
can’t be further simplified. Indeed, one actually has to prove that $x + y = y +
x$, it isn’t a simple matter of using the definition of addition to write $x +
y$ as $y + x$. The notation is $Γ \vdash a ≡_A b$, which means $a$ and
$b$ are definitionally equal to each other as elements of the type $A$. So,
without further ado:
Computation rule for functions If $Γ, x : A \vdash φ : B$, and $Γ
\vdash y : A$ then $Γ \vdash (λ \, x.φ)(y) ≡_B φ[x/y]$,
where $φ[x/y]$ just means “$φ$ with $x$ substituted with $y$.”
Let me give an example of the above rule. Given our good old $(+1)$ function, we
derived that $λ x \,. (+1)((+1)(x)) : \mathbb{N} → \mathbb{N}$, and so
by the computation rule for functions $(λ x \, . (+1)((+1)(x)))(5)
≡\mathbb{N} (+1)(+1)(5)$. From now I will be lazy and write $≡$
instead of $≡_B$ for example.
The last thing we want to say, is that every function can be written uniquely
using $λ$. This results in the totally obvious:
Uniqueness principle for functions If $Γ \vdash f : A → B$, then
$Γ \vdash f ≡ (λ \, x . f(x))$.
You might have the question, “How do we talk about functions like $+$,
which take two arguments?” The answer is quite elegant actually, a
function that takes two arguments, one of type $A$ and the other of
type $B$ and returns something of type $C$, is the same thing as
a function with the type $A → (B → C)$. Such a function $f : A → (B → C)$,
takes the first argument as input, and returns a function that takes the
second argument and outputs the result. For example, we will soon be able
to define $+ : \mathbb{N} → (\mathbb{N} → \mathbb{N})$, such that
$+(3) : \mathbb{N} → \mathbb{N}$, is the function that adds three to its input. In
a similar way a function with three inputs will have a type like $A → (B → (C → D))$,
which is quite cumbersome to write. So I’ll be lazy and write $f : A → B → C → D$,
and while I’m at it I’ll be lazier and write $f(a, b, c)$ instead of $f(a)(b)(c)$, for
example. The idea of defining multi argument functions like this is called currying.
Let’s move on from functions and think about how to define the natural numbers.
The pattern will be mostly the same, we will give introduction, elimination and
computation rules.
Introduction Rule for Natural Numbers We always have $Γ \vdash 0 : \mathbb{N}$,
i.e. zero is a natural number. Further if, $Γ \vdash 0 : \mathbb{N}$ then
$Γ \vdash \mathrm{succ}(n) : \mathbb{N}$. The function $\mathrm{succ}$ allows
you to construct new natural numbers from old ones, by adding one to them. So, given
no assumptions $0 : \mathbb{N}$, and so $\mathrm{succ}(0) : \mathbb{N}$ as well. The
name humans give to $\mathrm{succ}(0)$ is $1$, and similarly $\mathrm{succ}(\mathrm{succ}(0)) : \mathbb{N}$
which gives us the number $2$ and so on. We call $0$ and $\mathrm{succ}$ the constructors
for $\mathbb{N}$, a constructor is a fundamental way of making an element of a type,
these constructors are part of the definition of the type.
Elimination rule for Natural Numbers We want to make a function that takes a
natural number as input. Let’s think about one of the simplest such functions,
the factorial. The factorial is defined like this, assuming a proper definition of multiplication:
At first glance, it does seem kind of circular to define any function in terms of itself,
like what the second equality tries to do. However, this definition makes sense, and
always produces a natural number $n! : \mathbb{N}$ for any $n : \mathbb{N}$, here is
some intuition about how this works. Say we want to simplify $4!$, we can use the definition
like this:
In the above calculation we use the definition $4 ≡ \mathrm{succ}(3)$,
and apply the second equation above. We repeat this process until we
get to $0$, after which we may apply the first equation. So the reason such
“circular” definition works, is because we reduce the number step by step from
$\mathrm{succ}(n)$ to $n$ repeatedly until we get $0$, and we know that this will
eventually reach $0$ since $0$ and $\mathrm{succ}$ are the only ways we gave to
make a natural number given in the introduction rule. Computer scientists may recognise
this idea as recursion. So, we should define functions $f : \mathbb{N} → A$ by their
values $f(0) : A$ and $f(\mathrm{succ}(n)) : A$, and we should be able to define $f(\mathrm{succ}(n))$
in terms of $n : \mathbb{N}$ and $f(n) : A$. This elimination rule is kind of
intuitive given the introduction rules above. Now, some definitions may look valid, but may run forever
for example if we define:
Then, we are simply defining the value of the function as itself, which doesn’t
help one to compute the value of the function. Let’s say I try to work out
$f(4)$, we’ll the second rule allows me to write this as $f(4)$, and I can
apply the second rule again and get $f(4)$, and so on, we’re just going in circles. To
prevent this, the elimination rule is carefully designed to make sure that when you
define $f(\mathrm{succ}(n))$ you may only use the values of $n$ and $f(n)$, and
in particular you can’t use say $f(n+1)$. So here is the elimination rule:
Given the following data
$e_0 : A$
$e_\mathrm{succ} : \mathbb{N} → A → A$
We have:
$rec_\mathbb{N}(e_0, e_\mathrm{succ}) : A → \mathbb{N}$
Here I’m being lazy again, and I’m leaving out the assumptions $Γ$ since
obviously these rules will apply whatever assumptions you have. The first piece
of data you need to make a function $f : A → \mathbb{N}$, is the value of $f(0)$, this
is given by $e_0$. The next piece of data you need, is the value of
$f(\mathrm{succ}(n))$ given $n$ and the value of $f(n)$, this is provided by
$e_\mathrm{succ}(n)$ which is a function. The function $e_\mathrm{succ}$ takes
as input $n$, the value of $f(n)$ and returns the value of $\mathrm{succ}(n)$,
phrasing the elimination rule this way makes sure that we can only
define sensible functions by recursion. We call $rec_\mathbb{N}$ the recursor
for the type $\mathbb{N}$ of natural numbers.
Let us do an example. We will define $+ : \mathbb{N} → \mathbb{N} →
\mathbb{N}$: We will have to write $+(x) :≡ \cdots$ where $\cdots$ is a
function $\mathbb{N} → \mathbb{N}$, whose definition must depend on $x$. We
will define this function by recursion, since we want $+(x, 0) :≡ x$, and
$+(x, succ(y)) :≡ \mathrm{succ}(+(x, y))$, so here we go:
Now strictly speaking, we have to write the output type of the recursor but it turns out
that there is an automatic algorithm for figuring output types, that even works in languages
much more general than System T. So I’ll usually leave them out.
The only thing we need to make sure is that we really have $+(x, 0) ≡ x$
and $+(x, \mathrm{succ}(y)) ≡ \mathrm{succ}(y)$, and for this we need a
computation rule.
Computation rule for Natural Numbers Given $e_0 : A$, $e\mathrm{succ} : \mathbb{N} → A → A$
we have:
The equalities for $+$ follow as a special case of this. Now it’s your turn:
Exercise Write the definitions for addition, multiplication and exponentiation
using $rec_\mathbb{N}$. After that, just so you can see how
powerful $rec_\mathbb{N}$ can be when combined with other constructs we’ve
been talking about define the Ackermann function, $ack : \mathbb{N} → \mathbb{N}
→ \mathbb{N}$, which satisfies the following equalities:
One of the things we would want to add to System T are pairs, so that we
can talk about multiple pieces of data at once. For this we define the product
type $A × B$ of pairs $(a, b)$ where $a : A$ and $b : B$, the rules are
intuitive enough that I would encourage you to figure them out yourself before
reading on.
Introduction Rule Given $a : A$ and $b : B$, we have $(a, b) : A × B$
Elimination Rule Given $f : A → B → X$, we have $recA × B(f) : (A × B) → X$
Computation Rule Given $f : A → B → X$, $a : A$, and $b : B$ we have:
$$recA × B(f)((a, b)) ≡ f(a, b)$$
Finally, often in programming we have to talk about either
having one kind of data or a different kind. For example, we might want to say
“When I look up a value in a table, I either get a result or I get nothing.” For
this we will introduce the disjoint union $A + B$ which either contains a value
of type $A$ or a value of type $B$.
Introduction Rule Given $a : A$, we have $\mathrm{inl}(a) : A + B$, and similarly
given $b : B$ we have $\mathrm{inr}(b) : A + B$.
Elimination Rule Given $e_\mathrm{inl} : A → X$ and $e_\mathrm{inr} : B → X$
we have $recA + B(e_\mathrm{inl}, e_\mathrm{inr}) : A + B → X$, or in other
words if you can get an $X$ from either an $A$ or a $B$ then you can get an $X$ from
a value with the type $A + B$.
Computation Rule Given $e_\mathrm{inl} : A → X$, $e_\mathrm{inr} : B → X$,
we have:
Assuming $a : A$, we have $recA + B(e_\mathrm{inl}, e_\mathrm{inr})(\mathrm{inl}(a)) ≡ a$.
Assuming $b : B$, we have $recA + B(e_\mathrm{inl}, e_\mathrm{inr}(b)(\mathrm{inr}(b)) ≡ b$.
Now, we’ll introduce some more abbreviations. It is quite cumbersome to write
out recursors like $recA + B$ all the time, so I’ll use what’s called
pattern matching notation. Here is an example, we can define $+$ using pattern
matching like this:
$latex \begin{aligned}
x + 0 &:≡ x
x + \mathrm{succ}(n) &:≡ \mathrm{succ}(x + n)
\end{aligned}$
So first of all I wrote $x + 0$ for example, instead of $+(x, 0)$, also I just
wrote the definitional equalities I want satisfied instead of defining the
function with the recursor. Of course, if one is too sloppy with pattern
matching notation one can get functions that aren’t definable with the recursor,
but once you’ve defined a few functions with the recursor you get a lot of
intuition about what how to translate between pattern matching and recursors. In fact,
many programming languages based on type theory can automatically translate between
recursors and pattern matching.
Logic
Now you may be wondering, “How does this have anything to do
with mathematics?” Well, after very little more we can already do logic.
The main idea is that you can define any statement, by describing what
it would take to prove that statement. For example, you prove the statement
“$A$ and $B$ are true,” by giving two proofs, you first prove $A$
and then you prove $B$, this constitutes a definition of the the
statement “$A$ and $B$ are true.” So we will model propositions as types,
where a proposition is viewed as the same thing as the type of its proofs.
So let’s say you have a proposition $A$, and another one $B$. We will
call the type whose elements are proofs of $A$, by the same name
and we will do the same thing with $B$, since there a statements meaning
is given by the data one needs to prove it there is no need to distinguish
between the two.
So what is a proof of “$A$ and $B$”? Well, it is just a pair of proofs,
one for $A$ and another for $B$, which just an element of the product
type $A × B$. Similarly, what is a proof of “$A$ or $B$”? It’s just
either a proof of $A$, or a proof of $B$ or in other terms an element
of the disjoint union $A + B$. In the same vein, the statment “$A$ implies
$B$,” is proven by giving a process that transforms proofs of $A$ into
proofs of $B$, in other words a function $A → B$. The only thing we are missing to do
propositional logic is the propositions true ($\top$) and false ($\bot$). Let’s define them
using propositions as types, starting with true.
How do you prove the proposition “true.” Well, the proposition “true,”
is trivially true, and in particular you don’t need to do anything to
prove it true. So the corresponding type, which we call $\mathbf{1}$,
has a constructor that is trivial in the sense that it doesn’t require
any arguments. Let’s define $\mathbf{1}$, but again I would advise
you to come up with its introduction, elimination and computation rules
yourself
Introduction Rule We always have $* : \mathbf{1}$.
Elimination Rule Given $e_* : X$, we have a function $rec_\mathbf{1}(e_*) : \mathbf{1} → X$.
Computation Rule Assuming $e_* : X$, we have $rec_\mathbf{1}(e_*, *) ≡ e_*$.
Okay, how about “false.” How do you prove “false”? The short answer is that you
can’t, “false” is not trivially provable. If our logic is inconsistent, then
maybe you can make a proof of false but it won’t be a simple matter of invoking
one of the constructors. However, one can prove in the standard foundation of
mathematics that this logic is consistent, and so it’s just as reliable as any
result in mathematics. Now on to the (very simple definition) of the type
$\mathbf{0}$ which corresponds to the proposition “false.”
There is no introduction rule, since there shouldn’t be a way of proving false.
Elimination Rule Many people find this elimination rule tricky, so let me
phrase it this way. The elimination rules for $A + B$ and $\mathbb{N}$ have two
cases, since these types have two constructors. The elimination rule for say,
$\mathbf{1}$ has one case since it has one constructor. Since $\mathbf{0}$ has
no constructors, it’s elimination rule should have no cases, we can always just
claim $rec_\mathbf{0} : \mathbf{0} → X$ since there are no constructors to deal with.
Logicians call this principle ex falso quodlibet, which means “from the false
anything follows,” if you assume that false things are true (or equivalently
that $\mathbf{0}$ has an element), then you can prove whatever you want from
that.
The last logical notion we need to define, is the proposition “not A”. We
will simply define this as the type $A → \mathbf{0}$, since the statement
“A is not true” means the same thing as “A implies a falsehood.” Thinking
in terms of proofs, you can prove the statement “not A”, by assuming it to
be true and showing something that can’t be true. However, such a proof
is just a function $A → \mathbf{0}$ which takes a proof of $A$ as input
and returns a proof of $\mathbf{0}$, a false statement, as output.
Let’s prove some logical theorems by writing them out as computer programs
in System T. Let’s first try to prove $(A → \mathbf{0}) + (B → \mathbf{0})
→ (A × B) → \mathbf{0}$, or in English “not A or not B implies,
not (A and B).” I’ll write the proof using pattern matching notation, it’s
quite a good exercise to translate this into recursors. So,
there are two possible cases for the first input:
$latex \begin{aligned}
p &: (A → \mathbf{0}) + (B → \mathbf{0}) → (A × B) → \mathbf{0}
p(\mathrm{inl}(notA), \cdots) &:≡ \cdots \
p(\mathrm{inr}(notB), \cdots) &:≡ \cdots
\end{aligned}$
After pattern matching on the pair, assuming it is of the form
$(a, b)$ we get this:
$latex \begin{aligned}
p &: (A → \mathbf{0}) + (B → \mathbf{0}) → (A × B) → \mathbf{0}
p(\mathrm{inl}(notA), (a, b)) &:≡ \cdots \
p(\mathrm{inr}(notB), (a, b)) &:≡ \cdots
\end{aligned}$
Now each of the $\cdots$ has to have the type $\mathbf{0}$, seeing
as we can’t explicitly construct $\mathbf{0}$ we have to use some
of our input data somehow. However, in the first case, we have
$a : A$ and $notA : A → \mathbf{0}$ so $notA(a) : \mathbf{0}$,
and we can apply a similar thing to the second case. Here is the finished proof
$latex \begin{aligned}
p &: (A → \mathbf{0}) + (B → \mathbf{0}) → (A × B) → \mathbf{0}
p(\mathrm{inl}(notA), (a, b)) &:≡ notA(a) \
p(\mathrm{inr}(notB), (a, b)) &:≡ notB(b)
\end{aligned}$
Indeed the proof reads almost naturally, once you get used to type theory. The
two lines of the type theoretic proof, are exactly the two cases of an informal
proof, so you could read the proof like this: “Suppose (not A) or (not B),
then there are two cases. Firstly, we may have (not A) in which case if
we also have A and B, then we have A and so (not A) and A is true which is
a contradiction. Secondly, we may have (not B), and assuming (A and B) we have
B and in particular notB and B is true which is a contradiction.” This is
one of the things that makes type theory an elegant foundation of mathematics,
whereas traditionally you would have data encoded as sets and you would reason
about that data separately using logic, both of these mathematical activities
type theoretically speaking are just programming, and the idea of a type unifies
them both.
Anyways, that’s quite enough for today. Next time, we’ll generalise
type theory by making every type a value, i.e. an element of another type. This
will allow us to use propositions as types to reason about our programs, so that
we can prove that, for example a certain algorithm sorts a list, or more simply
that addition is commutative. We will introduce new types which exploit this,
which will lead to a logic with quantifiers and equality. But for now, here are some
exercises
Exercises
Prove the following statements using propositions as types:
If $A$, then (if $B$ then $A$)
If $A$, then (not (not $A)$.
If $A$ or $B$, and $A$ implies $C$, and $B$ implies $C$ then $C$.
Let’s say that instead of the recursor $rec_\mathbb{N}$, I gave
you the following function, which basically is the recursor but it
doesn’t keep track of the current number while recursing:
$latex \begin{aligned}
iter &: A → (A → A) → \mathbb{N} → A
iter(c_0, c_\mathrm{succ}, 0) &:≡ c_0 \
iter(c_0, c_\mathrm{succ}, \mathrm{succ}(n)) &:≡ c_\mathrm{succ}(iter(c_0, c_\mathrm{succ}, n))
\end{aligned}$
Can you define $rec_\mathbb{N}$ such that it satisfies the definitional equalities
we listed before?