6. 结构体(Structures)
现代数学广泛使用了代数结构,这些代数结构封装了可在不同环境中实例化的模式,而且往往有多种方法来对它们进行定义和实例化。
因此,Lean 提供了相应的方法来形式化定义这些结构并对其进行操作。
此前你已经接触过一些代数结构的示例,比如在
Chapter 2
中的环(rings)和格(lattices)。
本章将解释之前出现过的方括号语法,比如 [Ring α]
、 [Lattice α]
,并介绍如何创建和使用自定义的代数结构。
如需了解更多技术细节,可以参考 Theorem Proving in Lean, 以及 Anne Baanen 的这篇论文 Use and abuse of instance parameters in the Lean mathematical library。
6.1. 定义结构体
广义上来说,结构体是对特定形式数据集合的约定,包括包含数据的形式以及这些数据要满足的一些约束条件。 而结构体的实例则是某一组满足约束的具体数据。例如,我们可以规定一个点是由三个实数组成的三元组:
@[ext]
structure Point where
x : ℝ
y : ℝ
z : ℝ
上面用到的 @[ext]
注解让 Lean 自动生成一个定理,内容是当该结构体的两个实例各组成部分对应相同时,这两个实例相等。该属性也称为外延性(extensionality)。
#check Point.ext
example (a b : Point) (hx : a.x = b.x) (hy : a.y = b.y) (hz : a.z = b.z) : a = b := by
ext
repeat' assumption
接着我们定义一个 Point
结构体的实例。
Lean 提供多种实例化方式来达成目的。
def myPoint1 : Point where
x := 2
y := -1
z := 4
def myPoint2 : Point :=
⟨2, -1, 4⟩
def myPoint3 :=
Point.mk 2 (-1) 4
第一个例子中,我们明确地指明了结构体的各个字段。
在定义 myPoint3
时用到的函数 Point.mk
叫做 Point
结构体的构造函数(constructor),用于构造结构体成员。
你也可以为构造函数指定一个不同的名字,比如 build
。
structure Point' where build ::
x : ℝ
y : ℝ
z : ℝ
#check Point'.build 2 (-1) 4
接下来的两个例子展示了如何定义结构体的函数。
第二个例子中明确指出了构造函数 Point.mk
中的字段名,而第一个例子则使用更为简洁的匿名构造函数。
Lean 能根据 add
的目标类型推断出所需的构造函数。
通常,我们会将与结构体(例如这里的 Point
)相关的定义和定理放在一个同名的命名空间(namespace)中。
在下面的示例中,由于我们启用了 Point
命名空间,所以 add
的完整名称实际是 Point.add
。
当命名空间没被启用时,就得使用完整名称。
不过这时也可以使用匿名投影记号(anonymous projection notation),它允许我们用 a.add b
代替 Point.add a b
。
因为 a 的类型是 Point
,Lean 能在没有开启对应命名空间的时候将 a.add b
推断为 Point.add a b
。
namespace Point
def add (a b : Point) : Point :=
⟨a.x + b.x, a.y + b.y, a.z + b.z⟩
def add' (a b : Point) : Point where
x := a.x + b.x
y := a.y + b.y
z := a.z + b.z
#check add myPoint1 myPoint2
#check myPoint1.add myPoint2
end Point
#check Point.add myPoint1 myPoint2
#check myPoint1.add myPoint2
接下来我们继续在相关命名空间中添加定义,但在引用的代码片段中会省略开关命名空间相关的指令。
在证明加法函数性质时,可使用 rw
来展开定义,并用 ext
将结构体两个实例之间的等号转化为它们各个组成部分之间的等号。
下面的代码使用 protected
关键字,使得在命名空间打开的情况下定理的名字依然是 Point.add_comm
。
当我们希望避免与泛型定理如 add_comm
产生歧义时,这样做是有帮助的。
protected theorem add_comm (a b : Point) : add a b = add b a := by
rw [add, add]
ext <;> dsimp
repeat' apply add_comm
example (a b : Point) : add a b = add b a := by simp [add, add_comm]
因为 Lean 能在内部自动展开定义并简化投影,所以有时我们需要的等式在定义上就自动成立。
theorem add_x (a b : Point) : (a.add b).x = a.x + b.x :=
rfl
与在
Section 5.2
中定义递归函数类似,我们定义函数时也可以使用模式匹配。
下面定义的 addAlt
和 addAlt'
本质上是相同的,形式上的的区别在于我们在后者的定义中使用了匿名构造函数。
虽然以此方式定义函数有时会显得更加简洁,并且结构化 η-规约(structural eta-reduction)也确保了这种简写在定义上的等价性,但这么做可能会为后续的证明带来不便。
def addAlt : Point → Point → Point
| Point.mk x₁ y₁ z₁, Point.mk x₂ y₂ z₂ => ⟨x₁ + x₂, y₁ + y₂, z₁ + z₂⟩
def addAlt' : Point → Point → Point
| ⟨x₁, y₁, z₁⟩, ⟨x₂, y₂, z₂⟩ => ⟨x₁ + x₂, y₁ + y₂, z₁ + z₂⟩
theorem addAlt_x (a b : Point) : (a.addAlt b).x = a.x + b.x := by
rfl
theorem addAlt_comm (a b : Point) : addAlt a b = addAlt b a := by
rw [addAlt, addAlt]
-- the same proof still works, but the goal view here is harder to read
ext <;> dsimp
repeat' apply add_comm
数学构造通常涉及将捆绑的信息拆分开来,并以不同的方式重新组合。
这也是 Lean 和 Mathlib 提供如此多方法来高效处理这些操作的原因。
作为练习,请尝试证明 Point.add
是满足结合律的。
然后请定义点的标量乘法并证明其满足对加法的分配律。
protected theorem add_assoc (a b c : Point) : (a.add b).add c = a.add (b.add c) := by
sorry
def smul (r : ℝ) (a : Point) : Point :=
sorry
theorem smul_distrib (r : ℝ) (a b : Point) :
(smul r a).add (smul r b) = smul r (a.add b) := by
sorry
使用结构体是走向代数抽象的第一步。
我们目前还没有方法将 Point.add
与泛型的 +
符号联系起来,
抑或是将 Point.add_comm
、Point.add_assoc
与泛型的 add_comm
、add_assoc
定理联系起来。
这些任务属于结构体在代数层面的应用,我们将在下一节介绍如何具体实现。
对现在来说,我们只需要把结构体视作一种捆绑对象和信息的方法。
在结构体中我们不仅可以指定数据类型,还可以指定数据需要满足的约束。
在 Lean 中,后者被表示为类型 Prop
的字段。
例如,标准2-单纯形(standard 2-simplex)定义为满足 \(x ≥ 0\),\(y ≥ 0\),\(z ≥ 0\),\(x + y + z = 1\) 的点 \((x, y, z)\) 的集合。
如果你不熟悉这个概念,可以画个图,其实这个集合就是三维空间中以 \((1, 0, 0)\),\((0, 1, 0)\),和 \((0, 0, 1)\) 为顶点的等边三角形以及其内部。
在 Lean 中可以这样形式化表示:
structure StandardTwoSimplex where
x : ℝ
y : ℝ
z : ℝ
x_nonneg : 0 ≤ x
y_nonneg : 0 ≤ y
z_nonneg : 0 ≤ z
sum_eq : x + y + z = 1
请注意,最后四个字段用到的 x
,y
,z
指的就是前三个字段。
我们可以定义一个从2-单纯形到其自身的映射,该映射交换 x
和 y
:
def swapXy (a : StandardTwoSimplex) : StandardTwoSimplex
where
x := a.y
y := a.x
z := a.z
x_nonneg := a.y_nonneg
y_nonneg := a.x_nonneg
z_nonneg := a.z_nonneg
sum_eq := by rw [add_comm a.y a.x, a.sum_eq]
有趣的来了,我们可以计算单纯形内两个点的中点。
为此需要先在文件开头加上 noncomputable section
语句以便使用实数上的除法。
noncomputable section
def midpoint (a b : StandardTwoSimplex) : StandardTwoSimplex
where
x := (a.x + b.x) / 2
y := (a.y + b.y) / 2
z := (a.z + b.z) / 2
x_nonneg := div_nonneg (add_nonneg a.x_nonneg b.x_nonneg) (by norm_num)
y_nonneg := div_nonneg (add_nonneg a.y_nonneg b.y_nonneg) (by norm_num)
z_nonneg := div_nonneg (add_nonneg a.z_nonneg b.z_nonneg) (by norm_num)
sum_eq := by field_simp; linarith [a.sum_eq, b.sum_eq]
上面的代码中,对于 x_nonneg
,y_nonneg
,z_nonneg
,我们用简洁的证明项即可建立。
而对于 sum_eq
,我们选择使用 by
在策略模式下进行证明。
给定参数 \(\lambda\) 满足 \(0 \le \lambda \le 1\),
可定义标准2-单纯形中 \(a\) 和 \(b\) 两点的加权平均为 \(\lambda a + (1 - \lambda) b\)。
请尝试参考前面的 midpoint
形式化定义该加权平均。
def weightedAverage (lambda : Real) (lambda_nonneg : 0 ≤ lambda) (lambda_le : lambda ≤ 1)
(a b : StandardTwoSimplex) : StandardTwoSimplex :=
sorry
结构体还可以依赖于参数。
例如,可以将标准2-单纯形推广到任意维数 \(n\) 下的 \(n\) -单纯形。
目前阶段,你不需要对 Fin n
了解太多,只需知道它有 \(n\) 个元素,并且 Lean 直到如何在其上进行就和操作即可。
open BigOperators
structure StandardSimplex (n : ℕ) where
V : Fin n → ℝ
NonNeg : ∀ i : Fin n, 0 ≤ V i
sum_eq_one : (∑ i, V i) = 1
namespace StandardSimplex
def midpoint (n : ℕ) (a b : StandardSimplex n) : StandardSimplex n
where
V i := (a.V i + b.V i) / 2
NonNeg := by
intro i
apply div_nonneg
· linarith [a.NonNeg i, b.NonNeg i]
norm_num
sum_eq_one := by
simp [div_eq_mul_inv, ← Finset.sum_mul, Finset.sum_add_distrib,
a.sum_eq_one, b.sum_eq_one]
field_simp
end StandardSimplex
作为额外练习,请尝试定义一下标准 :math`n` -单纯形中两点的加权平均。
可以使用 Finset.sum_add_distrib
和 Finset.mul_sum
来实现相关的求和操作。
前面我们已经了解到结构体可以用来将数据和属性打包在一起。
有趣的是,结构体也可以用来打包脱离具体数据的抽象属性。
例如,下面的结构体 IsLinear
,将线性性(linearity)的两个主要组成部分打包在了一起。
structure IsLinear (f : ℝ → ℝ) where
is_additive : ∀ x y, f (x + y) = f x + f y
preserves_mul : ∀ x c, f (c * x) = c * f x
section
variable (f : ℝ → ℝ) (linf : IsLinear f)
#check linf.is_additive
#check linf.preserves_mul
end
值得一提的是,结构体并不是打包数据的唯一方法。
前面提到的 Point
数据结构也可以用泛型乘积类型来定义,而 IsLinear
可以使用简单的 and
来定义。
def Point'' :=
ℝ × ℝ × ℝ
def IsLinear' (f : ℝ → ℝ) :=
(∀ x y, f (x + y) = f x + f y) ∧ ∀ x c, f (c * x) = c * f x
泛型类型构造甚至可以替代包含依赖关系的结构体。
例如,子类型(subtype)构造结合了一段数据以及一条属性。
你可以将下面例子中的 PReal
看作正实数类型。
每个 x : PReal
都有两个分量:它的值,以及“它是正的”这一属性。
你可以分别用 x.val
与 x.property
来访问这两个分量,其中前者的类型为 ℝ
,后者是一个属性 0 < x.val
。
def PReal :=
{ y : ℝ // 0 < y }
section
variable (x : PReal)
#check x.val
#check x.property
#check x.1
#check x.2
end
我们也可以用子类型来定义前面的标准2-单纯形,以及任意维数 \(n\) 下的标准 \(n\) -单纯形。
def StandardTwoSimplex' :=
{ p : ℝ × ℝ × ℝ // 0 ≤ p.1 ∧ 0 ≤ p.2.1 ∧ 0 ≤ p.2.2 ∧ p.1 + p.2.1 + p.2.2 = 1 }
def StandardSimplex' (n : ℕ) :=
{ v : Fin n → ℝ // (∀ i : Fin n, 0 ≤ v i) ∧ (∑ i, v i) = 1 }
类似地,Sigma 类型是有序对的推广,其中第二个分量的类型依赖于第一个分量。
def StdSimplex := Σ n : ℕ, StandardSimplex n
section
variable (s : StdSimplex)
#check s.fst
#check s.snd
#check s.1
#check s.2
end
给定 s : StdSimplex
,它的第一个分量 s.fst
是个自然数,第二个分量是对应维数下标准单纯形中的一个元素 StandardSimplex s.fst
。
Sigma 类型与子类型的区别在于 Sigma 类型的第二个分量是数据而非属性。
尽管可以使用乘积类型、子类型、 Sigma 类型等替代结构体,但使用结构体其实有许多好处。 定义结构体可以抽象出底层的表达,且能为成员提供自定义的名称以供访问。 这使得证明更加健壮: 对于那些仅依赖结构体接口的证明,通常只需用新接口替换旧接口,就能让证明代码在结构体定义变更后依然有效。 此外,正如我们即将看到的,Lean 支持将各种结构体编织成一个丰富的、相互关联的层次体系,以管理它们之间的交互关系。
6.2. 代数结构
To clarify what we mean by the phrase algebraic structure, it will help to consider some examples.
A partially ordered set consists of a set \(P\) and a binary relation \(\le\) on \(P\) that is transitive and reflexive.
A group consists of a set \(G\) with an associative binary operation, an identity element \(1\), and a function \(g \mapsto g^{-1}\) that returns an inverse for each \(g\) in \(G\). A group is abelian or commutative if the operation is commutative.
A lattice is a partially ordered set with meets and joins.
A ring consists of an (additively written) abelian group \((R, +, 0, x \mapsto -x)\) together with an associative multiplication operation \(\cdot\) and an identity \(1\), such that multiplication distributes over addition. A ring is commutative if the multiplication is commutative.
An ordered ring \((R, +, 0, -, \cdot, 1, \le)\) consists of a ring together with a partial order on its elements, such that \(a \le b\) implies \(a + c \le b + c\) for every \(a\), \(b\), and \(c\) in \(R\), and \(0 \le a\) and \(0 \le b\) implies \(0 \le a b\) for every \(a\) and \(b\) in \(R\).
A metric space consists of a set \(X\) and a function \(d : X \times X \to \mathbb{R}\) such that the following hold:
\(d(x, y) \ge 0\) for every \(x\) and \(y\) in \(X\).
\(d(x, y) = 0\) if and only if \(x = y\).
\(d(x, y) = d(y, x)\) for every \(x\) and \(y\) in \(X\).
\(d(x, z) \le d(x, y) + d(y, z)\) for every \(x\), \(y\), and \(z\) in \(X\).
A topological space consists of a set \(X\) and a collection \(\mathcal T\) of subsets of \(X\), called the open subsets of \(X\), such that the following hold:
The empty set and \(X\) are open.
The intersection of two open sets is open.
An arbitrary union of open sets is open.
In each of these examples, the elements of the structure belong to a set, the carrier set, that sometimes stands proxy for the entire structure. For example, when we say “let \(G\) be a group” and then “let \(g \in G\),” we are using \(G\) to stand for both the structure and its carrier. Not every algebraic structure is associated with a single carrier set in this way. For example, a bipartite graph involves a relation between two sets, as does a Galois connection, A category also involves two sets of interest, commonly called the objects and the morphisms.
The examples indicate some of the things that a proof assistant has to do in order to support algebraic reasoning. First, it needs to recognize concrete instances of structures. The number systems \(\mathbb{Z}\), \(\mathbb{Q}\), and \(\mathbb{R}\) are all ordered rings, and we should be able to apply a generic theorem about ordered rings in any of these instances. Sometimes a concrete set may be an instance of a structure in more than one way. For example, in addition to the usual topology on \(\mathbb{R}\), which forms the basis for real analysis, we can also consider the discrete topology on \(\mathbb{R}\), in which every set is open.
Second, a proof assistant needs to support generic notation on structures.
In Lean, the notation *
is used for multiplication in all the usual number systems,
as well as for multiplication in generic groups and rings.
When we use an expression like f x * y
,
Lean has to use information about the types of f
, x
, and y
to determine which multiplication we have in mind.
Third, it needs to deal with the fact that structures can inherit definitions, theorems, and notation from other structures in various ways. Some structures extend others by adding more axioms. A commutative ring is still a ring, so any definition that makes sense in a ring also makes sense in a commutative ring, and any theorem that holds in a ring also holds in a commutative ring. Some structures extend others by adding more data. For example, the additive part of any ring is an additive group. The ring structure adds a multiplication and an identity, as well as axioms that govern them and relate them to the additive part. Sometimes we can define one structure in terms of another. Any metric space has a canonical topology associated with it, the metric space topology, and there are various topologies that can be associated with any linear ordering.
Finally, it is important to keep in mind that mathematics allows us to use functions and operations to define structures in the same way we use functions and operations to define numbers. Products and powers of groups are again groups. For every \(n\), the integers modulo \(n\) form a ring, and for every \(k > 0\), the \(k \times k\) matrices of polynomials with coefficients in that ring again form a ring. Thus we can calculate with structures just as easily as we can calculate with their elements. This means that algebraic structures lead dual lives in mathematics, as containers for collections of objects and as objects in their own right. A proof assistant has to accommodate this dual role.
When dealing with elements of a type that has an algebraic structure associated with it, a proof assistant needs to recognize the structure and find the relevant definitions, theorems, and notation. All this should sound like a lot of work, and it is. But Lean uses a small collection of fundamental mechanisms to carry out these tasks. The goal of this section is to explain these mechanisms and show you how to use them.
The first ingredient is almost too obvious to mention:
formally speaking, algebraic structures are structures in the sense
of Section 6.1.
An algebraic structure is a specification of a bundle of data satisfying
some axiomatic hypotheses, and we saw in Section 6.1 that
this is exactly what the structure
command is designed to accommodate.
It’s a marriage made in heaven!
Given a data type α
, we can define the group structure on α
as follows.
structure Group₁ (α : Type*) where
mul : α → α → α
one : α
inv : α → α
mul_assoc : ∀ x y z : α, mul (mul x y) z = mul x (mul y z)
mul_one : ∀ x : α, mul x one = x
one_mul : ∀ x : α, mul one x = x
inv_mul_cancel : ∀ x : α, mul (inv x) x = one
Notice that the type α
is a parameter in the definition of group₁
.
So you should think of an object struc : Group₁ α
as being
a group structure on α
.
We saw in Section 2.2
that the counterpart mul_inv_cancel
to inv_mul_cancel
follows from the other group axioms, so there is no need
to add it to the definition.
This definition of a group is similar to the definition of Group
in
Mathlib,
and we have chosen the name Group₁
to distinguish our version.
If you write #check Group
and ctrl-click on the definition,
you will see that the Mathlib version of Group
is defined to
extend another structure; we will explain how to do that later.
If you type #print Group
you will also see that the Mathlib
version of Group
has a number of extra fields.
For reasons we will explain later, sometimes it is useful to add
redundant information to a structure,
so that there are additional fields for objects and functions
that can be defined from the core
data. Don’t worry about that for now.
Rest assured that our simplified version Group₁
is
morally the same as the definition of a group that Mathlib uses.
It is sometimes useful to bundle
the type together with the structure, and Mathlib also
contains a definition of a GroupCat
structure that is equivalent to
the following:
structure Group₁Cat where
α : Type*
str : Group₁ α
The Mathlib version is found in Mathlib.Algebra.Category.GroupCat.Basic
,
and you can #check
it if you add this to the imports at the
beginning of the examples file.
For reasons that will become clearer below, it is more often
useful to keep the type α
separate from the structure Group α
.
We refer to the two objects together as a partially bundled structure,
since the representation combines most, but not all, of the components
into one structure. It is common in Mathlib
to use capital roman letters like G
for a type
when it is used as the carrier type for a group.
Let’s construct a group, which is to say, an element of the Group₁
type.
For any pair of types α
and β
, Mathlib defines the type Equiv α β
of equivalences between α
and β
.
Mathlib also defines the suggestive notation α ≃ β
for this type.
An element f : α ≃ β
is a bijection between α
and β
represented by four components:
a function f.toFun
from α
to β
,
the inverse function f.invFun
from β
to α
,
and two properties that specify these functions are indeed inverse
to one another.
variable (α β γ : Type*)
variable (f : α ≃ β) (g : β ≃ γ)
#check Equiv α β
#check (f.toFun : α → β)
#check (f.invFun : β → α)
#check (f.right_inv : ∀ x : β, f (f.invFun x) = x)
#check (f.left_inv : ∀ x : α, f.invFun (f x) = x)
#check (Equiv.refl α : α ≃ α)
#check (f.symm : β ≃ α)
#check (f.trans g : α ≃ γ)
Notice the creative naming of the last three constructions. We think of the
identity function Equiv.refl
, the inverse operation Equiv.symm
,
and the composition operation Equiv.trans
as explicit evidence
that the property of being in bijective correspondence is an equivalence relation.
Notice also that f.trans g
requires composing the forward functions
in reverse order. Mathlib has declared a coercion from Equiv α β
to the function type α → β
, so we can omit writing .toFun
and have Lean insert it for us.
example (x : α) : (f.trans g).toFun x = g.toFun (f.toFun x) :=
rfl
example (x : α) : (f.trans g) x = g (f x) :=
rfl
example : (f.trans g : α → γ) = g ∘ f :=
rfl
Mathlib also defines the type perm α
of equivalences between
α
and itself.
example (α : Type*) : Equiv.Perm α = (α ≃ α) :=
rfl
It should be clear that Equiv.Perm α
forms a group under composition
of equivalences. We orient things so that mul f g
is
equal to g.trans f
, whose forward function is f ∘ g
.
In other words, multiplication is what we ordinarily think of as
composition of the bijections. Here we define this group:
def permGroup {α : Type*} : Group₁ (Equiv.Perm α)
where
mul f g := Equiv.trans g f
one := Equiv.refl α
inv := Equiv.symm
mul_assoc f g h := (Equiv.trans_assoc _ _ _).symm
one_mul := Equiv.trans_refl
mul_one := Equiv.refl_trans
inv_mul_cancel := Equiv.self_trans_symm
In fact, Mathlib defines exactly this Group
structure on Equiv.Perm α
in the file GroupTheory.Perm.Basic
.
As always, you can hover over the theorems used in the definition of
permGroup
to see their statements,
and you can jump to their definitions in the original file to learn
more about how they are implemented.
In ordinary mathematics, we generally think of notation as
independent of structure.
For example, we can consider groups \((G_1, \cdot, 1, \cdot^{-1})\),
\((G_2, \circ, e, i(\cdot))\), and \((G_3, +, 0, -)\).
In the first case, we write the binary operation as \(\cdot\),
the identity at \(1\), and the inverse function as \(x \mapsto x^{-1}\).
In the second and third cases, we use the notational alternatives shown.
When we formalize the notion of a group in Lean, however,
the notation is more tightly linked to the structure.
In Lean, the components of any Group
are named
mul
, one
, and inv
,
and in a moment we will see how multiplicative notation is
set up to refer to them.
If we want to use additive notation, we instead use an isomorphic structure
AddGroup
(the structure underlying additive groups). Its components are named add
, zero
,
and neg
, and the associated notation is what you would expect it to be.
Recall the type Point
that we defined in Section 6.1,
and the addition function that we defined there.
These definitions are reproduced in the examples file that accompanies
this section.
As an exercise, define an AddGroup₁
structure that is similar
to the Group₁
structure we defined above, except that it uses the
additive naming scheme just described.
Define negation and a zero on the Point
data type,
and define the AddGroup₁
structure on Point
.
structure AddGroup₁ (α : Type*) where
(add : α → α → α)
-- fill in the rest
@[ext]
structure Point where
x : ℝ
y : ℝ
z : ℝ
namespace Point
def add (a b : Point) : Point :=
⟨a.x + b.x, a.y + b.y, a.z + b.z⟩
def neg (a : Point) : Point := sorry
def zero : Point := sorry
def addGroupPoint : AddGroup₁ Point := sorry
end Point
We are making progress. Now we know how to define algebraic structures in Lean, and we know how to define instances of those structures. But we also want to associate notation with structures so that we can use it with each instance. Moreover, we want to arrange it so that we can define an operation on a structure and use it with any particular instance, and we want to arrange it so that we can prove a theorem about a structure and use it with any instance.
In fact, Mathlib is already set up to use generic group notation,
definitions, and theorems for Equiv.Perm α
.
variable {α : Type*} (f g : Equiv.Perm α) (n : ℕ)
#check f * g
#check mul_assoc f g g⁻¹
-- group power, defined for any group
#check g ^ n
example : f * g * g⁻¹ = f := by rw [mul_assoc, mul_inv_cancel, mul_one]
example : f * g * g⁻¹ = f :=
mul_inv_cancel_right f g
example {α : Type*} (f g : Equiv.Perm α) : g.symm.trans (g.trans f) = f :=
mul_inv_cancel_right f g
You can check that this is not the case for the additive group structure
on Point
that we asked you to define above.
Our task now is to understand that magic that goes on under the hood
in order to make the examples for Equiv.Perm α
work the way they do.
The issue is that Lean needs to be able to find the relevant
notation and the implicit group structure,
using the information that is found in the expressions that we type.
Similarly, when we write x + y
with expressions x
and y
that have type ℝ
, Lean needs to interpret the +
symbol as the relevant addition function on the reals.
It also has to recognize the type ℝ
as an instance of a commutative ring,
so that all the definitions and theorems for a commutative ring are available.
For another example,
continuity is defined in Lean relative to any two topological spaces.
When we have f : ℝ → ℂ
and we write Continuous f
, Lean has to find the
relevant topologies on ℝ
and ℂ
.
The magic is achieved with a combination of three things.
Logic. A definition that should be interpreted in any group takes, as arguments, the type of the group and the group structure as arguments. Similarly, a theorem about the elements of an arbitrary group begins with universal quantifiers over the type of the group and the group structure.
Implicit arguments. The arguments for the type and the structure are generally left implicit, so that we do not have to write them or see them in the Lean information window. Lean fills the information in for us silently.
Type class inference. Also known as class inference, this is a simple but powerful mechanism that enables us to register information for Lean to use later on. When Lean is called on to fill in implicit arguments to a definition, theorem, or piece of notation, it can make use of information that has been registered.
Whereas an annotation (grp : Group G)
tells Lean that it should
expect to be given that argument explicitly and the annotation
{grp : Group G}
tells Lean that it should try to figure it out
from contextual cues in the expression,
the annotation [grp : Group G]
tells Lean that the corresponding
argument should be synthesized using type class inference.
Since the whole point to the use of such arguments is that
we generally do not need to refer to them explicitly,
Lean allows us to write [Group G]
and leave the name anonymous.
You have probably already noticed that Lean chooses names like _inst_1
automatically.
When we use the anonymous square-bracket annotation with the variables
command,
then as long as the variables are still in scope,
Lean automatically adds the argument [Group G]
to any definition or
theorem that mentions G
.
How do we register the information that Lean needs to use to carry
out the search?
Returning to our group example, we need only make two changes.
First, instead of using the structure
command to define the group structure,
we use the keyword class
to indicate that it is a candidate
for class inference.
Second, instead of defining particular instances with def
,
we use the keyword instance
to register the particular instance with
Lean. As with the names of class variables, we are allowed to leave the
name of an instance definition anonymous,
since in general we intend Lean to find it and put it to use
without troubling us with the details.
class Group₂ (α : Type*) where
mul : α → α → α
one : α
inv : α → α
mul_assoc : ∀ x y z : α, mul (mul x y) z = mul x (mul y z)
mul_one : ∀ x : α, mul x one = x
one_mul : ∀ x : α, mul one x = x
inv_mul_cancel : ∀ x : α, mul (inv x) x = one
instance {α : Type*} : Group₂ (Equiv.Perm α) where
mul f g := Equiv.trans g f
one := Equiv.refl α
inv := Equiv.symm
mul_assoc f g h := (Equiv.trans_assoc _ _ _).symm
one_mul := Equiv.trans_refl
mul_one := Equiv.refl_trans
inv_mul_cancel := Equiv.self_trans_symm
The following illustrates their use.
#check Group₂.mul
def mySquare {α : Type*} [Group₂ α] (x : α) :=
Group₂.mul x x
#check mySquare
section
variable {β : Type*} (f g : Equiv.Perm β)
example : Group₂.mul f g = g.trans f :=
rfl
example : mySquare f = f.trans f :=
rfl
end
The #check
command shows that Group₂.mul
has an implicit argument
[Group₂ α]
that we expect to be found by class inference,
where α
is the type of the arguments to Group₂.mul
.
In other words, {α : Type*}
is the implicit argument for the type
of the group elements and [Group₂ α]
is the implicit argument for the
group structure on α
.
Similarly, when we define a generic squaring function my_square
for Group₂
, we use an implicit argument {α : Type*}
for
the type of the elements and an implicit argument [Group₂ α]
for
the Group₂
structure.
In the first example,
when we write Group₂.mul f g
, the type of f
and g
tells Lean that in the argument α
to Group₂.mul
has to be instantiated to Equiv.Perm β
.
That means that Lean has to find an element of Group₂ (Equiv.Perm β)
.
The previous instance
declaration tells Lean exactly how to do that.
Problem solved!
This simple mechanism for registering information so that Lean can find it
when it needs it is remarkably useful.
Here is one way it comes up.
In Lean’s foundation, a data type α
may be empty.
In a number of applications, however, it is useful to know that a
type has at least one element.
For example, the function List.headI
, which returns the first
element of a list, can return the default value when the list is empty.
To make that work, the Lean library defines a class Inhabited α
,
which does nothing more than store a default value.
We can show that the Point
type is an instance:
instance : Inhabited Point where default := ⟨0, 0, 0⟩
#check (default : Point)
example : ([] : List Point).headI = default :=
rfl
The class inference mechanism is also used for generic notation.
The expression x + y
is an abbreviation for Add.add x y
where—you guessed it—Add α
is a class that stores
a binary function on α
.
Writing x + y
tells Lean to find a registered instance of [Add.add α]
and use the corresponding function.
Below, we register the addition function for Point
.
instance : Add Point where add := Point.add
section
variable (x y : Point)
#check x + y
example : x + y = Point.add x y :=
rfl
end
In this way, we can assign the notation +
to binary operations on other
types as well.
But we can do even better. We have seen that *
can be used in any
group, +
can be used in any additive group, and both can be used in
any ring.
When we define a new instance of a ring in Lean,
we don’t have to define +
and *
for that instance,
because Lean knows that these are defined for every ring.
We can use this method to specify notation for our Group₂
class:
instance hasMulGroup₂ {α : Type*} [Group₂ α] : Mul α :=
⟨Group₂.mul⟩
instance hasOneGroup₂ {α : Type*} [Group₂ α] : One α :=
⟨Group₂.one⟩
instance hasInvGroup₂ {α : Type*} [Group₂ α] : Inv α :=
⟨Group₂.inv⟩
section
variable {α : Type*} (f g : Equiv.Perm α)
#check f * 1 * g⁻¹
def foo : f * 1 * g⁻¹ = g.symm.trans ((Equiv.refl α).trans f) :=
rfl
end
In this case, we have to supply names for the instances, because
Lean has a hard time coming up with good defaults.
What makes this approach work is that Lean carries out a recursive search.
According to the instances we have declared, Lean can find an instance of
Mul (Equiv.Perm α)
by finding an
instance of Group₂ (Equiv.Perm α)
, and it can find an instance of
Group₂ (Equiv.Perm α)
because we have provided one.
Lean is capable of finding these two facts and chaining them together.
The example we have just given is dangerous, because Lean’s
library also has an instance of Group (Equiv.Perm α)
, and
multiplication is defined on any group.
So it is ambiguous as to which instance is found.
In fact, Lean favors more recent declarations unless you explicitly
specify a different priority.
Also, there is another way to tell Lean that one structure is an
instance of another, using the extends
keyword.
This is how Mathlib specifies that, for example,
every commutative ring is a ring.
You can find more information in Section 7 and in a
section on class inference in Theorem Proving in Lean.
In general, it is a bad idea to specify a value of
*
for an instance of an algebraic structure that already has
the notation defined.
Redefining the notion of Group
in Lean is an artificial example.
In this case, however, both interpretations of the group notation unfold to
Equiv.trans
, Equiv.refl
, and Equiv.symm
, in the same way.
As a similarly artificial exercise,
define a class AddGroup₂
in analogy to Group₂
.
Define the usual notation for addition, negation, and zero
on any AddGroup₂
using the classes Add
, Neg
, and Zero
.
Then show Point
is an instance of AddGroup₂
.
Try it out and make sure that the additive group notation works for
elements of Point
.
class AddGroup₂ (α : Type*) where
add : α → α → α
-- fill in the rest
It is not a big problem that we have already declared instances
Add
, Neg
, and Zero
for Point
above.
Once again, the two ways of synthesizing the notation should come up
with the same answer.
Class inference is subtle, and you have to be careful when using it, because it configures automation that invisibly governs the interpretation of the expressions we type. When used wisely, however, class inference is a powerful tool. It is what makes algebraic reasoning possible in Lean.
6.3. Building the Gaussian Integers
We will now illustrate the use of the algebraic hierarchy in Lean by building an important mathematical object, the Gaussian integers, and showing that it is a Euclidean domain. In other words, according to the terminology we have been using, we will define the Gaussian integers and show that they are an instance of the Euclidean domain structure.
In ordinary mathematical terms, the set of Gaussian integers \(\Bbb{Z}[i]\) is the set of complex numbers \(\{ a + b i \mid a, b \in \Bbb{Z}\}\). But rather than define them as a subset of the complex numbers, our goal here is to define them as a data type in their own right. We do this by representing a Gaussian integer as a pair of integers, which we think of as the real and imaginary parts.
@[ext]
structure GaussInt where
re : ℤ
im : ℤ
We first show that the Gaussian integers have the structure of a ring,
with 0
defined to be ⟨0, 0⟩
, 1
defined to be ⟨1, 0⟩
, and
addition defined pointwise. To work out the definition of multiplication,
remember that we want the element \(i\), represented by ⟨0, 1⟩
, to
be a square root of \(-1\). Thus we want
This explains the definition of Mul
below.
instance : Zero GaussInt :=
⟨⟨0, 0⟩⟩
instance : One GaussInt :=
⟨⟨1, 0⟩⟩
instance : Add GaussInt :=
⟨fun x y ↦ ⟨x.re + y.re, x.im + y.im⟩⟩
instance : Neg GaussInt :=
⟨fun x ↦ ⟨-x.re, -x.im⟩⟩
instance : Mul GaussInt :=
⟨fun x y ↦ ⟨x.re * y.re - x.im * y.im, x.re * y.im + x.im * y.re⟩⟩
As noted in Section 6.1, it is a good idea to put all the definitions
related to a data type in a namespace with the same name. Thus in the Lean
files associated with this chapter, these definitions are made in the
GaussInt
namespace.
Notice that here we are defining the interpretations of the notation 0
,
1
, +
, -
, and *
directly, rather than naming them
GaussInt.zero
and the like and assigning the notation to those.
It is often useful to have an explicit name for the definitions, for example,
to use with simp
and rewrite
.
theorem zero_def : (0 : GaussInt) = ⟨0, 0⟩ :=
rfl
theorem one_def : (1 : GaussInt) = ⟨1, 0⟩ :=
rfl
theorem add_def (x y : GaussInt) : x + y = ⟨x.re + y.re, x.im + y.im⟩ :=
rfl
theorem neg_def (x : GaussInt) : -x = ⟨-x.re, -x.im⟩ :=
rfl
theorem mul_def (x y : GaussInt) :
x * y = ⟨x.re * y.re - x.im * y.im, x.re * y.im + x.im * y.re⟩ :=
rfl
It is also useful to name the rules that compute the real and imaginary parts, and to declare them to the simplifier.
@[simp]
theorem zero_re : (0 : GaussInt).re = 0 :=
rfl
@[simp]
theorem zero_im : (0 : GaussInt).im = 0 :=
rfl
@[simp]
theorem one_re : (1 : GaussInt).re = 1 :=
rfl
@[simp]
theorem one_im : (1 : GaussInt).im = 0 :=
rfl
@[simp]
theorem add_re (x y : GaussInt) : (x + y).re = x.re + y.re :=
rfl
@[simp]
theorem add_im (x y : GaussInt) : (x + y).im = x.im + y.im :=
rfl
@[simp]
theorem neg_re (x : GaussInt) : (-x).re = -x.re :=
rfl
@[simp]
theorem neg_im (x : GaussInt) : (-x).im = -x.im :=
rfl
@[simp]
theorem mul_re (x y : GaussInt) : (x * y).re = x.re * y.re - x.im * y.im :=
rfl
@[simp]
theorem mul_im (x y : GaussInt) : (x * y).im = x.re * y.im + x.im * y.re :=
rfl
It is now surprisingly easy to show that the Gaussian integers are an instance
of a commutative ring. We are putting the structure concept to good use.
Each particular Gaussian integer is an instance of the GaussInt
structure,
whereas the type GaussInt
itself, together with the relevant operations, is an
instance of the CommRing
structure. The CommRing
structure, in turn,
extends the notational structures Zero
, One
, Add
,
Neg
, and Mul
.
If you type instance : CommRing GaussInt := _
, click on the light bulb
that appears in VS Code, and then ask Lean to fill in a skeleton for the
structure definition, you will see a scary number of entries.
Jumping to the definition of the structure, however, shows that many of the
fields have default definitions that Lean will fill in for you automatically.
The essential ones appear in the definition below.
A special case are nsmul
and zsmul
which should be ignored for now
and will be explained in the next chapter.
In each case, the relevant
identity is proved by unfolding definitions, using the ext
tactic
to reduce the identities to their real and imaginary components,
simplifying, and, if necessary, carrying out the relevant ring calculation in
the integers. Note that we could easily avoid repeating all this code, but
this is not the topic of the current discussion.
instance instCommRing : CommRing GaussInt where
zero := 0
one := 1
add := (· + ·)
neg x := -x
mul := (· * ·)
nsmul := nsmulRec
zsmul := zsmulRec
add_assoc := by
intros
ext <;> simp <;> ring
zero_add := by
intro
ext <;> simp
add_zero := by
intro
ext <;> simp
neg_add_cancel := by
intro
ext <;> simp
add_comm := by
intros
ext <;> simp <;> ring
mul_assoc := by
intros
ext <;> simp <;> ring
one_mul := by
intro
ext <;> simp
mul_one := by
intro
ext <;> simp
left_distrib := by
intros
ext <;> simp <;> ring
right_distrib := by
intros
ext <;> simp <;> ring
mul_comm := by
intros
ext <;> simp <;> ring
zero_mul := by
intros
ext <;> simp
mul_zero := by
intros
ext <;> simp
Lean’s library defines the class of nontrivial types to be types with at least two distinct elements. In the context of a ring, this is equivalent to saying that the zero is not equal to the one. Since some common theorems depend on that fact, we may as well establish it now.
instance : Nontrivial GaussInt := by
use 0, 1
rw [Ne, GaussInt.ext_iff]
simp
We will now show that the Gaussian integers have an important additional property. A Euclidean domain is a ring \(R\) equipped with a norm function \(N : R \to \mathbb{N}\) with the following two properties:
For every \(a\) and \(b \ne 0\) in \(R\), there are \(q\) and \(r\) in \(R\) such that \(a = bq + r\) and either \(r = 0\) or N(r) < N(b).
For every \(a\) and \(b \ne 0\), \(N(a) \le N(ab)\).
The ring of integers \(\Bbb{Z}\) with \(N(a) = |a|\) is an archetypal example of a Euclidean domain. In that case, we can take \(q\) to be the result of integer division of \(a\) by \(b\) and \(r\) to be the remainder. These functions are defined in Lean so that the satisfy the following:
example (a b : ℤ) : a = b * (a / b) + a % b :=
Eq.symm (Int.ediv_add_emod a b)
example (a b : ℤ) : b ≠ 0 → 0 ≤ a % b :=
Int.emod_nonneg a
example (a b : ℤ) : b ≠ 0 → a % b < |b| :=
Int.emod_lt a
In an arbitrary ring, an element \(a\) is said to be a unit if it divides \(1\). A nonzero element \(a\) is said to be irreducible if it cannot be written in the form \(a = bc\) where neither \(b\) nor \(c\) is a unit. In the integers, every irreducible element \(a\) is prime, which is to say, whenever \(a\) divides a product \(bc\), it divides either \(b\) or \(c\). But in other rings this property can fail. In the ring \(\Bbb{Z}[\sqrt{-5}]\), we have
and the elements \(2\), \(3\), \(1 + \sqrt{-5}\), and \(1 - \sqrt{-5}\) are all irreducible, but they are not prime. For example, \(2\) divides the product \((1 + \sqrt{-5})(1 - \sqrt{-5})\), but it does not divide either factor. In particular, we no longer have unique factorization: the number \(6\) can be factored into irreducible elements in more than one way.
In contrast, every Euclidean domain is a unique factorization domain, which
implies that every irreducible element is prime.
The axioms for a Euclidean domain imply that one can write any nonzero element
as a finite product of irreducible elements. They also imply that one can use
the Euclidean algorithm to find a greatest common divisor of any two
nonzero elements a
and b
, i.e.~an element that is divisible by any
other common divisor. This, in turn, implies that factorization
into irreducible elements is unique up to multiplication by units.
We now show that the Gaussian integers are a Euclidean domain with the norm defined by \(N(a + bi) = (a + bi)(a - bi) = a^2 + b^2\). The Gaussian integer \(a - bi\) is called the conjugate of \(a + bi\). It is not hard to check that for any complex numbers \(x\) and \(y\), we have \(N(xy) = N(x)N(y)\).
To see that this definition of the norm makes the Gaussian integers a Euclidean domain, only the first property is challenging. Suppose we want to write \(a + bi = (c + di) q + r\) for suitable \(q\) and \(r\). Treating \(a + bi\) and \(c + di\) are complex numbers, carry out the division
The real and imaginary parts might not be integers, but we can round them to the nearest integers \(u\) and \(v\). We can then express the right-hand side as \((u + vi) + (u' + v'i)\), where \(u' + v'i\) is the part left over. Note that we have \(|u'| \le 1/2\) and \(|v'| \le 1/2\), and hence
Multiplying through by \(c + di\), we have
Setting \(q = u + vi\) and \(r = (c + di) (u' + v'i)\), we have \(a + bi = (c + di) q + r\), and we only need to bound \(N(r)\):
The argument we just carried out requires viewing the Gaussian integers as a subset of the complex numbers. One option for formalizing it in Lean is therefore to embed the Gaussian integers in the complex numbers, embed the integers in the Gaussian integers, define the rounding function from the real numbers to the integers, and take great care to pass back and forth between these number systems appropriately. In fact, this is exactly the approach that is followed in Mathlib, where the Gaussian integers themselves are constructed as a special case of a ring of quadratic integers. See the file GaussianInt.lean.
Here we will instead carry out an argument that stays in the integers. This illustrates an choice one commonly faces when formalizing mathematics. Given an argument that requires concepts or machinery that is not already in the library, one has two choices: either formalizes the concepts or machinery needed, or adapt the argument to make use of concepts and machinery you already have. The first choice is generally a good investment of time when the results can be used in other contexts. Pragmatically speaking, however, sometimes seeking a more elementary proof is more efficient.
The usual quotient-remainder theorem for the integers says that for
every \(a\) and nonzero \(b\), there are \(q\) and \(r\)
such that \(a = b q + r\) and \(0 \le r < b\).
Here we will make use of the following variation, which says that there
are \(q'\) and \(r'\) such that
\(a = b q' + r'\) and \(|r'| \le b/2\).
You can check that if the value of \(r\) in the first statement
satisfies \(r \le b/2\), we can take \(q' = q\) and \(r' = r\),
and otherwise we can take \(q' = q + 1\) and \(r' = r - b\).
We are grateful to Heather Macbeth for suggesting the following more
elegant approach, which avoids definition by cases.
We simply add b / 2
to a
before dividing and then subtract it
from the remainder.
def div' (a b : ℤ) :=
(a + b / 2) / b
def mod' (a b : ℤ) :=
(a + b / 2) % b - b / 2
theorem div'_add_mod' (a b : ℤ) : b * div' a b + mod' a b = a := by
rw [div', mod']
linarith [Int.ediv_add_emod (a + b / 2) b]
theorem abs_mod'_le (a b : ℤ) (h : 0 < b) : |mod' a b| ≤ b / 2 := by
rw [mod', abs_le]
constructor
· linarith [Int.emod_nonneg (a + b / 2) h.ne']
have := Int.emod_lt_of_pos (a + b / 2) h
have := Int.ediv_add_emod b 2
have := Int.emod_lt_of_pos b zero_lt_two
revert this; intro this -- FIXME, this should not be needed
linarith
Note the use of our old friend, linarith
. We will also need to express
mod'
in terms of div'
.
theorem mod'_eq (a b : ℤ) : mod' a b = a - b * div' a b := by linarith [div'_add_mod' a b]
We will use the fact that \(x^2 + y^2\) is equal to zero if and only if \(x\) and \(y\) are both zero. As an exercise, we ask you to prove that this holds in any ordered ring.
theorem sq_add_sq_eq_zero {α : Type*} [LinearOrderedRing α] (x y : α) :
x ^ 2 + y ^ 2 = 0 ↔ x = 0 ∧ y = 0 := by
sorry
We will put all the remaining definitions and theorems in this section
in the GaussInt
namespace.
First, we define the norm
function and ask you to establish
some of its properties.
The proofs are all short.
def norm (x : GaussInt) :=
x.re ^ 2 + x.im ^ 2
@[simp]
theorem norm_nonneg (x : GaussInt) : 0 ≤ norm x := by
sorry
theorem norm_eq_zero (x : GaussInt) : norm x = 0 ↔ x = 0 := by
sorry
theorem norm_pos (x : GaussInt) : 0 < norm x ↔ x ≠ 0 := by
sorry
theorem norm_mul (x y : GaussInt) : norm (x * y) = norm x * norm y := by
sorry
Next we define the conjugate function:
def conj (x : GaussInt) : GaussInt :=
⟨x.re, -x.im⟩
@[simp]
theorem conj_re (x : GaussInt) : (conj x).re = x.re :=
rfl
@[simp]
theorem conj_im (x : GaussInt) : (conj x).im = -x.im :=
rfl
theorem norm_conj (x : GaussInt) : norm (conj x) = norm x := by simp [norm]
Finally, we define division for the Gaussian integers
with the notation x / y
, that rounds the complex quotient to the nearest
Gaussian integer. We use our bespoke Int.div'
for that purpose.
As we calculated above, if x
is \(a + bi\) and y
is \(c + di\),
then the real and imaginary parts of x / y
are the nearest integers to
respectively. Here the numerators are the real and imaginary parts of \((a + bi) (c - di)\), and the denominators are both equal to the norm of \(c + di\).
instance : Div GaussInt :=
⟨fun x y ↦ ⟨Int.div' (x * conj y).re (norm y), Int.div' (x * conj y).im (norm y)⟩⟩
Having defined x / y
, We define x % y
to be the remainder,
x - (x / y) * y
. As above, we record the definitions in the
theorems div_def
and
mod_def
so that we can use them with simp
and rewrite
.
instance : Mod GaussInt :=
⟨fun x y ↦ x - y * (x / y)⟩
theorem div_def (x y : GaussInt) :
x / y = ⟨Int.div' (x * conj y).re (norm y), Int.div' (x * conj y).im (norm y)⟩ :=
rfl
theorem mod_def (x y : GaussInt) : x % y = x - y * (x / y) :=
rfl
These definitions immediately yield x = y * (x / y) + x % y
for every
x
and y
, so all we need to do is show that the norm of x % y
is
less than the norm of y
when y
is not zero.
We just defined the real and imaginary parts of x / y
to be
div' (x * conj y).re (norm y)
and div' (x * conj y).im (norm y)
,
respectively.
Calculating, we have
(x % y) * conj y = (x - x / y * y) * conj y = x * conj y - x / y * (y * conj y)
The real and imaginary parts of the right-hand side are exactly mod' (x * conj y).re (norm y)
and mod' (x * conj y).im (norm y)
.
By the properties of div'
and mod'
,
these are guaranteed to be less than or equal to norm y / 2
.
So we have
norm ((x % y) * conj y) ≤ (norm y / 2)^2 + (norm y / 2)^2 ≤ (norm y / 2) * norm y
.
On the other hand, we have
norm ((x % y) * conj y) = norm (x % y) * norm (conj y) = norm (x % y) * norm y
.
Dividing through by norm y
we have norm (x % y) ≤ (norm y) / 2 < norm y
,
as required.
This messy calculation is carried out in the next proof. We encourage you to step through the details and see if you can find a nicer argument.
theorem norm_mod_lt (x : GaussInt) {y : GaussInt} (hy : y ≠ 0) :
(x % y).norm < y.norm := by
have norm_y_pos : 0 < norm y := by rwa [norm_pos]
have H1 : x % y * conj y = ⟨Int.mod' (x * conj y).re (norm y), Int.mod' (x * conj y).im (norm y)⟩
· ext <;> simp [Int.mod'_eq, mod_def, div_def, norm] <;> ring
have H2 : norm (x % y) * norm y ≤ norm y / 2 * norm y
· calc
norm (x % y) * norm y = norm (x % y * conj y) := by simp only [norm_mul, norm_conj]
_ = |Int.mod' (x.re * y.re + x.im * y.im) (norm y)| ^ 2
+ |Int.mod' (-(x.re * y.im) + x.im * y.re) (norm y)| ^ 2 := by simp [H1, norm, sq_abs]
_ ≤ (y.norm / 2) ^ 2 + (y.norm / 2) ^ 2 := by gcongr <;> apply Int.abs_mod'_le _ _ norm_y_pos
_ = norm y / 2 * (norm y / 2 * 2) := by ring
_ ≤ norm y / 2 * norm y := by gcongr; apply Int.ediv_mul_le; norm_num
calc norm (x % y) ≤ norm y / 2 := le_of_mul_le_mul_right H2 norm_y_pos
_ < norm y := by
apply Int.ediv_lt_of_lt_mul
· norm_num
· linarith
We are in the home stretch. Our norm
function maps Gaussian integers to
nonnegative integers. We need a function that maps Gaussian integers to natural
numbers, and we obtain that by composing norm
with the function
Int.natAbs
, which maps integers to the natural numbers.
The first of the next two lemmas establishes that mapping the norm to the
natural numbers and back to the integers does not change the value.
The second one re-expresses the fact that the norm is decreasing.
theorem coe_natAbs_norm (x : GaussInt) : (x.norm.natAbs : ℤ) = x.norm :=
Int.natAbs_of_nonneg (norm_nonneg _)
theorem natAbs_norm_mod_lt (x y : GaussInt) (hy : y ≠ 0) :
(x % y).norm.natAbs < y.norm.natAbs := by
apply Int.ofNat_lt.1
simp only [Int.natCast_natAbs, abs_of_nonneg, norm_nonneg]
apply norm_mod_lt x hy
We also need to establish the second key property of the norm function on a Euclidean domain.
theorem not_norm_mul_left_lt_norm (x : GaussInt) {y : GaussInt} (hy : y ≠ 0) :
¬(norm (x * y)).natAbs < (norm x).natAbs := by
apply not_lt_of_ge
rw [norm_mul, Int.natAbs_mul]
apply le_mul_of_one_le_right (Nat.zero_le _)
apply Int.ofNat_le.1
rw [coe_natAbs_norm]
exact Int.add_one_le_of_lt ((norm_pos _).mpr hy)
We can now put it together to show that the Gaussian integers are an
instance of a Euclidean domain. We use the quotient and remainder function we
have defined.
The Mathlib definition of a Euclidean domain is more general than the one
above in that it allows us to show that remainder decreases with respect
to any well-founded measure.
Comparing the values of a norm function that returns natural numbers is
just one instance of such a measure,
and in that case, the required properties are the theorems
natAbs_norm_mod_lt
and not_norm_mul_left_lt_norm
.
instance : EuclideanDomain GaussInt :=
{ GaussInt.instCommRing with
quotient := (· / ·)
remainder := (· % ·)
quotient_mul_add_remainder_eq :=
fun x y ↦ by simp only; rw [mod_def, add_comm] ; ring
quotient_zero := fun x ↦ by
simp [div_def, norm, Int.div']
rfl
r := (measure (Int.natAbs ∘ norm)).1
r_wellFounded := (measure (Int.natAbs ∘ norm)).2
remainder_lt := natAbs_norm_mod_lt
mul_left_not_lt := not_norm_mul_left_lt_norm }
An immediate payoff is that we now know that, in the Gaussian integers, the notions of being prime and being irreducible coincide.
example (x : GaussInt) : Irreducible x ↔ Prime x :=
irreducible_iff_prime