Szemeredi's theorem

In this post I would like to discuss Szemeredi's theorem, a cornerstone of additive combinatorics, and some of the ideas that go into one of the many proofs. What follows is a discussion of just one possible approach, and the interested reader is encouraged to search for other proofs.

A subset $A$ of the natural numbers is said to have positive upper density if $\limsup \frac{|A \cap \{1, 2, \dots, n\}|}{n}>0$. Szemeredi's theorem asserts that in any subset $A$ of the natural numbers with positive upper density and for any positive integer $k$ there exists an arithmetic progression of length $k$. Equivalently, defining $r_k(N)$ to be the largest possible size of a subset of $\{1, 2, \dots, N\}$ without an AP of length $k$, that for any $k$, $r_k(N) = o(N)$.

This is a theorem that shows the large amount of structure underlying large subsets of the positive integers, which at least in hindsight should motivate the approach we present. We'll mostly be following Wikipedia because, rather unusually for mathematics, that had the cleanest and best exposition I could find about this topic.

Definitions 
  • Let $U, W$ be disjoint subsets of $V$, with $|V| = n$. The edge density $d(X,Y)$ of the pair $(X,Y)$ is defined to be proportion of possible edges between $U,W$ that actually exist, i.e. $\frac{|E(X,Y)|}{|X||Y|}$.
  • For $\epsilon>0$ a pair of vertex sets $X,Y$ is called $\epsilon -$regular if for all subsets $A \subset X, B \subset Y$ such that $|A|\geq \epsilon |X| $ and $|B| \geq \epsilon |Y|$ we have  $|d(X,Y)-d(A,B)| \leq \epsilon$. This means that any sufficiently large subsets approximate the edge density to $\epsilon$ precision.
  • A partition of $V$ into sets $V_1 \dots V_k$ is called an $\epsilon-$regular partition if $\sum_{(V_i,V_j) \text{not } \epsilon-\text{regular}} |V_i||V_j| \leq \epsilon |V(G)|^2$
  • The energy of the pair $(U,W)$ is defined as:
\[q(U, W) := \frac{|U||W|}{n^2} d(U,W)^2,\] where $d(U,W)$ is the edge density between $U$ and $W$. For partitions $\mathcal{P}_U = \{ U_1, \ldots, U_k \}$ of $U$ and $\mathcal{P}_W = \{ W_1, \ldots, W_l \}$ of $W$, define \[q(\mathcal{P}_U, \mathcal{P}_W) := \sum_{i=1}^k \sum_{j=1}^l q(U_i, W_j).\] Finally, for a partition $\mathcal{P} = \{ V_1, \ldots, V_k \}$ of $V$, define \[ q(\mathcal{P}) = \sum_{i=1}^k \sum_{j=1}^k q(V_i, V_j) = \sum_{i=1}^k \sum_{j=1}^k \frac{|V_i||V_j|}{n^2} d(V_i, V_j)^2. \]

Note that $q(\mathcal{P}) \leq 1$, since $d(V_i, V_j) \leq 1$. We can now state our first result:

Szemeredi's regularity lemma: For every $\varepsilon > 0$ and positive integer $m$, there exists an integer $M$ such that if $G$ is a graph with at least $M$ vertices, there exists an integer $k$ in the range $m \leq k \leq M$ and an $\varepsilon$-regular partition of the vertex set of $G$ into $k$ sets.


We will prove Szemeredi's regularity lemma by finding subsets that fail to be $\epsilon-$regular and just cutting them up. To show that the process terminates, we define a monovariant called energy and show that the energy has to increase every so often (take that physics!) by an amount that is bounded below, and sufficiently large energy implies regularity.


Lemma 1 For any partitions $\mathcal{P}_U$ and $\mathcal{P}_W$ of $U$ and $W$, we have
\[
q(\mathcal{P}_U, \mathcal{P}_W) \geq q(U,W).
\]


Proof Let $\mathcal{P}_U = \{ U_1, \ldots, U_k \}$ and $\mathcal{P}_W = \{ W_1, \ldots, W_l \}$. Choose $x \in U$ and $y \in W$ uniformly at random, where $x$ lies in $U_i$ and $y$ lies in $W_j$.

Define the random variable $Z = d(U_i, W_j)$. Then,
\[
\mathbb{E}[Z] = \sum_{i=1}^k \sum_{j=1}^l \frac{|U_i|}{|U|} \frac{|W_j|}{|W|} d(U_i, W_j) = \frac{e(U,W)}{|U||W|} = d(U,W),
\]
and
\[
\mathbb{E}[Z^2] = \sum_{i=1}^k \sum_{j=1}^l \frac{|U_i|}{|U|} \frac{|W_j|}{|W|} d(U_i, W_j)^2 = \frac{n^2}{|U||W|} q(\mathcal{P}_U, \mathcal{P}_W).
\]
By convexity, $\mathbb{E}[Z^2] \geq (\mathbb{E}[Z])^2$, so
\[ \frac{n^2}{|U||W|} q(\mathcal{P}_U, \mathcal{P}_W) \geq d(U,W)^2.\]
Multiplying both sides by $\frac{|U||W|}{n^2}$ yields $q(\mathcal{P}_U, \mathcal{P}_W) \geq q(U,W)$. $\blacksquare$

Energy boost lemma If $U_1 \subset U$ and $W_1 \subset W$ witness that $(U,W)$ is not $\varepsilon$-regular, then
\[
q\left( \{U_1, U \setminus U_1\}, \{W_1, W \setminus W_1\} \right) > q(U,W) + \varepsilon^4 \frac{|U||W|}{n^2}.
\]

Proof Define $Z$ as above. Then,
\[
\text{Var}(Z) = \mathbb{E}[Z^2] - (\mathbb{E}[Z])^2 = \frac{n^2}{|U||W|} \left( q\left( \{U_1, U \setminus U_1\}, \{W_1, W \setminus W_1\} \right) - q(U,W) \right).
\]
Observe that $|Z - \mathbb{E}[Z]| = |d(U_1, W_1) - d(U,W)|$ with probability $\frac{|U_1|}{|U|} \frac{|W_1|}{|W|}$.

Thus,
\[
\text{Var}(Z) \geq \frac{|U_1|}{|U|} \frac{|W_1|}{|W|} (d(U_1, W_1) - d(U,W))^2 > \varepsilon \cdot \varepsilon \cdot \varepsilon^2 = \varepsilon^4,
\]
yielding the desired result. $\blacksquare$

Energy increment lemma If a partition $\mathcal{P} = \{ V_1, \ldots, V_k \}$ of $V(G)$ is not $\varepsilon$-regular, then there exists a refinement $\mathcal{Q}$ of $\mathcal{P}$ where each $V_i$ is partitioned into at most $2^k$ parts such that
\[
q(\mathcal{Q}) \geq q(\mathcal{P}) + \varepsilon^5.
\]

Proof For each irregular pair $(V_i, V_j)$, find subsets $A^{i,j} \subset V_i$ and $A^{j,i} \subset V_j$ witnessing their irregularity. Let $\mathcal{Q}$ be the common refinement of $\mathcal{P}$ according to all such sets $A^{i,j}$. Each $V_i$ is partitioned into at most $2^k$ parts.

Then,
\[
q(\mathcal{Q}) = \sum_{(i,j) \in [k]^2} q(\mathcal{Q}_{V_i}, \mathcal{Q}_{V_j}),
\]
where $\mathcal{Q}_{V_i}$ denotes the partition of $V_i$ induced by $\mathcal{Q}$.

Let $\mathcal{Q}_{V_i}$ be the partition of $V_i$ given by $\mathcal{Q}$. By Lemma 1, the quantity
\[
\sum_{(V_i, V_j)\ \varepsilon\text{-regular}} q(V_i, V_j)
+ \sum_{(V_i, V_j)\ \text{not } \varepsilon\text{-regular}} q(\{A^{i,j}, V_i \setminus A^{i,j}\}, \{A^{j,i}, V_j \setminus A^{j,i}\})
\]
is a lower bound on $q(\mathcal{Q})$.

Since $V_i$ is cut by $A^{i,j}$ when creating $\mathcal{Q}$, we know that $\mathcal{Q}_{V_i}$ is a refinement of $\{A^{i,j}, V_i \setminus A^{i,j}\}$. By Lemma 2, the above sum is at least
\[
\sum_{(i,j)\in [k]^2} q(V_i, V_j)
+ \sum_{(V_i, V_j)\ \text{not } \varepsilon\text{-regular}} \varepsilon^4 \cdot \frac{|V_i||V_j|}{n^2}.
\]

But the second sum is at least $\varepsilon^5$ since $\mathcal{P}$ is not $\varepsilon$-regular, so we deduce the desired inequality.  $\blacksquare$

Starting from any partition, we can keep applying the Energy Increment Lemma as long as the resulting partition is not $\varepsilon$-regular. Since the energy increases by at least $\varepsilon^5$ in each step and is bounded above by 1, the process must terminate after at most $\varepsilon^{-5}$ steps, producing an $\varepsilon$-regular partition. This finishes the proof of the regularity lemma.

Two important applications are the following:

Graph counting lemma Let $H$ be a graph with $V(H) = [k]$, and let $\varepsilon > 0$. Let $G$ be an $n$-vertex graph with vertex sets $V_1, \dots, V_k \subseteq V(G)$ such that $(V_i, V_j)$ is $\varepsilon$-regular whenever $\{i,j\} \in E(H)$. Then the number of labeled copies of $H$ in $G$ is within $e(H)\varepsilon |V_1| \cdots |V_k|$ of
\[
\left(\prod_{\{i,j\} \in E(H)} d(V_i, V_j)\right)
\left(\prod_{i=1}^{k} |V_i|\right).
\]
 
Graph removal lemma Let $H$ be a graph with $h$ vertices. The graph removal lemma states that for any $\epsilon > 0$, there exists a constant $\delta = \delta(\epsilon, H) > 0$ such that for any $n$-vertex graph $G$ with fewer than $\delta n^h$ subgraphs isomorphic to $H$, it is possible to eliminate all copies of $H$ by removing at most $\epsilon n^2$ edges from $G$.

An alternative way to state this is: for any $n$-vertex graph $G$ with $o(n^h)$ subgraphs isomorphic to $H$, it is possible to eliminate all copies of $H$ by removing $o(n^2)$ edges from $G$.

The proof is somehow just get rid of the really bad stuff while keeping track of how much you've gotten rid of using the graph counting lemma.

To prove Szemeredi's theorem, we will need a strengthening of this to hypergraphs:
A directed hypergraph is a pair $(X,E)$, where $X$ is a set of elements called nodes, vertices, points, or elements and $E$ is a set of pairs of subsets of $X$. Each of these pairs $(D,C) \in E$ is called an edge or hyperedge; the vertex subset $D$ is known as its tail and $C$ as its head. What this really means is that a hyperedge can have any number of vertices, but edges in 'standard' graphs can only have two.

The hypergraph removal lemma states that when a hypergraph contains few copies of a given sub-hypergraph, then all of the copies can be eliminated by removing a small number of hyperedges. We defer the proof to Tao. A large difficulty that needs to be overcome is that it isn't clear what the correct notion of hypergraph regularity should be. Thankfully other people have figured that out, so we can sketch a proof of Szemeredi's theorem using this handwavy statement:

The high-level idea of the proof is that we construct a hypergraph from a subset without any length-$k$ arithmetic progression, then use the graph removal lemma to show that this hypergraph cannot have too many hyperedges, which in turn shows that the original subset cannot be too big.

Let $A \subset \{1,\dots,N\}$ be a subset that does not contain any length-$k$ arithmetic progression. Let $M = k^2 N + 1$ be a large enough integer. We can think of $A$ as a subset of $\mathbb{Z}/M\mathbb{Z}$. Clearly, if $A$ doesn't have a length-$k$ arithmetic progression in $\mathbb{Z}$, it also doesn't have one in $\mathbb{Z}/M\mathbb{Z}$.

We will construct a $k$-partite $(k-1)$-uniform hypergraph $G$ from $A$, with parts $V_1, V_2, \dots, V_k$, all of which are $M$-element vertex sets indexed by $\mathbb{Z}/M\mathbb{Z}$.

For each $1 \leq i \leq k$, we add a hyperedge among vertices $(v_j \in V_j)_{j \in [k]\setminus\{i\}}$ if and only if
\[
\sum_{j \neq i} (j-i)v_j \in A.
\]

Let $H$ be the complete $k$-partite $(k-1)$-uniform hypergraph.

If $G$ contains an isomorphic copy of $H$ with vertices $v_1, \dots, v_k$, then
\[
\alpha_i = \sum_{j \neq i} (j-i)v_j \in A
\]
for any $1 \leq i \leq k$. However, note that $\alpha_i$ forms a length-$k$ arithmetic progression with common difference
\[
\alpha_{i+1} - \alpha_i = -\sum_j v_j.
\]
Since $A$ has no length-$k$ arithmetic progression, it must be the case that
\[
\alpha_1 = \alpha_2 = \cdots = \alpha_k,
\]
so
\[
\sum_j v_j = 0.
\]

Thus, for each hyperedge $(v_j \in V_j)_{j \in [k]\setminus\{i\}}$, we can find a unique copy of $H$ that this edge lies in by finding
\[
v_i = -\sum_{j \neq i} v_j.
\]

The number of copies of $H$ in $G$ equals
\[
\frac{1}{k}e(G) = O(N^{k-1}) = o(N^k).
\]
Therefore, by the hypergraph removal lemma, we can remove $o(N^{k-1})$ edges to eliminate all copies of $H$ in $G$. Since every hyperedge of $G$ is in a unique copy of $H$, to eliminate all copies of $H$ in $G$, we must remove at least $e(G)/k$ edges.

Thus,
\[
e(G) = o(N^{k-1}).
\]
The number of hyperedges in $G$ is
\[
kM^{k-2} |A| = o(N^{k-1}),
\]
which shows that $|A| = o(N)$. $\blacksquare$

I would like to end by mentioning that in fact one can push this to subsets of any group isomorphic to $\mathbb{Z}^k$ for any $k >1$, in particular $\mathbb{Z}[i]$, by using this clever trick.

Comments

Popular Posts