Finding and proving identities
In this and the next post I document my notes from reading the book A=B, which is freely available on the internet.
As a secondary school student I often enjoyed proving various identities about the binomial coefficients, for example Vandermonde's identity ${\displaystyle {m+n \choose r}=\sum _{k=0}^{r}{m \choose k}{n \choose r-k}}$, because they admit various proofs from different areas of mathematics. For instance, one can prove Vandermonde's identity by a double counting argument or by directly manipulating binomial expansions. More sophisticated identities might require a mixture of insights from various fields to be proved cleanly. Sometimes, discovering new identities and trying to interpret them gives rise to new areas of mathematics. At some point, however, I slowly lost interest in showing the identities because it was both cumbersome and unilluminating: I no longer understood why what I was proving was interesting and how to gain insight from the proofs (which is related to the proofs I was able to come up with being suboptimal in many respects). It is therefore with great pleasure that I learned that there exists a much more systematic (and not too difficult) way of coming up with nice proofs of many such identities with a computer, which I hope to discuss in this and the next post.
First we need to introduce some notation.
A geometric series $\sum_k t_k$ is one in which the ratio of every two consecutive terms is constant, i.e., $t_{k+1}/t_k$ is a constant function of the summation index $k$. The $k^{th}$ term of a geometric series is of the form $c x^k$ where $c$ and $x$ are constants, i.e., are independent of the summation index $k$. Therefore a general geometric series looks like
\[\sum_k c x^k.\]
A hypergeometric series $\sum_k t_k$ is one in which $t_0 = 1$ and the ratio of two consecutive terms is a rational function of the summation index $k$, i.e., in which
\[\frac{t_{k+1}}{t_k} = \frac{P(k)}{Q(k)},\]
where $P$ and $Q$ are polynomials in $k$. In this case we will call the terms hypergeometric terms. Examples of such hypergeometric terms are $t_k = x^k$, or $k!$, or $(2k+7)!/(k-3)!$, or $(k^2 -1)(3k +1)!/((k +3)!(2k +7))$.
The set of hypergeometric terms is closed under multiplication and reciprocation but not under addition. For example, while $n^2 + 1$ is a hypergeometric term, $2^n +1$ isn't, although it is a nice and well behaved expression.
It turns out that many of the sums we are interested in consist of hypergeometric terms, which gives rise to the following question. Consider a sum
\[s_n = \sum_{k=0}^{n-1} t_k, \]
where $t_k$ is a hypergeometric term that does not depend on $n$, i.e., the consecutive term ratio
\[r(k) = \frac{t_{k+1}}{t_k} \]
is a rational function of $k$. We would like to express $s_n$ in closed form.
We note that $s_n$ plays the role of an antiderivative here. Instead of its derivative being the integrand, its difference is the summand. That is, $s_{n+1} - s_n = t_n$. Hence, just as in the integration problem, we are led to inquire if, given a hypergeometric term $t_n$, there exists a hypergeometric term $z_n$, say, such that
\[z_{n+1} - z_n = t_n. \tag{1} \]
If we can somehow find such a function $z_n$ then we will indeed have expressed the sum $s_n$ in the simple form of a single hypergeometric term plus a constant. Conversely, any solution $z_n$ will have the form
\[z_n = z_{n-1} + t_{n-1} = z_{n-2} + t_{n-2} + t_{n-1} = \cdots = z_0 + \sum_{k=0}^{n-1} t_k = s_n + c,\]
where $c = z_0$ is a constant.
Gosper's algorithm
Gosper's algorithm answers the following question: Given a hypergeometric term $t_n$, is there a hypergeometric term $z_n$ as above?
If the answer is affirmative, then $s_n$ can be expressed as a hypergeometric term plus a constant, and the algorithm outputs such a term. In this case we will say that $t_n$ is Gosper summable. On the other hand, if Gosper's algorithm returns a negative answer, then we will see that proves that (1) has no hypergeometric solution.
If $z_n$ is a hypergeometric term that satisfies (1) then the ratio
\[\frac{z_n}{t_n} = \frac{z_n}{z_{n+1} - z_n} = \frac{1}{\frac{z_{n+1}}{z_n} - 1}\]
is a rational function of $n$. So let
\[z_n = y(n)t_n, \tag{2}\]
where $y(n)$ is an (as yet unknown) rational function of $n$. Substituting $y(n)t_n$ for $z_n$ in (1) reveals that $y(n)$ satisfies
\[r(n)y(n+1) - y(n) = 1, \tag{3}\]
where $r(k) = \frac{t_{k+1}}{t_k}$ is as above. This is a first-order linear recurrence relation with rational coefficients and constant right hand side. Thus we have reduced the problem of finding hypergeometric solutions of (1) to the problem of finding rational solutions of (3).
It turns out that for every rational function a factorization of the form
\[r(n) = \frac{a(n)}{b(n)}\frac{c(n+1)}{c(n)},\] exists,
where $a(n)$, $b(n)$, $c(n)$ are polynomials in $n$, and
\[\gcd(a(n), b(n+h)) = 1, \quad \text{for all nonnegative integers } h. \tag{4}\]
Moreover, there exists an algorithm to find it, but I don't particularly want to dwell on this. The interested reader is invited to consult A=B.
We look for a nonzero rational solution of (3) in the form
\[y(n) = \frac{b(n-1)x(n)}{c(n)}, \tag{5}\]
where $x(n)$ is an unknown rational function of $n$. Substitution of the form of $r(n)$ and (5) into (3) shows that $x(n)$ satisfies
\[a(n)x(n+1) - b(n-1)x(n) = c(n). \tag{6}\]
And now, remarkably, lots of things simplify themselves.
Theorem 1: Let $a(n)$, $b(n)$, $c(n)$ be polynomials in $n$ such that equation (4) holds. If $x(n)$ is a rational function of $n$ satisfying (6), then $x(n)$ is a polynomial in $n$.
Proof: Let $x(n) = f(n)/g(n)$, where $f(n)$ and $g(n)$ are relatively prime polynomials in $n$. Then (6) can be rewritten as
\[a(n)f(n+1)g(n) - b(n-1)f(n)g(n+1) = c(n)g(n)g(n+1). \tag{7}\]
Suppose that the conclusion of the theorem is false. Then $g(n)$ is a non-constant polynomial. Let $N$ be the largest integer such that $\gcd(g(n), g(n+N))$ is a non-constant polynomial; note that $N \geq 0$. Let $u(n)$ be a non-constant irreducible common divisor of $g(n)$ and $g(n+N)$. Since $u(n-N)$ divides $g(n)$ it follows from (7) that
\[u(n - N) \mid b(n - 1)f(n)g(n+1).\]
Now $u(n-N)$ does not divide $f(n)$ since it divides $g(n)$, which is relatively prime to $f(n)$ by assumption. It also does not divide $g(n + 1)$, or else $u(n)$ would be a non-constant common factor of $g(n)$ and $g(n+N+1)$, contrary to our choice of $N$. Therefore $u(n - N) \mid b(n - 1)$ and hence $u(n+1) \mid b(n+N)$.
Similarly, it follows from (7) that
\[u(n + 1) \mid a(n)f(n+1)g(n).\]
Again, $u(n+1)$ does not divide $f(n+1)$ by assumption. It also does not divide $g(n)$, or else $u(n)$ would be a non-constant common factor of $g(n-1)$ and $g(n+N)$, contrary to our choice of $N$. Hence $u(n + 1) \mid a(n)$. But then, by the previous paragraph, $u(n + 1)$ is a non-constant common factor of $a(n)$ and $b(n+N)$, contrary to (4). This contradiction shows that $g(n)$ is constant, and so $x(n)$ is a polynomial in $n$. $\blacksquare$
Finding hypergeometric solutions of (1) is therefore equivalent to finding polynomial solutions of (6). The correspondence between them is that if $x(n)$ is a nonzero polynomial solution of (6) then
\[z_n = \frac{b(n-1)x(n)}{c(n)} t_n\]
is a hypergeometric solution of (1), and vice versa.
Again, there is an algorithm to find polynomial solutions, which the interested reader is invited to look up in A=B. This gives rise to the following, known as Gosper's algorithm.
Given a hypergeometric term $t_n$, do the following:
- Form the ratio $r(n)=\frac{t_{n+1}}{{t_n}}$ which is a rational function of $n$.
- Write $r(n)$ in the form of equation 4
- Find a nonzero polynomial solution x(n) of (6), if one exists, otherwise return $\sum_{i=0}^{n-1}t_k$ and stop.
- Return $\frac{b(n-1)x(n)}{c(n)}t_n$ and stop.
If we make it all the way through the algorithm the output is a hypergeometric term $z_n$ satisfying (1).
We now discuss how the algorithm failing is actually a proof of impossibility.
Question 2: Given a hypergeometric term $t_n$, how can we decide if the sum $s_n = \sum_n t_k$ is expressible as a linear combination of several (but a fixed number of) hypergeometric terms? For example, since $k!$ is not Gosper-summable we know that the sum $\sum_n k!$ cannot be expressed as a hypergeometric term plus a constant; but could it be equal to a sum of two, or three, or any fixed (independent of $n$) number of hypergeometric terms?
In considering sums of hypergeometric terms, an important role is played by the relation of similarity.
Definition: Two hypergeometric terms $s_n$ and $t_n$ are similar if their ratio is a rational function of $n$. In this case we write $s_n \sim t_n$.
Similarity is obviously an equivalence relation in the set of all hypergeometric terms. One equivalence class, for example, consists of all rational functions. It is a matter of routine algebra to show the following:
- If $s_n$ is a non-constant hypergeometric term then $s_{n+1} - s_n$ is a hypergeometric term similar to $s_n$.
- Let $s_n$ and $t_n$ be hypergeometric terms such that $s_n + t_n \neq 0$. Then $s_n + t_n$ is hypergeometric if and only if $s_n \sim t_n$.
- Let $t^{(1)}_n, t^{(2)}_n, \ldots, t^{(k)}_n$ be hypergeometric terms such that\[\sum_{i=1}^k t^{(i)}_n = 0. \] Then $t^{(i)}_n \sim t^{(j)}_n$ for some $i$ and $j$, $1 \leq i < j \leq k$.
- Every sum of a fixed number of hypergeometric terms can be written as a sum of pairwise dissimilar hypergeometric terms.
Let $t_n$ be a hypergeometric term and $r(n) = t_{n+1}/t_n$ its rational consecutive-term ratio. Let
\[r(n) = \frac{A(n)}{B(n)}\frac{C(n+1)}{C(n)} \quad \text{and} \quad \frac{B(n)}{A(n)} = \frac{a(n)}{b(n)}\frac{c(n +1)}{c(n)} \]
be the canonical factorizations of $r(n)$ and of $B(n)/A(n)$, respectively (which we asserted exist above). Then $t_n$ is a rational function of $n$ if and only if $A(n)$ is monic and $a(n) = b(n) = 1$.
Suppose that $\sum_{k=0}^{n-1} t_k = a^{(1)}_n + a^{(2)}_n + \cdots + a^{(m)}_n + C$ where $a^{(i)}_n$ are hypergeometric terms. By item 4 we can assume that these terms are pairwise dissimilar. Let $\Delta$ be the difference operator defined such that $\Delta a_n= a_n-a_{n-1}$. Then $t_n = \Delta a^{(1)}_n + \cdots + \Delta a^{(m)}_n$. The nonzero terms on the right are pairwise dissimilar. By item 3, there can be at most one nonzero term on the right. It follows that $m$ above is at most 2, and if it is 2 then one of $a^{(1)}_n$, $a^{(2)}_n$ must be a constant. So the answer to question 2 is as follows.
Theorem 3: If Gosper's algorithm does not succeed then the given sum cannot be expressed as a linear combination of a fixed number of hypergeometric terms (i.e., the sum is not expressible in closed form).
We can in fact decide the following:
Question 4: Given a linear combination $c_n$ of hypergeometric terms, how can we decide if the sum $s_n = \sum_n c_k$ is expressible in the same form, that is, as a linear combination of hypergeometric terms? Note that such a combination may be Gosper-summable even though its individual terms are not. For example, take $t_{k+1}-t_k$ where $t_k$ is a hypergeometric term which is not Gosper-summable.
Here is an algorithm: given hypergeometric terms $t^(1)_n, \dots t^(p)_n$, do the following:
- Write $\sum_{j=1}^p t_n^{(j)} = \sum_{j=1}^qu_n^{(j)}$ where the $u_n^{(j)}$ are pairwise dissimilar.
- For each $j =1 \dots q$ use Gosper's algorithm to find $s_n(j)$ such that $\Delta s_n^{(j)}=u_n^{(j)}$ . If Gosper's algorithm does not succeed then let $s_n^{(j)}=\sum_{k=0}^{n-1}u_k^{(j)}$
- Return $\sum_{j=1}^qs_n^{(j)}$ and stop.
The output will be the required discrete functions which will be hypergeometric terms if this is at all possible.
Example: Does the sum of the first $n+1$ factorials
\[S_n = \sum_{k=0}^n k!\]
have a closed form? Here $t_n = n!$ and $r(n) = t_{n+1}/t_n = n + 1$, so we can take $a(n) = n+1$, $b(n) = c(n) = 1$. The equation (6) is
\[(n + 1)x(n+1) - x(n) = 1,\]
Since $\deg a(n) \neq \deg b(n)$, the leading terms on the left of (6) do not cancel. Hence the degree of the left hand side of (6) is $d+max\{\text{deg }a(n), \text{deg }b(n)\}$. Since the degree of the right hand side is deg $c(n)$, it follows that $d =\text{deg }c(n)- max\{\text{deg }a(n), \text{deg }b(n)\}$, which is $\deg c(n) - \deg a(n) = -1$, so (6) has no nonzero polynomial solution in this case, proving that our sum cannot be written as a hypergeometric term plus a constant.
Wilf-Zeilberger algorithm
Gosper's algorithm gave us a way of summing finitely many things. Suppose instead we want to prove an identity $\sum_k F(n,k) = r(n)$.(but crucially here we know what the identity should be! To find what this $r(n)$ should be without knowing it before there are other algorithms that are exposed in A=B). When $F(n,k)$ is hypergeometric in both arguments, i.e. $\frac{F(n+1,k)}{F(n,k)}$ and $\frac{F(n,k+1)}{F(n,k)}$ are both rational functions in $n, k$, in many cases there is a very quick way of proving such identities known as the Wilf-Zeilberger algorithm, which we now describe.
First, if the right side, $r(n)$, is nonzero, divide through by that right hand side, and write the identity that is to be proved as
\[\sum_k \left(\frac{F(n,k)}{r(n)}\right) = 1.\]
That being done, we might as well just think of $F(n,k)/r(n)$ as having been the original summand. In other words, we can now assume, without loss of generality, that the identity that we're trying to prove is
\[\sum_k F(n,k) = \text{const}. \tag{8}\]
Let's call the left hand side $f(n)$. So we're trying to prove that $f(n) = \text{const}$ for all $n$. We will attempt to show that $f(n+1) - f(n) = 0$ for all $n$.
A good way to certify the fact that $f(n+1) - f(n) = 0$ for all $n$ would be to display a function $G(n,k)$ such that
\[F(n+1;k) - F(n,k) = G(n;k+1) - G(n,k), \tag{9}\]
for then we would simply sum (9) over all integers $k$ to find that, under suitable hypotheses, indeed $f(n+1) - f(n)$ is always $0$.
A pair of functions $(F,G)$ that satisfy (9) is called a WZ pair.
Begin with your summand $F(n,k)$ from (8). Now form the difference $D = F(n+1;k) - F(n,k)$, and collect and simplify it as much as possible.
This difference $D$ depends on $n$ and $k$ and the various parameters that may have been in the summand. Let's think of $n$, for the time being, as one of the parameters, so we won't explicitly name it as one of the variables that $D$ depends on. That means that $D$ is a function of $k$ alone, so call it $D(k)$.
Take $D(k)$ and give it to Gosper's algorithm. If possible, Gosper's algorithm will produce a function $g(k)$ such that $D(k)=g(k+1)-g(k)$, and in practice it often does, although when exactly it does seems to be a bit of a mystery. That function $g(k)$ will have the parameter $n$ in it, so let's rename $g(k)$ to $G(n,k)$. This $G(n,k)$ does exactly what we wanted it to do, namely it is the WZ-mate for $F$ in equation (9).
Furthermore, by equation (2), $G/F$ is a rational function.
The theorem which underpins this algorithm requires the following hypotheses
- [(F1)] For each integer $k$, the limit \[ f_k = \lim_{n\to\infty} F(n,k) \tag{10}\] exists and is finite.
- [(G1)] For each integer $n \geq 0$, we have \[\lim_{k\to\pm\infty} G(n,k) = 0.\]
- [(G2)] We have $\lim_{L\to\infty} \sum_n G(n;-L) = 0$.
Theorem 5: Let $(F,G)$ satisfy equation (9).
- If (G1) holds, then \[ \sum_k F(n,k) = \text{const} \quad (n=0,1,2,\ldots). \tag{11}\]
- If (F1) and (G2) both hold, then we have the companion identity \[\sum_n G(n,k) = \sum_{j\leq k-1} (f_j - F(0;j)),\] where $f$ is defined by (10).
Proof: We use the symbol $\Delta_n$ for the forward difference operator on $n$: $\Delta_n h(n) = h(n+1) - h(n)$. Sum both sides of equation (9) from $k = -L$ to $k = K$, getting
\[\Delta_n \left(\sum_{k=-L}^K F(n,k)\right) = \sum_{k=-L}^K \{\Delta_k G(n,k)\} = G(n,K+1) - G(n,-L).\]
Now let $K,L \to \infty$ and use (G1) to find that $\Delta_n \sum_k F(n,k) = 0$; i.e., that $\sum_k F(n,k)$ is independent of $n \geq 0$, which establishes (11).
If we sum both sides of (9) from $n=0$ to $N$, we get
\[F(N+1;k) - F(0;k) = \Delta_k \left(\sum_{n=0}^N G(n,k)\right).\]
Now let $N \to \infty$ and use (F1) to get
\[f_k - F(0;k) = \Delta_k \left(\sum_n G(n,k)\right).\]
Replace $k$ by $k'$, sum from $k'=-L$ to $k' = k-1$, let $L \to \infty$, and use (G2) to obtain the companion identity, completing the proof. $\blacksquare$
This gives a quick way of proving an identity $\sum_k f(n,k)=r(n)$ from its WZ certificate $R(n,k)$:
- If $r(n)$ isn't 0, set $F(n,k):=f(n,k)/r(n)$; otherwise set $F(n,k):=f(n,k)$. Also set $G(n,k):=R(n,k)F(n,k)$.
- Just check (9) and verify the identity for one value of $n$
Since the WZ equation is symmetric in $n,k$ the proof in fact shows that the companion identity is true, which gives us new identities for free. There are more ways of doing this. The most prominent is the dual identity (which is a true dual since the mapping from an identity to its companion is not an involution in general).
Suppose $F(n,k)$ is a summand of the form
\[F(n,k) = x^n y^k \rho(n,k) \frac{\prod_{i}(a_i n + b_i k + c_i)!}{\prod_{i}(u_i n + v_i k + w_i)!},\]
where $\rho$ is a rational function of $n,k$. Here's how to construct the dual identity.
Find any factor $(an+bk+c)!$ that appears, say, in the numerator of $F$. Remove that factor from the numerator of $F$, and place in the denominator of $F$ a factor $(-1-an-bk-c)!$. Then multiply $F$ by $(-1)^{an+bk}$. By doing this to $F$ we will have changed $F$ to a new function, say $\tilde{F}$. Perform exactly the same operation on the WZ mate, $G$, of $F$, getting $\tilde{G}$.
We claim that $\tilde{F}$, $\tilde{G}$ are still a WZ pair.
To see why, begin with the reflection formula for the gamma function,
\[\Gamma(z)\Gamma(1-z) = \frac{\pi}{\sin \pi z}.\]
Thus we have
\[(an+bk+c)! = -\frac{\pi}{\sin\left(\pi(an+bk+c)\right)(-1-an-bk-c)!}.\]
It follows that if we take some term $F$, and divide it by $(an+bk+c)!$, and then divide it by $(-1)^{an+bk}(-1-an-bk-c)!$, then we have in effect multiplied the term $F$ by the factor
\[\left\{(-1)^{an+bk+1}\frac{\sin\left((an+bk+c)\pi\right)}{\pi}\right\}.\]
But the remarkable thing about this latter factor is that it is a periodic function of $n$ and $k$ of period 1. Consequently, if we perform this operation
\[(an+bk+c)! \longrightarrow \frac{(-1)^{an+bk}}{(-an-bk-c-1)!} \]
on both $F$ and $G$, then the basic WZ equation
\[F(n+1,k) - F(n,k) = G(n,k+1) - G(n,k)\]
will still hold between the new functions since $F,G$ will have just been multiplied by functions of period 1.
For other methods, see A=B and the references within. Often the identities proved this way aren't interesting or aesthetic (to me at least), but what is most interesting is that although one identity might be a special case of another, the companions and duals might not be special cases of the corresponding companions and duals, which means that at least in principle one could generate new identities ad infinitum.
What's appealing about this is partly how clever it is, and partly how useful it is, and arguably partly because a computer can now produce the rational function $R(n,k)$ and prove identities. The authors of A=B say that this opens a new chapter in the theory of hypergeometric functions and their q-analogues, which is definitely true. They do muse upon whether the various fields of mathematics that have been generated by proving identities and wondering what to think about the proofs would exist if this method had been around earlier. I think that's interesting food for thought, but we now have this tool so we can only speculate how mathematics would be different without it. I think the main thing to bear in mind is that when using computers it is most important not to forget to think (sadly a mistake some people do make), and if we try to understand the proof certificates I hope that we will discover what we would've discovered anyway, but faster.
Comments
Post a Comment