## Analytic View of Functional Dependency

### January 3, 2014

A recurring theme of this blog is that there is more to a relation than being merely a set of tuples. First, a relation is an object which obeys certain rules. The structure of this object is less important as long as it satisfies relational algebra axioms. Second, in previous post we have learned that relations are systems of constraints. The idea that a relation can be viewed as a set of constraints has been promoted in the 90s by Paris Kanellakis. Our intuition, however, is that by narrowing scope to equalities we can leverage powerful machinery of algebraic geometry.

Let’s examine the example that we ended up last time. The relation

```[x y z]
1 1 1
2 1 1
3 2 1
3 2 2
```

corresponds to the variety

$-6 + 11 x - 6 x^2 + x^3 = 0$
$-4 + 3 x - x^2 + 2 y = 0$
$3 - x - 3 z + x z = 0$
$2 - 3 z + z^2 = 0$

which is convenient to rewrite with some equations factored

$(x-3)(x-2)(x-1) = 0$
$-4 + 3 x - x^2 + 2 y = 0$
$(x-3)(z-1) = 0$
$(z-2)(z-1) = 0$

The first and the last equations are univariate. They assert that values x belong to the set {1,2,3}, and values z are restricted to the set {1,2}. The third equation asserts that either x=3 or z=1. What is the second constraint?

Before answering that question, though, let’s comment that these equation are akin to “columnar” constraints. This is refreshing but somewhat perverse perspective to what “column-store” database formally is.

Returning back to the second equation we see that y is limited in power to 1. Therefore, it can be rewritten as

$y = (-4 + 3 x - x^2)/2$

In other words, y is a function of x. This is just an example of general phenomenon: given a functional dependency `x->y`, one can always apply polynomial interpolation, thus explicitly expressing y as a function of x. That is right, in algebraic geometry view functional dependency holds whenever we can exhibit an analytic function.

Next, Heath’s theorem declares that given a relation Q such that `Q ^ [] = [x y z]`, and functional dependency `x->y`, then Q can be decomposed into join of projections:

`Q = (Q v [x y]) ^ (Q v [x z])`

What is Heath’s theorem in algebraic geometry terms?

First, let’s focus on join of two relations. If both relations have the same set attributes, then join is set intersection of tuples. Relation corresponds to finite variety, and we already know how to perform intersection of varieties: just combine both sets of defining equations. If two varieties have different sets of attributes, then we can expand them into larger space spanning the common set of attributes. This procedure doesn’t affect the set of defining equations. For example, if we consider a variety defined by a single equation

$(x-3)(x-2)(x-1) = 0$

in $R^1$, then it is also a variety in $R^2$ — there is simply no constraints onto the other variable. Likewise, the variety defined by the system

$(x-3)(x-2)(x-1) = 0$
$-4 + 3 x - x^2 + 2 y = 0$

is actually two varieties (at least): one defined in space $R^2$ of variables {x,y}, and the other in space $R^3$ of variables {x,y,z}. In a similar venue, the system

$(x-3)(x-2)(x-1) = 0$
$(x-3)(z-1) = 0$
$(z-2)(z-1) = 0$

defines one variety in {x,z}, and the other in {x,y,z}.

What about going the other way: can we take a variety in space of the two variables x and y and project it into x?

Unfortunately, in general, a projection of a variety is not a variety, as witnessed by the following counterexample:

$x^2 + y^2 - 1 = 0$

Elimination of variables is the proper way to project a variety. Fortunately, in our example, variety in space $R^2$

$(x-3)(x-2)(x-1) = 0$
$-4 + 3 x - x^2 + 2 y = 0$

is projection of variety defined by the same equations in space $R^3$ : the variable z being not present in the system makes it elimination trivial.

Let’s summarize what we have achieved so far. We have split 4 equations defining a variety in $R^3$ into two parts:

$(x-3)(x-2)(x-1) = 0$
$-4 + 3 x - x^2 + 2 y = 0$

and

$(x-3)(x-2)(x-1) = 0$
$(x-3)(z-1) = 0$
$(z-2)(z-1) = 0$

We can view both systems either in the same space $R^3$, or project the first one to {x,y}, and the second one into {x,z}. If we look onto this pair of varieties in $R^3$, then the result is geometrically intersection of varieties. If we consider them as residing in different $R^2$ spaces (i.e. {x,y} and {x,z}), then it is a new operation: [natural] join of varieties. These “projected” varieties correspond to relations

```[x y]
1 1
2 1
3 2
```
```[x z]
1 1
2 1
3 1
3 2
```

The first one is immediate, because the roots of the first cubic univariate equation in x are the domain of three values {1,2,3}. The second equation is a function unambiguously defining each (x,y) value pair.

For the second system its correspondence to the relation is little less obvious. A brute force method to check it is Wolfram Alpha/Mathematica command

`Solve[(x - 3)(x - 2)(x - 1) == 0 && (x - 3)(z - 1) == 0 && (z - 2)(z - 1) == 0, {x, y, z}]`

Checking it manually is not hard either. The last constraint assets that z is either 1 or 2. If it is 1, then both the second and the third constraint are satisfied, so x can be either of the three values {1,2,3} from the first constraint. If z = 2, then x has to be 3 in order to satisfy the second constraint.

There is one last detail, that we have missed. In general case, what if we calculated Groebner basis, but still have the two constraints: one in all three variables {x,y,z}, and the other one in {y,z} — sure we can’t decompose the system anymore? Well, the functional dependency would still be there; then we express y as an explicit function of x, and eliminate it in both equations by substitution.

To summarize, the interpretation of Heath’s theorem in algebraic geometry setting is quite intuitive. Most of the effort here has been spent establishing correspondence between join of relations and intersection/join of varieties. Heath’s theorem itself can be eloquently described in few steps:

• Write down the system of constraints defining the relation
• Partition all constraints into 3 groups:
1.  Unary [polynomial] constraint defining the values of functional dependency’s domain. In our example its $(x-3)(x-2)(x-1) = 0$
2.  Functional dependency itself, e.g. $y = (-4 + 3 x - x^2)/2$
3.  The rest of the system
• Eliminate the dependent variable in 3 by substitution.
• We have required decomposition of the constraint system into {1,2} and {1,3}.