## Relations as Finite Varieties

### January 2, 2014

It is common wisdom that database field is firmly grounded in the two math disciplines: predicate logic and set theory. However, neither logic nor set theory are dominant math subjects. Just counting tags at mathoverflow.net can give rough idea of the importance of an individual math topic, and it is evident that *algebraic geometry* governs the math world. This field is so rich that it spawned new sub-fields, such as category theory. Furthermore, *mathoverflow.net* hints that category theory is more popular than either predicate logic or set theory! Some category theorists even proposed to rebuild the entire math foundation with category theory toolkit…

Does algebraic geometry offer any insights to database theory? First, we recall what basic mathematical objects of algebraic geometry are. The basic geometric object is *affine variety* — a system of polynomial equations. Affine variety is a set of tuples in , and our focus on database applications prompts that we must narrow our scope to finite varieties, when polynomial system has finite number of roots.

OK, this abstract motivation is fine, but given a [database] relation, how do we construct a variety out of it? This is accomplished after we learn the two fundamental operations over varieties.

1. Set intersection of two varieties and is a set which is, again, a variety. The defining set of polynomials for is a union of the polynomial sets defining the and .

2. Set union of two varieties and is a set which is, again, a variety. The defining set of polynomials for is a set of all pairwise products of the polynomials from and .

Union and intersection of varieties is exactly what is needed for being able to exhibit a variety corresponding to a relation. Consider an unary relations with a single tuple

[x] 1

Geometrically, it corresponds to a single point on the x axis, so it becomes immediately obvious what equation defines it:

Likewise, the relation

[y] 1

is defines by

Next, we construct join of the two relations

[x y] 1 1

Join is set intersection, and intersection of varieties gives us the system of defining equations

Let’s expand our example and add one more attribute:

[x y z] 1 1 1

Our system of equations grows with one more constraint:

Now, knowing how to construct “single tuple varieties”, we are ready to move onto relations with more than one tuple. This is accomplished via union. Consider a relation

[x y z] 1 1 1 2 1 1

which is a union of already familiar relation

[x y z] 1 1 1

with

[x y z] 2 1 1

To build the union of varieties we need a polynomial system defining the second variety

The polynomial system for the union is

Wait a minute, 9(!) equations for a relation with just couple of tuples — this doesn’t look promising… The critical observation is that not all of these equations are independent. A well known method to greatly reduce polynomial system is Groebner basis. Executing the following command at Wolfram Alpha:

`GroebnerBasis[{(x-1)(x-2), (y-1)(x-2), (z-1)(x-2),`

(x-1)(y-1), (y-1)(y-1), (z-1)(y-1),

(x-1)(z-1), (y-1)(z-1), (z-1)(z-1)} ,{z, y, x}]

we obtain the system

As an afterthought, this result is obvious. The first equation constraints `x`

to being either 1 or 2, the second equation asserts that `y`

is equal to 1, while the third one asserts `z=1`

. It is good to learn the general method, though.

With this technique we can proceed and find a variety corresponding to the relation

[x y z] 1 1 1 2 1 1 3 2 1 3 2 2

An inquisitive reader might have already noticed functional dependency `x -> y`

lurking in plain view. In the next installment we’ll investigate what functional dependency is from algebraic geometry perspective.

January 18, 2014 at 7:14 am

Is this representation column-order-independent? If yes, is it possible to demonstrate so? If no, is it possible to make it column-order-independent as described for example in chapter “Symmetric Functions” here: vadimtropashko.files.wordpress.com/2014/01/book_sql_chap4_v2.pdf ?

January 23, 2014 at 7:25 pm

I don’t understand the question. Multivariate polynomials correspond to relations in named perspective (“named” vs. “unnamed” is the Alice book terminology). In other words, attributes are referred by variable names and not by positions.

The referred “Symmetric Functions” book section introduced a fancy unique constraint. Here is how to interpret it in the new setting. Equip the Boxes relation with 6 attributes: length, height, depth, A, B, C, where A, B, and C are functionally dependent of length, height, depth via symmetric polynomials. The uniqueness constraint says that {A, B, C} is a key, or in other words we have functional dependency {A, B, C}->{A, B, C, length, height, depth}. What a functional dependency is in geometric terms is covered here.