Translating Relational Lattice Expressions into Predicate Calculus

Relational Lattice is simplified “attribute-free” Relational Algebra. It allows to axiomatize Relational Algebra in the same spirit as Algebra of Binary Relations. Here is axiom system by Litak&Mikulas&Hidders (LMH):

x ^ y = y ^ x.
(x ^ y) ^ z = x ^ (y ^ z).
x ^ (x v y) = x.
x v y = y v x.
(x v y) v z = x v (y v z).
x v (x ^ y) = x.

x ^ (y v z) = (x ^ (z v (R00 ^ y))) v (x ^ (y v (R00 ^ z))).
(x ^ y) v (x ^ z) = x ^ ( ((x v z) ^ y) v ((x v y) ^ z) ).
(R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y).

The authors hint the proof of each individual axiom “by direct calculation”. Let’s prove them by converting each equality into equivalent Predicate Calculus formula.

The first snag is that Relational Lattice expressions don’t refer to [relation] attributes. We have to equip each individual relation variable with attributes. This is inherently ambiguous step, although one observation might help.

Consider an expression of two variables x and y, e.g. x ^ (x v y). There might be more than one common attribute for x and y, but when calculating a result of either operation join ^ or meld v, two common attributes match the same way as a single composite attribute. For example, for all practical purposes joining by two conditions matching first and last name is the same as matching by their concatenation — full name. Therefore, considering expressions with two variables we equip x with two attributes: p1, which is in x but not in y, together with p2, which is both in x and y. Likewise, we equip y with two attributes: p2, which is both in x and y, together with p3, which is in y but not in x. This canonical attribute introduction generalizes to any number of relvars.

Now that all relvars are equipped with attributes we can translate complex Relational Lattice expressions into Predicate Calculus as well. Join operation is straightforward: it is Boolean conjunction. The set of free variables in the resulting formula is a union of free variables of the operands. Meld operation is Boolean disjunction, however we must add existential quantification over arguments which are in the first operand but are absent from the second one, and arguments which are in the second operand but not the first one (symmetric difference). For example,

x v y

translates to

exists p1 exists p3 x(p1,p2) | y(p2,p3)

Translation procedure is entirely mechanical and begs for automation. In QBQL you just write down an axiom into the input file qbql/lang/parse/RL.axiom, then run qbql.lang.parse.Translate.main() and, finally, witness the translated expression in the output. The result then can be fed directly into Prover9.

Let’s try this algorithm for several axioms. The absorption law

x v (x ^ y) = x.

translates into predicate equivalence

exists p2(x(p1,p3) | (x(p1,p3)&y(p2,p3)))
<->
x(p1,p3).

The Prover9 proof of the later is remarkable:

1 (exists p2 (x(p1,p3) | x(p1,p3) & y(p2,p3))) x(p1,p3) # label(non_clause) # label(goal). [goal].
2 $F. [deny(1)].

It is certainly shorter than the proof in one of the earlier paper, and warrants an explanation, to say the least. Prover9 succinct output is a result of several steps done internally, such as Scolemization, rewriting to Conjunctive Normal Form, etc. In this case, Scolemization followed by CNF transformation produced the false value, thus proving the theorem.

Let’s investigate more relational operations. Relational complement translates into Boolean negation, while inversion is a little bit more sophisticated. Here is a theorem, which can be considered definition for R00 (better known in D&D community as DUM):

(y' v y`)'=R00.

It translates into

-( exists p1(-(y(p1)) | ( exists p1 y(p1))))
<->
$F.

where one don’t need help of any automatic proof system to be convinced of its validity. There we see yet again that R00 is analog of nullary predicate “false”.

Now that we established conversion of Relational Lattice expressions into Predicate Calculus it is naturally to ask if we can translate in the opposite direction. Consider arbitrary calculus formula and let’s drill down into its structure. There are several possibilities of what syntactic constructions we encounter on each level.

If we come across existential quantifiers, then we introduce projection, i.e. meld with an empty relation. Universal quantifier is dual to existential via De Morgan’s formula. There we have to introduce negation, which is domain dependent. If we encounter Boolean conjunction, then we introduce [natural] join (leaving all free variables unaffected). The only step which is not straightforward is handling disjunction. We translate it into the outer union AKA <OR>. Outer union is domain-dependent operation, but as you see ignoring domain dependence rewarded us with very succinct proof of Codd’s theorem. Free variables flow unobstructed through outer union, similar to natural join; giving another reason why this transformation step was so easy.

The last step is rewriting outer union in terms of standard lattice operations using one of the following equivalent definitions:

x <OR> y = (x ^ R11) v (y ^ (x v R11)).
x <OR> y = (R11 ^ y) v (x ^ y) v (x ^ R11).
x <OR> y = (x ^ (y v R11)) v (y ^ (x v R11)).
x <OR> y = (x' ^ y')'.
x <OR> y = ((R00 ^ x) ^ y) v (R11 ^ y) v (x ^ R11).

Implicit and Explicit dependencies

Shadows of the Truth by Alexandre Borovik offers an interesting perspective upon human learning experience. The book is abundant with examples of concepts being challenging at the early age, but clarified later (sometimes, much later) in life. One of my stumbling blocks was the definition of implicit and explicit dependencies. Now, with modest help of relational thinking (that is, basic dependency theory) I can report some progress.

Intuition behind implicit and explicit dependencies is clear. For example, given the two functions

y(x,t) = x^2 + t
x(t) = t^2

then, in the formula for y we notice two variables x and t , which suggests that y explicitly depends on t . Compare it with

y(x,t) = x^2
x(t) = t^2

where formula for y involves variable x only. Since, at the second line we have x expressed in terms of t , y still depends on t , but the dependence is implicit.

The concept of implicit and explicit dependencies surfaces in many places, for example Partial Derivative Equations and Noether conservation theorems, which both are part of undergraduate math and physics curriculum. Nevertheless, most textbooks take this concept for granted, perhaps implying that mathematically mature reader should have no problems understanding it. Wikipedia offers couple dedicated articles: Dependent and Independent Variables giving overview and intuition, and more ambitious Time-Invariant System with an attempt to formal definition.

The concept of time-invariant system belongs to physics, the idea being that if we shift the time variable t , then it doesn’t affect time-invariant system behavior. This is illustrated by “formal example” in the middle of the page, where by comparing values of y(x,t) with two arguments t vs. t+\delta they suggest that y(x,t) = t x(t) is time-dependent, while y(x,t) = 10 x(t) is not. Compared to math, physics standards of rigor are lax, so it takes little effort to find a flaw. Take x(t) = t , then y(x,t) = t x(t) = {x(t)}^2 so y(x,t) is time-invariant with proper choice of x(t) !

Can relational dependency theory suggest any insight? By glancing over standard definitions of functional dependency:

\forall y_{1} \forall y_{2} \forall x : S(x, y_{1}) \wedge S(x, y_{2}) \implies y_{1} = y_{2}

and independence:

\exists y_{1} \exists y_{2} \exists x : S(x, y_{1}) \wedge S(x, y_{2}) \wedge y_{1} \not= y_{2}

it becomes obvious that dependency concepts hinge upon equality/inequality (amended with some quantifiers, perhaps), and not upon domain algebraic structure (time-shifts). Let’s examine closely two examples:

y(x,t) = t-x
x(t) = t^2-2 t

vs.

y(x,t) = x^2-3 x
x(t) = t^2-2 t

Tabulating values at points t=0,1,2,3 we’ll get relations

R=[t  x  y]
   0  0  0
   1  -1  2
   2  0  2
   3  3  0
;

and

S=[t  x  y]
   0  0  0
   1  -1  4
   2  0  0
   3  3  0
;

correspondingly. The second relation S honors FDs t->x and x->y (and by Armstrong transitivity t->y), while the first one R does only t->x and t->y. Therefore, the formal definition of variable y being not [explicitly] dependent of t is equivalent to the absence of functional dependency x->y — if not counter intuitive, then terminologically confusing to say the least!

TABLE_DUM divided by Relation

Here is curious identity:

TABLE_DUM divided by a relation evaluates to the relation’s complement.

Formally in QBQL:

TABLE_DUM /= x = <NOT> x.

What division am I talking about? Googling “relational division” brings up a list of usual suspects — articles by Date, Celko, etc — with some confusion leading to “Todd’s division”, ternary (!) division operation and so on. QBQL follows much cleaner definitions of set equality join, set containment join and alike readily found in academic literature. In the above identity the “/=” is binary set equality join operation, informally known as relational division.

Dual identity is also interesting. First, the dual of TABLE_DUM, aka R00 is the universal relation R11. The dual of unary complement operation is inversion. Duality among multiple division-like operations is not evident, but the reader might verify that set intersection join analogous to composition in the algebra of Binary Relations fits the bill:

R11 /^ x = <INV> x.