Two years ago we have ventured into mathematical foundation of relational theory. From algebraic geometry perspective relations were viewed as Finite Varieties. In the followup  we were able to describe functional dependencies via explicit analytic formulas and provide intuitive interpretation of the classic result from database dependency theory — Heath’s theorem. Recent arXiv preprint provides a little more  coherent exposition of those ideas together with some excursion into Quantum Theory.

Here we complement the aforementioned arXiv article with demonstration that your average off-the-shelf Computer Algebra System is actually a [rudimentary] RDBMS. The CAS system of choice is CoCoA.

The starting point is being able to exhibit a system of polynomial equations constraining a relation defined as a set of set of tuples.

Given a binary relation with attributes x and y

[x y]
 1 1
 2 1
 3 2

we execute the following series of CoCoA commands. First, we need to specify the polynomial ring:

Use XY ::= QQ[x,y];

Then, list the tuples:

Points := mat([[1, 1], [2, 1], [3, 2]]);

Finally, specify the ideal, and print it out:

I := IdealOfPoints(XY, Points);
I;

The output:

ideal(y^2 -3*y +2, x*y -x -3*y +3, x^2 -3*x -2*y +4)

Next, for the second relation:

[x z]
 1 1
 2 1
 3 1
 3 2

we execute the series of commands

Use XZ ::= QQ[x,z];
Points := mat([[1, 1], [2, 1], [3, 1],[3, 2]]);
J := IdealOfPoints(XZ, Points);
J;

which outputs

ideal(z^2 -3*z +2, x*z -x -3*z +3, x^3 -6*x^2 +11*x -6)

What is the natural join of the two relations? It is the sum I+J. However, we must switch to polynomial ring which contains both XY and XZ:

Use XYZ ::= QQ[x,y,z];
I:=ideal(y^2 -3*y +2, x*y -x -3*y +3, x^2 -3*x -2*y +4);
J:=ideal(z^2 -3*z +2, x*z -x -3*z +3, x^3 -6*x^2 +11*x -6);

After redefining verbatim both ideals in the larger ring (can it be accomplished easier?), we calculate their “join”:

I+J;

which outputs

ideal(y^2 -3*y +2, x*y -x -3*y +3, x^2 -3*x -2*y +4, 
z^2 -3*z +2, x*z -x -3*z +3, x^3 -6*x^2 +11*x -6)

This is, again an ideal of points, which is evident with the command

RationalSolve(GBasis(I+J));
[[1, 1, 1], [2, 1, 1], [3, 2, 1], [3, 2, 2]]

Here we leveraged GBasis function as a way to convert an ideal into list (because RationalSolve accepts a list, not an ideal).

Next, projection is an elimination ideal. Once again, it is zero-dimensional ideal of points, so that a typical database user would like to list the tuples:

RationalSolve(GBasis(elim(z, I+J)));
[[1, 1], [2, 1], [3, 2]]

Finally, let’s hint what are counterparts of the rest of RA operations. The set union is the intersection of an ideals. The set difference is colon ideal. The only operation which doesn’t have an obvious analog in polynomial algebra is the least challenging one — renaming.

Database links allow clean solutions for many problems, which slowly began to recognize in oracle community. For example, oracle 12c features PDB instances communicating via database links. So, what is the status of SQLDeveloper database links in SQLDeveloper 4.1, a year since last announcement?

The major omission was a failure to run individual query which outputs result set into a grid (as opposed to running it as a script). Admittedly, there was a technical culprit for that. When a query is executed into a grid, the contract between the server (RDBMS) and client (SQLDeveloper) is somewhat loosely defined. Specifically, the client asks the server to execute the query, then fetches 50 records. The cursor remains open “just in case” if user might want to scroll down beyond the first batch of records. In other words, the end of sql execution is not precisely demarcated. How does this affect SQLDeveloper automatic DB link feature? Well, SQLDeveloper needs to know when to delete temporary database links, and the only solution is attaching listener to the event of grid closure. Hence, after you you set dblinks on, and execute a query into a grid, but before closing the grid, you’ll witness database link under the links node in the navigator. Please be aware that deleting temporary links from the client is not bulletproof; for example, server process going down with ORA-600 and taking user session with it, would result in leftover database links.  If your database development work revolves around environment with strict security requirements, it is advisable to reevaluate using this feature.

In releases prior to SQLDeveloper 4.1RC1 there were also smaller snags. For example, PL/SQL anonymous blocks execution was not accompanied by database link creation.  Also, database link names with identifiers separated by dots were not recognized. Regarding the second issue, please keep in mind that SQLDeveloper’s syntax for connection names is less restricted than that of database links. For example, something like 11.45.34.12_SCOTT is legitimate connection alias, but is not a database link.

With these bug fixes behind, let’s venture into applications. How about copying [remote] database schema?  In introductory computer science class students learn that there are two kinds of objects in programming: data and code. Here is the command which copies the data:

set autodblink on;

BEGIN
   FOR rec IN (SELECT TABLE_NAME FROM user_tables@gbr30060_DB11GR24_hr) LOOP
      begin
         DBMS_OUTPUT.PUT_LINE('Processing Table: ' || rec.TABLE_NAME );
         execute immediate 'create table ' || rec.TABLE_NAME 
                  || ' as select * from ' || rec.TABLE_NAME || '@gbr30060_DB11GR24_hr';      
      exception when others then
         DBMS_OUTPUT.PUT_LINE('Skipped Table: ' || rec.TABLE_NAME ||' -> -ERROR- '||SQLERRM);
      end;
   END LOOP;
END;
/

and the code

BEGIN
   FOR rec IN (SELECT name, type, LISTAGG(text, 'chr(13)') WITHIN GROUP (ORDER BY line) AS code
               FROM   user_source@gbr30060_DB11GR24_hr
               GROUP BY name, type ) LOOP
      begin
         DBMS_OUTPUT.PUT_LINE('Processing Object: ' || rec.NAME );
         execute immediate 'create ' || rec.code;      
      exception when others then
         DBMS_OUTPUT.PUT_LINE('Skipped : ' || rec.type || '' || rec.name ||' --ERROR-- '||SQLERRM);
      end;
   END LOOP;
END;
/

Recently, Litak & Mikulas & Hidders published the extended version of their earlier work. However, in a typical mathematical tradition following the famous saying “when fine building is revealed to the public eye, the scaffolding should be removed”, reading it requires some detective work. In particular, lets investigate the motivation behind their axiom system:

x ^ (y v z) = (x ^ (z v (R00 ^ y))) v (x ^ (y v (R00 ^ z))). % AxRH1

(R00 ^ (x ^ (y v z))) v (y ^ z) =
= ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y). % AxRH2

(x ^ y) v (x ^ z) = x ^ ( ((x v z) ^ y) v ((x v y) ^ z) ). % AxRL1

Let’s start with well known lattice inequality

x v (y ^ z) > (x v y) ^ (x v z).

Here we abused notation a little and assume the order symbol > to be reflexive relation. If the LHS of inequality is greater (or equal) than the RHS, then perhaps one can hope to substitute y and z with greater values at RHS, or with lower values at LHS so that inequality becomes an identity? This is the technique employed by Sergey Polin when refuting McKenzie’s conjecture. Specifically, he introduced a non-decreasing chain of values

y1 = y
z1 = z

y2 = y v (x ^ z1).
z2 = z v (x ^ y1).

y3 = y v (x ^ z2).
z3 = z v (x ^ y2).
...

together with weakened distributivity law

x v (y ^ z) = (x v yN) ^ (x v zN).

Unfortunately, literal Polin-style weakened distributivity law fails in relational lattices for n=1,…,6, at least. However, Polin’s technique can be naturally generalized when an algebra offers constant(s). The axiom AxRH1 is weakened distributivity law with y elevated to y v (R00 ^ z) and z elevated to y v (R00 ^ z).

In a similar venue, the axiom AxRL1 is Polin-style weakening applied to LHS of distributivity law. (Litak et.al. attribute the axiom to Padmanabhan work.)

Genesis of AxRH2 is probably various earlier discovered conditional distributivity identities involving the header relation R00. For professional logician converting implication into equality is a walk in the park.

How about the dual distributivity law of union over join? The stricter conditional distributivity of union over join established in an earlier paper hints that the axiom might be more complicated. One might have to lower/elevate both sides of inequality. After some trial permutations of variables and the header constant R00, it revealed the following identity:

(x v (y ^ (x v z))) ^ (x v (z ^ (x v y))) = x v ((y v (x ^ R00)) ^ (z v (x ^ R00))).

This can be translated into relational calculus expression

(( exists p1 exists p2 exists p6(x(p1,p3,p5,p7) | (y(p2,p3,p6,p7)& exists p1 exists p3 exists p4 exists p6(x(p1,p3,p5,p7) | z(p4,p5,p6,p7))))& exists p1 exists p4 exists p6(x(p1,p3,p5,p7) | (z(p4,p5,p6,p7)& exists p1 exists p5 exists p2 exists p6(x(p1,p3,p5,p7) | y(p2,p3,p6,p7)))))
<->
exists p1(x(p1,p3,p5,p7) | ( exists p2 exists p6 exists p1 exists p5(y(p2,p3,p6,p7) | (x(p1,p3,p5,p7)&$F))& exists p4 exists p6 exists p1 exists p3(z(p4,p5,p6,p7) | (x(p1,p3,p5,p7)&$F))))).

which validity is easily verified with Prover9.

This axiom doesn’t seem to be independent from the {AxRH1,AxRH2,AxRL1} with Mace4 models lookup up to cardinality 26 at least, although Prover9 fails to confirm it after 3 hours running time. (This is not entirely surprising, with union over join distributivity law being often enforced by join over union).

People invent really complicated ways to achieve simple tasks such as copying a table from one database to another. With database link (which is arguably RDBMS’s the most under appreciated feature) it is just one SQL command:

create table emp as select * from scott.emp@dblink1;

But who has time to create database links? In SQLDeveloper 4.0 you don’t have to. Assuming that you have the right connection the script

set autodblink on
create table emp as select * from scott.emp@connection1; -- assuming there is connection "connection1"

would create temporary db link, then run the statement and, finally, drop the link.

Another example, which benefits from this functionality is comparing a table in two different databases:

set autodblink on
(select * from employees@connection1 
minus
select * from employees)
union
(select * from employees
minus
select * from employees@connection1);

Watch out for pitfalls: if connection info doesn’t store a password, you won’t be asked for it — the fix is scheduled for 4.0.2 patch.

Relational Lattice is simplified “attribute-free” Relational Algebra. It allows to axiomatize Relational Algebra in the same spirit as Relation Algebra. Here is axiom system by Litak&Hidders&Mikulas:

x ^ y = y ^ x.
(x ^ y) ^ z = x ^ (y ^ z).
x ^ (x v y) = x.
x v y = y v x.
(x v y) v z = x v (y v z).
x v (x ^ y) = x.

x ^ (y v z) = (x ^ (z v (R00 ^ y))) v (x ^ (y v (R00 ^ z))).
(x ^ y) v (x ^ z) = x ^ ( ((x v z) ^ y) v ((x v y) ^ z) ).
(R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y).

The authors hint the proof of each individual axiom “by direct calculation”. Let’s prove them by converting each equality into equivalent Predicate Calculus formula.

The first snag is that Relational Lattice expressions don’t refer to [relation] attributes. We have to equip each individual relation variable with attributes. This is inherently ambiguous step, although one observation might help.

Consider an expression of two variables x and y, e.g. x ^ (x v y). There might be more than one common attribute for x and y, but when calculating a result of either operation join ^ or meld v, two common attributes match the same way as a single composite attribute. For example, for all practical purposes joining by two conditions matching first and last name is the same as matching by their concatenation — full name. Therefore, considering expressions with two variables we equip x with two attributes: p1, which is in x but not in y, together with p2, which is both in x and y. Likewise, we equip y with two attributes: p2, which is both in x and y, together with p3, which is in y but not in x. This canonical attribute introduction generalizes to any number of relvars.

Now that all relvars are equipped with attributes we can translate complex Relational Lattice expressions into Predicate Calculus as well. Join operation is straightforward: it is Boolean conjunction. The set of free variables in the resulting formula is a union of free variables of the operands. Meld operation is Boolean disjunction, however we must add existential quantification over arguments which are in the first operand but are absent from the second one, and arguments which are in the second operand but not the first one (symmetric difference). For example,

x v y

translates to

exists p1 exists p3 x(p1,p2) | y(p2,p3)

Translation procedure is entirely mechanical and begs for automation. In QBQL you just write down an axiom into the input file qbql/lang/parse/RL.axiom, then run qbql.lang.parse.Translate.main() and, finally, witness the translated expression in the output. The result then can be fed directly into Prover9.

Let’s try this algorithm for several axioms. The absorption law

x v (x ^ y) = x.

translates into predicate equivalence

exists p2(x(p1,p3) | (x(p1,p3)&y(p2,p3)))
<->
x(p1,p3).

The Prover9 proof of the later is remarkable:

1 (exists p2 (x(p1,p3) | x(p1,p3) & y(p2,p3))) x(p1,p3) # label(non_clause) # label(goal). [goal].
2 $F. [deny(1)].

It is certainly shorter than the proof in one of the earlier paper, and warrants an explanation, to say the least. Prover9 succinct output is a result of several steps done internally, such as Scolemization, rewriting to Conjunctive Normal Form, etc. In this case, Scolemization followed by CNF transformation produced the false value, thus proving the theorem.

Let’s investigate more relational operations. Relational complement translates into Boolean negation, while inversion is a little bit more sophisticated. Here is a theorem, which can be considered definition for R00 (better known in D&D community as DUM):

(y' v y`)'=R00.

It translates into

-( exists p1(-(y(p1)) | ( exists p1 y(p1))))
<->
$F.

where one don’t need help of any automatic proof system to be convinced of its validity. There we see yet again that R00 is analog of nullary predicate “false”.

Now that we established conversion of Relational Lattice expressions into Predicate Calculus it is naturally to ask if we can translate in the opposite direction. Consider arbitrary calculus formula and let’s drill down into its structure. There are several possibilities of what syntactic constructions we encounter on each level.

If we come across existential quantifiers, then we introduce projection, i.e. meld with an empty relation. Universal quantifier is dual to existential via De Morgan’s formula. There we have to introduce negation, which is domain dependent. If we encounter Boolean conjunction, then we introduce [natural] join (leaving all free variables unaffected). The only step which is not straightforward is handling disjunction. We translate it into the outer union AKA <OR>. Outer union is domain-dependent operation, but as you see ignoring domain dependence rewarded us with very succinct proof of Codd’s theorem. Free variables flow unobstructed through outer union, similar to natural join; giving another reason why this transformation step was so easy.

The last step is rewriting outer union in terms of standard lattice operations using one of the following equivalent definitions:

x <OR> y = (x ^ R11) v (y ^ (x v R11)).
x <OR> y = (R11 ^ y) v (x ^ y) v (x ^ R11).
x <OR> y = (x ^ (y v R11)) v (y ^ (x v R11)).
x <OR> y = (x' ^ y')'.
x <OR> y = ((R00 ^ x) ^ y) v (R11 ^ y) v (x ^ R11).

Constraints have always been under appreciated in database practice. Today we’ll focus on inclusion dependencies, which in SQL community are better known as foreign keys. Let’s start with an example borrowed from David Spivak’s work on categorical foundation of databases — the familiar Employees table

Employees = [empno mgr dept]
               1    3   10
               2    2   20
               3    4   10
               4    4   10
               5    4   10

together with constraint

“For every employee e, the manager of e works in the same department
that e works in.”

One may argue that this is not very realistic constraint. If an employee and his manager have to be in the same department, then an employee and his manager’s manager have to be in the same department as well. Then, by transitivity it follows that the entire hierarchy tree of employees must be in the same department. If hierarchy graph is disjoint (i.e. a forest of trees), then the department number is just a label distinguishing a connected component of hierarchy.

Yet, a business rule is just that: an [arbitrary] rule, which is silly to argue with. Let see if we can formally enforce it. For each pair (mgr,dept) there should exist a pair (emp,dept). That sounds like inclusion dependency, so one may be tempted to enforce it as a foreign key:

CREATE TABLE employees (
    empno INT, 
    mgr INT,
    dept INT,
    PRIMARY KEY (empno,dept) ,
    CONSTRAINT fk FOREIGN KEY fk (mgr,dept)
    REFERENCES employees (empno,dept)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
);

So far, so good. However, the initial enthusiasm of discovering foreign keys referencing the same table hits the first snag:

insert into emps (empno,mgr,dpt) values(3,4,10);

SQL Error: Cannot add or update a child row: a foreign key constraint fails (`test`.`employees`, CONSTRAINT `fk` FOREIGN KEY (`mgr`, `dept`) REFERENCES `employees` (`empno`, `dept`) ON DELETE NO ACTION ON UPDATE NO ACTION)

A bug? Nope. Apparently, the constraint doesn’t allow to insert an employee before manager. The correct table update sequence should be like this:

insert into emps (empno,mgr,dpt) values(4,4,10);
insert into emps (empno,mgr,dpt) values(3,4,10);

Union via Join

February 12, 2014

Here is really bizarre technique: one can perform union via join! Consider:

(([p] 0 <OR> [q] a b) <AND> ([p] 1 <OR> [q] c)) v [q];

which outputs

[q]
a
b
c

(We assumed domain p to include {0,1} and q to include {a,b,c}, at least). You may object that the expression involves [domain dependent] union operation <OR>, but it is applied for two relations with disjoint attributes, so it is nothing like classic union.

This observation leads to more general assertion:
(z <OR> x) ^ (<NOT>z <OR> y) = (<NOT>z ^ x) <OR> (z ^ y).
which by leveraging relational lattice De Morgan’s law can be rewritten as

(x' ^ z')' ^ (y' ^ z)' = ((z' ^ x)' ^ (z ^ y)')'.

Here we have switched notation a little from D&D to Prover9Mace4, otherwise, it is just too many negation operators to struggle with. Loading this assertion into automatic prover demonstrates that it is not trivial. It doesn’t follow from semi-lattice axioms amended with distributivity of <OR> over join:

x ^ x = x.
x ^ y = y ^ x.
(x ^ y) ^ z = x ^ (y ^ z).
x ^ (y' ^ z')' = ((x ^ y)' ^ (x ^ z)')'.

Follow

Get every new post delivered to your Inbox.