NOT EXISTS in Relational Algebra and QBQL

September 24, 2012


Many people criticize SQL for bloated and inconsistent syntax. However, one must admit that some of its features give Relational Algebra run for the money. Consider recent stack overflow question “Relational Algebra equivalent of SQL “NOT IN””

Is there a relational algebra equivalent of the SQL expression NOT IN? For example, if I have the relation:

A1 | A2
----------
x | y
a | b
y | x

I want to remove all tuples in the relation for which A1 is in A2. In SQL I might query:

SELECT * FROM R
WHERE A1 NOT IN( SELECT A2 FROM R );

What is really stumping me is how to subquery inside the relational algebra selection operator, is this possible?:

\sigma_{some subquery here}R

The answer, which was accepted, suggested the query:

R - \rho_{A1,A2} ( \pi_{a_{11},a_{22}} \sigma_{a_{11}=a_{22}} (\rho_{a_{11},a_{12}} R \bowtie \rho_{a_{21},a_{22}} R) )

The length and complexity of this expression is puzzling: one can’t convincingly argue of inferiority of SQL to “truly relational” languages without being able to crack it.

On close inspection it becomes evident that the author used more renaming than its necessary, so we have:

R - ( \pi_{A1,A2} \sigma_{A1=A2} (\rho_{A1,a_{12}} R \bowtie \rho_{a_{21},A2} R) )

It is still longer and less intuitive than SQL query. Can Relational Algebra do better than this? Also, if SQL is allowed to use extra operations which are equivalent to Relational Algebra antijoin, then perhaps the competition ground is not fair?

There is one operation — relational division, initially included by Codd into the list of base operation, but excluded later when it has been discovered that it can be expressed in terms of the others. We’ll demonstrate that with division-like operators at hand we can achieve much shorter query.

Modern database theory treats relational division as set-containment-join. Set containment join is easy to describe if we temporarily step outside of first normal form relations. Consider the following relations

Certified=[name  skill]
           claire  java
           max  java
           max  sql
           peter  html
           peter  java
;
Requirements=[job  skill]
              appl  java
              appl  sql
              teaching  java
              web  html
;

We can nest their common attribute skill into the set

Certified=[name  skillSet]
           claire  {java}
           max  {java,sql}
           peter  {html,java}
;
Requirements=[job  skillSet]
              appl  {java,sql}
              teaching  {java}
              web  {html}
;

Next, for each pair of applicant and job we check if their skills matches job requirements, that is if one is subset of the other, and keep those pairs which meet this criteria.

This is classic set containment join, but mathematically inclined person would be tempted to do something about empty sets. What about the other applicants and jobs in their respective domains, shouldn’t they be included into both relations as well? Suppose there is a person named “Joe” who doesn’t have any qualification, and let the “entry” job assume no requirements at all:

Certified=[name  skillSet]
           claire  {java}
           max  {java,sql}
           peter  {html,java}
           joe {}
;
Requirements=[job  skillSet]
              appl  {java,sql}
              teaching  {java}
              web  {html}
              entry {}
;

Then, the amended set containment join would output additional tuple of Joe qualified for entry level job.

Certainly, the amended set join is no longer domain independent, but one may argue that the benefits overweight the drawbacks. For example, compared to classic set containment join, the amended operation expressed in terms of the basic ones is much more concise. We’ll show relational algebra with amended set containment join would greatly simplify our puzzle query as well.

In general, to sharpen intuition around problems like this it is useful to leverage relational algebra query engine. As usual, we use QBQL. Most of QBQL operations generalize Relational Algebra, and, therefore, are easily translatable into the later.

Here is QBQL query step by step. The most surprising is the very first step where we divide the input relation R over the empty relation with attribute A2:

R=
[A1   A2]
x     y
a     b
y     x
;
R /< [A2];

which outputs

[A1]
b

Here an explanation may be warranted. In order to evaluate set containment join we have to partition sets of attributes of R and [A2] into exclusive and common attributes. The attribute set of the result is common attributes that is symmetric difference of the attributes of both inputs. In our example the output header is, as expected, {A1}. Now, let’s evaluate the output content. For R the exclusive/common attributes partition is easy: the A1 is exclusive attribute, while A2 is common. In “de-first-normalized” form the table R looks like this:

R=
[A1 A2]
x {y}
y {x}
a {b}
b {}

For the relation [A2] the common attribute is again A2, while the exclusive one is empty. How we set-join such a relation? In de-normalized form the table [A2] looks like this:

[A2 {}]
{} R01

That’s right, sets are relations with single attribute, and when we are challenged to exibit a nested relation with no attributes, then we have only two choices: R01 and R00 (i.e. TABLE_DEE and TABLE_DUM). After set containment join we obtain

[A1 {}]
b R01

where the fictitious empty set column should be excluded from the normalized result.

After obtaining the relation

[A1]
b

the next steps are easy: rename the attribute to A2 and join with R. In QBQL renaming is technically set-intersection join (/^) with binary equality relation (although set-equality /^ and set-containment would produce the same result), so the query becomes:

R ^ ("A1=A2" /^ (R /< [A2]));

and it outputs

[A1 A2]
a b

as expected.

Advertisements

2 Responses to “NOT EXISTS in Relational Algebra and QBQL”

  1. AntC Says:

    Thanks Vadim, I suspect that all that the stackoverflow answer shows is that it’s just as easy to write obfuscated RA as obfuscated SQL. I’ve posted an improved solution (with an explanation).

    The QBQL would go like this, I think:
    R – (R ^ ((R v [A2] ^ “A2=A1”) v [A1]))

    (This produces the right answer in my implementation of QBQL.)

    The minus (-) is relative complement aka NOT MATCHING.
    I’ve avoided your compose operator /^, and used explicit ^ and v to achieve the rename from A2 to A1. So there’s no set-join operations.

    This is reasonably succinct, IMO. But not as intuitive as the RA solution In my implementation of QBQL, I’ve defined some extra operators as ‘syntactic sugar’ for rename and project.


  2. It produces empty result set due to missing parenthesis:
    R – (R ^ (((R v [A2]) ^ “A2=A1”) v [A1]));
    ————-^———-^


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: