Integer Predicates

September 19, 2007


Another thought provoking OTN exchange.

To recap, given a relation with column QTY, for example:

 KEY   STRING   QTY 
 1   a   2 
 2   b   3 

produce a multiset such that each row of the original relation is duplicated the QTY times:

 KEY   STRING   QTY 
 1   a   2 
 1   a   2 
 2   b   3 
 2   b   3 
 2   b   3 

We’ll focus on the two solutions, the one suggested by Joe:

select t.*
from T
CONNECT BY CONNECT_BY_ROOT KEY = KEY AND LEVEL <= QTY

and “the standard” one

select t.* from T, (
SELECT level L from DUAL
connect by level <= (select max(qty) from T)
) iv where L <= QTY

On the surface it appears as if Joe just merged the inline view iv from the standard solution!

Let’s contrast them in more detail. The inline view in the standard solution is a segment of the infinite integer numbers relation. The eyesore level <= (select max(qty) from T) expression is a safety predicate: it is logically not required! We can think of joining the original table t with the set of all integers, the filtering out only those rows that satisfy the L <= QTY condition. Alternatively, we can just join the original table with the infinite L <= QTY predicate

 L   QTY 
 1   1 
 1   2 
 1   3 
 1   4 
 2   2 
 2   3 
 2   4 
 3   3 
 3   4 
 4   4 

This imaginary SQL solution is remarkably succinct:

select t.* from T, LE
where LE.QTY = T.QTY

It could be reduced even more

select t.* from T
where L <= QTY

The normal SQL syntax wouldn’t allow any of these, of course. The first query refers to some relation LE which is not defined, while the second one refers to a variable L which is free. The answer is that SQL should be extended to admit predicates as first class citizens. The query execution engine should become aware of the infinite relations and be extended with the methods allowing to join predicates and relations.

In this particular example what is the join method? Clearly any join with LE as a leading relation is going to fail. The only feasible execution is to scan the finite T relation first, and try to find the matching tuples from LE by some sort of index lookup, otherwise we, again, have to make the full scan of LE. It is pretty clear what kind of index do we need, it has to be a function that accepts the QTY as an integer argument, and computes the set of all the (L,QTY) pairs that match.

With this idea in mind, here is the proposed syntax:

create predicate LE (
L integer,
QTY integer
);
create index IDX_LE
on LE (QTY) implemented as a function...;

As you see it differs from the traditional table/index SQL definition in two aspects:

  • a user is not allowed to to modify the LE relation
  • he has to supply the index implementation

This is the first step, which “fixes” the first query. The second step is for SQL engine to automatically recognize “built-in” predicates, so that the query

select t.* from T
where L <= QTY

become legitimate as well.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: