## Integer Predicates

### September 19, 2007

Another thought provoking OTN exchange.

To recap, given a relation with column QTY, for example:

 KEY STRING QTY 1 a 2 2 b 3

produce a multiset such that each row of the original relation is duplicated the QTY times:

 KEY STRING QTY 1 a 2 1 a 2 2 b 3 2 b 3 2 b 3

We’ll focus on the two solutions, the one suggested by Joe:

```select t.* from T CONNECT BY CONNECT_BY_ROOT KEY = KEY AND LEVEL <= QTY ```
and “the standard” one

```select t.* from T, ( SELECT level L from DUAL connect by level <= (select max(qty) from T) ) iv where L <= QTY ```

On the surface it appears as if Joe just merged the inline view iv from the standard solution!

Let’s contrast them in more detail. The inline view in the standard solution is a segment of the infinite integer numbers relation. The eyesore level <= (select max(qty) from T) expression is a safety predicate: it is logically not required! We can think of joining the original table t with the set of all integers, the filtering out only those rows that satisfy the L <= QTY condition. Alternatively, we can just join the original table with the infinite L <= QTY predicate

 L QTY 1 1 1 2 1 3 1 4 2 2 2 3 2 4 3 3 3 4 4 4

This imaginary SQL solution is remarkably succinct:

```select t.* from T, LE where LE.QTY = T.QTY ```

It could be reduced even more

```select t.* from T where L <= QTY ```

The normal SQL syntax wouldn’t allow any of these, of course. The first query refers to some relation LE which is not defined, while the second one refers to a variable L which is free. The answer is that SQL should be extended to admit predicates as first class citizens. The query execution engine should become aware of the infinite relations and be extended with the methods allowing to join predicates and relations.

In this particular example what is the join method? Clearly any join with LE as a leading relation is going to fail. The only feasible execution is to scan the finite T relation first, and try to find the matching tuples from LE by some sort of index lookup, otherwise we, again, have to make the full scan of LE. It is pretty clear what kind of index do we need, it has to be a function that accepts the QTY as an integer argument, and computes the set of all the (L,QTY) pairs that match.

With this idea in mind, here is the proposed syntax:

```create predicate LE ( L integer, QTY integer ); create index IDX_LE on LE (QTY) implemented as a function...; ```

As you see it differs from the traditional table/index SQL definition in two aspects:

• a user is not allowed to to modify the LE relation
• he has to supply the index implementation

This is the first step, which “fixes” the first query. The second step is for SQL engine to automatically recognize “built-in” predicates, so that the query

```select t.* from T where L <= QTY ```

become legitimate as well.