What is the difference between predicate and table?

May 2, 2008


This issue is becoming a recurring theme of mine. This time it has been triggered by OTN exchange. To my surprise, people have difficulty accepting this seemingly obvious idea.

In SQL there is syntactic difference between a predicate and a table. However, let’s look beyond what you may have learned from syntax diagrams in Oracle SQL Reference (or other vendor for that matter). For our purposes let’s create a simple table Dept10:

create table Dept10 (
deptno integer;
);
insert into Dept10 values (10);

Here is the question: is there a difference between this table and the predicate deptno = 10?

Sure there is: we can examine the content of Dept10 with a simple query:

select * from Dept10

and we have no means to “examine the content” of a predicate! However, let’s include the predicate deptno = 10 and the Dept10 table into a bigger query:

select dname from dept where deptno = 10

select dname from Dept natural join Dept10;

Now, there is no difference between the two! In fact, you can construct many more queries: for every query involving Dept10 we may compose equivalent query without Dept10 and vice versa, with the only exception being the early mentioned single table query:

select * from Dept10

Therefore, I propose to extend SQL to allow queries like this

select * where deptno = 10

One consequence of this amendment is that the famous Dual table is no longer required. The other implication of equating tables with predicates is that physical query optimization can be achieved in a simpler optimizer framework.

Some people suggested to go even further and just write

select 10

how this proposal fits? Let’s analyze this expression step by step. The legitimate SQL query which achieves this is

select 10 from Dual

where the Dual could be any table or view containing one row. Next, what is this constant 10 in the select clause? I’m asking because in relational algebra you can’t just take a projection operator and substitute a column with an arbitrary expression. We may think of this expression being evaluated for each row, but a better idea is that we invoke some function. In our case it happen to be a function taking an empty argument list and returning a constant

function return10 return integer is
BEGIN
RETURN 10;
END return10;

Now the query transforms to

select return10() from dual

Next, every function is a relation (or predicate) that matches its input with output. Which predicate the function return10 corresponds to? Well, it must be deptno=10! Certainly function output is unnamed, so the predicate argument deptno can be anything. With the query output column being aliased

select return10() as deptno from dual

however, we can interpret row-wise function invocation as a join:

select deptno from dual natural join Dept10

Removing the redundant Dual we get

select deptno from Dept10

Once again, the Dept10 can be thought as a table which materializes the deptno = 10 predicate, and we established that there is unambiguous interpretation of the queries

select 10

and

select 10 as deptno

Advertisements

11 Responses to “What is the difference between predicate and table?”

  1. Chen Shapira Says:

    I think the reason people have trouble accepting the idea is that SQL language is designed to be friendly and hide the relation model.
    (Even if you ignore obvious deviations such as distinct and order by).

    Using the syntax “select * from t where x=1”, simply sounds like t is a big bucket that you look into and pick the rows that answer to a certain condition. All the predicates appear after “where”, so they must be different from a table that appears after “from”. Thats how SQL is taught.

    This does a very good job of hiding the obvious: that T is a relation, and so is “x=1” and that functions are really a specific case for relations, so they can be used as a table too.

    Anyway, good post, and I enjoyed the paper you linked to as well.


  2. One serious drawback with the proposal is that every innocent user typo in a column name is no longer caught by compiler. For example, the query

    select * from emp where enpno = 1000

    would successfully execute and produce result that user almost certainly didn’t expect!

  3. prodlife Says:

    Maybe its time that SQL will have parse warnings – correct syntax that looks suspicious, but will still run and give results.

    C developers live with that for almost 40 years now 🙂
    (non-sense syntax such as “if (x=y) ” gives unexpected results, and you are lucky if the compiler warns you)


  4. “if (x=y)” is not quite convincing example — java just fixed the problem…


  5. Here is IMO a satisfactory answer. Generally, in programming a variable is referenced in more than one place. For example, we define “int i;” then we use it somewhere. Likewise, the attribute “empno” is defined in the “emp” table and used in the predicate “empno=1000”. A standard SQL compiler would complain that the attribute “enpno” is not defined anywhere.

    I suggest more symmetric perspective. The attribute “empno” is referenced two places: in the “emp” table and the predicate “empno=1000”. Any attribute is required to be referenced in at least two places (otherwise, one can’t possibly utilize this attribute for any purpose). Thus “select x where x=10” is legitimate, and “select ename from emp where enpno=1000” is not.

  6. prodlife Says:

    what about type issues?
    For example:
    select x where x=10 and x=’hello’

  7. prodlife Says:

    well, maybe using “and” in the predicate was a bad example, since this would be false anyway…
    select x where x=10 or x=’hello’;

    what would be type of x, in this case? implicit cast of 10 to varchar would be consistent, but error prone.


  8. Is it different from conventional SQL?

    select 10 a from dual
    UNION
    select ‘hi’ from dual

    When asking what is the type of X in

    select x where x < 100

    I wondered if x is integer, rational, or real number data type. It it is integer, we are OK as the query generates a finite set of numbers. If it is rational, we are still sort of OK, because we can pipeline the result set, and the client may signal to stop the query based upon some other criteria. But if the datatype is real, we can’t possibly have the query to return a meaningful result!

  9. prodlife Says:

    If you think about set-theoretic side of SQL then real numbers may be a problem.
    But our limited computers don’t handle reals, just approximations of various precisions.
    So you can assume everything you work with is interpreted according to either default or user specified scale+precision, and happily pipeline the result away.
    The number of results may be very large, but finite.


  10. OK, let’s look little closer, what is a type, a set of values? Then, we can define the type of variable is by an unary predicate! The predicate in our case is modulo function:

    mod(x,2)=0 restricts x to even number
    mod(x,1)=0 says that x is integer type

    Here is amusing little query in traditional SQL:

    select * from (
    select level/10 i from dual
    connect by level < 100
    ) where mod(i,1/2)=0


  11. Isn’t there a concept of domain in SQL already (which nobody uses)? Therefore, my proposal may be enhanced with domain predicates such as

    STRING(x) — character strings (no varchar2 please:-)
    INTEGER(x) — more straightforward than equivalent mod(x,1)==0


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: