On Boolean datatype in SQL and beyond

September 16, 2010

The question if Boolean is a legitimate SQL datatype is the source of long debates; witness the 10 years long discussion at Ask Tom, for example. Here is my perspective.

To begin with, what is Boolean datatype, and why it is ubiquitous in each and every procedural programming language? Mathematically, depending how you look into it, Boolean objects form a ring or lattice. Then, any algebraic system prompts a datatype. We have integers (of finite and infinite precision), floating point numbers (as bastardized version of reals), alphanumeric strings, complex numbers, vectors, matrices, and so on. Boolean is just yet another type among many (and not the one that is terribly hard to implement, either).

However, relational databases are different. Relations are predicates and the simplest form of predicate is a proposition. For example, the assertion “Employee ___ has been hired at ___” is a binary predicate, while “Employee SCOTT has been hired at 01/01/2010” is a proposition. What values does a proposition have? Here, you have it: True and False. The relations whose values are True and False are sometimes known in database world under colorful names of DEE and DUM, but practitioner would have hard time finding anything of resemblance in his favorite RDBMS. The takeout lesson here is that if we allow propositional values of relational attributes then, perhaps, why don’t we allow predicates as well? This leads us to the old idea of nested relations.

Let’s get back to earth, and borrow an example from the above Ask Tom discussion. Consider the Boolean attribute IS_CREDIT_WORTHY of the relation CUSTOMERS. This is sloppy database design: it is obvious that for credit agency this Boolean attribute is calculated, while genuine customer data contains credit score. Given a relation CUSTOMER with attribute CREDIT_SCORE, or CREDIT_LIMIT one can always make a view

select * from CUSTOMERS where CREDIT_LIMIT > 0

Therefore, I interpret all these pleas for Boolean datatype as “I’m too lazy (or impatient) to do proper database design, please give us Boolean shortcut. (After all, you have stuffed RDBMS with other things that don’t fit there, like Blob, XML and Java!)”. Customer is always right isn’t e?

And, finally, returning back to asktom discussion, what about [PL/SQL] function which returns a BOOLEAN: how such a function can be leveraged in a SQL query without BOOLEAN datatype available? Suppose this function takes two parameters DEPTNO and DNAME and returns TRUE if a department with such attributes exist. One may argue that whatever this function does, table function constructing a set of tuples with valid DEPTNO and DNAME can do as well. There is one important caveat, though: database relations are finite. Pipelined table functions can accommodate infinite relations, so any database query with some boolean function in the select operator (AKA SQL WHERE clause) can be expressed as an equivalent query against a view supported by pipelined function. A function which returns BOOLEAN can be viewed as an index over a relation specified by pipelined function. This perspective goes quite beyond SQL idiosyncrasies, which I touched in this article.


7 Responses to “On Boolean datatype in SQL and beyond”

  1. Probably unrelated, but I know at least two SQL functions that returns boolean… LNNVL and REGEXP_LIKE

    Why is it not possible to do the same?

  2. Yes, one may ask why certain functions is allowed to be in the “where” but not “select” clause. Well, at the end of the day, no matter what technical merits are, the argument would always be “Look, all your competitors have this feature, just do it”.

  3. select nvl2(lnnvl(regexp_like(‘x’,’y’)),1,2) from dual

    but this is abusing the language 🙂

  4. Paul Mansour Says:

    I can’t speak for SQL, but in relational database theory, the boolean type is fundamental. And its values are not the relations Dee and DUM. The expression A=B or A>B returns a scalar boolean value, not a relation. The fact that it does means that a scalar boolean type must be known to the system.

    With respect to the above comments, in relational theory, the result of the where clause is more stringent than the select clause. What is the where clause but a boolean returning expression? Any expression in a where clause should be allowed in a select clause.

    • The expressions A=B and A>B are binary predicates. In relational algebra the expression sigma(A>B, R(A,B,C)) is nothing more than join of finite ternary relation R(A,B,C) with predicate A>B. Therefore, relational database systems already have knowledge about predicates; and predicates are, again, more general than propositional values true and false.

      • Erwin Smout Says:

        Isn’t the point that we would want to be able to evaluate A>B without the A>B relation actually being materialised ?

        I’m not sure I’m getting the point, but without boolean, how do you formulate the answer to (loosely) ‘(3,2) elm-of {A>B}’ ?

      • In QBQL

        [lft rgt] 3 2 ^ LE';

        is perfectly legal even though LE (and its complement LE') is predicate and not [finite] relation. The evaluation strategy is described here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: