## On Boolean datatype in SQL and beyond

### September 16, 2010

The question if `Boolean`

is a legitimate SQL datatype is the source of long debates; witness the 10 years long discussion at Ask Tom, for example. Here is my perspective.

To begin with, what is Boolean datatype, and why it is ubiquitous in each and every procedural programming language? Mathematically, depending how you look into it, Boolean objects form a ring or lattice. Then, any algebraic system prompts a datatype. We have integers (of finite and infinite precision), floating point numbers (as bastardized version of reals), alphanumeric strings, complex numbers, vectors, matrices, and so on. Boolean is just yet another type among many (and not the one that is terribly hard to implement, either).

However, relational databases are different. Relations are predicates and the simplest form of predicate is a proposition. For example, the assertion *“Employee ___ has been hired at ___”* is a binary predicate, while *“Employee SCOTT has been hired at 01/01/2010” *is a proposition. What values does a proposition have? Here, you have it: *True* and *False*. The relations whose values are *True* and *False* are sometimes known in database world under colorful names of `DEE`

and `DUM`

, but practitioner would have hard time finding anything of resemblance in his favorite RDBMS. The takeout lesson here is that if we allow propositional values of relational attributes then, perhaps, why don’t we allow predicates as well? This leads us to the old idea of *nested relations*.

Let’s get back to earth, and borrow an example from the above Ask Tom discussion. Consider the Boolean attribute `IS_CREDIT_WORTHY`

of the relation `CUSTOMERS`

. This is sloppy database design: it is obvious that for credit agency this Boolean attribute is calculated, while genuine customer data contains credit score. Given a relation `CUSTOMER`

with attribute `CREDIT_SCORE`

, or `CREDIT_LIMIT`

one can always make a view

`create view CREDIT_WORTHY_CUSTOMERS as`

select * from CUSTOMERS where CREDIT_LIMIT > 0

Therefore, I interpret all these pleas for Boolean datatype as *“I’m too lazy (or impatient) to do proper database design, please give us Boolean shortcut. (After all, you have stuffed RDBMS with other things that don’t fit there, like Blob, XML and Java!)”*. Customer is always right isn’t e?

And, finally, returning back to asktom discussion, what about [PL/SQL] function which returns a BOOLEAN: how such a function can be leveraged in a SQL query without BOOLEAN datatype available? Suppose this function takes two parameters DEPTNO and DNAME and returns TRUE if a department with such attributes exist. One may argue that whatever this function does, table function constructing a set of tuples with valid DEPTNO and DNAME can do as well. There is one important caveat, though: database relations are finite. Pipelined table functions can accommodate infinite relations, so any database query with some boolean function in the select operator (AKA SQL WHERE clause) can be expressed as an equivalent query against a view supported by pipelined function. A function which returns BOOLEAN can be viewed as an index over a relation specified by pipelined function. This perspective goes quite beyond SQL idiosyncrasies, which I touched in this article.

September 17, 2010 at 7:41 am

Probably unrelated, but I know at least two SQL functions that returns boolean… LNNVL and REGEXP_LIKE

Why is it not possible to do the same?

September 17, 2010 at 3:13 pm

Yes, one may ask why certain functions is allowed to be in the “where” but not “select” clause. Well, at the end of the day, no matter what technical merits are, the argument would always be “Look, all your competitors have this feature, just do it”.

September 17, 2010 at 4:06 pm

select nvl2(lnnvl(regexp_like(‘x’,’y’)),1,2) from dual

2

but this is abusing the language 🙂

September 29, 2010 at 12:04 pm

I can’t speak for SQL, but in relational database theory, the boolean type is fundamental. And its values are not the relations Dee and DUM. The expression A=B or A>B returns a scalar boolean value, not a relation. The fact that it does means that a scalar boolean type must be known to the system.

With respect to the above comments, in relational theory, the result of the where clause is more stringent than the select clause. What is the where clause but a boolean returning expression? Any expression in a where clause should be allowed in a select clause.

September 29, 2010 at 2:01 pm

The expressions A=B and A>B are binary predicates. In relational algebra the expression sigma(A>B, R(A,B,C)) is nothing more than join of finite ternary relation R(A,B,C) with predicate A>B. Therefore, relational database systems already have knowledge about predicates; and predicates are, again, more general than propositional values

trueandfalse.September 29, 2010 at 2:54 pm

Isn’t the point that we would want to be able to evaluate A>B without the A>B relation actually being materialised ?

I’m not sure I’m getting the point, but without boolean, how do you formulate the answer to (loosely) ‘(3,2) elm-of {A>B}’ ?

September 29, 2010 at 7:23 pm

In QBQL

`[lft rgt] 3 2 ^ LE';`

is perfectly legal even though

`LE`

(and its complement`LE'`

) is predicate and not [finite] relation. The evaluation strategy is described here.