Introduction

In the opening article about QBQL we discussed the foundation ideas, and tasted a flavor of QBQL programming on a basic example of string processing. The article conclusion raised the two major issues: how to join predicates, and how to program problems involving transitive closure. In this article we’ll cover more advanced string processing problems which require addressing the first concern — joining predicates. We’ll also cover more ground such as generalizing indexes beyond functional dependencies, and introducing two new operations: complement and set intersection join.

Operations

Let’s rehash the two fundamental QBQL operations: natural join and inner union. The join operation is ubiquitous, given the two relations

Certified=[name  skill]
           claire  java
           max  java
           max  sql
           peter  html
           peter  java
;
Requirements=[job  skill]
              appl  java
              appl  sql
              teaching  java
              web  html
;

their join is

Certified^Requirements;
-- QBQL output:
Certified^Requirements=[job  name  skill]
                        appl  claire  java
                        appl  max  java
                        appl  max  sql
                        appl  peter  java
                        teaching  claire  java
                        teaching  max  java
                        teaching  peter  java
                        web  peter  html
;

In Tutotial D Date&Darwen introduced The New Relational Algebra (named promptly as A), where they argued that natural join is the logical AND operator. They go further to suggest a relational reincarnation of logical OR. Relational Lattice analog of logical OR is the inner union. The union of the Certified and Requirements relations is:

Certified v Requirements;
-- QBQL output:
CertifiedvRequirements=[skill]
                        html
                        java
                        sql
;

Inner union proves handy to simulate Relational Algebra projection:

Certified v [name];
-- QBQL output:
Certifiedv[name]=[name]
                  claire
                  max
                  peter
;

Either of these operations is symmetric, which implies that the above projection can also be written as [name] v Certified.

Next, we introduce two more operations. The first one is renaming, which renames a relation/predicate attribute. It is not fundamental operation: let’s demonstrate how to rename the skill attribute of the Certified relation into the craft. The critical idea is leveraging an equality predicate [skill=craft] which could informally be thought as an infinite relation:

[skill craft]
sql sql
java java
html html
... ...

Renaming the skill into the craft is joining the Certified relation with the equality predicate [skill=craft] and projecting away the redundant column:

(Certified^[skill=craft]) v [name craft];
-- QBQL output:
(Certified^[skill=craft])v[namecraft]=[craft  name]
                                       html  peter
                                       java  claire
                                       java  max
                                       java  peter
                                       sql  max
;

Set containment join provides less cumbersome way to rename an attribute. Set joins, in general, (some of which are also known as relational division) have reputation of being pretty sophisticated topic. We’ll go into detailed exploration later, but for now all that is required to know is that set containment join /^ is a natural join projected into the set of attributes, which is a symmetric difference of both operand headers, e.g:

Certified /^ Requirements;
-- QBQL output:
Certified/^Requirements=[job  name]
                         appl  claire
                         appl  max
                         appl  peter
                         teaching  claire
                         teaching  max
                         teaching  peter
                         web  peter
;

Therefore, set containment join takes care of eliminating the attribute which both relations are joined with, which establishes a succinct notation for renaming operation:

Certified /^ [skill=craft];
-- QBQL output:
Certified/^[skill=craft]=[name  craft]
                          claire  java
                          max  java
                          max  sql
                          peter  html
                          peter  java
;

Again, the join operation is symmetric so QBQL allows operands in any order. Renaming is also associative (while set joins, in general, aren’t) which makes convenient omitting brackets in chain of renaming.

The last operation in this section is complement. Generally, complement is unsafe, so that when applied to a finite relation within infinite domain it produces an infinite relation. Even for finite domain the result is domain dependent; this is why, relational theory introduced relative complement (incarnated as set difference, anti-join, etc). On the other hand, unary complement shares many properties with boolean negation, making it very attractive from algebraic perspective. This is the main reason why we decided to have unary complement operation, and use the quote mark in QBQL notation.

As we have noticed before, the result of applying complement to a relation depends on the context. If the Certified relation has being created during database initialization, then the output is an ordinary relation

Certified';
-- QBQL output:
Certified'=[name  skill]
            claire  html
            claire  sql
            joe  html
            joe  java
            joe  sql
            max  html
            peter  sql
;

Admittedly, this interpretation is only useful for theoretic applications which check the properties of the complement operation. In normal usage (such as all the examples that we have seen so far), the output is expected to be a predicate

Certified';
-- QBQL output:
Certified'=[name  skill]
            ...
;

As before, the evaluation is lazy: this predicate can be joined with another relation so that the result becomes well defined. For example, the complement of the relation [skill] java is domain dependent, while the query asking all the Requrements that don’t involve java skill is domain independent:

Requirements ^ ([skill] java)';
-- QBQL output:
Requirements^([skill]java)'=[job  skill]
                             appl  sql
                             web  html
;

Complex predicates

In the opening article we sketched how to program basic predicates such as Cat — string concatenation. Unlike (finite) relations, predicates have to be specified via procedures. In many respects these procedures behave like table indexes which gives an excuse for name abuse.

Let’s refine the Cat indexes that we have sketched earlier:

public class Cat {
    public static NamedTuple source_from_prefix_postfix( 
            String source, int from 
    ) {
        String[] columns = new String[]{"prefix","postfix"};
        Object[] data = new Object[]{
                source.substring(0,from),
                source.substring(from)};
        return new NamedTuple(columns,data);
    }
    public static NamedTuple prefix_postfix_source_from( 
            String prefix, String postfix 
    ) {
        String[] columns = new String[]{"source","from"};
        Object[] data = new Object[]{prefix+postfix,prefix.length()};
        return new NamedTuple(columns,data);
    }
    public static Relation source_from_prefix_postfix( String source ) {
        Relation ret = new Relation(
            new String[]{"from","prefix","postfix"}
        );
        for( int i = 0; i <= source.length(); i++ ) {
            Map<String,Object> content = new HashMap<String,Object>();
            content.put("from", i);
            content.put("prefix", source.substring(0,i));
            content.put("postfix", source.substring(i));
            ret.addTuple(content);
        }
        return ret;
    }  
}

The first source_from_prefix_postfix method implements the {source,from}->{prefix,postfix} functional dependency. Most of the method complexity is in boilerplate code that boxes the output into a NamedTuple object. Java is not a relation-friendly language: it has positioned rather than named arguments and no facility of returning multiple named values.

Dual to the method of splitting a string into the parts is string concatenation that is reflected in the {prefix,postfix}->{source,from} functional dependency and implemented via the prefix_postfix_source_from method.

The last method source_from_prefix_postfix overrides the first method name but, it is not a functional dependency anymore. The source argument Cat predicate doesn’t determine the rest of the arguments uniquely; however, the number of tuples is finite! Therefore, we introduce the index source_from_prefix_postfix which consumes the source argument and produces a relation. As usual, it might be intuitive to exhibit QBQL output

Cat ^  /^ [postfix=fragment]) does the split of the prefix attribute “produced” by the first Cat; this is why we matched the prefix attribute of the first Cat to the source attribute of the second Cat via renaming to some bogus attribute name _1. The rest of the renaming is to match the attributes to that of Substr. The final set join was favored over natural join because it eliminates the internal bind variable — the attribute _1.

Ok, now that the ground work is done, let’s run some queries:

Find all the substrings beginning at position 1 (including) and ending at 3 (excluding):

(Substr ^  Hello World ^ [from to] 1 3) v [fragment];
-- QBQL output:
(Substr^HelloWorld^[fromto]13)v[fragment]=[fragment]
                                                   el
                                                   or
;

Find all the occurrencies of the strings “o” and “l”:

(Substr ^  Hello World ^ [fragment] o l) v [from fragment];
-- QBQL output:
(Substr^HelloWorld^[fragment]"o"l)v[fromfragment]=[fragment  from]
                                                    l  2
                                                    l  3
                                                    o  1
                                                    o  4
;

Find all common substrings in “Hello” and “World”:

HelloFgmts = ([from=from1] /^ Substr /^  Hello) v [from1 fragment];
WorldFgmts = ((Substr /^  World) /^ [from=from2]) v [from2 fragment];
HelloFgmts ^ WorldFgmts ;
-- QBQL output:
HelloFgmts^WorldFgmts=[fragment  from1  from2]
                   0  0
                   0  1
                   0  2
                   0  3
                   0  4
                   0  5
                   1  0
                   1  1
                   1  2
                   1  3
                   1  4
                   1  5
                   2  0
                   2  1
                   2  2
                   2  3
                   2  4
                   2  5
                   3  0
                   3  1
                   3  2
                   3  3
                   3  4
                   3  5
                   4  0
                   4  1
                   4  2
                   4  3
                   4  4
                   4  5
                   5  0
                   5  1
                   5  2
                   5  3
                   5  4
                   5  5
                 l  2  3
                 l  3  3
                 o  4  1
;

The output looks not quite what one may expect. Apparently the query discovered empty words at each and every position in both strings! Here the complement operation becomes handy:

HelloFgmts = ([from=from1] /^ Substr /^  Hello) v [from1 fragment];
WorldFgmts = ((Substr /^  World) /^ [from=from2]) v [from2 fragment];
HelloFgmts ^ WorldFgmts ^ ([fragment]"")';
-- QBQL output:
HelloFgmts^WorldFgmts^([fragment]"")'=[fragment  from1  from2]
                                       l  2  3
                                       l  3  3
                                       o  4  1
;

Perhaps the predicate ([fragment]"")' should be pushed into the Substr predicate definition?

Substr1 = Substr ^ ([fragment]"")';
;

The final problem:

Substitute the substring “o” with the “***” everywhere in the string “Hello World”:

SubstrSrc = (Substr /^  "Hello World" /^ [fragment]o) v [prefix postfix];
(Substr /^ SubstrSrc /^ [fragment]"***") v  ;
-- QBQL output:
(Substr/^SubstrSrc/^[fragment]"***")v=
                                               Hell*** World
                                               Hello W***rld
;

Concluding remarks

Many modern programming languages feature Regular Expressions, while we sided with Fernando C. N. Pereira, who prompted logic perspective into parsing. Parallel between logic and parsing are not coincidental. Arguably, the two most important operators in logic are conjunction and disjunction. Incidentally, conjunction and disjunction are the two most prominent operations in grammar theory.

Let’s illuminate algebraic differences between them. Disjunction is essentially the same in both cases: it’s idempotent, symmetric, and associative. Conjunction in logic is genuinely dual to disjunction, so it retains these properties. However, in formal language theory, it no longer enjoys either symmetric, or idempotent property. Repetitive conjunction gives rise to the other prominent operator: transitive closure.

The reader may have noticed that in the final example we have failed to achieve the goal: substitute all the occurrences of the word. A procedural way to do this is to substitute the first occurrence of the word, them move on to the second occurrence, and so on. Transitive closure, again, is important feature that has to find its way into QBQL.

Continue to QBQL Programming, part3

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: