## Manipulating Strings

Introduction

In the opening article about QBQL we discussed the foundation ideas, and tasted a flavor of QBQL programming on a basic example of string processing. The article conclusion raised the two major issues: how to join predicates, and how to program problems involving transitive closure. In this article we’ll cover more advanced string processing problems which require addressing the first concern — joining predicates. We’ll also cover more ground such as generalizing indexes beyond functional dependencies, and introducing two new operations: complement and set intersection join.

Operations

Let’s rehash the two fundamental QBQL operations: natural join and inner union. The join operation is ubiquitous, given the two relations

```Certified=[name  skill]
claire  java
max  java
max  sql
peter  html
peter  java
;
Requirements=[job  skill]
appl  java
appl  sql
teaching  java
web  html
;```

their join is

```Certified^Requirements;
-- QBQL output:
Certified^Requirements=[job  name  skill]
appl  claire  java
appl  max  java
appl  max  sql
appl  peter  java
teaching  claire  java
teaching  max  java
teaching  peter  java
web  peter  html
;```

In Tutotial D Date&Darwen introduced The New Relational Algebra (named promptly as A), where they argued that natural join is the logical `AND` operator. They go further to suggest a relational reincarnation of logical `OR`. Relational Lattice analog of logical `OR` is the inner union. The union of the `Certified` and `Requirements` relations is:

```Certified v Requirements;
-- QBQL output:
CertifiedvRequirements=[skill]
html
java
sql
;```

Inner union proves handy to simulate Relational Algebra projection:

```Certified v [name];
-- QBQL output:
Certifiedv[name]=[name]
claire
max
peter
;```

Either of these operations is symmetric, which implies that the above projection can also be written as `[name] v Certified`.

Next, we introduce two more operations. The first one is renaming, which renames a relation/predicate attribute. It is not fundamental operation: let’s demonstrate how to rename the `skill` attribute of the `Certified` relation into the `craft`. The critical idea is leveraging an equality predicate `[skill=craft]` which could informally be thought as an infinite relation:

```[skill craft] sql sql java java html html ... ... ```
Renaming the `skill` into the `craft` is joining the `Certified` relation with the equality predicate `[skill=craft]` and projecting away the redundant column:

```(Certified^[skill=craft]) v [name craft];
-- QBQL output:
(Certified^[skill=craft])v[namecraft]=[craft  name]
html  peter
java  claire
java  max
java  peter
sql  max
;```

Set containment join provides less cumbersome way to rename an attribute. Set joins, in general, (some of which are also known as relational division) have reputation of being pretty sophisticated topic. We’ll go into detailed exploration later, but for now all that is required to know is that set containment join `/^` is a natural join projected into the set of attributes, which is a symmetric difference of both operand headers, e.g:

```Certified /^ Requirements;
-- QBQL output:
Certified/^Requirements=[job  name]
appl  claire
appl  max
appl  peter
teaching  claire
teaching  max
teaching  peter
web  peter
;```

Therefore, set containment join takes care of eliminating the attribute which both relations are joined with, which establishes a succinct notation for renaming operation:

```Certified /^ [skill=craft];
-- QBQL output:
Certified/^[skill=craft]=[name  craft]
claire  java
max  java
max  sql
peter  html
peter  java
;```

Again, the join operation is symmetric so QBQL allows operands in any order. Renaming is also associative (while set joins, in general, aren’t) which makes convenient omitting brackets in chain of renaming.

The last operation in this section is complement. Generally, complement is unsafe, so that when applied to a finite relation within infinite domain it produces an infinite relation. Even for finite domain the result is domain dependent; this is why, relational theory introduced relative complement (incarnated as set difference, anti-join, etc). On the other hand, unary complement shares many properties with boolean negation, making it very attractive from algebraic perspective. This is the main reason why we decided to have unary complement operation, and use the quote mark in QBQL notation.

As we have noticed before, the result of applying complement to a relation depends on the context. If the Certified relation has being created during database initialization, then the output is an ordinary relation

```Certified';
-- QBQL output:
Certified'=[name  skill]
claire  html
claire  sql
joe  html
joe  java
joe  sql
max  html
peter  sql
;```

Admittedly, this interpretation is only useful for theoretic applications which check the properties of the complement operation. In normal usage (such as all the examples that we have seen so far), the output is expected to be a predicate

```Certified';
-- QBQL output:
Certified'=[name  skill]
...
;```

As before, the evaluation is lazy: this predicate can be joined with another relation so that the result becomes well defined. For example, the complement of the relation `[skill] java` is domain dependent, while the query asking all the `Requrements` that don’t involve `java` skill is domain independent:

```Requirements ^ ([skill] java)';
-- QBQL output:
Requirements^([skill]java)'=[job  skill]
appl  sql
web  html
;```

Complex predicates

In the opening article we sketched how to program basic predicates such as `Cat` — string concatenation. Unlike (finite) relations, predicates have to be specified via procedures. In many respects these procedures behave like table indexes which gives an excuse for name abuse.

Let’s refine the `Cat` indexes that we have sketched earlier:

```public class Cat {
public static NamedTuple source_from_prefix_postfix(
String source, int from
) {
String[] columns = new String[]{"prefix","postfix"};
Object[] data = new Object[]{
source.substring(0,from),
source.substring(from)};
return new NamedTuple(columns,data);
}
public static NamedTuple prefix_postfix_source_from(
String prefix, String postfix
) {
String[] columns = new String[]{"source","from"};
Object[] data = new Object[]{prefix+postfix,prefix.length()};
return new NamedTuple(columns,data);
}
public static Relation source_from_prefix_postfix( String source ) {
Relation ret = new Relation(
new String[]{"from","prefix","postfix"}
);
for( int i = 0; i <= source.length(); i++ ) {
Map<String,Object> content = new HashMap<String,Object>();
content.put("from", i);
content.put("prefix", source.substring(0,i));
content.put("postfix", source.substring(i));
ret.addTuple(content);
}
return ret;
}
}```

The first `source_from_prefix_postfix` method implements the `{source,from}->{prefix,postfix}` functional dependency. Most of the method complexity is in boilerplate code that boxes the output into a `NamedTuple` object. Java is not a relation-friendly language: it has positioned rather than named arguments and no facility of returning multiple named values.

Dual to the method of splitting a string into the parts is string concatenation that is reflected in the `{prefix,postfix}->{source,from}` functional dependency and implemented via the `prefix_postfix_source_from` method.

The last method `source_from_prefix_postfix` overrides the first method name but, it is not a functional dependency anymore. The `source` argument `Cat` predicate doesn’t determine the rest of the arguments uniquely; however, the number of tuples is finite! Therefore, we introduce the index `source_from_prefix_postfix` which consumes the `source` argument and produces a relation. As usual, it might be intuitive to exhibit QBQL output

```Cat ^  /^ [postfix=fragment]) does the split of the `prefix` attribute “produced” by the first `Cat`; this is why we matched the `prefix` attribute of the first `Cat` to the `source` attribute of the second `Cat` via renaming to some bogus attribute name `_1`. The rest of the renaming is to match the attributes to that of `Substr`. The final set join was favored over natural join because it eliminates the internal bind variable — the attribute `_1`.
Ok, now that the ground work is done, let’s run some queries:
Find all the substrings beginning at position 1 (including) and ending at 3 (excluding):
(Substr ^  Hello World ^ [from to] 1 3) v [fragment];
-- QBQL output:
(Substr^HelloWorld^[fromto]13)v[fragment]=[fragment]
el
or
;
Find all the occurrencies of the strings “o” and “l”:
(Substr ^  Hello World ^ [fragment] o l) v [from fragment];
-- QBQL output:
(Substr^HelloWorld^[fragment]"o"l)v[fromfragment]=[fragment  from]
l  2
l  3
o  1
o  4
;
Find all common substrings in “Hello” and “World”:
HelloFgmts = ([from=from1] /^ Substr /^  Hello) v [from1 fragment];
WorldFgmts = ((Substr /^  World) /^ [from=from2]) v [from2 fragment];
HelloFgmts ^ WorldFgmts ;
-- QBQL output:
HelloFgmts^WorldFgmts=[fragment  from1  from2]
0  0
0  1
0  2
0  3
0  4
0  5
1  0
1  1
1  2
1  3
1  4
1  5
2  0
2  1
2  2
2  3
2  4
2  5
3  0
3  1
3  2
3  3
3  4
3  5
4  0
4  1
4  2
4  3
4  4
4  5
5  0
5  1
5  2
5  3
5  4
5  5
l  2  3
l  3  3
o  4  1
;
The output looks not quite what one may expect. Apparently the query discovered empty words at each and every position in both strings! Here the complement operation becomes handy:
HelloFgmts = ([from=from1] /^ Substr /^  Hello) v [from1 fragment];
WorldFgmts = ((Substr /^  World) /^ [from=from2]) v [from2 fragment];
HelloFgmts ^ WorldFgmts ^ ([fragment]"")';
-- QBQL output:
HelloFgmts^WorldFgmts^([fragment]"")'=[fragment  from1  from2]
l  2  3
l  3  3
o  4  1
;
Perhaps the predicate `([fragment]"")'` should be pushed into the `Substr` predicate definition?
Substr1 = Substr ^ ([fragment]"")';
;
The final problem:
Substitute the substring “o” with the “***” everywhere in the string “Hello World”:

SubstrSrc = (Substr /^  "Hello World" /^ [fragment]o) v [prefix postfix];
(Substr /^ SubstrSrc /^ [fragment]"***") v  ;
-- QBQL output:
(Substr/^SubstrSrc/^[fragment]"***")v=
Hell*** World
Hello W***rld
;
Concluding remarks
Many modern programming languages feature Regular Expressions, while we sided with Fernando C. N. Pereira, who prompted logic perspective into parsing. Parallel between logic and parsing are not coincidental. Arguably, the two most important operators in logic are conjunction and disjunction. Incidentally, conjunction and disjunction are the two most prominent operations in grammar theory.
Let’s illuminate algebraic differences between them. Disjunction is essentially the same in both cases: it’s idempotent, symmetric, and associative. Conjunction in logic is genuinely dual to disjunction, so it retains these properties. However, in formal language theory, it no longer enjoys either symmetric, or idempotent property. Repetitive conjunction gives rise to the other prominent operator: transitive closure.
The reader may have noticed that in the final example we have failed to achieve the goal: substitute all the occurrences of the word. A procedural way to do this is to substitute the first occurrence of the word, them move on to the second occurrence, and so on. Transitive closure, again, is important feature that has to find its way into QBQL.
Continue to QBQL Programming, part3

__ATA.cmd.push(function() {
__ATA.initVideoSlot('atatags-370373-5b0b54cb2d0db', {
sectionId: '370373',
format: 'inread'
});
});

Advertisements

__ATA.cmd.push(function() {
__ATA.initSlot('atatags-26942-5b0b54cb2d11b',  {
collapseEmpty: 'before',
sectionId: '26942',
width: 300,
height: 250
});
});

__ATA.cmd.push(function() {
__ATA.initSlot('atatags-114160-5b0b54cb2d11e',  {
collapseEmpty: 'before',
sectionId: '114160',
width: 300,
height: 250
});
});

Rate this:Like this:Like Loading...					```