Nested Sets and Relational Division
May 5, 2007
Introduction to Nested Sets
There are many ways to perform hierarchical queries in SQL database, and the Nested Sets model is one of them. For our purpose, we’ll generalize it.
Consider the following tree example:
If we collect the leaf nodes at each tree branch into sets, we’ll get something like this:
Figure 2: Nested sets structure for the tree at fig. 1. Set elements are boxes, and sets are the ovals including them. Every parent set contains its children sets, for example the set {C,D} corresponding to the child node B is a subset of the set {C,D,E} corresponding to the parent node A.
This is the essence of the Nested Sets model: a tree node is a parent of another node whenever the corresponding sets are in the subset relation. One can invent many variations of the theme, for example, Nested Sets in Joe Celko’s interpretation are intervals of integers. Interval containment is much easier to check than subset relationship, which partially explains why you never heard of any other Nested Sets model than the one popularized by Joe. Yet, for the terminology sake, let’s brand our model as Generalized Nested Sets.
Adjacency Relation
Even though Nested Sets are pretty intuitive, it is the the Adjacency Relation that is crowned as the king of all the hierarchy models. Any hierarchy is a graph, and graphs are defined by its nodes and edges. The Adjacency Relation is just another term for the graph Edges. Recalling our example from figure 1, we have the following Adjacency Relation:
| Tail | Head |
| A | B |
| B | C |
| B | D |
| A | E |
This table looks deceptively simple, but querying it is not an easy matter! For example, a the grandparent of a node could be found with the following self-join query:
select a.head AS grandparent, b.tail AS grandchild
from Edges a, Edges b
where a.tail=b.head
Finding node’s great grand father is not a problem either. However, querying node’s ancestors to unlimited depth is not possible with basic SQL (which is standard Relational Algebra amended with aggregation). One way to get around the problem is transforming the Adjacency Relation to make it transitively and reflectively closed. Informally, we add the edges which are loops, for example A->A, and edges made of all adjacent edges joined together, for example, A->C which is a composition of A->B and B->C. Here is the result:
| Tail | Head |
| A | B |
| B | C |
| B | D |
| A | E |
| A | A |
| B | B |
| C | C |
| D | D |
| E | E |
| A | C |
| A | D |
A reader interested how to automatically maintain the transitive closure relation is referred to the work of Dong et.al. and/or chapter 6 of my book.
A typical query “find all the ancestors of node C” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is
select tail from T
where head = ‘C’
Transitive Adjacency Relation and Nested Sets
There is another perspective into Transitive Adjacency Relation. Let’s rename the Tail column into the Set, and the Head column into the Element. Let’s also rearrange the rows, and transform the Element column values into the lower case, in order to emphasize the set structure:
| Set | Elem |
| A | a |
| A | b |
| A | c |
| A | d |
| A | e |
| B | b |
| B | c |
| B | d |
| C | c |
| D | d |
| E | e |
If you haven’t already noticed a Nested Set structure hidden in the table above, here are those sets written in traditional notation:
A = {a,b,c,d,e}
B = {b,c,d}
C = {c}
D = {d}
E = {e}
We have mentioned that in the Generalized Nested Sets model we perform hierarchical query via set containment, which is relational division. “find all the ancestors of node B (including itself)” is done in 2 steps:
1. Find all the elements of the set at node B. This would be
select elem from T
where set = ‘B’
which evaluates into the following unary relation
| Elem |
| b |
| c |
| d |
It is natural to wonder if we are selecting and projecting onto the wrong columns. It appears as if we look for all the descendants of the node B! Hang on to step 2.
2. Divide the relation T(Set,Elem) by the relation obtained at step1, that is find all the sets that contain the set {b,c,d}. That would be the set {A,B}, which is the right answer.
Let’s summarize our finding:
Transitive Adjacency Relation divided by the node’s descendants produces node’s ancestors.
Symmetrically:
Transitive Adjacency Relation divided by the node’s ancestors produces node’s descendants.


June 23, 2008 at 9:05 pm
A typical query “find all the ancestors of node C” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is
select head from T
where tail = ‘C’
This will only select the one self-referential node. The believe that in order to find all the ancestors of C, the correct query would be:
select head from T
where head = ‘C’
Which would return {A,B,C}
June 24, 2008 at 7:48 pm
You are right as written the query looks for all the descendants (including the node itself) which should return one node only. To find ancestors we need to exchange the tail and head.
In your version of the query you select the same column as you placed constraint, sure the output would be
[C,C,C], and not [A,B,C]