Errata

p.17
Not exactly a typo, but a comment from Laurent Schneider that grouping by the empty set can be explicitly expressed as
group by ()

p.19-21
Missing \cup and \cap symbols.

p.20
Missing symbols \alpha and β:

if x ≥ x0 then \alpha else β endif

p.28 and elsewhere
inner view should read as inline view

p.21
Missing symbols \alpha and β again:

if x ≥ x0 then \alpha else β endif = \alpha 1 x-x0 + β 1 x0-x

Also in the following text fragment:

Easy: the above formula for general step function should handle the case where α is an expression, rather than constant.

p.47.

Since the solution differs between various vendors, now it’s a good time to branch out this information

This was copyedited from

The solution differs between various vendors, so that we have to branch our story here.

which is IMO easier to read.

p.82
Missing symbol \gamma:

\gamma(x+1) = …

\gamma(6) = 59.589686421

p.95
Missing symbol \cap in the expression:

(A \cup B) \ (A \cap B)

p.147
Another typesetting issue of missing arrow:

∅ → {A,B,C}

X → Y

p.170 (brought to my attention by Gary Howard)
Typo, the root node should be

[2 -1]
[1 0]

and not

[1 -1]
[1 0]

p.172 (by David Brusowankin)

Consider the child node 1.3.2.5 which matrix encoding is

[107 -19]
[ ]
[62 -11]

and it’s parent 1.3.2

[19 -7]
[ ]
[11 -4]

so the child.a12 = -19 is the negative of the parent.a11 = 19
and the child.a22 = -11 is the negative of the parent.a21 = 11

The queries:

Find all the employees who report directly to Jones.

select child.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where parent.name = ‘Jones’
and child.a12 = parent.a11 and child.a22 = parent.a21

Who is Jones’ manager?

select parent.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where child.name = ‘Jones’
and child.a12 = parent.a11 and child.a22 = parent.a21

p.173
David Brusowankin and Steve Johnston challenged the algebraic step of division removal. Here is the detailed exposition:

We take

descendant.a11/descendant.a21 <=
(node.a11-node.a12)/(node.a21-node.a22)

and multiply both sides by descendant.a21*(node.a21-node.a22) to get

descendant.a11*(node.a21-node.a22) <=
(node.a11-node.a12)*descendant.a21

which is the same as

descendant.a11*node.a21-descendant.a11*node.a22 <=
node.a11*descendant.a21-node.a12*descendant.a21

At this point we have already reached our goal and removed the division. We can try to simplify more but Steve comment makes me suspect that the following steps have an error hidden somewhere.

Given that we already have

descendant.a11*node.a21 >= node.a11*descendant.a21

we deduce

-descendant.a11*node.a22 <= -node.a12*descendant.a21

which turns to

descendant.a11*node.a22 >= node.a12*descendant.a21

p.181
Steve Johnston noticed that the parent and child matrix entries are also confused in the following query:
select max(floor(a11/a12)) as N from MatrixTreeNodes
where a12 = :parent_a11 and a22 = :parent_a21

Then, there is a case when a node has no children, so the query should be amended as follows:
select case when count(1)=0 then 0 else max(floor(a11/a12)) end as N
from MatrixTreeNodes
where a11 = :parent_a12 and a21 = :parent_a22

Finally, we want to insert into position N+1, not N, because N is occupied.

p.193
Missing symbol \alpha:

(\alpha1,\alpha2,\alpha3,…,\alphak)

Figures for chapter 1

Fig1_1

Figure 1.1: Cartesian product of the set A = {2,3,4,6,9} by itself. Counting all the elements x that are no greater than y produces the sequence number of y in the set A.

image003.gif

Figure 1.2: Cartesian product of the set A = {2,3,4,6,9} by itself. The predecessor of y is the maximal number in a set of x that are less than y. There is no predecessor for y = 2.

image005.gif

Figure 1.3: Indicator functions 1A(x) and 1B(x).

image007.gif

Figure 1.4: 1A(x)+1B(x) is not an indicator function, because it is equal to 2 for any x∈A∩B . By inclusion-exclusion principle we have to subtract 1A(x)1B(x).

image009.gif

Figure 1.5: Step function f(x)=1x.

image011.gif

Figure 1.6: Endpoints of the covering interval are constructed from the endpoints of the first and last intervals in a chain of intervals with no gaps.

image013.gif

Figure 1.7: In the chain of intervals fst, int, lst the endpoint int.x is covered by the fst interval. Likewise, the fst.y is covered by the int.

image015.gif

Figure 1.8: The chain starting with the fst interval and ending with lst is not maximal, since the lst.y is covered by some other interval.

image017.gif

Figure 1.9: A chain of intervals beginning with fst1 and ending with lst1 has a gap. Therefore, there is an element lst2, such that lst2.y is not covered by any other interval.

image019.gif

Figure 1.10: Interval Packing. A set of integers 0,1,2,3,5,6,7,9 is packed into the intervals [0,3], [5,7], and [9,9].

Figures for chapter 5

fig_chap5_image031.gif

Figure 5.14: Nested sets structure for the tree in fig. 5.2a with set containment reversed. Now, a child set is required to contain its parent.

fig_chap5_image033.gif

Figure 5.15: A different view of the reverse nested sets structure for the tree in fig. 5.14. A set is indistinguishable from path to the root.

Advertisement

33 thoughts on “Errata

  1. I have some concerns about the section titled ‘Inserting Nodes’ in the book SQL Design Patters’ The sections starts on page 180.

    The first area of concern reads

    1. Query all the immediate children and find the oldest child.

    select max(floor(a11/a12)) as N from MatrixTreeNodes
    where a11 = :parent_a12
    and a21 = :parent_a22.

    I think this query should read:

    select max(floor(a11/a12)) as N from MatrixTreeNodes
    where a12 = :parent_a11
    and a22 = :parent_a21.

    After making this correction, the query should return N, the number of children of the parent node. This can also be called the position of the oldest child among all the other children of the parent node.

    Following the above query, the book states the following

    2. Insert the node at the n-th position:

    insert into MatrixTreeNodes(a11,a12,a21,a22) values
    (:parent_a11*(:N+1) – :parent_a12,
    :parent_a11,
    :parent_a21*(:N+1) – :parent_a22,
    :parent_a21);

    The insert statement implies that you carry the value of N directly down from the first query. This is not the case. You are not inserting into the nth position as it is already occupied. You are actually inserting into the N+1 position.

    Additionally, if there are no children for the parent node, the first query returns a null value.

    I would recommend the additional corrections to the first query:

    MSSQL:
    select isnull(max(floor(a11/a12)),0) as N from MatrixTreeNodes
    where a11 = :parent_a12
    and a21 = :parent_a22.

    ORACLE:
    select nvl(max(floor(a11/a12)),0) as N from MatrixTreeNodes
    where a11 = :parent_a12
    and a21 = :parent_a22.

    I would also recommend adding a statement that N needs to be incremented because it is referring to the current oldest node and not where the new node should be inserted.

    :newN = N + 1

    insert into MatrixTreeNodes(a11,a12,a21,a22) values
    (:parent_a11*(:newN+1) – :parent_a12,
    :parent_a11,
    :parent_a21*(:newN+1) – :parent_a22,
    :parent_a21);

  2. Now that I think about this, my comment

    “After making this correction, the query should return N, the number of children of the parent node.”

    is not correct. It does not account for deletions.

    I wonder if an enhancement needs to be considered to fill gaps when children are deleted.

  3. I am having some serious problems implementing Nested Intervals using matrix encoding based on whats in this book.

    Here is what I have found:

    After creating a table and insert procedure that follows the suggestions in this book I could not get the ‘query all descendants’ to work. After going through the numbers over and over again I went back to a query proposed in the book.

    select descendant.*
    from MatrixTreeNodes descendant, MatrixTreeNodes node
    where descendant.a11/descendant.a21 between node.a11/node.a21
    and (node.a11-node.a12)/(node.a21-node.a22)
    and node.name = … — predicate uniquely identifying a node

    It then states the following:
    Unfortunately, this query would produce a wrong result. None of the database vendors support the rational number datatype. The ratios of integers would be silently casted into float numbers with accompanying errors due to lack of precision. All the expressions must be rewritten with divisions within the means of integer arithmetic:

    The resulting query is this:


    select descendant.*
    from MatrixTreeNodes descendant, MatrixTreeNodes node
    where descendant.a11*node.a21 >= descendant.a21*node.a11
    and descendant.a11*node.a22 >= descendant.a21*node.a12
    and node.name = … — predicate identifying a node uniquely

    This does not work for what I have created.

    So, taking the first query, I pulled out the conditions and went through the process of moving things around to get rid of the division. I came up with the following. I had to switch the greater than with the less than.


    select dec.*
    from
    tblSiteLogonHierarchy dec,tblSiteLogonHierarchy node
    where dec.matrixUL*node.matrixLL ((node.matrixUL*dec.matrixLL) – (node.matrixUR*dec.matrixLL))
    and node.lSiteLogonID =-200
    order by 4

    matrixUL = UpperLeft (a11 from book)
    matrixLL = Lower Left (a21 from book)
    matrixUR = Upper Right (a12 from book)
    matrixLR = Lower Right (a22 from book)
    lSiteLogonID = unique identifier

    What am I missing here? After some very small scale experimintation, my revised query seems to work as I would expect.

    Does anyone have an opinion on this? Is the book wrong? Am I wrong?

  4. David Brusowankin says:

    Regarding “Relocating Tree Branches”, I don’t believe that the code on page 182 accounts for the case in which the new node root has existing children.

  5. If new root is not empty, sure there might be nodes with conflicting encoding. In terms of materialized path, suppose we move all the children of 1.3 into new location 1.7.2 Next, assume the node 5.4 is in the relocated branch. The new location for that node is encoded as 1.7.2.5.4 and we have no guarantee that it is not already occupied. So yes, the discussion should go deeper and introduce some policy how to insert children safely, but this is IMO easier problem than figuring out how to modify encodings when moving nodes.

  6. David Brusowankin says:

    On page 179: finding the parent of
    [7 -2]
    [ ]
    [4 -1]

    1 – mod (4, 1) = 1 – 0 = 1

    Not 0 so the root matrix is not obtained.

    Is the root a special case somehow ?

    Thanks,

    David

  7. Consider the matrix equation from the top of page 178 adapted to the matrix from your example

    [2..x]…[n.-1]…[7.-2]
    [….].*.[….].=.[….]
    [1..y]…[1..0]…[4.-1]

    (dots are used as spacers). Due to matrix multiplication law, the 3 unknowns x,y and n should satisfy the following equations:

    2n + x = 7
    n + y = 4

    It is a system of two equations with three variable that we are solving, and modulo function is just a handy tool. In the first equation the n is multiplied by a number greater than 1. There is no ambiguity, and we get a single integer solution n=4, x=-1. The second equation, however, admits two solutions n=4, y=0 and n=5, y=-1 and the modulo function finds the wrong one (that is the one inconsistent with the first equation)!

    To summarize, you are right, the modulo function is the robust means to find the parent matrix upper right element, but the lower right element is better calculated some other way. You can use linear equation as above, or even more simple, leverage the determinant constraint.

  8. David Brusowankin says:

    One can achieve much higher precision without overflow by use of DECIMAL (38,0). Still, this has its limits too.

  9. Yuriy Fuksenko says:

    Here it is again:

    on page 172,
    expression a11/a21 < ((a11-a12)/(a21-a22)) seems to be incorrect.
    I think you got boundaries switched.

    If you take examples from the book
    For 7/4,2/1 : 7/4 is more then 5/3 (7*3 > 5*4)
    For 107/62, 19/11: 107/62 > 88/51 (5457 > 5456)

  10. Correct. The expression at the very top of page 172

    a11 a22 – a12 a21

    is not the determinant but its negative.

    This bug convinces me that the decision to store the right matrix column as absolute values was wrong. (It was made under impression of referential constraint on page 169, but this constraint cant be enforced anyway). Databases are perfectly capable representing negative integers, so nothing was really gained except introducing confusion.

  11. Yuriy Fuksenko says:

    Well, but everything works once you go through it all. And on the bright side, those errors makke you really understand what is going on vs just copying code.

    By the way, your math on simplifying boundary expression in get descendants query is correct, the error Steve johnson had was due to this problem.

    Couple questions for you:
    1. How would you compare this method to using “connect by” or somethig like that.
    2. How would you sort that tree so that you have children under the node? I mean, I can use boundaries, but that involves division – is there another way of doing that.
    3. what is theoretical limit on tree depth/wideness
    4. What are your thoughts on this method scalability?
    5. In the very first chapter “Counting ordered rows”, why are you not using rownum?

  12. Yuriy Fuksenko says:

    Sorry, one more thing
    How would I limit amount of children returned to each node to some number, let say 10?
    Let me explain:
    I want all descendants of a given node, but no more than first 10 children on each level. since N used to compute node placement is not sequencial, I can not use it.

  13. 1. The table on page 250 summarizes various methods. It is slightly expanded version of this post:
    http://groups.google.com/group/comp.databases.theory/msg/7b772060322df739
    2. Division is rather simple and clean operation from mathematical perspective. It would be surprising if you find more elegant solution.
    3. Consider sequence of nodes 1, 1.1, 1.1.1, … The numbers in the matrix are Fibonacci sequence which grows approximately as powers of 1.6. There was a usenet exchange where somebody calculated limits with various integer precision assumptions, but I’m unable to find it:-(
    5. Rownum is a hack (albeit a very good one). It is not portable and is not a relational attribute — hence the odd behaviour e.g. select * from emp where rownum = 2.
    6. Build a temporary relation containing atomic matrices {.1,.2,.3,…,.10}. Select all the descendants and filter one oldest child in each set of siblings. Make a Cartesian product of the previous two relations and calculate a column the product of path matrices. Now you have a tree with 10 children at each level. Intersect it with the original relation to filter out the real nodes. (This method is not flawless. If you have gaps, e.g, {1.1, 1.7, 1.8, 1.9, 1.10, 1.11} then the node 1.11 would be missing in the result).

  14. Yuriy Fuksenko says:

    for #3 – I can’t figure it out. Can you tell me the level of the numbers? Let say I use LONG in Oracle for the matrix, or BIGINT in Sql Server, will it be able to handle couple hundreds levels with couple thousends children at each level? 20 millions leafs total?

    for #6 – I am trying to come up with some type of SQL solution to limit amount of nodes with the same (a12,a22) in the result – this will take care of the gaps too. Not much luck though yet.

  15. Yuriy Fuksenko says:

    Hi,
    may be this will be useful. The following query returns first available gap to insert a new child. The idea is to get a list of children, ordered by a11 and join it to itself pairing each a11 with predecessor (using techniqe from p.10 of your book). If the difference between numbers is bigger than a11 of the parent – you have a gap.

    I am not very proficient in SQL, so this is what I came up with, but I suspect a lot of space for improvement.

    select min(FLOOR(a11/a12)) from
    (select m.a11 , m.a12, p.a11 p,
    (select case when min(m2.a11) is NULL then m.a11 + p.a11 + 1 else min(m2.a11)end
    from LinkMatrixes m2
    where m2.a12=p.a11 and m2.a22=p.a21 and m2.a11 > m.a11) – m.a11 diff
    from LinkMatrixes m, LinkMatrixes p
    where m.a12=p.a11 and m.a22=p.a21
    and p.id=1) t
    WHERE diff > p

    Note, that 1 in this example is unique id of parent record.

  16. David Brusowankin says:

    Vadim,

    How could the matrix (or farey interval) encoding be reversed (inverted) so that I could query ancestors in the same way the published encoding queries descendants?

    Thanks,

    David

  17. Descendants are queried via interval containment query. You can query ancestors via interval containment as well, which is quite elegant method, but it is not the most efficient way to do it.

  18. Was there a followup to Steve Johnson’s entry on Oct 7, 2007? I ran into the same problems when querying for descendents. I couldn’t get the book’s query to work, and Steve’s query wouldn’t pass the sql checker. What is the official SQL for descendents of a node?

  19. select descendant.*
    from MatrixTreeNodes descendant, MatrixTreeNodes node
    where descendant.a11*node.a21 >= descendant.a21*node.a11
    and descendant.a11*node.a21-descendant.a11*node.a22 <=
    node.a11*descendant.a21-node.a12*descendant.a21
    and node.name = … — predicate uniquely identifying a node

  20. Yuriy Fuksenko says:

    Hi Vadim,
    Just to let you know we have implementation of the algorithm in production since April. Here is the product:
    http://digitalmedia.opentext.com/our-product/Media-Management-7.aspx
    I added a little twist to it to support multiple parents. By requirements it should support very wide tree may be 10 levels deep, up to 40 mil nodes overall, and I don’t see any problems with that. Implementation was tested on SQL Server and Oracle, but it is hibernate based, so it should easily support a bunch of different databases.

      1. ok…query seems to work now with the following clause for the lower bound:
        ….
        AND (descendant.a11 * (node.a21 – node.a22)) > ((node.a11 – node.a12) * descendant.a21)

        Pretty close to Vadim’s with the exception of the > operator.

  21. TRC(i,j) contains aggregated weight for all the paths between nodes i and j. One can manipulate sets of [alternative] paths in place of aggregated weights. Unfortunately, this is very inefficient, and is akin to working with aggregated weight numbers in base 1 system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s