p.17
Not exactly a typo, but a comment from Laurent Schneider that grouping by the empty set can be explicitly expressed as
group by ()
p.19-21
Missing and
symbols.
p.20
Missing symbols and β:
if x ≥ x0 then else β endif
p.28 and elsewhere
inner view should read as inline view
p.21
Missing symbols and β again:
if x ≥ x0 then else β endif =
1 x-x0 + β 1 x0-x
Also in the following text fragment:
Easy: the above formula for general step function should handle the case where α is an expression, rather than constant.
p.47.
Since the solution differs between various vendors, now it’s a good time to branch out this information
This was copyedited from
The solution differs between various vendors, so that we have to branch our story here.
which is IMO easier to read.
p.82
Missing symbol :
(x+1) = …
(6) = 59.589686421
p.95
Missing symbol in the expression:
(A B) \ (A
B)
p.147
Another typesetting issue of missing arrow:
∅ → {A,B,C}
X → Y
p.170 (brought to my attention by Gary Howard)
Typo, the root node should be
[2 -1]
[1 0]
and not
[1 -1]
[1 0]
p.172 (by David Brusowankin)
Consider the child node 1.3.2.5 which matrix encoding is
[107 -19]
[ ]
[62 -11]
and it’s parent 1.3.2
[19 -7]
[ ]
[11 -4]
so the child.a12 = -19 is the negative of the parent.a11 = 19
and the child.a22 = -11 is the negative of the parent.a21 = 11
The queries:
Find all the employees who report directly to Jones.
select child.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where parent.name = ‘Jones’
and child.a12 = parent.a11 and child.a22 = parent.a21
Who is Jones’ manager?
select parent.name
from MatrixTreeNodes parent, MatrixTreeNodes child
where child.name = ‘Jones’
and child.a12 = parent.a11 and child.a22 = parent.a21
p.173
David Brusowankin and Steve Johnston challenged the algebraic step of division removal. Here is the detailed exposition:
We take
descendant.a11/descendant.a21 <=
(node.a11-node.a12)/(node.a21-node.a22)
and multiply both sides by descendant.a21*(node.a21-node.a22)
to get
descendant.a11*(node.a21-node.a22) <=
(node.a11-node.a12)*descendant.a21
which is the same as
descendant.a11*node.a21-descendant.a11*node.a22 <=
node.a11*descendant.a21-node.a12*descendant.a21
At this point we have already reached our goal and removed the division. We can try to simplify more but Steve comment makes me suspect that the following steps have an error hidden somewhere.
Given that we already have
descendant.a11*node.a21 >= node.a11*descendant.a21
we deduce
-descendant.a11*node.a22 <= -node.a12*descendant.a21
which turns to
descendant.a11*node.a22 >= node.a12*descendant.a21
p.181
Steve Johnston noticed that the parent and child matrix entries are also confused in the following query:
select max(floor(a11/a12)) as N from MatrixTreeNodes
where a12 = :parent_a11 and a22 = :parent_a21
Then, there is a case when a node has no children, so the query should be amended as follows:
select case when count(1)=0 then 0 else max(floor(a11/a12)) end as N
from MatrixTreeNodes
where a11 = :parent_a12 and a21 = :parent_a22
Finally, we want to insert into position N+1, not N, because N is occupied.
p.193
Missing symbol :
(1,
2,
3,…,
k)
Figures for chapter 1
Figure 1.1: Cartesian product of the set A = {2,3,4,6,9} by itself. Counting all the elements x that are no greater than y produces the sequence number of y in the set A.
Figure 1.2: Cartesian product of the set A = {2,3,4,6,9} by itself. The predecessor of y is the maximal number in a set of x that are less than y. There is no predecessor for y = 2.
Figure 1.3: Indicator functions 1A(x) and 1B(x).
Figure 1.4: 1A(x)+1B(x) is not an indicator function, because it is equal to 2 for any x∈A∩B . By inclusion-exclusion principle we have to subtract 1A(x)1B(x).
Figure 1.5: Step function f(x)=1x.
Figure 1.6: Endpoints of the covering interval are constructed from the endpoints of the first and last intervals in a chain of intervals with no gaps.
Figure 1.7: In the chain of intervals fst, int, lst the endpoint int.x is covered by the fst interval. Likewise, the fst.y is covered by the int.
Figure 1.8: The chain starting with the fst interval and ending with lst is not maximal, since the lst.y is covered by some other interval.
Figure 1.9: A chain of intervals beginning with fst1 and ending with lst1 has a gap. Therefore, there is an element lst2, such that lst2.y is not covered by any other interval.
Figure 1.10: Interval Packing. A set of integers 0,1,2,3,5,6,7,9 is packed into the intervals [0,3], [5,7], and [9,9].
Figures for chapter 5
Figure 5.14: Nested sets structure for the tree in fig. 5.2a with set containment reversed. Now, a child set is required to contain its parent.
Figure 5.15: A different view of the reverse nested sets structure for the tree in fig. 5.14. A set is indistinguishable from path to the root.
there are two figures 6.15
I have some concerns about the section titled ‘Inserting Nodes’ in the book SQL Design Patters’ The sections starts on page 180.
The first area of concern reads
1. Query all the immediate children and find the oldest child.
select max(floor(a11/a12)) as N from MatrixTreeNodes
where a11 = :parent_a12
and a21 = :parent_a22.
I think this query should read:
select max(floor(a11/a12)) as N from MatrixTreeNodes
where a12 = :parent_a11
and a22 = :parent_a21.
After making this correction, the query should return N, the number of children of the parent node. This can also be called the position of the oldest child among all the other children of the parent node.
Following the above query, the book states the following
2. Insert the node at the n-th position:
insert into MatrixTreeNodes(a11,a12,a21,a22) values
(:parent_a11*(:N+1) – :parent_a12,
:parent_a11,
:parent_a21*(:N+1) – :parent_a22,
:parent_a21);
The insert statement implies that you carry the value of N directly down from the first query. This is not the case. You are not inserting into the nth position as it is already occupied. You are actually inserting into the N+1 position.
Additionally, if there are no children for the parent node, the first query returns a null value.
I would recommend the additional corrections to the first query:
MSSQL:
select isnull(max(floor(a11/a12)),0) as N from MatrixTreeNodes
where a11 = :parent_a12
and a21 = :parent_a22.
ORACLE:
select nvl(max(floor(a11/a12)),0) as N from MatrixTreeNodes
where a11 = :parent_a12
and a21 = :parent_a22.
I would also recommend adding a statement that N needs to be incremented because it is referring to the current oldest node and not where the new node should be inserted.
:newN = N + 1
insert into MatrixTreeNodes(a11,a12,a21,a22) values
(:parent_a11*(:newN+1) – :parent_a12,
:parent_a11,
:parent_a21*(:newN+1) – :parent_a22,
:parent_a21);
Now that I think about this, my comment
“After making this correction, the query should return N, the number of children of the parent node.”
is not correct. It does not account for deletions.
I wonder if an enhancement needs to be considered to fill gaps when children are deleted.
I am having some serious problems implementing Nested Intervals using matrix encoding based on whats in this book.
Here is what I have found:
After creating a table and insert procedure that follows the suggestions in this book I could not get the ‘query all descendants’ to work. After going through the numbers over and over again I went back to a query proposed in the book.
select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node
where descendant.a11/descendant.a21 between node.a11/node.a21
and (node.a11-node.a12)/(node.a21-node.a22)
and node.name = … — predicate uniquely identifying a node
It then states the following:
Unfortunately, this query would produce a wrong result. None of the database vendors support the rational number datatype. The ratios of integers would be silently casted into float numbers with accompanying errors due to lack of precision. All the expressions must be rewritten with divisions within the means of integer arithmetic:
The resulting query is this:
select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node
where descendant.a11*node.a21 >= descendant.a21*node.a11
and descendant.a11*node.a22 >= descendant.a21*node.a12
and node.name = … — predicate identifying a node uniquely
This does not work for what I have created.
So, taking the first query, I pulled out the conditions and went through the process of moving things around to get rid of the division. I came up with the following. I had to switch the greater than with the less than.
select dec.*
from
tblSiteLogonHierarchy dec,tblSiteLogonHierarchy node
where dec.matrixUL*node.matrixLL ((node.matrixUL*dec.matrixLL) – (node.matrixUR*dec.matrixLL))
and node.lSiteLogonID =-200
order by 4
matrixUL = UpperLeft (a11 from book)
matrixLL = Lower Left (a21 from book)
matrixUR = Upper Right (a12 from book)
matrixLR = Lower Right (a22 from book)
lSiteLogonID = unique identifier
What am I missing here? After some very small scale experimintation, my revised query seems to work as I would expect.
Does anyone have an opinion on this? Is the book wrong? Am I wrong?
Vadim, I dont suppose you happen to stop by here to examine this at all? Would love to hear your input on this.
Updated the errata, sorry for late reply
Regarding “Relocating Tree Branches”, I don’t believe that the code on page 182 accounts for the case in which the new node root has existing children.
If new root is not empty, sure there might be nodes with conflicting encoding. In terms of materialized path, suppose we move all the children of 1.3 into new location 1.7.2 Next, assume the node 5.4 is in the relocated branch. The new location for that node is encoded as 1.7.2.5.4 and we have no guarantee that it is not already occupied. So yes, the discussion should go deeper and introduce some policy how to insert children safely, but this is IMO easier problem than figuring out how to modify encodings when moving nodes.
On page 179: finding the parent of
[7 -2]
[ ]
[4 -1]
1 – mod (4, 1) = 1 – 0 = 1
Not 0 so the root matrix is not obtained.
Is the root a special case somehow ?
Thanks,
David
Consider the matrix equation from the top of page 178 adapted to the matrix from your example
[2..x]…[n.-1]…[7.-2]
[….].*.[….].=.[….]
[1..y]…[1..0]…[4.-1]
(dots are used as spacers). Due to matrix multiplication law, the 3 unknowns x,y and n should satisfy the following equations:
2n + x = 7
n + y = 4
It is a system of two equations with three variable that we are solving, and modulo function is just a handy tool. In the first equation the n is multiplied by a number greater than 1. There is no ambiguity, and we get a single integer solution n=4, x=-1. The second equation, however, admits two solutions n=4, y=0 and n=5, y=-1 and the modulo function finds the wrong one (that is the one inconsistent with the first equation)!
To summarize, you are right, the modulo function is the robust means to find the parent matrix upper right element, but the lower right element is better calculated some other way. You can use linear equation as above, or even more simple, leverage the determinant constraint.
One can achieve much higher precision without overflow by use of DECIMAL (38,0). Still, this has its limits too.
Here it is again:
on page 172,
expression a11/a21 < ((a11-a12)/(a21-a22)) seems to be incorrect.
I think you got boundaries switched.
If you take examples from the book
For 7/4,2/1 : 7/4 is more then 5/3 (7*3 > 5*4)
For 107/62, 19/11: 107/62 > 88/51 (5457 > 5456)
Correct. The expression at the very top of page 172
a11 a22 – a12 a21
is not the determinant but its negative.
This bug convinces me that the decision to store the right matrix column as absolute values was wrong. (It was made under impression of referential constraint on page 169, but this constraint cant be enforced anyway). Databases are perfectly capable representing negative integers, so nothing was really gained except introducing confusion.
Well, but everything works once you go through it all. And on the bright side, those errors makke you really understand what is going on vs just copying code.
By the way, your math on simplifying boundary expression in get descendants query is correct, the error Steve johnson had was due to this problem.
Couple questions for you:
1. How would you compare this method to using “connect by” or somethig like that.
2. How would you sort that tree so that you have children under the node? I mean, I can use boundaries, but that involves division – is there another way of doing that.
3. what is theoretical limit on tree depth/wideness
4. What are your thoughts on this method scalability?
5. In the very first chapter “Counting ordered rows”, why are you not using rownum?
Sorry, one more thing
How would I limit amount of children returned to each node to some number, let say 10?
Let me explain:
I want all descendants of a given node, but no more than first 10 children on each level. since N used to compute node placement is not sequencial, I can not use it.
1. The table on page 250 summarizes various methods. It is slightly expanded version of this post:
http://groups.google.com/group/comp.databases.theory/msg/7b772060322df739
2. Division is rather simple and clean operation from mathematical perspective. It would be surprising if you find more elegant solution.
3. Consider sequence of nodes 1, 1.1, 1.1.1, … The numbers in the matrix are Fibonacci sequence which grows approximately as powers of 1.6. There was a usenet exchange where somebody calculated limits with various integer precision assumptions, but I’m unable to find it:-(
5. Rownum is a hack (albeit a very good one). It is not portable and is not a relational attribute — hence the odd behaviour e.g. select * from emp where rownum = 2.
6. Build a temporary relation containing atomic matrices {.1,.2,.3,…,.10}. Select all the descendants and filter one oldest child in each set of siblings. Make a Cartesian product of the previous two relations and calculate a column the product of path matrices. Now you have a tree with 10 children at each level. Intersect it with the original relation to filter out the real nodes. (This method is not flawless. If you have gaps, e.g, {1.1, 1.7, 1.8, 1.9, 1.10, 1.11} then the node 1.11 would be missing in the result).
for #3 – I can’t figure it out. Can you tell me the level of the numbers? Let say I use LONG in Oracle for the matrix, or BIGINT in Sql Server, will it be able to handle couple hundreds levels with couple thousends children at each level? 20 millions leafs total?
for #6 – I am trying to come up with some type of SQL solution to limit amount of nodes with the same (a12,a22) in the result – this will take care of the gaps too. Not much luck though yet.
https://vadimtropashko.wordpress.com/2008/07/12/matrix-encoding-growth/
Hi,
may be this will be useful. The following query returns first available gap to insert a new child. The idea is to get a list of children, ordered by a11 and join it to itself pairing each a11 with predecessor (using techniqe from p.10 of your book). If the difference between numbers is bigger than a11 of the parent – you have a gap.
I am not very proficient in SQL, so this is what I came up with, but I suspect a lot of space for improvement.
select min(FLOOR(a11/a12)) from
(select m.a11 , m.a12, p.a11 p,
(select case when min(m2.a11) is NULL then m.a11 + p.a11 + 1 else min(m2.a11)end
from LinkMatrixes m2
where m2.a12=p.a11 and m2.a22=p.a21 and m2.a11 > m.a11) – m.a11 diff
from LinkMatrixes m, LinkMatrixes p
where m.a12=p.a11 and m.a22=p.a21
and p.id=1) t
WHERE diff > p
Note, that 1 in this example is unique id of parent record.
Vadim,
How could the matrix (or farey interval) encoding be reversed (inverted) so that I could query ancestors in the same way the published encoding queries descendants?
Thanks,
David
Descendants are queried via interval containment query. You can query ancestors via interval containment as well, which is quite elegant method, but it is not the most efficient way to do it.
Was there a followup to Steve Johnson’s entry on Oct 7, 2007? I ran into the same problems when querying for descendents. I couldn’t get the book’s query to work, and Steve’s query wouldn’t pass the sql checker. What is the official SQL for descendents of a node?
select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node
where descendant.a11*node.a21 >= descendant.a21*node.a11
and descendant.a11*node.a21-descendant.a11*node.a22 <=
node.a11*descendant.a21-node.a12*descendant.a21
and node.name = … — predicate uniquely identifying a node
Vadim,
are there copyrights/patents on the algorithm?
Algorithms are not patentable:-)
Hi Vadim,
Just to let you know we have implementation of the algorithm in production since April. Here is the product:
http://digitalmedia.opentext.com/our-product/Media-Management-7.aspx
I added a little twist to it to support multiple parents. By requirements it should support very wide tree may be 10 levels deep, up to 40 mil nodes overall, and I don’t see any problems with that. Implementation was tested on SQL Server and Oracle, but it is hibernate based, so it should easily support a bunch of different databases.
where can one find the solutions for the questions asked in your book? Is it anywhere?
There is not. Please don’t hesitate to email me (frstname+tro@gmail.com) about specific exercise you found challenging.
Tried all the variants listed in the errata and the comments to get the all descendants query to work, but to no avail ( i’m using mySQL). Eventually, I came up with a solution that corrects for the lower bound test, by finding the a11 and a21 of the first child of the node…seems sound:
https://snipt.net/dimumurray/hierarchical-models-nested-intervals-using-matrix-encodings/
Nope…doesn’t quite work…
ok…query seems to work now with the following clause for the lower bound:
….
AND (descendant.a11 * (node.a21 – node.a22)) > ((node.a11 – node.a12) * descendant.a21)
Pretty close to Vadim’s with the exception of the > operator.
Regarding the Matrix solution (TRC) to graphs, is there a reasonably efficient way to return all of the paths for any entry TRC (i,j) ?
TRC(i,j) contains aggregated weight for all the paths between nodes i and j. One can manipulate sets of [alternative] paths in place of aggregated weights. Unfortunately, this is very inefficient, and is akin to working with aggregated weight numbers in base 1 system.