Need to sum columns: Nested sets in SQL

Who invented the nested sets algorithm for SQL? Was it Kamfonas or Celko?
The algorithms are so similar that clearly Celko must have copied from
Kamfonas, or Kamfonas from Celko .. or both from someone else.
(If you don't know what I'm on about, it's this:
http://www.dbpd.com/vault/9811/kamfn.shtml
versus this:
http://www.dbmsmag.com/9604d06.html )
Any ideas? ThanksAn earlier article by Celko on nested sets appeared in the March 1996:
http://www.dbmsmag.com/9603d06.html
whereas the Kamfonas article is dated 1998.
However, Joe Celko wrote (about nested sets model for hierarchies) "I
made it popular by filling in some holes, but the idea was not mine."
in the following message:
http://groups-beta.google.com/group...
517df0172eba8
Razvan|||Appeo Allkam wrote:
> http://www.dbpd.com/vault/9811/kamfn.shtml
I don't quite understand the following passage in Kamfonas article
<quote>
You may ask: "Why don't we use a hierarchical keyword to achieve the
same result?" For example, an IP address-like scheme enumerates the
nodes of a tree as 1,1.1,1.2,1.3,1.1.1, and so on. The problem with
this scheme is that you have to start all qualifications from the top.
Otherwise, you'll have to scan the whole index or table. With the L and
R enumeration however, you can anchor your search at any level, relate
any node to any other one, and still employ matching index scans.
</quote>
Any interpretations?|||Mikito Harakiri wrote:
> Appeo Allkam wrote:
> I don't quite understand the following passage in Kamfonas article
> <quote>
> You may ask: "Why don't we use a hierarchical keyword to achieve the
> same result?" For example, an IP address-like scheme enumerates the
> nodes of a tree as 1,1.1,1.2,1.3,1.1.1, and so on. The problem with
> this scheme is that you have to start all qualifications from the top.
> Otherwise, you'll have to scan the whole index or table. With the L and
> R enumeration however, you can anchor your search at any level, relate
> any node to any other one, and still employ matching index scans.
> </quote>
> Any interpretations?
Well, I'm reading further and the next paragraphs are partially true
and completely wrong:
<quote>
Descendent-s

ing queries, such as the one in Listing 1's Case 2, are
the most common and the most efficient. The optimizer will use a
matching index scan to find the qualifying D.L values that lie between
the A.L and A.R constants. With this query plan, the cost of
descendent- s

ing queries is one sweep through the index reading in a
number of contiguous pages proportional to the answer set's size. In
ancestor-s

ing queries, such as Listing 1's case 1, the between
predicate restricts a constant, D.L, between the two columns A.L and
A.R. A combined index on L (descending) and R (ascending) helps these
ancestor searches. The best plan we can expect for these queries is a
matching lookup of D.L on the combined index, and a scan to the end of
the index using index only access. Consequently, the average cost for
ancestor-s

ing queries is half an index-only scan.
</quote>
The cost estimation for descendant looking queries is correct. What is
the efficiency of ancestor search? My understanding is that with
combined index, or bitmap, or even spatial it still sucks.
With Matrialized Path/Nested Intervals you *calculate* the chain of
ancestors (doesn't really matter on the client, or server) and
construct a dynamic SQL query
select * from tree where path in ('1','1.2', 1.2.1')
which as a concatenation uf 3 index unique scans is extremely fast.
<quote>
The L and R method has a level of magnitude performance advantage over
any recursive SQL method, such as the SQL recursive union or Connect By
clause. The node enumeration captures the nodes' topological ordering
once, thus enabling transitive closure in one simple step, as opposed
to multiple invocations. Detecting whether two nodes have an
ancestor-descendent relationship normally requires the path traversal
from one node to the other. Using the L and R numbers, however, you can
test any two nodes using a simple between clause--without traversing
the graph. Because the L and R method doesn't need to traverse the
structure, more selective predicates may filter down the qualifying
rows before applying the ancestor-descendent qualification. In
recursive SQL, path traversal has to happen on unconstrained node sets,
postponing highly filtering predicates until after the paths are
traversed exhaustively.
</quote>
This is entirely wrong. Traversing adjacency list is fast. Each next
node is found by index unique scan. Multiple invokations are not evil,
as long as their calls are not exposed (over a network connection
between client and server). This is why it makes sence supporting
recursive SQL on server, instead of client querying hierarchy in
multiple dynamically generated nonrecursive SQL queries.|||Mikito Harakiri wrote:
> Appeo Allkam wrote:
> I don't quite understand the following passage in Kamfonas article
<quote>
The "prestored transitive closure" approach involves an intermediate
table that contains X's descendents, and a join to the org table. The
intermediate table is either temporary, generated every time you issue
a query, or permanent, containing the transitive closure of the
"reports to" relationship.
There are di

vantages to each of these solutions. The first actually
"walks the structure," requiring multiple requests that let you extract
children sets for each node you retrieve. The advantage of this
approach is that it doesn't introduce any update anomalies because you
don't maintain redundant data. The second solution uses set processing,
but it requires that you maintain a very large redundant table and deal
with the associated update anomalies. For example, a tree five layers
deep with a fan-out of 10 children per node has a total of 11,111 nodes
and 11,110 parent-child relationships. But there are more than
11,000,000 ancestor-descendent relationships in the transitive closure.
A single maintenance operation may affect more than one million of
these ancestor-descendent relationships. The cost of this approach
grows geometrically as the fanout and depth grow. This option is
undesirable only because of maintenance complications and the lack of
scalability.>
</quote>
Where this 11,000,000 number came from? Each of 11,111 nodes has no
more than 5 ancestors so that the size of transitive closure is
certainly less than 5*11,111.
Just for the record, the size of Materialized path/Nested
Intervals.Nested Sets encoding is about the same. It is true, there are
only 11,111 records, but each encoding grows in size with the number of
nodes in the tree increasing.
I guess I'll stop reading, unless convinced that there is a single not
entirelly wrong idea in this article.|||Razvan Socol wrote:
> An earlier article by Celko on nested sets appeared in the March 1996:
> http://www.dbmsmag.com/9603d06.html
> whereas the Kamfonas article is dated 1998.
>
I heard a presentation on the nested set structure back in 1995 by
two Norwegian guys, Leif Morten Kofoed and H=E5kon Erdal. It seemed
to me that they figured out the idea on their own. They used
it in a large system involving the Central Bank of Norway.
Lauri Pietarinen|||Mikito Harakiri wrote:
> This is entirely wrong. Traversing adjacency list is fast. Each next
> node is found by index unique scan. Multiple invokations are not evil,
> as long as their calls are not exposed (over a network connection
> between client and server). This is why it makes sence supporting
> recursive SQL on server, instead of client querying hierarchy in
> multiple dynamically generated nonrecursive SQL queries.
Isn't it the case that a recursive query that uses an index is
roughly comparable to an indexed nested loop join? That is to say,
it will perform quite well.
I'm still trying to wrap my head around recursive queries; they
are a fairly new thing to think about, and I don't have a good
model for how they are implemented. The 'recursive with' mentioned
recently makes me think it's going to build the whole recursively
defined set in advance of the select, but I expect that's probably
not right.
I completely agree with your conclusion. Client code executing
multiple queries over the network is bad in so many ways.
Marshall|||"Marshall Spight" <marshall.spight@.gmail.com> wrote in message
news:1122529011.244856.250850@.g43g2000cwa.googlegroups.com...

> I'm still trying to wrap my head around recursive queries; they
> are a fairly new thing to think about, and I don't have a good
> model for how they are implemented. The 'recursive with' mentioned
> recently makes me think it's going to build the whole recursively
> defined set in advance of the select, but I expect that's probably
> not right.
Take advantage of this moment of ignorance. It won't come again. Once you
wrap your head around the "how" your vision of the "what" will be more
cloudy than it is now.
One of the consistent failures we all make is to deal with the "what rather
than how".
Once you have a workable model of how recursive joins are implemented, that
model will begin
to displace the model you now have of what recursive joins really are.
This has happened to me several times in my long career. It begin to think
of the "what" stated in a program
as being shorthand for the "how" that I or a code generator might use to
carry it out. It's an illusion, albeit a useful one.|||David Cressey wrote:
> "Marshall Spight" <marshall.spight@.gmail.com> wrote in message
> news:1122529011.244856.250850@.g43g2000cwa.googlegroups.com...
>
> Take advantage of this moment of ignorance. It won't come again. Once yo
u
> wrap your head around the "how" your vision of the "what" will be more
> cloudy than it is now.
> One of the consistent failures we all make is to deal with the "what rathe
r
> than how".
Argh! This is plainly excellent advice. It's quite ironic for me to
get it, because these days I'm working with a lot of less experienced
engineers, and I'm always chiding them for applying all their mental
energy to implementation and nothing into thinking about interface
or model or concept or whatever.
How funny to receive the exact advice one is dispensing on a daily
basis, and how amusing, in a self-deprecating way, to realize that
one wasn't following one's own best practices.
Thanks! I hereby resolve to build a full conceptual model of recursive
queries before moving on to implementation.
Marshall|||"Mikito Harakiri" <mikharakiri_nospaum@.yahoo.com> wrote in message
news:1122488079.859136.9130@.o13g2000cwo.googlegroups.com...
> Appeo Allkam wrote:
> I don't quite understand the following passage in Kamfonas article
> <quote>
> You may ask: "Why don't we use a hierarchical keyword to achieve the
> same result?" For example, an IP address-like scheme enumerates the
> nodes of a tree as 1,1.1,1.2,1.3,1.1.1, and so on. The problem with
> this scheme is that you have to start all qualifications from the top.
> Otherwise, you'll have to scan the whole index or table. With the L and
> R enumeration however, you can anchor your search at any level, relate
> any node to any other one, and still employ matching index scans.
> </quote>
> Any interpretations?
>
It's not clear to me from your post whether you are interested in who
invented the nested sets method.
I don't know, but I can tell you this: years before I saw Joe Celko's
description of nested sets, I saw a magazine article entitled, "Taming the
dreaded hierarchy" by, I think, John Baugh. In this article he outlined a
method that's very similar to nested sets.

Wednesday, March 21, 2012

Nested sets in SQL - inventor?

No comments:

Post a Comment

Need to sum columns

Blog Archive

About Me