I'm trying to implement a tree like structure using a Materialized Path model described here: http://www.dbazine.com/oracle/or-articles/tropashko4.
Is it possible to enforce referential integrity on the [path] field? I don't see how SQL could do it, do I have to do it manually in the DAL?
Yes, you have to enforce data integrity yourself in the DAL when you use either Materialized Path or Nested Sets solutions for hierarchical data.
Adjacency List supports referential integrity, and this is true also for a design I call "Closure Table" (Tropashko calls this design "transitive closure relation").
"Materialized path" as presented by Vadim Tropashko in that article, introduces the notion of order into a relation ("Jones is the second member".).
"Materialized path" is nothing but "some form of materialized view" on the transitive closure, and therefore suffers all and exactly the same problems as any other "materialized view", except that matters are algorithmically worse precisely because of the involvement of a closure.
SQL is almost completely powerless when constraints-on-a-closure are in play. (Meaning : yes, SQL requires you to do everything yourself.) It's one of those areas where the RM shows the maximum of its almost unlimited power, but SQL fails abysmally, and where it is such a shame that the majority of people mistake SQL for being relational.
(#Bill Karwin : I'd like to be able to give you +1 for your remark on the relation between the depth of the trees and the result on performance. There are no known algorithms to compute closures that perform well in the case of trees with "crazy" depths. It's an algorithmic problem, not an SQL nor a relational one.)
EDIT
Yes, RM = Relational Model
In the materialized path model you can use arbitrary strings (maybe unicode strings to allow more than 256 children) instead of special strings of form "x.y.z". The id of the parent is then the id of the direct children with the last character removed. You can easily enforce this with a check constraint like (my example works in PostgreSQL)
check(parent_id = substring(id from 1 for char_length(id)-1)),
within your create table command. If you insist on strings of form "x.y.z", you'll have to play around with regular expressions, but I'd guess it's possible to find a corresponding check constraint.
Yes, we can "enforce referential integrity on the [path] field". I did that a couple of years ago, described here:
Store your configuration settings as a hierarchy in a database
Related
In about every SQL-based database application I have worked on so far, sooner or later the following three-faceted requirement has popped up:
There is some entity, linked in a hierarchical fashion (i.e. the tuples form a tree structure).
Users must be able to define any number of custom attributes with values for the tuples, and these values are inherited/overridden towards the leaves of the tree structure. ("Dumb" attributes usually suffice. That is, no uniqueness constraints, no foreign keys, only one value per attribute, ...)
Users must be able to run arbitrary queries on this data (i.e. custom boolean expressions, based upon filters for the values of the user-defined attributes that are linked with AND/OR).
Storing the data, roughly matching the first two bullets above, is quite straightforward:
The hierarchy is built up by giving the respective table a parent column. This column will be null for root nodes, and a pointer to the ID of the parent node for all other nodes.
The user-defined attributes are stored according to the entity-attribute-value pattern.
While there are numerous resources that suggest to use a different approach especially in the latter point (e.g. answers here, here, or here), I have not usually been in a position to move away from a traditional static relational database schema. Hence, let's simply assume the above as a given. Also, hardly ever could I rely on the specifics of a particular DBMS; the more usual case was systems that were supposed to work with MS SQL Server, Oracle, and possibly others as backends without requiring two significantly different product versions.
Solving the third item, however, is always problematic (even without considering the hierarchical inheritance of attribute values). The number of joins depends on the different number of attributes considered in the boolean expression. Alternatively, the number of joins can somewhat be reduced by determining the maximum number of distinct attributes considered in any case of the custom boolean expression, which may save joins, but makes the resulting queries and the code used to generate them even less intelligible and maintainable. For instance,
a = 5 or (b = 8 and c = 9)
could do with 2 joins to the attribute-value table.
I have always been able to do this "somehow", but as this appears to be a fairly ubiquitous situation, I am looking for the "canonical" way to generate SQL queries in this situation. Is there a "standard pattern" to follow here?
Careful not to fall prey to the inner platform effect. It is a complicated problem, and SQL itself is designed to handle the complexities. Generate DDL to add and remove columns as needed, and generate simple select statements for queries. Store each Tuple Type (distinct set of attributes) as a table.
With regards to inheritance, I recommend handling it in the application or DAL, and only storing the non-inherited values. On retrieval, read all parent rows to calculate the functional values. If you do need to access "functional" values from SQL, use an indexed view or triggers to maintain them separate from storage.
Hierarchies can be represented as you describe, but a simple "Parent" column can make it difficult to query beyond a single level. Look at hierarchyid on SQL Server or CONNECT BY on oracle.
Avoiding EAV stores allows you to:
Use indexes and statistics where needed
Keep efficient storage (ints stored as ints, money stored as money)
Keep understandable queries (SELECT * FROM vwProducts WHERE Color = 'RED' ORDER BY Price ASC)
If you want an EAV system because you have too many attributes (>1024 per type) or they are not somewhat statically defined (many changes per hour), I would avoid using a relational database in the first place. Use an EAV (NoSQL) database server instead.
tl;dr: If you have a schema, use DDL to tell the server about it. If you don't, use a more appropriate server.
Does anyone have any good references to read regarding whether PostgreSQL conforms to all 12 of Codd's rules?
If not, are there any veteran PostgreSQL users that have an opinion on this topic?
Thank you.
PostGreSQL do not comply to the 12 rules. Rules that PG is unnable to respects are :
2 - The information rule : because you can create table without a PRIMARY KEY. But most of the RDBMS (Oracle, DB2, SQL Server... ) provides tables without PKs, except SQL Azure.
3 - Systematic treatment of null values : the one and only marker that is accepted for unknown or inapplicable is the NULL one.
PostGreSQL add some extra marker like infinity or -infinity (other RDBMS like SQL Server, does not use such ugly thing that cannot combine in algebraic operations nor functions...)
6 - The view updating rule : PostGreSQL does not accepts INSERT, UPDATE or DELETE in view containing JOINs. Most professionnal RDBMS accept this (Oracle, DB2, SQL Server... ) but limits it to one table in the view as a general standard rule of SQL about INSERTs/UPDATEs/DELETEs.
7 - Relational Operations Rule (for high-level insert, update, and delete) : PostGreSQL is unnable to UPDATE properly keys (PRIMARY or UNIQUE). This because, when updating a batch of rows, PostGreSQL acts row by row and checks the constraint each time it has updated the row and not like all other RDBMS do once all rows have been updated. The theory of relational databases being based on the fact that updates must be set-based and not row by row... Of course there is a possible ugly "rustine" for PostGreSQL (deferrable constraints...) but this affects some other logics.
11 - Distribution independence. When partitionning a table or an index (and more generally for any storage management operation) there must be no logical aspect likely to affect applications... More generally, Codd's theory is based on the fact that logical objects and logical processing must in no case be dependent to physical objects and vice versa. Physical/logical separation is essential and all major RDBMSs use it (Oracle, SQL Server, IBM DB2), but PostGreSQL requires the creation of additional tables to be able to divide a table into partitions...
Of course RDBMS that respects all those rule are very rare. I have work with about 20 differents RDBMS, but the only one I think thats respects plainely all the 12 Codd's rules is Microsoft Azure SQL
My answer is based on other answers via Google.
In 1986 was created a first ANSI standard of the SQL language. The last ANSI standard is ANSI SQL:2003. It's requirements are nevertheless implemented only in few RDBMSs. So far the most widespread RDBMSs respect ANSI SQL:1999, respectively older ANSI SQL:1992. PostgreSQL supports completely ANSI SQL:1999 and partly ANSI:2003.
If strictly speaking Codd's rules:
Codd's 12 rules aren't all that is to the relational
model. In fact, he expanded these 12 to 40 rules in his 1.990 book
_The_Relational_Model_for_Database_Management.
But furthermore, if you care to read Christopher J Date's
1.999 _An_Introduction_to_Database_Systems_ you will see that the
relational model comprises some basic elements and some principles.
The basic element is the domain, or data type. PostgreSQL
does not really enforces domains because it accepts NULL, which by
definition is not part of any domain. Thus the triplet domain, name
and value called attribute breaks down, and so the tuple -- because it
represents a proposition, and a proposition with missing information
is another proposition, not the one declared in the relation's header
--, and so also the relation breaks down.
Furthermore, the relation is a set, not a bag. A bag
accepts duplicates, but not a relation. So because PostgreSQL does
not enforce the necessity of declaring a candidate key for each and
every table, its tables are not necessarily relations, but quite
possibly and commonly bags of not tuples as shown above, but simply
rows.
Also, the first principle is The Information Principle, where
all the database must be represented by data. Object IDs violate
this, with serious consequences about data independence, which by the
way is necessary to another relational model sine qua non, namely the
separation between user, logical and physical schemas. Also not
properly supported by PostgreSQL.
I've been doing the relational database thing for years now, but lately have moved into Cassandra/Redis territory. NoSQL makes sense for what we're doing, so that's fine.
As I was working through defining Cassandra column families today a question occurred to me: In relational databases, why doesn't DDL let us define denormalization rules in such a way that the database engine itself could manage the resulting consistency issues natively. In other words, when a relational database programmer denormalizes to achieve performance goals... why is he/she then left to maintain consistency via purpose-written SQL?
Maybe there's something obvious that I'm missing? Is there some reason why such a suggestion is silly, because it seems to me like having this capability might be awfully useful.
EDIT:
Appreciate the feedback so far. I still feel like I have an unanswered (perhaps because it's been poorly articulated) question on my hands. I understand that materialized views attempt to offer engine-managed consistency for denormalized data. However, my understanding is that they aren't updated immediately with changes to the underlying tables. If this is true, it means the engine really isn't managing the consistency issues resulting from the denormalization... at least not at write-time. What I'm getting at is that a normalized data structure without true, feature-rich, engine-managed denormalization hamstrings relational database engines when it comes time to scale a system with heavy read load against complex relational models. I suppose it's true that adjusting materialized view refresh rates equates to tunable "eventual consistency" offered by NoSQL engines like Cassandra. I need to read up on how efficiently engines are able to sync their materialized views. In order to be considered viable relative to NoSQL options, the time it takes to sync a view would need to increase linearly with the number of added/updated rows.
Anyway, I'll think about this some more and re-edit. Hopefully with some representative examples of imagined DDL.
Some relational database systems are able to maintain consistency of denormalized data to some extent (if I understand right what you mean).
In Oracle, this is called materialized views, in SQL Server — indexed views.
Basically, this means that you can create a self-maintaned denormalized table as a result of an SQL query and index it:
CREATE VIEW a_b
WITH SCHEMABINDING
AS
SELECT b.id AS id, b.value, b.a_id, a.property
FROM dbo.b b
JOIN dbo.a a
ON a.id = b.a_id
The resulting view, a_b, were it a real table, would violate 2NF since property is functionally dependent on a_id which is not a candidate key. However, the database system maintains this functional dependency and you can create a composite index on, say, (value, property).
Even MySQL and PostgreSQL which don't support materialized views natively are capable of maintaining some kind of denormalized tables.
For instance, when you create a FULLTEXT index on a column or a set of columns in MySQL, you get two indexes at once: first one contains one entry for each distinct word in each record (with a reference to the original record id), the second one contains one record per each word in the whole table, with the total word count. This allows searching for the words fast and ordering by relevance.
The total word count table is of course dependent on the individual words table and hence violates 5NF, but, again, the systems maintains this dependency.
Similar things are done for GIN and GIST indexes in PostgreSQL.
Of course not all possible denormalizations can be maintained, that means that you cannot materialize and index just any query in real time: some are too expensive to maintain, some are theoretically possible but not implemented in actual systems, etc.
However, you may maintain them using your own logic in triggers, stored procedures or whatever, that's exactly what they are there for.
Denormalisation in an RDBMS is a special case: not the standard. One only does this when you have a proven case. If you design in denormalised data up front, you've already lost.
Given each case is by definition "special", then how can there be standard SQL constructs to maintain the denormalised data.
An RDBMS differs from NoSQL in that it is designed to work with normalised designs. IMHO, you can't compare RDBMS and NoSQL like this
Is it when you're trying to get data and there is no apparent easy way of doing it?
When you find something should be a table on it's own?
What are the laws?
Check out Wikipedia. The article talks about database normalization and the different forms (first, second, third, etc.). Most times you should be aiming for at least third normal form. There are times when you want to relax the rules a bit (it may be too expensive to join multiple tables together so might want to de-normalize a bit) but for the most part third normal form is good.
When you notice you have to repeat the same data, or when you start using single fields as arrays.
While this is a somewhat snarky answer, when you discover that the data isn't sufficiently normalized. There are many resources on the web about the levels (or, more properly, "forms") of normalization, and they more completely describe the forms than I could here. First and second normal forms should be pretty much required. If you aren't at third (or, really, fourth) normal form, you need to have a strong justification as to why.
Check out the Wikipedia article on database normalization.
When you're starting to question whether an SQL database needs more normalization.
Whenever you have a relational database.... <grin/>
No, actually there are laws, check out this Wikipedia link.
they are called the five normal forms or something like that. Originally from the guy who invented relational databases in the 50s/60s, E. F. Codd.
"The key the whole key and nothing but the Key, so help me Codd"
This is a synopsis:
First normal form (1NF) Table
faithfully represents a relation and
has no repeating groups
Second normal form (2NF) No
non-prime attribute in the table is
functionally dependent on a part
(proper subset) of a candidate key
Third normal form (3NF) Every
non-prime attribute is
non-transitively dependent on every
key of the table Every non-trivial functional dependency in the table is a dependency on a superkey
Fourth normal form (4NF) Every
non-trivial multivalued dependency
in the table is a dependency on a
superkey
Fifth normal form (5NF) Every non-trivial join dependency in the table is implied by the superkeys of the table. Domain/key normal form (DKNF) Ronald Fagin (1981)[19] Every constraint on the table is a logical consequence of the table's domain constraints and key constraints
Sixth normal form (6NF) Table features no
non-trivial join dependencies at all
(with reference to generalized join
operator)
Other people have pointed you to the formal rules for normalization. Here are some informal guidelines I use:
If you have columns in a table the names of which differ only by a number (eg Phone1 and PHone2).
If you have any columns in a table that should be filled in only when another column in the table is filled in.
If updating a "fact" in the database (such as a street address) requires more than one UPDATE.
If the same question could ever get two different answers depending on which table you get your information from.
If the answer to any non-trivial question can be gotten from the database without JOINing at least two tables.
If you have any quantity-based restrictions in the database other than "only 1 of something is allowed" (that is, "only one address is allowed" is okay, but "only two addresses are allowed" indicates a normalization problem).
3NF is generally all you need and it follows three rules:
Every column in the table should be dependent on:
the key (1NF),
the whole key (2NF),
and nothing but the key (3NF) (so help me Codd is the way that quote usually ends).
You can often "downgrade" to 2NF for performance reasons, provided you understand the implications and only when you strike problems, but 3NF should be the initial goal for all your designs..
As everyone else has said, you know when you start having (too many) duplicate columns in multiple tables.
That being said, it is sometimes useful to have redundant columns across multiple tables. This can reduce the number of JOINs you have to do in complicated queries. Just be careful to keep all the tables in sync, or you're just asking for trouble.
This is a pretty good article. Getting normal is a science, not an art. Now knowing when to DEnormalize... that's an art.
http://www.alvechurchdata.co.uk/hints-and-tips/softnorm.html
See Description of the database normalization basics
What level of normalization are you currently at? If you can't answer that I assume your database is a nasty mess. I always hit 3rd normal on initial design and de-normalize or normalize further if and when needed.
I assume you're talking about a transactional database supporting an interactive application, but for what it's worth...
OLAP databases used exclusively for reporting and only updated by ETL processes may benefit from a less normalized structure. In these applications you accept the cost of redundant data storage and duplication for the performance benefit of fewer joins and the increased ease of use for (sometimes less technical) data analysts and business analysts.
Transactional databases should always be normalized to the extent practical (at least 3NF) and then selectively denormalized only as needed. And the need to denormalize should ideally be based on actual performance testing results.
When you have to search trough huge amounts of data just to extract some basic info - i.e. what kind of Product categories are there or something like that.
My employer, a small office supply company, is switching suppliers and I am looking through their electronic content to come up with a robust database schema; our previous schema was pretty much just thrown together without any thought at all, and it's pretty much led to an unbearable data model with corrupt, inconsistent information.
The new supplier's data is much better than the old one's, but their data is what I would call hypernormalized. For example, their product category structure has 5 levels: Master Department, Department, Class, Subclass, Product Block. In addition the product block content has the long description, search terms and image names for products (the idea is that a product block contains a product and all variations - e.g. a particular pen might come in black, blue or red ink; all of these items are essentially the same thing, so they apply to a single product block). In the data I've been given, this is expressed as the products table (I say "table" but it's a flat file with the data) having a reference to the product block's unique ID.
I am trying to come up with a robust schema to accommodate the data I'm provided with, since I'll need to load it relatively soon, and the data they've given me doesn't seem to match the type of data they provide for demonstration on their sample website (http://www.iteminfo.com). In any event, I'm not looking to reuse their presentation structure so it's a moot point, but I was browsing the site to get some ideas of how to structure things.
What I'm unsure of is whether or not I should keep the data in this format, or for example consolidate Master/Department/Class/Subclass into a single "Categories" table, using a self-referencing relationship, and link that to a product block (product block should be kept separate as it's not a "category" as such, but a group of related products for a given category). Currently, the product blocks table references the subclass table, so this would change to "category_id" if I consolidate them together.
I am probably going to be creating an e-commerce storefront making use of this data with Ruby on Rails (or that's my plan, at any rate) so I'm trying to avoid getting snagged later on or having a bloated application - maybe I'm giving it too much thought but I'd rather be safe than sorry; our previous data was a real mess and cost the company tens of thousands of dollars in lost sales due to inconsistent and inaccurate data. Also I am going to break from the Rails conventions a little by making sure that my database is robust and enforces constraints (I plan on doing it at the application level, too), so that's something I need to consider as well.
How would you tackle a situation like this? Keep in mind that I have the data to be loaded already in flat files that mimic a table structure (I have documentation saying which columns are which and what references are set up); I'm trying to decide if I should keep them as normalized as they currently are, or if I should look to consolidate; I need to be aware of how each method will affect the way I program the site using Rails since if I do consolidate, there will be essentially 4 "levels" of categories in a single table, but that definitely seems more manageable than separate tables for each level, since apart from Subclass (which directly links to product blocks) they don't do anything except show the next level of category under them. I'm always a loss for the "best" way to handle data like this - I know of the saying "Normalize until it hurts, then denormalize until it works" but I've never really had to implement it until now.
I would prefer the "hypernormalized" approach over a denormal data model. The self referencing table you mentioned might reduce the number of tables down and simplify life in some ways, but in general this type of relationship can be tricky to deal with. Hierarchical queries become a pain, as does mapping an object model to this (if you decide to go that route).
A couple of extra joins is not going to hurt and will keep the application more maintainable. Unless performance degrades due to the excessive number of joins, I would opt to leave things like they are. As an added bonus if any of these levels of tables needed additional functionality added, you will not run into issues because you merged them all into the self referencing table.
I totally disagree with the criticisms about self-referencing table structures for parent-child hierarchies. The linked list structure makes UI and business layer programming easier and more maintainable in most cases, since linked lists and trees are the natural way to represent this data in languages that the UI and business layers would typically be implemented in.
The criticism about the difficulty of maintaining data integrity constraints on these structures is perfectly valid, though the simple solution is to use a closure table that hosts the harder check constraints. The closure table is easily maintained with triggers.
The tradeoff is a little extra complexity in the DB (closure table and triggers) for a lot less complexity in UI and business layer code.
If I understand correctly, you want to take their separate tables and turn them into a hierarchy that's kept in a single table with a self-referencing FK.
This is generally a more flexible approach (for example, if you want to add a fifth level), BUT SQL and relational data models don't tend to work well with linked lists like this, even with new syntax like MS SQL Servers CTEs. Admittedly, CTEs make it much better though.
It can be difficult and costly to enforce things, like that a product must always be on the fourth level of the hierarchy, etc.
If you do decide to do it this way, then definitely check out Joe Celko's SQL for Smarties, which I believe has a section or two on modeling and working with hierarchies in SQL or better yet get his book that is devoted to the subject (Joe Celko's Trees and Hierarchies in SQL for Smarties).
Normalization implies data integrity, that is: each normal form reduces the number of situations where you data is inconsistent.
As a rule, denormalization has a goal of faster querying, but leads to increased space, increased DML time, and, last but not least, increased efforts to make data consistent.
One usually writes code faster (writes faster, not the code faster) and the code is less prone to errors if the data is normalized.
Self referencing tables almost always turn out to be much worse to query and perform worse than normalized tables. Don't do it. It may look to you to be more elegant, but it is not and is a very poor database design technique. Personally the structure you described sounds just fine to me not hypernormalized. A properly normalized database (with foreign key constraints as well as default values, triggers (if needed for complex rules) and data validation constraints) is also far likelier to have consistent and accurate data. I agree about having the database enforce the rules, likely this is part of why the last application had bad data because the rules were not enforced in the proper place and people were able to easily get around them. Not that the application shouldn't check as well (no point even sending an invalid date for instance for the datbase to fail on insert). Since youa redesigning, I would put more time and effort into designing the necessary constraints and choosing the correct data types (do not store dates as string data for instance), than in trying to make the perfectly ordinary normalized structure look more elegant.
I would bring it in as close to their model as possible (and if at all possible, I would get files which match their schema - not a flattened version). If you bring the data directly into your model, what happens if data they send starts to break assumptions in the transformation to your internal application's model?
Better to bring their data in, run sanity checks and check that assumptions are not violated. Then if you do have an application-specific model, transform it into that for optimal use by your application.
Don't denormalize. Trying to acheive a good schema design by denormalizing is like trying to get to San Francisco by driving away from New York. It doesn't tell you which way to go.
In your situation, you want to figure out what a normalized schema would like. You can base that largely on the source schema, but you need to learn what the functional dependencies (FD) in the data are. Neither the source schema nor the flattened files are guaranteed to reveal all the FDs to you.
Once you know what a normalized schema would look like, you now need to figure out how to design a schema that meets your needs. It that schema is somewhat less than fully normalized, so be it. But be prepared for difficulties in programming the transformation between the data in the flattened files and the data in your desgined schema.
You said that previous schemas at your company cost millions due to inconsistency and inaccuracy. The more normalized your schema is, the more protected you are from internal inconsistency. This leaves you free to be more vigilant about inaccuracy. Consistent data that's consistently wrong can be as misleading as inconsistent data.
is your storefront (or whatever it is you're building, not quite clear on that) always going to be using data from this supplier? might you ever change suppliers or add additional different suppliers?
if so, design a general schema that meets your needs, and map the vendor data to it. Personally I'd rather suffer the (incredibly minor) 'pain' of a self-referencing Category (hierarchical) table than maintain four (apparently semi-useless) levels of Category variants and then next year find out they've added a 5th, or introduced a product line with only three...
For me, the real question is: what fits the model better?
It's like comparing a Tuple and a List.
Tuples are a fixed size and are heterogeneous -- they are "hypernormalized".
Lists are an arbitrarty size and are homogeneous.
I use a Tuple when I need a Tuple and a List when I need a list; they fundamentally server different purposes.
In this case, since the product structure is already well defined (and I assume not likely to change) then I would stick with the "Tuple approach". The real power/use of a List (or recursive table pattern) is when you need it to expand to an arbitrary depth, such as for a BOM or a genealogy tree.
I use both approaches in some of my database depending upon the need. However, there is also the "hidden cost" of a recursive pattern which is that not all ORMs (not sure about AR) support it well. Many modern DBs have support for "join-throughs" (Oracle), hierarchy IDs (SQL Server) or other recursive patterns. Another approach is to use a set-based hierarchy (which generally relies on triggers/maintenance). In any case, if the ORM used does not support recursive queries well, then there may be the extra "cost" of using the to the DB features directly -- either in terms of manual query/view generation or management such as triggers. If you don't use a funky ORM, or simply use a logic separator such as iBatis, then this issue may not even apply.
As far as performance, on new Oracle or SQL Server (and likely others) RDBMS, it ought to be very comparable so that would be the least of my worries: but check out the solutions available for your RDBMS and portability concerns.
Everybody who recommends you not to have a hierarchy introduced in the database, considering just the option of having a self-referenced table. This is not the only way to model the hierarchy in the database.
You may use a different approach, that provides you with easier and faster querying without using recursive queries.
Let's say you have a big set of nodes (categories) in your hierarchy:
Set1 = (Node1 Node2 Node3...)
Any node in this set can also be another set by itself, that contains other nodes or nested sets:
Node1=(Node2 Node3=(Node4 Node5=(Node6) Node7))
Now, how we can model that? Let's have each node to have two attributes, that set the boundaries of the nodes it contains:
Node = { Id: int, Min: int, Max: int }
To model our hierarchy, we just assign those min/max values accordingly:
Node1 = { Id = 1, Min = 1, Max = 10 }
Node2 = { Id = 2, Min = 2, Max = 2 }
Node3 = { Id = 3, Min = 3, Max = 9 }
Node4 = { Id = 4, Min = 4, Max = 4 }
Node5 = { Id = 5, Min = 5, Max = 7 }
Node6 = { Id = 6, Min = 6, Max = 6 }
Node7 = { Id = 7, Min = 8, Max = 8 }
Now, to query all nodes under the Set/Node5:
select n.* from Nodes as n, Nodes as s
where s.Id = 5 and s.Min < n.Min and n.Max < s.Max
The only resource-consuming operation would be if you want to insert a new node, or move some node within the hierarchy, as many records will be affected, but this is fine, as the hierarchy itself does not change very often.