In MySQL, I store categories this way:
categories:
- category_id
- category_name
- parent_category_id
What would be the most efficient way to generate the trail / breadcrumb for a given category_id?
For example
breadcrumbs(category_id):
General > Sub 1 > Sub 2
There could be in theories unlimited levels.
I'm using php.
UPDATE:
I saw this article (http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/) about the Nested Set Model.
It looks interesting, but how would you ago about dynamically managing categories?
It looks easier on paper, like when you know ahead of times the categories, but not when the user can create/delete/edit categories on the fly ...
What do you think?
I like to use the Materialized Path method, since it essentially contains your breadcrumb trail, and makes it easy to do things like select all descendants of a node without using recursive queries.
Materialized Path model
The idea with the Materialized path model is to link each node in the hierarchy with its position in the tree. This is done with a concatenated list of all the nodes ancestors. This list is usually stored in a delimited string. Note the “Linage” field below.
CAT_ID NAME CAT_PARENT Lineage
1 Home .
2 product 1 .1
3 CD’s 2 .1.2
4 LP’s 2 .1.2
5 Artists 1 .1
6 Genre 5 .1. 5
7 R&B 6 .1. 5.6
8 Rock 6 .1. 5.6
9 About Us 1 .1
Traversing the table
Select lpad('-',length(t1.lineage))||t1.name listing
From category t1, category t2
Where t1.lineage like t2.lineage ||'%'
And t2.name = 'Home';
Order by t1.lineage;
Listing
Home
-product
–CD’s
–LP’s
-Artists
–Genre
—R&B
—Rock
-About Us
Generate it (however you like) from a traditional parent model and cache it. It's too expensive to be generating it on the fly and the changes to the hierarchy are usually several orders of magnitude less frequent than other changes ever are. I wouldn't bother with the nested sets model since the hierarchy will be changing and then you have to go fooling around with the lefts and rights. (Note that the article only included recipes for adding and deleting - not re-parenting - which is very simple in the parent model).
The beauty of nested sets is that you can easily add/remove nodes from the graph with just a few simple SQL statements. It's really not all that expensive, and can be coded pretty quickly.
If you happen to be using PHP (or even if you don't), you can look at this code to see a fairly straight-forward implementation of adding nodes to a nested set model (archive.org backup). Removing (or even moving) is similarly straightforward.
Related
I have no idea if I wrote that correctly. I want to start learning higher end data mining techniques and I'm currently using SQL server and Access 2016.
I have a system that tracks ID cards. Each ID is tagged to one particular level of a security hierarchy, which has many branches.
For example
Root
-Maintenance
- Management
- Supervisory
- Manager
- Executive
- Vendors
- Secure
- Per Diem
- Inside Trades
There are many other departments like Maintenance, some simple, some with much more convoluted, hierarchies.
Each ID card is tagged to a level so in the Maintenance example, - Per Diem:Vendors:Maintenance:Root. Others may be just tagged to Vendors, Some to general Maintenance itself (No one has root, thank god).
So lets say I have 20 ID Cards selected, these are available personnel I can task to a job but since they have different area's of security I want to find a commonalities they can all work on together as a 20 person group or whatever other groupings I can make.
So the intended output would be
CommonMatch = - Per Diem
CardID = 1
CardID = 3
CommonMatch = Vendors
CardID = 1
CardID = 3
CardID = 20
So in the example above, while I could have 2 people working on -Per Diem work, because that is their lowest common security similarity, there is also card holder #20 who has rights to the predecessor group (Vendors), that 1 and 3 share, so I could have three of them work at that level.
I'm not looking for anyone to do the work for me (Although examples always welcome), more to point me in the right direction on what I should be studying, what I'm trying to do is called, etc. I know CTE's are a way to go but that seems like only a tool in a much bigger process that needs to be done.
Thank you all in advance
Well, it is not so much a graph-theory or data-mining problem but rather a data-structure problem and one that has almost solved itself.
The objective is to be able to partition the set of card IDs into disjoint subsets given a security clearance level.
So, the main idea here would be to layout the hierarchy tree and then assign each card ID to the path implied by its security level clearance. For this purpose, each node of the hierarchy tree now becomes a container of card IDs (e.g. each node of the hierarchy tree holds a) its own name (as unique identification) b) pointers to other nodes c) a list of card IDs assigned to its "name".)
Then, retrieving the set of cards with clearance UP TO a specific security level is simply a case of traversing the tree from that specific level downwards until the tree's leafs, all along collecting the card IDs from the node containers as they are encountered.
Suppose that we have access tree:
A
+-B
+-C
D
+-E
And card ID assignments:
B:[1,2,3]
C:[4,8]
E:[10,12]
At the moment, B,C,E only make sense as tags, there is no structural information associated with them. We therefore need to first "build" the tree. The following example uses Networkx but the same thing can be achieved with a multitude of ways:
import networkx
G = networkx.DiGraph() #Establish a directed graph
G.add_edge("A","B")
G.add_edge("A","C")
G.add_edge("A","D")
G.add_edge("D","E")
Now, assign the card IDs to the node containers (in Networkx, nodes can be any valid Python object so I am going to go with a very simple list)
G.node["B"]=[1,2,3]
G.node["C"]=[4,8]
G.node["E"]=[10,12]
So, now, to get everybody working under "A" (the root of the tree), you can traverse the tree from that level downwards either via Depth First Search (DFS) or Breadth First Search (BFS) and collect the card IDs from the containers. I am going to use DFS here, purely because Networkx has a function that returns the visited nodes depending on visiting order, directly.
#dfs_preorder_nodes returns a generator, this is an efficient way of iterating very large collections in Python but I am casting it to a "list" here, so that we get the actual list of nodes back.
vis_nodes = list(networkx.dfs_preorder_nodes(G,"A")); #Start from node "A" and DFS downwards
cardIDs = []
#I could do the following with a one-line reduce but it might be clearer this way
for aNodeID in vis_nodes:
if G.node[aNodeID]:
cardIDs.extend(G.node[aNodeID])
In the end of the above iteration, cardIDs will contain all card IDs from branch "A" downwards in one convenient list.
Of course, this example is ultra simple, but since we are talking about trees, the tree can be as large as you like and you are still traversing it in the same way requiring only a single point of entry (the top level branch).
Finally, just as a note, the fact that you are using Access as your backend is not necessarily an impediment but relational databases do not handle graph type data with great ease. You might get away easily for something like a simple tree (like what you have here for example), but the hassle of supporting this probably justifies undertaking this process outside of the database (e.g, use the database just for retrieving the data and carry out the graph type data processing in a different environment. Doing a DFS on SQL is the sort of hassle I am referring to above.)
Hope this helps.
Let me first start by stating that in the last two weeks I have received ENORMOUS help from just about all of you (ok ok not all... but I think perhaps two dozen people commented, and almost all of these comments were helpful). This is really amazing and I think it shows that the stackoverflow team really did something GREAT altogether. So thanks to all!
Now as some of you know, I am working at a campus right now and I have to use a windows machine. (I am the only one who has to use windows here... :( )
Now I manage to setup (ok, IT department did that for me) and populate a Postgres database (this I did on my own), with about 400 mb of data. Which perhaps is not so much for most of you heavy Ppostgre users, but I was more used to sqlite database for personal use which rarely exceeded 2mb ever.
Anyway, sorry for being so chatty - now the queries from that database work
nicely. I use ruby to do queries actually.
The entries in the Postgres database are interconnected, in as far as they are like
"pointers" - they have one field that points to another field.
Example:
entry 3667 points to entry 35785 which points to entry 15566. So it is quite simple.
The main entry is 1, so the end of all these queries is 1. So, from any other number, we can reach 1 in the end as the last result.
I am using ruby to make as many individual queries to the database until the last result returned is 1. This can take up to 10 individual queries. I do this by logging into psql with my password and data, and then performing the SQL query via -c. This probably is not ideal, it takes a little time to do these logins and queries, and ideally I would have to login only once, perform ALL queries in Postgres, then exit with a result (all these entries as result).
Now here comes my question:
- Is there a way to make conditional queries all inside of Postgres?
I know how to do it in a shell script and in ruby but I do not know if this is available in postgresql at all.
I would need to make the query, in literal english, like so:
"Please give me all the entries that point to the parent entry, until the last found entry is eventually 1, then return all of these entries."
I already "solved" it by using ruby to make several queries until 1 is eventually returned, but this strikes me as fairly inelegant and possibly not effective.
Any information is very much appreciated - thanks!
Edit (argh, I fail at pasting...):
Example dataset, the table would be like this:
id | parent
----+---------------+
1 | 1 |
2 | 131567 |
6 | 335928 |
7 | 6 |
9 | 1 |
10 | 135621 |
11 | 9 |
I hope that works, I tried to narrow it down solely on example.
For instance, id 11 points to id 9, and id 9 points to id 1.
It would be great if one could use SQL to return:
11 -> 9 -> 1
Unless you give some example table definitions, what you're asking for vaguely reminds of a tree structure which could be manipulated with recursive queries: http://www.postgresql.org/docs/8.4/static/queries-with.html .
I am working on a system to display information about real estate. It runs in angular with the data stored as a json file on the server, which is updated once a day.
I have filters on number of bedrooms, bathrooms, price and a free text field for the address. It's all very snappy, but the problem is the load time of the app. This is why I am looking at Redis. Trouble is, I just can't get my head round how to get data with several different filters running.
Let's say I have some data like this: (missing off lots of fields for simplicity)
id beds price
0 3 270000
1 2 130000
2 4 420000
etc...
I am thinking I could set up three sets, one to hold the whole dataset, one to create an index on bedrooms and another for price:
beds id
2 1
3 0
4 2
and the same for price:
price id
130000 1
270000 0
420000 2
Then I was thinking I could use SINTER to return the overlapping sets.
Let's say I looking for a house with more than 2 bedrooms that is less than 300000.
From the bedrooms set I get IDs 0,2 for beds > 2.
From the prices set I get IDs 0,1 for price < 300000
So the common id is 0, which I would then lookup in the main dataset.
It all sounds good in theory, but being a Redis newbie, I have no clue how to go about achieving it!
Any advice would be gratefully received!
You're on the right track; sets + sorted sets is the right answer.
Two sources for all of the information that you could ever want:
Chapter 7 of my book, Redis in Action - http://bitly.com/redis-in-action
My Python/Redis object mapper - https://github.com/josiahcarlson/rom (it uses ideas directly from chapter 7 of my book to implement sql-like indices)
Both of those resources use Python as the programming language, though chapter 7 has been translated into Java: https://github.com/josiahcarlson/redis-in-action/ (go to the java path to see the code).
... That said, a normal relational database (especially one with built-in Geo handling like Postgres) should handle this data with ease. Have you considered a relational database?
I am using Oracle APEX 4.2.2 and have constructed a Tree region based off a view.
Now when I take this query (see below) and run this query say in Oracle SQL Developer - all is fine but when I place this same query within the page in Oracle APEX based off a Tree region - all saves correctly but when I run this query, no records/tree is displayed at all.
Now the underlying view can change in record size but for the example I am talking about here, I have just over 6000 records that I need to build a Oracle Tree hierarchy from.
One thing I have noticed is that if I reduce the record size to say 500 rows, the tree displays perfectly.
Questions:
1) Now is there a limitation that I am not aware of as I really need to get this going based on whether there are 500 records or 6000 records?
2) Is 6000 rows too many for a tree hierarchy representation?
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js for building trees and there causing issues due to the quantity of data?
4) Is there a means of reducing the depth of the tree records so that I can still at least display something to the user?
My query is something like:
SELECT case when connect_by_isleaf = 1 then 0
when level = 1 then 1
else -1
end as status,
level,
c as title,
null as icon,
c as value,
null as tooltip,
null as link
FROM t
start with p IS NULL
CONNECT BY NOCYCLE PRIOR c = p;
Also I've noticed that if I try and run the query in SQL Workshop, it doesn't work there either unless I reduce the record size down to say 500 records.
I asked about using IE because the 'too large tree' issue especially plays up in IE. I've seen this issue pass by and asked about a couple of times already. The conclusion was simply that there isn't much to be done about it and generally the browser(s) don't cope too well with a tree with such a large dataset. Usually the issue isn't there or is minimal in ff or chrome though, and ie is mostly not playing ball, and my guess is that this has to do with memory and dom manipulation.
1) Now is there a limitation that I am not aware of as I really need
to get this going based on whether there are 500 records or 6000
records?
No limitation.
2) Is 6000 rows too many for a tree hierarchy representation?
Probably, yes.
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js
for building trees and there causing issues due to the quantity of
data?
Trees are being built with jstree since 4.0 (don't know about 3.2). Apex puts out a global variable in the tree region which holds all the data. The initialization of the widget will then create the complete ul-li list structure. Part of the issue might be that there are so many nodes to begin with, and then how this is ran through jstree, and the huge amount of dom manipulation occuring. I'm not sure if this would go better with the newer release of jstree (apex version is 0.9.9 while 1.x has been released for a while now).
4) Is there a means of reducing the depth of the tree records so that
I can still atleast display something to the user?
If you want to limit the depth you can limit the query by using level in the where clause. eg
WHERE level <= 3
Alternative options will probably be non-apex solutions. Dynamic trees, ajax for the tree nodes, another plugin,... I haven't really explored those as I haven't had to deal with such a big tree yet.
I experienced, that the number of displayable tree nodes depends also on the text lenghts in you tree (e.g. nodes and tooltips). The shorter the texts, the more nodes your tree can display. However, it makes a difference of maybe 50 nodes, so it won't solve your problem, as it didn't solve mine.
My mediocre educated guess is, that this ul-li is limited in size.
I built in a drop-down prefilter, so the user has to narrow down what she/he wants to have displayed.
Duplicate:
SQL - how to store and navigate hierarchies
If I have a database where the client requires categories, sub-categories, sub-sub-categories and so on, what's the best way to do that? If they only needed three, and always knew they'd need three I could just create three tables cat, subcat, subsubcat, or the like. But what if they want further depth? I don't like the three tables but it's the only way I know how to do it.
I have seen the "sql adjacency list" but didn't know if that was the only way possible. I was hoping for input so that the client can have any level of categories and subcategories. I believe this means hierarchical data.
EDIT: Was hoping for the sql to get the list back out if possible
Thank you.
table categories: id, title, parent_category_id
id | title | parent_category_id
----+-------+-------------------
1 | food | NULL
2 | pizza | 1
3 | wines | NULL
4 | red | 3
5 | white | 3
6 | bread | 1
I usually do a select * and assemble the tree algorithmically in the application layer.
You might have a look at Joe Celko's book, or this previous question.
creating a table with a relation to itself is the best way for doing the same. its easy and flexible to the extent you want it to be without any limitation. I dont think i need to repeat the structure that you should put since that has already been suggested in the 1st answer.
I have worked with a number of methods, but still stick to the plain "id, parent_id" intra-table relationship, where root items have parent_id=0. If you need to query the items in a tree a lot, especially when you only need 'branches', or all underlying elements of one node, you could use a second table: "id, path_id, level" holding a reference to each node in the upward path of each node. This might look like a lot of data, but it drastically improves the branch-lookups when used, and is quite manageable to render in triggers.
Not a recommended method, but I have seen people use dot-notation on the data.
Food.Pizza or Wines.Red.Cabernet
You end up doing lots of Like or midstring queries which don't use indices terribly well. And you end up parsing things alot.