Ordered tree in a database - sql

I have a folder structure which I need to store in a database. Each folder has a name, a primary key and a foreign key to their parent folder. So a folder can have sub folders.
What I am having trouble with is when a user wants to move up or down or add a new sub folder (adding a sub folder is added at the end of the tree) and I want to record the order of the sub folders.
How do I record the user ordering in a table?
So if I have sub folders B, C and D under A (in that order) and I move D up, then the order is B, D and then C. How do you reflect this in a database?

If you are looking to create a sub-hirarchy you can apply a order by clause with a integer to sort. for example
ID, ParentID, SortOrder, Name
Where SortOrder is a integer between 0 and foldercount-1 then just apply a
ORDER BY SortOrder ASC
I would also suggest you add a Lineage and Depth field to your table, that way you can easily traverse up and down the tree a bit quicker without as expensive queries. Have a look at this article.

There is an alternative way to keep hierarchical structures in one table, called nested sets. This model allow faster queries for children or parent nodes by the price of alterning tree.
Consider table
id left right node
0 0 9 root node
1 1 4 left node
2 5 8 right node
3 2 3 left sub node
.... etc
In order to get all parents of node N we need to find all records where (left-i; right-i) will include set (letf-N; right-N)
In order to get all children of node N you need to find all records where (left-i; right-i) are included in (left-N; right-N)
So nested set model allows to make simple hierarchical queries without recursion.
Here is wiki on Nested Set Model

Have you considered the HierarchyID data type?
http://msdn.microsoft.com/en-us/library/bb677290
See http://msdn.microsoft.com/en-us/magazine/cc794278.aspx

Perhaps what you're looking for is hierarchical queries. This enables you to automatically do a depth-first query on your hierarchical table, and define a sort order on any level of the hierarchy (so that each folder is ordered, but after each item you will get its sub-items before getting the next item on that folder).

Related

Displaying multiple columns on one row (SQL)

I have a report I am trying to make that displays parent information and all children in one household on ONE row.
There is no "parent" table that stores the information on parents and there is no ID that links parents to child and no ID that links sibling to sibling. The only way to tell if they are siblings is if they have the same address (logic being that if they have the same address, they live together, and are part of the same household). All the information is pulled from a "student" table or a custom field in the student table that stores the parent information, address they live at, etc.
Instead of displaying parent info twice I want to display
the information like this:
Parent_name, address, phone,child1_name, child1_schoolname, child1_age, child2_name, child2_schoolname, child2_age, etc(for every child in that household)
The problem is that not every household will have the same amount of children and I can only link siblings by their address.
How can I display all information for each household on ONE row? Is this possible and how? I've tried pivot table but with no avail.
This is a classic 'you shouldn't be doing reports in the database' question. A database is for data retrieval, not data formatting. But let's assume you know this and need to do it anyway for some reason.
The algorithm I'd use for this would be
Create some windowed queries across the data; group by address (the joinable value) and sort by age desc.
Create a query that utilize this window and returns the first item in each group.
Create additional queries that return the second, the third, the fourth, in each group. etc.
Outer join these together.
This is going to be far easier if you define some maximum number of siblings (five?) as opposed to dynamically building these siblings.
If the parents are in the same table, how do you know which items are parents and which are children?
In case you have two tables one for Parent(first table) and one for Children(second table) as below:
You can do something like that in your data model:
select Parent.NAME as parent_name,
Parent.ADDRESS as parent_address,
Parent.PHONE AS phone,
(
select listagg(Child.NAME,',')
within group(order by Child.NAME)
from CHILD Child
where Child.ADDRESS=Parent.ADDRESS
)as children_names,
(
select
listagg(Child.AGE,',')
within group(order by Child.NAME)
from CHILD Child
where Child.ADDRESS=Parent.ADDRESS
)as children_ages
from PARENT Parent .
And you will have the output query result:
Listagg is your solution which operates as you want bringing muliple rows in one.
However,listagg is compatible for database 11g and newest versions,
so in case you have older version,this is not going to work.
Hope this help.

JIRA - SQL Find all Subtasks given a parent issue number

The main table on the jira database has no information about subtasks.
I'm trying to find all subtasks from a given parent issue using the SQL and not the JIRA web interface.
Where is this information located?
In Jira, the parent issue, child issue/subtask relationship is stored in the table issuelink.
The issue link table has the following fields:
| id | linktype | source | destination | sequence |
Where source is the jiraissue.id value of the parent, and destination is the jiraissue.id of the children.
The following query will return the children of the parent issue:
SELECT destination AS children FROM issuelink WHERE source=XXX;
where XXX is your parent jiraissue.id number.
I found that not only:
Where source is the jiraissue.id value of the parent, and destination is the jiraissue.id of the children.
There is viable variant exist too, when:
Where destination is the jiraissue.id value of the parent, and source is the jiraissue.id of the children.
So, that's means thats links can have two directions.
The issuelink table is the table to use but you must realize that the table is also used for other task relationships as well (blocks, clones, duplicates, etc) so that means if you are only looking for subtasks you must indicate the specific link type you are looking for. Thus the proper query would be:
SELECT jiraschema.issuelink.DESTINATION AS childID
FROM jiraschema.issuelink
INNER JOIN jiraschema.issuelinktype
ON jiraschema.issuelink.LINKTYPE = jiraschema.issuelinktype.ID
WHERE jiraschema.issuelinktype.pstyle = 'jira_subtask'
jiraschema.issuelink.SOURCE = [parent_issue_id]

Do I really need a relation table in my case?

Lets say I have a module. I build an interface where user can assign the module to groups.
Lets say currently I have 3 groups. In the UI the user would choose all 3 groups to assign the module to them. For examle in a multiple selectbox.
The intention of the user is to assign the module to ALL groups.
I guess I would need a many to many relation table. My source code would execute a sql query to insert 3 entries.
But wait. What if two weeks later the admin adds a new group... In the relation table are only 3 entries. And the user wonders why the module is not assigned to the new added group.
What would be an elegant solution? I need definite to update the relation table, or I make a new column -called, lets say "groups"- in my module table where I add the assigned groupsIds in this format: "1;2;7;15" or the keyword "All".
The advantage would be that with the keyword "All" I could know in my code that the module is assigned to all groups.
With the relation table I do not have this option. In addition I do not need to assign a group to a module. I just need to assign a module to groups.
In my opinion I do not need a relation table in this case.
What would you say? Or do you have another approach?
To make a many-to-many relation, you need a relation table.
To use a solution like putting comma separated values in a field to make multiple relations only works if you fetch data from separate tables in separate queries, when you need to use that to join the data in the database, it becomes very complicated very fast.
In a relation table you could use a null value to mean "all". In this example the modules 1 and 2 are members of the group 1 only, and the module 3 is a member of all groups:
ModuleId GroupId
-------- -------
1 1
2 1
3 null
To fetch data for groups using a relation like that you would use a query like:
select
g.GroupName,
m.ModuleName
from
Groups g
inner join GroupModules gm on gm.GroupId is null or gm.GroupId = g.GroupdId
inner join Modules m on m.ModuleId = gm.ModuleId
Another alternative is to use the relation table as usual, and add a property on the module that it's a member of all groups. When you add a new group, you would also add records in the relation table for all modules that are members of all groups. Example (in T-SQL):
create procedure Group_Add
#GroupName varchar(50)
as
set nocount on
declare #id int
insert into Groups (
GroupName
) values (
#GroupName
)
set #id = scope_identity()
insert in GroupModules (
GroupId,
ModuleId
)
select
#id, ModuleId
from
Modules
where
IsInAllGroups = 1
The typical database pattern for this is 3 tables, Module, Group, GroupModule. That is how many to many relationships are properly handled in database design.
How to get the data populated is a problem for your UI. You can indeed have a pull down list that includes the word all for them to use in choosing the groups a module will be associated with. It can even be the default. What you do is write code to interpret ALL to insert one record for every group.
NOw if you are adding groups as well as modules, you also need a process to make sure that when a group is added all those which shoudl have all groups get added to the new group. Personally I would put an IS_ALL flag on the Module table to make this easier. Then you know which moduels have been selected for all groups. You will need to make sure that if someone goes back and changes the module to specificgroups instead of all that this field is updated.

How to copy an entity tree in database

I have a pretty standard relational data situation in which there is a root entity (and corresponding table) with children entities. These children have children entities and so on and so forth for about 6 levels. Each level has a many children to one parent relationship. I would like to write a procedure that effectively copies the root entity and all of its children entities (recursively copying the childrens' children), creating new entities for each along the way while storing each in its respective table.
I know this could be done with nested cursors, but I don't want to do it that way. I know that there is a more elegant solution out there I just need help creating it. I have a feeling that the solution lies in a combination of OUTPUT clauses and MERGE statements.
If you could, please tailor your answer to the novice SQL developer level. I will need an explanation or a link to an explanation for any structure you use that is outside of the basic SELECT INSERT UPDATE and DELETE.
Thank you for your time.
You need to use common table expression. Check this:
http://blog.sqlauthority.com/2012/04/24/sql-server-introduction-to-hierarchical-query-using-a-recursive-cte-a-primer/
I'll assume that you want to copy a subset of the data that is in a hierarchy of tables. By hierarchy I mean tables that are interrelated through foreign keys in the obvious sense of the word. For example, Customers would be a root table, Orders a child of it and OrderDetails another child (at the 3rd level).
First we copy the root table of the hierarchy:
MERGE RootTable as target
USING (
SELECT *
FROM RootTable
WHERE SomeCondition
) AS src
ON 1=2 -- this is so that all rows that do not match will be added
WHEN NOT MATCHED THEN INSERT (AllColumns) VALUES (AllColumns)
OUTPUT src.ID as OldID, INSERTED.ID as NewID INTO #RootTableMapping
Now we have a 1 to 1 mapping of the copy source and copy target IDs of the root table in #RootTableMapping. Also, all root rows were copied.
We now need to copy all child tables. Here's the statement for one:
MERGE ChildTable as target
USING (
SELECT *, #RootTableMapping.NewID AS NewParentID
FROM ChildTable
JOIN #RootTableMapping ON ChildTable.RootID = #RootTableMapping.OldID
WHERE SomeCondition
) AS src
WHEN NOT MATCHED THEN INSERT (AllColumns, RootID) VALUES (AllColumns, NewParentID)
Here, we obtain for each child row the ID of the cloned root table row so that we can link up the hierarchy. For that we use #RootTableMapping. We copy all columns unmodified, except for the ID of the parent which we substitute with the NewID from the mapping.
You'd need one such MERGE statement for each child table. The concept also extends to hierarchies with more than 2 levels by adding additional joins. All levels except for the bottom level must record the mapping of copy-source IDs to copy-target IDs to allow the next layer below to obtain the new IDs.
Feel free to ask further questions in case I did not make everything clear enough. I know this is a rough sketch.

Implementing a sorting key in SQL

We store relationships between documents in an Oracle db using a table having a column named docid and a column named parentid. If I have a document, doc1, related to child documents, child1_1 and child1_2 they would be represented by the following records in the Documents table.
docid parentid
1000 null record for doc1
1001 1000 " " child1_1
1002 1000 " " child1_2
The Documents table can have millions of rows, so to make sure all related documents are grouped together in our UI we pre-sort the Documents table by using an indexed varchar column named sortedfamily and populate it with the concatenation of the docids of the related documents. Without using the sortedfamily column sorting the records at query time is too slow. The records shown above become.
docid parentid sortedfamily
1000 null 1000 record for doc1
1001 1000 1000_1001 " " child1_1
1002 1000 1000_1002 " " child1_2
This allows us to add 'ordered by sortedfamily' to our queries and the returned records will always be sorted by related documents. What I outlined above works pretty well but it has some limitations related to a document family hierarchical depth and it feels weird concatenating integers to sort the records. Is there a way to do the above using only integers?
Thanks in advance.
UPDATE: My example above was not detailed enough. The children themselves may also have related documents. If child1_1 had a related document the resulting value for sortedfamily may be "1000_1001_2000".
Oracle has excellent support for hierarchical queries. You can get your document hierarchy without resorting to the sortedfamily column. Here's the query:
SELECT docid, PRIOR docid AS "Parent"
FROM Documents
START WITH parentid IS NULL
CONNECT BY parentid = PRIOR docid
ORDER SIBLINGS BY docid
Now to explain:
SELECT docid, PRIOR docid AS "Parent"
This gets the document and its parent on the same row by "looking back" with the PRIOR operator.
START WITH parentid IS NULL
This defines the hierarchy's root. Every row that has a null parentid is considered the root of a branch.
CONNECT BY parentid = PRIOR docid
This says that the "parent" of the current row is connected by parentid of the child up to docid of the parent.
ORDER SIBLINGS BY docid
This sorts along the entire hierarchy rather than a single value. It's hard to explain, but it works.
The best thing about the Oracle hierarchical queries is that they'll query an entire branch, so if you have a document with a child that has a child (that has a child, and on on...) Oracle will handle it. It will also handle multiple children per parent.
There's a SQL Fiddle here with your data plus a few additional documents.
The Fiddle also includes a column that shows the entire "root to branch" relationship using the SYS_CONNECT_BY_PATH function. The SYS_CONNECT_BY_PATH does the same thing as your sortedfamily column, but it does it dynamically, without the need to maintain the column. It's also a good way to visualize each branch of the hierarchy.
Addendum
Note that the query above will return every branch for every document. If you're just interested in a single document such as docid = 1000, replace the START WITH parentid IS NULL with this:
START WITH docid = 1000
That will give you the entire branch for docid 1000. If you have an index on docid it will be very fast.