Performance issues in SQL query with a hierarchical relationship - sql

I have an Oracle table that represents parent-child relationships, and I want to improve the performance of a query that searches the hierarchy for an ancestor record. I'm testing with the small data set here, though the real table is much larger:
id name parent_id tagged
== ==== ========= ======
1 One null null
2 Two 1 1
3 Three 2 null
4 Four 3 null
5 Five null null
6 Six 5 1
7 Seven 6 null
8 Eight null null
9 Nine 8 null
parent_id refers back to id in this same table in a foreign key relationship.
I want to write a query that returns each leaf record (those records that have no descendants... id 4 and id 7 in this example) which has an ancestor record that has tagged = 1 (walking back through the parent_id relationship).
So, for the above source data, I want my query to return:
id name tagged_ancestor_id
== ==== ==================
4 Four 2
7 Seven 6
My current query to retrieve these records is:
select * from (
select id,
name,
connect_by_root id tagged_ancestor_id
from mytree
connect by prior id = parent_id
start with tagged is not null
) m1
where not exists (
select * from mytree m2 where m2.parent_id = m1.id
)
This query works fine on this simple little example table, but its performance is terrible on my real table which has about 11,000,000 records. The query takes over a minute to run.
There are indexes on both fields in the connect by clause.
The "tagged" field in the start with clause also has an index on it, and there are about 1,500,000 records in my table with non-null values in this field.
The where clause doesn't seem to be the problem, because when modify it to return a specific name (also indexed) with where name = 'somename' instead of where not exists ..., the query still takes about the same amount of time.
So, what are some strategies I can use to try to make these types queries on this hierarchy run faster?

Here is what I would check first:
Make sure your table has a primary key.
Make sure the statistics on the table are current. Use DBMS_STATS.GATHER_TABLE_STATS to collect the statistics. See this URL: (for ORACLE version 11.1):
http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm
Even if you have indexes on both fields individually, you still need
an index on the 2 fields combined; Create an index on the ID and PARENT_ID:
CREATE INDEX on TABLE_NAME(ID, PARENT_ID);
See this URL:
Optimizing Oracle CONNECT BY when used with WHERE clause
Make sure the underlying table does not have row chaining or other problems (E.G. corruption).
Make sure the table and all indexes are in the same tablespace.

I'm not sure if this is any faster without the volume of data to test with... but something to consider. I guess I'm hoping by starting with only those that are tagged, and only those that are leafs we are dealing with a smaller volume to process which may result in a performance gain. but the overhead for the string manipulation seems hackish.
with cte(id, name, parent_id, tagged) as (
SELECT 1, 'ONE', null, null from dual union all
SELECT 2, 'TWO', 1, 1 from dual union all
SELECT 3, 'THREE', 2, null from dual union all
SELECT 4, 'FOUR', 3, null from dual union all
select 5, 'FIVE', null, null from dual union all
select 6, 'SIX', 5, 1 from dual union all
select 7, 'SEVEN', 6, null from dual union all
select 8, 'EIGHT', null, null from dual union all
select 9, 'NINE', 8, null from dual),
Leafs(id, name) as (select id, Name
from cte
where connect_by_isleaf = 1
Start with parent_Id is null
connect by nocycle prior id =parent_id),
Tagged as (SELECT id, name, SYS_CONNECT_BY_PATH(ID, '/') Path, substr(SYS_CONNECT_BY_PATH(ID, '/'),2,instr(SYS_CONNECT_BY_PATH(ID, '/'),'/',2)-2) as Leaf
from cte
where tagged=1
start with id in (select id from leafs)
connect by nocycle prior parent_id = id)
select l.*, T.ID as Tagged_ancestor from leafs L
inner join tagged t
on l.id = t.leaf
In essence I created 3 cte's one for the data (Cte) one for the leafs(leafs) and one for the tagged records (tagged)
We traverse the hierarchy twice. Once to get all the leafs, once to get all the tagged. We then parse out the first leaf value from the tagged hierarchy and join it back to leafs to get the leafs related to tagged records.
As to if this is faster than what you're doing... Shrug I didn't want to spend the time testing since I don't have your indexes nor do I have your data volume

Related

How to migrate datas using .sql script?

Im struggling how to migrate datas using .sql script I'm quite new to SQL and trying to figure out how to migrate data's purely on .SQL. I want to add my old data to the new table as a new record with a different structure
Here's my case: I have old two tbl and i want to merge it to my new structured tbl with an additional columns. I'm kinda stuck here since I'm not used in using conditional on .SQL
Prefixes of the tables are schemas
Old table
old.groups
id
group_name
10
Apex
11
Pred
12
Tor
old.sub_groups
parent_id
sub_group
10
sub-apex
11
sub-pred
11
sub-sub-pred
New Table:
Expected Migrated Data
public.new_groups *id is auto incremented
Fresh New populated table
id
group_name
level
parent_id
0
Apex
1
10
1
Pred
1
11
2
Tor
null
null
3
sub-apex
2
10
4
sub-pred
2
11
5
sub-sub-pred
2
11
I want to merge it with conditions. but i can't keep up with SQL queries
Condition 1: If old.groups.id doesn't detect any match on old.sub_groups.parent_id it will be inserted to public.new_groups but the public.new_groups.level and public.new_groups.parent_id will be default to null.
Condition 2: If old.groups.id detects a match on old.sub_groups.parent_id it will be also inserted to public.new_groups then tag the level as 1 (1 means parent group in my structure) but with another new three inserted records which is the sub_groups it detected refer to tbl.new_groups id [3, 4, and 5] and tag the level as 2. and the parent_id will be the parent_id of the old.sub_groups or the id of the parent in old.groups
This is my unfinished Query im only able to call the data its missing out the conditional and the update but i think this is also wrong:
INSERT INTO public.new_groups(
SELECT *, b.sub_group as group_name, b.parent_id FROM old.groups as a
LEFT JOIN old.sub_groups as b ON a.id = b.parent_id....
)
When you created your table like this:
CREATE TABLE new (
id SERIAL PRIMARY KEY ,
group_name VARCHAR(20),
level INTEGER,
parent_id INTEGER
);
You can copy the tables with this statement:
INSERT INTO new(group_name, level, parent_id)
SELECT DISTINCT
group_name,
CASE WHEN subgroups.parent_id IS NULL THEN NULL ELSE 1 END as level,
subgroups.parent_id
FROM old
LEFT JOIN subgroups ON old.id = subgroups.parent_id
UNION ALL
SELECT
sub_group,
2,
parent_id
FROM subgroups;
see: DBFIDDLE
just my id starts with 1, and not with 0.

How to find table names having ID (primary key) of a certain value in a hierarchy of tables?

I use Oracle 11g and have a massive number of tables representing inheritance, where a base parent table has a primary key NUMBER ID. The subsequent tables inherit from it, representing through the shared primary key NUMBER ID. Let's assume there is a multiple layers of such inheritance.
To have a clear picture, let's work with the following simplified structure and assume the hierarchy is quite complex:
- TABLE FOOD
- TABLE FRUIT
- TABLE CYTRUS
- TABLE ORANGE
- TABLE GREPFRUIT
- TABLE VEGETABLE
- TABLE MEAT
- TABLE BEEF
- TABLE SIRLOIN
- TABLE RIB EYE
- TABLE CHICKEN
This is not taxative, regardless of how dumb the example is, assume such a multi-layered hierarchy using Class Table Inheritance (aka Table Per Type Inheritance).
If you want to insert a record to a table ORANGE having a certain generated ID, there must be inserted records to the parent tables (CYTRUS, FRUIT and FOOD) as well. Assume an ORM engine takes care after this as keeping such consistency would be very complex.
Let's also assume each of the tables in the hierarchy ends with a certain word (let's say FOOD: FRUIT_FOOD, CYTRUS_FOOD etc.) - I didn't include it to the chart above for sake of clarity.
Question: I have found a record in FOOD table with ID = 123 based on certain criteria. Thanks to the hierarchical structure, how do I find what tables contain the record with the very same ID using SQL only? I.e. my goal is to find out what * the lowest type in the hierarchy* the certain ID is related to.
Note: If you have also an answer for a newer version of Oracle, don't hesitate to include it as long as others might find it useful.
Assuming all these tables have a column ID but you may adjust based on the example.
Q1. what tables contain the record with the very same ID using SQL only
You could use a series unions to determine this eg.
SELECT
id,
table_type,
heirarchy_level
FROM (
SELECT ID, 'FOOD', 1 FROM FOOD
UNION ALL
SELECT ID,'FRUIT',2 FROM FRUIT
UNION ALL
SELECT ID,'CYTRUS',3 FROM CYTRUS
UNION ALL
SELECT ID,'ORANGE',4 FROM ORANGE
UNION ALL
SELECT ID,'GREPFRUIT',4 FROM GREPFRUIT
UNION ALL
SELECT ID,'VEGETABLE',2 FROM VEGETABLE
UNION ALL
SELECT ID,'MEAT',2 FROM MEAT
UNION ALL
SELECT ID,'BEEF',3 FROM BEEF
UNION ALL
SELECT ID,'SIRLOIN',4 FROM SIRLOIN
UNION ALL
SELECT ID,'RIBEYE',4 FROM RIBEYE
UNION ALL
SELECT ID,'CHICKEN',3 FROM CHICKEN
) t
WHERE
id = 123
This would return a table with the id=123 but more importantly a table listing all tables where the record was present along with the depth/level in the hierarchy. You could then use MAX or order by to determine the deepest level
Q2. what is the lowest type in the hierarchy the certain ID is related to
This would return only one record with the lowest type
SELECT
id,
table_type,
heirarchy_level
FROM (
SELECT ID, 'FOOD', 1 FROM FOOD
UNION ALL
SELECT ID,'FRUIT',2 FROM FRUIT
UNION ALL
SELECT ID,'CYTRUS',3 FROM CYTRUS
UNION ALL
SELECT ID,'ORANGE',4 FROM ORANGE
UNION ALL
SELECT ID,'GREPFRUIT',4 FROM GREPFRUIT
UNION ALL
SELECT ID,'VEGETABLE',2 FROM VEGETABLE
UNION ALL
SELECT ID,'MEAT',2 FROM MEAT
UNION ALL
SELECT ID,'BEEF',3 FROM BEEF
UNION ALL
SELECT ID,'SIRLOIN',4 FROM SIRLOIN
UNION ALL
SELECT ID,'RIBEYE',4 FROM RIBEYE
UNION ALL
SELECT ID,'CHICKEN',3 FROM CHICKEN
) t
WHERE
id = 123
ORDER BY
heirarchy_level desc
LIMIT 1

Oracle Recursive Query Connect By Loop in data

I have a table that looks essentially like this (the first row pk1=1 is the parent row)
pk1
event_id
parent_event_id
1
123
123
2
456
123
3
789
456
Given any particular row in the above table, I need a query that returns all the related rows (up and down the hierarchy). I was trying to do this via an initial CTE table that grabs all the parent rows. Then use that as my base table and join back into the above table using a recursive query to navigate down (this seems wildly inefficient and I assume there is a better way???).
However, trying even the first step (populating my CTE table) and using a query like below to navigate up returns the connect by LOOP error.
select event_id, level
from myTable
start with pk1 = 2
connect by prior parent_event_id = event_id
I assume this is due to the fact the parent row is self-referencing (event_id = parent_event_id)? If I add in the NOCYCLE statement, then the recursion stops at the row prior to the actual parent.
Two questions:
1.) Is there a better way to do this in one query?
2.) Any clue how to tweak the above to get the parent row returned?
Thanks
I'm not super clear on what you mean by "all the related rows (up and down the tree)", but it might be possible.
Here, I'm adding more logic to the connect clause to go up OR down the tree. This includes direct parents and descendants, but also includes siblings/cousins to the starting node. That might or might not be what you want.
with mytable as (select 1 as pk1, 123 as event_id, 123 as parent_event_id from dual
union select 2, 456, 123 from dual
union select 3, 789, 456 from dual
union select 4, 837, 123 from dual)
select pk1, event_id, level, SYS_CONNECT_BY_PATH(event_id, '/') as path
from myTable
start with pk1 = 2
connect by nocycle (prior parent_event_id = event_id and prior event_id <> event_id)
or (prior event_id = parent_event_id)
The tweak to get the root parent to show up is just and prior event_id <> event_id - ie, don't go further up the tree if the parent node = the current node.
I added an example row (pk1=4) to show a sibling row (not direct parent or descendant) being returned.

Recursive Delete SQL Oracle

I'm searching a way to do a recursive delete on a table.
The situation is that table have 3 foreign key 1 on itself and 2 others, I want to delete depending on the date of the occurrence.
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
2, 13-07-18, null
3, 14-12-31, 1
4, 13-06-26, 1
5, 14-07-23, null
6, 13-07-22, 2
Table2--> ID, stuff
Table3 --> ID, stuff
The ID of Table 2 and Table 3 are linked directly on ID of Table1.
The amount of data inside table 1 is approximately 20 000 000 row and the others table is approximately the same amount.
Here is on of the request I tried(its inside of a cursor who delete the data returned.
SELECT EO.ID,
EO.DATEOCC,
EO.PARENTID
FROM TABLE1 EO
WHERE EO.DATEOCC <= TO_DATE ('2013-12-31','YYYY-MM-DD')
AND NOT EXISTS(SELECT 1 FROM TABLE2 WHERE ID = EO.ID)
AND NOT EXISTS( SELECT 1 FROM TABLE3 WHERE ID = EO.ID)
START WITH EO.PARENTID IS NULL
CONNECT BY PRIOR EO.ID = EO.PARENTID;
This request is really really slow to output the data that I want.
And it seems that is not return the data that I need to delete.
Edit #1
Ok so heres an example of what I need to do(In this example I suppose that the table 2 and table 3 have no matching ID on Table 1)
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
2, 13-07-18, null
3, 14-12-31, 1
4, 13-06-26, 1
5, 14-07-23, null
6, 13-07-22, 2
After the delete sequence the table have to be like that if the >= date is 13-12-31
Table1 --> Id1, dateOCC, ParentID
1, 13-12-26, null
3, 14-12-31, 1
5, 14-07-23, null
So as you can see I delte the child that I can delete with his parent if possible. If I cant delete his parent because another child exist and I cant delete it I dont delete de parent(delete only the child that I can).
In a hierarchical query, the WHERE clause is applied after the START WITH and CONNECT BY are used to build the hierarchy. But syntactically it comes first, which makes it intuitively seem that it will be applied first.
If what you really want is to apply the WHERE clause first, then build the hierarchy, you can use a subquery like this:
SELECT EO.ID,
EO.DATEOCC,
EO.PARENTID
FROM (
SELECT * FROM TABLE1 EO
WHERE EO.DATEOCC <= TO_DATE ('2013-12-31','YYYY-MM-DD')
AND NOT EXISTS(SELECT 1 FROM TABLE2 WHERE ID = EO.ID)
AND NOT EXISTS( SELECT 1 FROM TABLE3 WHERE ID = EO.ID)
) EO
START WITH EO.PARENTID IS NULL
CONNECT BY PRIOR EO.ID = EO.PARENTID;
But it is not clear whether that is what you want. This would give you the top-level parents within the desired date range, and without children in the other tables, then build the entire hierarchy for those parents. It's possible that lower nodes in the hierarchy would have children in the other tables, which would cause the delete to fail.
If that's not what you want, I think you need to describe your requirements more clearly.

Reporting against a CSV field in a SQL server 2005 DB

Ok so I am writing a report against a third party database which is in sql server 2005. For the most part its normalized except for one field in one table. They have a table of users (which includes groups.) This table has a UserID field (PK), a IsGroup field (bit) , a members field (text) this members field has a comma separated list of all the members of this group or (if not a group) a comma separated list of the groups this member belongs to.
The question is what is the best way to write a stored procedure that displays what users are in what groups? I have a function that parses out the ids into a table. So the best way I could come up with was to create a cursor that cycles through each group and parse out the userid, write them to a temp table (with the group id) and then select out from the temp table?
UserTable
Example:
ID|IsGroup|Name|Members
1|True|Admin|3
2|True|Power|3,4
3|False|Bob|1,3
4|False|Susan|2
5|True|Normal|6
6|False|Bill|5
I want my query to show:
GroupID|UserID
1|3
2|3
2|4
5|6
Hope that makes sense...
If you have (or could create) a separate table containing the groups you could join it with the users table and match them with the charindex function with comma padding of your data on both sides. I would test the performance of this method with some fairly extreme workloads before deploying. However, it does have the advantage of being self-contained and simple. Note that changing the example to use a cross-join with a where clause produces the exact same execution plan as this one.
Example with data:
SELECT *
FROM (SELECT 1 AS ID,
'1,2,3' AS MEMBERS
UNION
SELECT 2,
'2'
UNION
SELECT 3,
'3,1'
UNION
SELECT 4,
'2,1') USERS
LEFT JOIN (SELECT '1' AS MEMBER
UNION
SELECT '2'
UNION
SELECT '3'
UNION
SELECT '4') GROUPS
ON CHARINDEX(',' + GROUPS.MEMBER + ',',',' + USERS.MEMBERS + ',') > 0
Results:
id members group
1 1,2,3 1
1 1,2,3 2
1 1,2,3 3
2 2 2
3 3,1 1
3 3,1 3
4 2,1 1
4 2,1 2
Your technique will probably be the best method.