Improving recursive SQL looping - sql

I am trying to solve a performance issue on an inherited system that appears when we have a significant amount of data.
We have a table that contains the two fields "ItemID" and "ParentItemID".
The "ParentItemID" field relates to another row in the same talbe where the "ItemID" field matches this row's "ParentItemID" field.
This relationship can be many, many rows deep in places.
The following query is being run and looks like it could be another cause of slowdown:
WHILE 1=1
BEGIN
SELECT #ParentID = ParentItemID FROM Items WHERE ItemID = #LastParentID
IF #parentID IS NULL
BEGIN
break
END
ELSE
BEGIN
SET #LastParentID = #ParentID
END
END
Is there a better way of doing this sort of recursive search?
note: we are NOT allowed to make table changes at this point, so adding a "RootItemID" column is not possible (I've already asked, as this would solve the problem outright!)

You could use a common table expression for this:
WITH Antecedents (ITemID, ParentItemID, Level)
AS
(
-- Anchor member definition
SELECT ItemID, ParentItemID, 0 AS Level FROM Items WHERE ItemID = #StartingID
UNION ALL
SELECT ItemID, ParentItemID, Antecedents.Level + 1 AS Level
FROM Items
INNER JOIN Antecedents
ON Antecedents.ParentItemID = Items.ItemID
)
SELECT TOP 1 #LastParentID = ItemID
FROM Antecedents
ORDER BY Level DESC
More info on recursive CTE's here:
http://msdn.microsoft.com/en-us/library/ms186243.aspx

you can do it by a Common Table Expression like :
;WITH cte_hierarchy
AS (SELECT *
FROM item
WHERE ItemID = #ParentID
UNION ALL
SELECT i.*
FROM item i
JOIN cte_hierarchy h
ON i.ItemID = h.ParentItemID)
SELECT *
FROM cte_hierarchy
WHERE .....

Related

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

SQL to retrieve all linked records

I have a product table (tProduct) and a product links table (tProductLink) to allow establishing links between products. Given a ProductID and ProductLinkID, I need to get all of the tProduct.ID records that are related.
In the example table (tProductLink) below, all of the ID's would be returned. Note that it's not possible to create a recursive link; that is given the first row in the table below there cannot be a row where ProductID is 31563 and ProductID is 28818.
So say I search for all products related to the link in row 4, ProductID 137902 and LinkProductID 410901. Give that link, it should return all six rows.
Here is an example of the data.
I have tried various techniques such as a recursive CTE and calling a table function using "cross apply" but I have got nowhere.
This is one of the last solutions I tried, which ended up not returning all products as noted in the comments.
declare #ProductID int, #ProductLinkID int
select #ProductID = 137902
select #ProductLinkID = 410901
;with p1 as
(
select ProductID, ProductLinkID
from tProductLink
where ProductID = #ProductID and ProductLinkID = #ProductLinkID
union all
select tProductLink.ProductID, tProductLink.ProductLinkID
from tProductLink
join p1 on p1.ProductLinkID = tProductLink.ProductID
where not (tProductLink.ProductID = #ProductID and tProductLink.ProductLinkID = #ProductLinkID)
)
select distinct ProductID from p1
union
select ProductLinkID from p1
You start with one ID. This can be in multiple rows ProductLinkId or ProductId in the second table. You look up the corresponding IDs thus found again in the second table.
This asks for a recursive query, where you always collect all corresponding IDs. Unfortunately SQL Server does not support DISTINCT in recursive queries, so the same IDs get looked up multiple times. SQL Server also doesn't prevent from cycles (but fails instead), so we must prevent them ourselves by remembering which IDs we already found. This would ideally be done with an array or set that we fill, but SQL Server doesn't support such, so we must build a string instead.
The complete query:
with cte(id, seen) as
(
select 28520 as id, cast('/28520/' as varchar(max)) as seen from t1
union all
select case when cte.id = t2.productid then t2.linkproductid
else t2.productid end as id,
cte.seen + cast(case when cte.id = t2.productid
then t2.linkproductid
else t2.productid end as varchar(max)) + '/'
from cte
join t2 on cte.id in (t2.productid, t2.linkproductid)
and charindex('/' + cast(case when cte.id = t2.productid
then t2.linkproductid
else t2.productid end as varchar(max))+ '/', cte.seen) = 0
)
select distinct id from cte
option (maxrecursion 1000);
Rextester demo: http://rextester.com/WJJ78304

Moving an an entire row from one position to another

I have 5 rows from my query results and i want to know if its even possible to move an entire row from one position to another. For example the third row to the first row.
Below is my sql code:
DECLARE #ClientID int = 1041
DECLARE #ProfileID int =2520
DECLARE #PageType tinyint = 2
BEGIN
WITH SortedList
AS (
SELECT PageID, PageName, PageTitle, PageUrl, ParentID,
CAST((PageName) AS VARCHAR(1000)) AS "Path"
FROM Pagelist p
WHERE p.ParentId IS NULL and p.PageType=#PageType
UNION ALL
SELECT p.PageID, p.PageName, p.PageTitle, p.PageUrl, p.ParentID,
CAST((a.path + '/' + p.PageName) AS VARCHAR(1000)) AS "Path"
FROM Pagelist AS p
JOIN SortedList AS a
ON p.ParentID = a.PageID
WHERE p.PageType=#PageType
)
SELECT a.PageID,a.PageTitle,a.ParentID,ua.Access,a.Path,a.PageName
FROM SortedList as a,ProfilePageAccess ua,UserProfile up
WHERE ua.ClientID=#ClientID and ua.PageID=a.PageID
and up.ProfileID=ua.ProfileID and up.ProfileID=#ProfileID
ORDER BY a.Path
END
I want to be able to move the third row to first row.
This is my current results
PAGEID PAGETITLE PARENTID ACCESS PATH PAGENAME
001R Administration 801 2 HRAdmin HrAdmin
002R Performance 802 2 HRAdmin/AdminPer AdminPer
003R Overall Ratings 803 2 HRAdmin/AdminPerformance Perform
004R Score Ratings 804 2 HRAdmin/AdminPerformance Perform
005R Template Setup 805 2 HRAdmin/AdminPerformance Perform
In your ORDER BY you have: ORDER BY a.Path
This is ordering by a.Path in ascending (alphabetical) order.
If you want to order by specific values of some other column, you can use an ORDER BY with a CASE Statement.
ex.
...
SELECT a.PageID,a.PageTitle,a.ParentID,ua.Access,a.Path,a.PageName
FROM SortedList as a,ProfilePageAccess ua,UserProfile up
WHERE ua.ClientID=#ClientID and ua.PageID=a.PageID
and up.ProfileID=ua.ProfileID and up.ProfileID=#ProfileID
ORDER BY CASE a.PageTitle
WHEN 'Overall Ratings' THEN 1
WHEN 'Score Ratings' THEN 2
ELSE 3
END
One way is to change your ORDER BY to something more complex. For example, in Postgres you could change your query from
SELECT a.PageID,a.PageTitle,a.ParentID,ua.Access,a.Path,a.PageName
to
SELECT a.PageID,a.PageTitle,a.ParentID,ua.Access,a.Path,a.PageName, a.PageID == '003R' as priority_column
and change
ORDER BY a.Path
to
ORDER BY priority_column DESC, a.Path
Then it will sort results such that all rows where priority_column is true are above all rows where priority_column is false.
EDIT: Zorkolot's answer is cleaner than this; probably go with that one.

Teradata SQL Reverse Parent Child Hierarchy

I know how to build a hierarchy starting with the root node (i.e. where parent_id is null or something like that), but I can't find anything on how to build a hierarchy upward from the final child/edge node. I'd like to start with a child and build all the way back up to the top. Assume I don't know how many levels, or who the parent is, and we'll have to use SQL to figure it out.
Here is my base table:
old_entity_key,new_entity_key
1,2
2,3
3,4
4,5
5,6
Desired output:
new_entity_key,path
2,1/2
3,1/2/3
4,1/2/3/4
5,1/2/3/4/5
6,1/2/3/4/5/6
This is also acceptable:
new_entity_key,path
2,2/1
3,3/2/1
4,4/3/2/1
5,5/4/3/2/1
6,6/5/4/3/2/1
Here is the CTE I've started with:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path
from table
where new_entity_key not in (select old_entity_key from table)
and cast(start_time as date) between current_date - interval '3' day and current_date
union all
select
c.old_entity_key,
c.new_entity_key,
p.new_entity_key||'/'||c.path
from history c
join table p on p.new_entity_key = c.old_entity_key
)
select new_entity_key, old_entity_key, substr(path, 1, instr(path, '/') - 1) as original_entity_key, path
from history s;
The problem with the above query is that it runs forever. I think I've created an infinite loop. I've also tried using the below where filter in the bottom query of the union to try to find the root node, but Teradata gives me an error:
where p.new_entity_key in (select old_entity_key from table)
Any help would be greatly appreciated.
You'll need some sort of counter, and I think your join logic in your CTE doesn't make sense. I threw together a very simple volatile table example:
create volatile table tb
(old_entity_key char(1),
new_entity_key char(1),
rn integer)
on commit preserve rows;
insert into tb values ('1','2',1);
insert into tb values ('2','3',2);
insert into tb values ('3','4',3);
Now we can put together a recursive CTE:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path,
rn
from tb t
where
rn = 1
union all
select
t.old_entity_key,
t.new_entity_key,
h.path || '/' || t.new_entity_key,
t.rn
from
tb t
join history h
on t.rn = h.rn + 1
)
select * from history order by rn
The important things here are:
Limit your first pass (accomplished here by rn=1).
The second pass needs to pick up the "next" row, based on the previous row (t.rn = h.rn + 1)

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example