SQL to retrieve all linked records - sql

I have a product table (tProduct) and a product links table (tProductLink) to allow establishing links between products. Given a ProductID and ProductLinkID, I need to get all of the tProduct.ID records that are related.
In the example table (tProductLink) below, all of the ID's would be returned. Note that it's not possible to create a recursive link; that is given the first row in the table below there cannot be a row where ProductID is 31563 and ProductID is 28818.
So say I search for all products related to the link in row 4, ProductID 137902 and LinkProductID 410901. Give that link, it should return all six rows.
Here is an example of the data.
I have tried various techniques such as a recursive CTE and calling a table function using "cross apply" but I have got nowhere.
This is one of the last solutions I tried, which ended up not returning all products as noted in the comments.
declare #ProductID int, #ProductLinkID int
select #ProductID = 137902
select #ProductLinkID = 410901
;with p1 as
(
select ProductID, ProductLinkID
from tProductLink
where ProductID = #ProductID and ProductLinkID = #ProductLinkID
union all
select tProductLink.ProductID, tProductLink.ProductLinkID
from tProductLink
join p1 on p1.ProductLinkID = tProductLink.ProductID
where not (tProductLink.ProductID = #ProductID and tProductLink.ProductLinkID = #ProductLinkID)
)
select distinct ProductID from p1
union
select ProductLinkID from p1

You start with one ID. This can be in multiple rows ProductLinkId or ProductId in the second table. You look up the corresponding IDs thus found again in the second table.
This asks for a recursive query, where you always collect all corresponding IDs. Unfortunately SQL Server does not support DISTINCT in recursive queries, so the same IDs get looked up multiple times. SQL Server also doesn't prevent from cycles (but fails instead), so we must prevent them ourselves by remembering which IDs we already found. This would ideally be done with an array or set that we fill, but SQL Server doesn't support such, so we must build a string instead.
The complete query:
with cte(id, seen) as
(
select 28520 as id, cast('/28520/' as varchar(max)) as seen from t1
union all
select case when cte.id = t2.productid then t2.linkproductid
else t2.productid end as id,
cte.seen + cast(case when cte.id = t2.productid
then t2.linkproductid
else t2.productid end as varchar(max)) + '/'
from cte
join t2 on cte.id in (t2.productid, t2.linkproductid)
and charindex('/' + cast(case when cte.id = t2.productid
then t2.linkproductid
else t2.productid end as varchar(max))+ '/', cte.seen) = 0
)
select distinct id from cte
option (maxrecursion 1000);
Rextester demo: http://rextester.com/WJJ78304

Related

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Improving recursive SQL looping

I am trying to solve a performance issue on an inherited system that appears when we have a significant amount of data.
We have a table that contains the two fields "ItemID" and "ParentItemID".
The "ParentItemID" field relates to another row in the same talbe where the "ItemID" field matches this row's "ParentItemID" field.
This relationship can be many, many rows deep in places.
The following query is being run and looks like it could be another cause of slowdown:
WHILE 1=1
BEGIN
SELECT #ParentID = ParentItemID FROM Items WHERE ItemID = #LastParentID
IF #parentID IS NULL
BEGIN
break
END
ELSE
BEGIN
SET #LastParentID = #ParentID
END
END
Is there a better way of doing this sort of recursive search?
note: we are NOT allowed to make table changes at this point, so adding a "RootItemID" column is not possible (I've already asked, as this would solve the problem outright!)
You could use a common table expression for this:
WITH Antecedents (ITemID, ParentItemID, Level)
AS
(
-- Anchor member definition
SELECT ItemID, ParentItemID, 0 AS Level FROM Items WHERE ItemID = #StartingID
UNION ALL
SELECT ItemID, ParentItemID, Antecedents.Level + 1 AS Level
FROM Items
INNER JOIN Antecedents
ON Antecedents.ParentItemID = Items.ItemID
)
SELECT TOP 1 #LastParentID = ItemID
FROM Antecedents
ORDER BY Level DESC
More info on recursive CTE's here:
http://msdn.microsoft.com/en-us/library/ms186243.aspx
you can do it by a Common Table Expression like :
;WITH cte_hierarchy
AS (SELECT *
FROM item
WHERE ItemID = #ParentID
UNION ALL
SELECT i.*
FROM item i
JOIN cte_hierarchy h
ON i.ItemID = h.ParentItemID)
SELECT *
FROM cte_hierarchy
WHERE .....

SQL Server CTE - SELECT after UPDATE using OUTPUT INSERTED

I've seen a few posts on using CTE (WITH) that I thought would address my issue but I can't seem to make it work for my specific use case. My use case is that I have a table with a series of records, and I need to pull some number of records AFTER a small update has been made to them.
i.e.
- retrieve records where a series of conditions are met
- update one or more columns in each of those records
- return the updated records
I know I can return the IDs of the records using the following:
WITH cte AS
( SELECT TOP 1 * FROM msg
WHERE guid = 'abcd'
AND active = 1
ORDER BY created DESC )
UPDATE cte SET active = 0
OUTPUT INSERTED.msg_id
WHERE guid = 'abcd'
That nicely returns the msg_id field. I tried wrapping all of that in a SELECT * FROM msg WHERE msg_id IN () query, but it fails.
Anyone have a suggestion? For reference, using SQL Server 2008 R2.
CREATE TABLE #t (msg_id int)
;
WITH cte AS
( SELECT TOP 1 * FROM msg
WHERE guid = 'abcd'
AND active = 1
ORDER BY created DESC )
UPDATE cte SET active = 0
OUTPUT INSERTED.msg_id INTO #t
WHERE guid = 'abcd'
SELECT *
FROM #t
You can select the data that you need by just adding all columns that you want. INSERTED contains all columns, not just the ones written to. You can also output columns from the cte alias. Example:
OUTPUT INSERTED.SomeOtherColumn, cte.SomeOtherColumn

SQL - passing variable from first select to second select

I have one table things full of items listed by ItemID. Given an ItemID, I need to get the record with the ItemID and all other items with the same name.
In the sample data below, given the ItemID of 1, I need to select all records with the same name (in this case, "poptarts") as ItemID 1, including the record with ItemID 1.
ItemID = 1 name = poptarts
ItemID = 7 name = poptarts
ItemID = 8 name = cheddar
ItemID = 323 name = poptarts
select a.ItemID, a.name from things where a.ItemID = '1'
UNION
select b.ItemID, b.name from things where b.name = a.name
The SQL I've written above however does not pass a.name to the second select. Is there any way to pass the first name value to the second select? I would like for the statement to return itemid = 1 as the first row and 7 and 323 as the other rows.
UNION is only really used to concatenate two distinct sets. Based on your example, you could probably do something like this:
SELECT a.ItemID, a.Name
FROM things a
WHERE name IN (SELECT name FROM things WHERE itemID = 1)
There are lots of ways to write this kind of query and will depend on which flavor of SQL you're using but this should be more or less universal.
select
a.itemID,
a.name
from
things a
where a.name in (
select name
from things b
where b.itemID = '1'
)
SELECT this.name, this.id, that.id
FROM thing this
LEFT JOIN thing that ON that.name=this.name AND that.id <> this.id
WHERE this.id = 1
;
NOTE: this also selects the this-rows that have no twin records; in that case the that.id will be NULL. If you want to suppress the records without twin-records, remove the LEFT.
UPDATE: added the id <> id clause to suppres the obvious match.
If you really only have one table, no need to bring it in twice, UNION, or anything fancy like htat.
SELECT
name
FROM
a --assuming this is your only table
GROUP BY
itemID, name
HAVING
itemID = '1'