Faster Join on Sub-query

Faster Join on Sub-query - sql

I'm attempting to do a single query for a report that I need and I'm not sure how to get past a speed issue.
Expected Outcome
I need to have a single row per patient that lists all of their diagnosis codes in a the same column. My code does work and gets the job done but it increases my runs which must be done 30 times under different criteria and will make a 5 minute process about 30.
Attempted Resolution
I am using the following code to left outer join to.
left outer join (Select distinct add2.VisitID,
substring((Select ', '+add1.Diagnosis AS [text()]
From AbsDrgDiagnoses add1 Where add1.VisitID = add2.VisitID
ORDER BY add1.VisitID,DiagnosisSeqID For XML PATH ('')), 2, 1000) DiagText
From [Livendb].[dbo].[AbsDrgDiagnoses] add2) add3 on diag.VisitID = add3.VisitID
Outcome
This works but my 9 second query over a month of data with only a filter one 1 of 30 codes raises to 1m 12s. If I run the query by itself it takes 3m 49s seconds to compile so its an improvement in my main table but I would like to slim this down if possible.
Other Attempted Resolutions
I attempted to create a view from the query and use that but received the same run time.
I also attached SourceID which is always the same value but my 8 tables use this in their index but it actually slightly increased my time.
Conclusion
The table that I need to merge contains around 30 million rows which is most likely the issue and there is no way around the increased time, but I'm hoping someone may have a trick that could help me decrease this time.

This is your subquery:
(Select distinct add2.VisitID,
substring((Select ', '+add1.Diagnosis AS [text()]
From AbsDrgDiagnoses add1
Where add1.VisitID = add2.VisitID
order by add1.VisitID,DiagnosisSeqID
For XML PATH ('')
), 2, 1000) DiagText
From [Livendb].[dbo].[AbsDrgDiagnoses] add2
) add3
on diag.VisitID = add3.VisitID
Let me assume that when you remove it, the query is fast.
I think you would be better off with outer apply:
outer apply
(select stuff((Select ', ' + add1.Diagnosis as [text()]
From AbsDrgDiagnoses add
Where diag.VisitID = add.VisitID
order by DiagnosisSeqID
For XML PATH ('')
), 1, 2, '') DiagText
) add3
I can't imagine that the second level of subqueries actually helps performance.
And, speaking of performance, you can use an index on AbsDrgDiagnoses(VisitID, DiagnosisSeqID, Diagnosis).

Related

Best way to fetch 24K records and stuff in sql

I have query which fetches addresses of industry xyz. Which has around 24K records. And those has to display in Level1->Level2->Level3->Level4->Level5->Level6 format in one column of excel sheet.
It took more than 30 min to execute.
Select Left(REPLACE(STUFF(
(
SELECT Char(10) + LevelPath
FROM #TempIndustryAddress ubu WITH (NOLOCK)
WHERE LevelPath<> ''
AND industrycode= ubu.industrycode
GROUP BY LevelPath
FOR XML PATH('')
)
,1,1,''), '&','&'),32766) AS [Industry Addresses]
Thanks in advance.
Regards,
Netra S W.

Two things. First, remove the GROUP BY from the subquery -- unless you know you have duplicates (which would be string).
Second, add an index on #TempIndustryAddress(industrycode, LevelPath).

CTE query in SQL Server : exit when one row exists in result

I'm writing a SQL Server procedure to optimize cut of bars. I haven't found yet the best method. Seems to be CTE request, but I'm stuck.
I try to write a stored procedure to optimize cut of bars. For my test, I have to cut 18 pieces (3 of 1000 mm, 3 of 1500 mm, 3 of 2500mm, 3 of 3500 mm, 3 of 4500 mm and 3 of 6000 mm), and I have 3 sizes of bars (5500mm, 7000mm and 8500mm).
After that, I generate every combination of bars with any cuts as possible.
I tried with a while loop and a temporary table, It takes me one hour and a half. But I think I can do better with a CTE request...
Now, I must generate every combination of many bars to have my 18 cuts. I made another CTE request, but I haven't find the way to stop recursivity when at least one combination has all the cuts. So, my request find over 150 millions combinations, with 8,9,10,11... bars. And it tries every loop with 18 bars. I want it to stop with 8 bars (I know it is the smallest bar count I need for my cuts). And it takes more than two days !
I have 2 temporary tables, on with my combination of bars (#COMBI_BARRE) with this structure : ID_ART : identity for article, COLOR, CUT_COMBI : a varchar concat the cut ID of the bar combination : 1-2-3-4..., NB_CUTSan integer to get the count of cuts in the bar, FIRST_CUT the smaller cut ID of the bar.
I have another temporary table #DET_BAR with the detail of my cuts, with 2 columns : ID_COMBI_BAR the bar combination ID and ID_CUT_STR, the cut ID in varchar (to avoid cast or convert in CTE for better performance).
I store the result in a table call Combi, with the ID_ART, the COLOR, a varchar column Combi who concat the the bar combination ID (1-2-3-4...), a varchar column COMBI_CUT who concat the ID_CUT (1-2-3-4-5...), NB_BAR the count of bar in the combination, NB_CUTS : the count of cuts in the combination, MAX_CUTS the total number of cut I must to for my article and color.
As it makes one loop per bar,I tried to add a exists clause to stop recursivity when the number of loop has at least one combination with all my cuts. I know I must not cut 10 bars if I can do it with 8. But I get an error "the recursive table has multiple reference'.
How can I make my request and avoid every loop ?
;WITH Combi (ID_ART, COLOR, COMBI, COMBI_CUT, NB_BAR, NB_CUTS, MAX_CUTS)
AS
( SELECT C.ID_ART,
C.COLOR,
'-' + ID_COMBI_BAR_STR + '-',
'-' + C.CUT_COMBI + '-',
1,
C.NB_CUTS,
ISNULL(MAXI.CUT_NUM,0)
FROM #COMBI_BARRE C with(nolock)
outer apply (select top 1 D.CUT_NUM
from #DEBITS D
where D.ID_ART = C.ID_ART
and D.COLOR= C.COLOR
order by D.NUM_OCC_DEB desc) MAXI
WHERE C.FIRST_CUT = 1
UNION ALL
SELECT C.ID_ART,
C.COLOR,
Combi.COMBI + ID_COMBI_BAR_STR + '-',
Combi.COMBI_CUT+ C.CUT_COMBI + '-',
Combi.NB_BAR+ 1,
Combi.NB_CUTS+ C.NB_CUTS,
Combi.MAX_CUTS
FROM #COMBI_BARRE C with(nolock)
INNER JOIN Combi on C.ID_ART = Combi.ID_ART
and C.COLOR= Combi.COLOR
where C.FIRST_CUT > Combi.NB_BAR
and Combi.NB_CUTS+ C.NB_CUTS<= Combi.MAX_CUTS
and NOT EXISTS(select * from #DET_BAR D with(nolock)
where D.ID_COMBI_BAR = C.ID_COMBI_BAR
and PATINDEX(D.ID_CUT_STR, Combi.COMBI_CUT) > 0)
and NOT EXISTS(select top 1 * from Combi Combi2 where Combi2.ID_ART = C.ID_ART and Combi2.COLOR = C.COLOR and Combi2.NB_CUTS = Combi2.MAX_CUTS)
)
select * from Combi

This is a variation of the bin packing problem. That search term might help you in the right direction.
Also, you can to go my Bin Packing page, which gives several approaches to the more simplified version of your problem.
A small warning: the linked article(s) don't use any (recursive) CTE, so they won't answer your specific CTE question.

Trouble with pulling distinct data

Ok this is hard to explain partially because I'm bad at sql but this code isn't doing exactly what I want it to do. I'll try to explain what it is supposed to do as best I can and hopefully someone can spot a glaring mistake. I'm sorry about the long winded explanation but there is a lot going on here and I really could use the help.
The point of this script is to search for parts which need to be obsoleted. in other words they haven't been used in three years and are still active.
When we obsolete part, "part.status" is set to 'O'. It is normally null. Also, the word 'OBSOLETE' is usually written in to "part.description"
The "WORK_ORDER" contains every scheduled work order. These are defined by base,lot, and sub ID's. It also contains many dates such as the date when the work order was closed.
the "REQUIREMENT" table contains all the parts require for each job. many jobs may require multiple parts, some at different legs of the job. The way this is handled is that for a given "REQUIREMENT.WORKORDER_BASE_ID" and "REQUIREMENT.WORKORDER_LOT_ID", they may be listed on a dozen or so subsequent rows. Each line specifies a different "REQUIREMENT.PART_ID". The sub id separates what leg of the job that the part is needed. All of the parts I care about start with 'PCH'
When I run this code it returns 14 lines, I happen to know it should be returning about 39 right now. I believe the screwy part starts at line 17. I found that code on another form hoping that it would help solve the original problem. Without that code, I get like 27K lines because the DB is pulling every criteria matching requirement from every criteria matching work order. Many of these parts are used on multiple jobs. I've also tried using DISTINCT on REQUIREMENT.PART_ID which seems like it should solve the problem. Alas it doesn't.
So I know despite all the information I probably still didn't give nearly enough. Does anyone have any suggestions?
SELECT
PART.ID [Engr Master]
,PART.STATUS [Master Status]
,WO.CLOSE_DATE
,PT.ID [Die]
,PT.STATUS [Die Status]
FROM PART
CROSS APPLY(
SELECT
WORK_ORDER.BASE_ID
,WORK_ORDER.LOT_ID
,WORK_ORDER.SUB_ID
,WORK_ORDER.PART_ID
,WORK_ORDER.CLOSE_DATE
FROM WORK_ORDER
WHERE
GETDATE() - (360*3) > WORK_ORDER.CLOSE_DATE
AND PART.ID = WORK_ORDER.PART_ID
AND PART.STATUS ='O'
)WO
CROSS APPLY(
SELECT
REQUIREMENT.WORKORDER_BASE_ID
,REQUIREMENT.WORKORDER_LOT_ID
,REQUIREMENT.WORKORDER_SUB_ID
,REQUIREMENT.PART_ID
FROM REQUIREMENT
WHERE
WO.BASE_ID = REQUIREMENT.WORKORDER_BASE_ID
AND WO.LOT_ID = REQUIREMENT.WORKORDER_LOT_ID
AND WO.SUB_ID = REQUIREMENT.WORKORDER_SUB_ID
AND REQUIREMENT.PART_ID LIKE 'PCH%'
)REQ
CROSS APPLY(
SELECT
PART.ID
,PART.STATUS
FROM PART
WHERE
REQ.PART_ID = PART.ID
AND PART.STATUS IS NULL
)PT
ORDER BY PT.ID

This is difficult to understand without any sample data, but I took a stab at it anyway. I removed the second JOIN to PART (that had alias PART1) as it seemed unecessary. I also removed the subquery that was looking for parts HAVING COUNT(PART_ID) = 1
The first JOIN to PART should be done on REQUIREMENT.PART_ID = PART.PART_ID as the relationship as already been defined from WORK_ORDER to REQUIREMENT, hence you can JOIN PART directly to REQUIREMENT at this point.
EDIT 03/23/2015
If I understand this correctly, you just need a distinct list of PCH parts, and their respective last (read: MAX) CLOSE_DATE. If that is the case, here is what I propose.
I broke the query up into a couple of CTE's. The first CTE is simply going through the PART table and pulling out a DISTINCT list of PCH parts, grouping by PART_ID and DESCRIPTION.
The second CTE, is going through the REQUIREMENT table, joining to the WORK_ORDER table and, for each PART_ID (handled by the PARTITION) assigning the CLOSE_DATE a ROW_NUMBER in descending order. This will ensure that each ROW_NUMBER with a value of "1" will be the Max CLOSE_DATE for each PART_ID.
The final SELECT statement simply JOINS the two Cte's on PART_ID, filtering where LastCloseDate = 1 (the ROW_NUMBER assigned in the second CTE).
If I understand the requirements correctly, this should give you the desired results.
Additionally, I removed the filter WHERE PART.DESCRIPTION NOT LIKE 'OB%' because we're already filtering by PART.STATUS IS NULL and you stated above that an 'O' is placed in this field for Obsolete parts. Also, [DIE] and [ENGR MASTER] have the same value in the 27 rows being pulled before, so I just used the same field and labeled them differently.
; WITH Parts AS(
SELECT prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
FROM PART prt
WHERE prt.STATUS IS NULL
AND prt.PART_ID LIKE 'PCH%'
GROUP BY prt.ID, prt.DESCRIPTION
)
, LastCloseDate AS(
SELECT req.PART_ID
, wrd.CLOSE_DATE
, ROW_NUMBER() OVER(PARTITION BY req.PART_ID ORDER BY wrd.CLOSE_DATE DESC) AS LastCloseDate
FROM REQUIREMENT req
INNER JOIN WORK_ORDER wrd
ON wrd.BASE_ID = req.WORKORDER_BASE_ID
AND wrd.LOT_ID = req.WORKORDER_LOT_ID
AND wrd.SUB_ID = req.WORKORDER_SUB_ID
WHERE wrd.CLOSE_DATE IS NOT NULL
AND GETDATE() - (365 * 3) > wrd.CLOSE_DATE
)
SELECT prt.PART_ID AS [DIE]
, prt.PART_ID AS [ENGR MASTER]
, prt.DESCRIPTION
, lst.CLOSE_DATE
FROM Parts prt
INNER JOIN LastCloseDate lst
ON prt.PART_ID = lst.PART_ID
WHERE LastCloseDate = 1

Get once time a duplicate record (SQL)

SELECT DISTINCT A.LeaseID,
C.SerialNumber,
B.LeasedProjectNumber As 'ProjectNumber',
A.LeaseComment As 'LeaseContractComments'
FROM aLease A
LEFT OUTER JOIN aLeasedAsset B
ON a.LeaseID = B.LeaseID
LEFT OUTER JOIN aAsset C
ON B.LeasedProjectNumber = C.ProjectNumber AND B.PartID = C.aPartid
WHERE A.LeaseComment IS NOT NULL
I got this result from a query statement. But I don't want to get repeated the last column(Comments) for the 3 records in the second column.
I want for the values on the second column write once the repeated comment. Like a Group By

Alright, I'll take a stab at this. It's pretty unclear what exactly you're hoping for, but reading your comments, it sounds like you're looking to build a hierarchy of sorts in your table.
Something like this:
"Lease Terminated Jan 29, 2013 due to the event of..."
216 24914 87
216 724992 87
216 724993 87
"Other potential column"
217 2132 86
...
...
Unfortuantely, I don't believe that that's possible. SQL Server is pretty strict about returning a table, which is two-dimensional by definition. There's no good way to describe a hierarchy such as this in SQL. There is the hierarchyid type, but that's not really relevant here.
Given that, you only really have two options:
My preference 99% of the time, just accept the duplicates. Handle them in your procedural code later on, which probably does have support for these trees. Unless you're dealing with performance-critical situations, or if you're pulling back a lot of data (or really long comments), that should be totally fine.
If you're hoping to print this result directly to the user, or if network performance is a big issue, aggregate your columns into a single record for each comment. It's well-known that you can't have multiple values in the same column, for the very same reason as the above-listed result isn't possible. But what you could do, data and your own preferences permitting, is write an aggregate function to concatenate the grouped results into a single, comma-delimited column.
You'd likely then have to parse those commas out, though, so unless network traffic is your biggest concern, I'd really just do it procedural-side.

SELECT STUFF((SELECT DISTINCT ', ' + SerialNumber
FROM [vLeasedAsset]
WHERE A.LeaseID = LeaseID AND A.ProjectNumber = ProjectNumber
FOR XML PATH (''))
, 1, 1, '') AS SerialNumber, [ProjectNumber],
MAX(ContractComment) 'LeaseContractComment'
FROM [vLeasedAsset] A
WHERE ContractComment != ''
GROUP BY [ProjectNumber], LeaseID
Output:
SerialNumber
24914, 724993
23401, 720356
ProjectNumber
87
91

Retrieve a random row with like statement (over 5 millions rows)

I have a DB with two tables
tblVideos is about 8 million rows, contains Id(auto increment 1,1), videoId, Name, Tags, (FK)VideoProviderId
tblVideoProviders is about 6 providers at the moment, and has 3 columns:
Id(auto increment 1,1 tiny int), Name, Url(to build the link using the provider + video Id)
Unlike YouTube smaller providers don't have an API to return an array then pick up something random.
retrieving a totally random row takes under a second in both ways I got now:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
WHERE ((ABS(CAST(
(BINARY_CHECKSUM
(tblVideos.id, NEWID())) as int))
% 6800000) < 10 )
OR
slightly longer
select top 1 tblVideoProvider.Url + tblVideos.videoId as url,
tblVideos.Name, tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
ORDER BY NEWID()
but once I start looking for something more specific:
select top 1 tblVideoProvider.Url + tblVideos.videoId as url, tblVideos.Name,
tblVideos.tags from tblVideos
inner join tblVideoProvider
on tblVideos.VideoProviderId = tblVideoProvider.id
where (tblVideos.tags like '%' + #tag + '%')
or (tblVideos.Name like '%' + #tag + '%')
ORDER BY NEWID()
The query hits 8 seconds, removing the last or tblVideos like takes it down to 4~5 seconds, but that's way too high.
retrieving the whole query without the "order by newid()" will make the query take a lot less time but the application will consume about 0.2~2 MB of data per user, and assuming over 200~400 simultanios requests ends up in lots of data

In general the "like" operator is very expensive, and when the pattern starts with a "%" even an index on the respective column (assuming you have one) cannot be used. I think there is no easy way to increase the performance of your query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas