Rewrite query in FOR LOOP to single query - sql

Could you please help me to rewrite the following query inside of FOR LOOP query to single query without loop:
for rec in select distinct sp.student_id, v.test_id, v.name, p.version_id from student_pages sp, pages p, versions v where p.id = sp.page_id and v.id = p.version_id order by student_id, test_id, name LOOP
select STRING_AGG(cast(p.index as varchar), ';' ORDER BY p.index) as lost_page_indices into l_lost_page_indices from pages p left join student_pages sp on p.id = sp.page_id and sp.student_id = rec.student_id where p.version_id = rec.version_id and sp.page_id is null;
end loop;
In the final query I need the following fields: sp.student_id, v.test_id, v.name and lost_page_indices.

First cut: Take the query you're looping over, turn it into a subquery, and use it as a join table.
select STRING_AGG(cast(p.index as varchar), ';' ORDER BY p.index)
as lost_page_indices into l_lost_page_indices
from pages p
join (
select distinct sp.student_id, v.test_id, v.name, p.version_id
from student_pages sp
join pages p on p.id = sp.page_id
join versions v on v.id = p.version_id
) as rec on rec.student_id = sp.student_id
left join student_pages sp on p.id = sp.page_id and
sp.student_id = rec.student_id
where p.version_id = rec.version_id and
sp.page_id is null
order by rec.student_id, rec.test_id, rec.name
I reorgnaized the subquery using join syntax for easier reading.
order by cannot be relied on to survive a join, so it moves into the outer query.
There's no group by so I'm not sure if an aggregate function will work.
And, as others have pointed out, string_agg() is not a built in Oracle function. You have to make it yourself.
That can be simplified, there's a lot of redundancy between the two joins. That subquery is putting together student_pages, pages, and versions which can all be done with a normal join. The only thing left is the distinct sp.student_id which can be better done with a group by sp.student_id.
select STRING_AGG(cast(p.index as varchar), ';' ORDER BY p.index)
as lost_page_indices into l_lost_page_indices
from pages p
left join student_pages sp on sp.page_id = p.id
join versions v on v.id = p.version_id
where sp.page_id is null
group by sp.student_id
order by sp.student_id, v.test_id, v.name
I'm not 100% sure that's an equivalent query, but it should get you started. This makes it a lot clearer what the query is doing: find orphaned pages and stick them into a table.

Related

Getting multiple returns on distinct selection

This is going to look like a mess. I have a request for multiple tables to output into one spreadsheet. I'm still new at this and have only really used inner joins before. Here's what my SQL looks like:
select distinct (o.objectnumber), g.locale, g.locus, g.excavation, g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
from objects o
inner join TextEntries t on t.id = o.objectid
inner join ObjGeography g on g.ObjectID = o.objectid
inner join userfieldxrefs f on f.id = o.objectid
inner join PackageList pl on o.objectID = pl.ID
inner join Packages p on pl.PackageID = p.PackageID
where p.packageid like '8502'
order by g.mapreferencenumber asc
I know, it's a mess right? It's giving me the correct output, but is also creating multiple rows with the same data. I've done some googling on this and have seen some remarks about outer joins, but I'm not sure how to correctly apply this to my statement. Any ideas?
Sorry; I got called away and just now had a chance to check on this. It's my first question and I guess I didn't really understand what you guys needed to see in order to understand what I wanted. It looks like the group by function will work best, I had seen this when researching the statement, but didn't understand how to properly implement it. Thanks everyone!
it is better to use GROUP BY for the columns you want in the resultset
select o.objectnumber,
g.locale,
g.locus,
g.excavation,
g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
from objects o
inner join TextEntries t on t.id = o.objectid
inner join ObjGeography g on g.ObjectID = o.objectid
inner join userfieldxrefs f on f.id = o.objectid
inner join PackageList pl on o.objectID = pl.ID
inner join Packages p on pl.PackageID = p.PackageID
where p.packageid like '8502'
group by o.objectnumber,
g.locale,
g.locus,
g.excavation,
g.mapreferencenumber,
case when t.texttypeid like '9' then t.textentry end,
case when f.userfieldid like '25' then f.fieldvalue end
order by g.mapreferencenumber asc

How can I make this SQL query more efficient?

I have a query trying to pull data from multiple tables but when I run it, it takes a really long time (So long I haven't even been able to wait long enough). I know it's extremely inefficient and wanted to get some input as to how it can be written better. Here it is:
SELECT
P.patient_name,
LOH.patient_id,
LOH.requesting_location,
LOH.sample_date,
LOH.lab_doing_work,
L.location_name,
LOD.test_code,
LOD.test_rdx,
LSR.tube_type
FROM
mis_db.dbo.lab_order_header AS LOH,
mis_db.dbo.patient AS P,
mis_db.dbo.lab_order_detail AS LOD,
mis_db.dbo.lab_sample_rule AS LSR,
mis_db.dbo.location AS L
WHERE
LOH.requesting_location = '000839' AND
LOH.lab_order_id = LOD.lab_order_id AND
LOH.sample_date IN ('05/28/2015', '05/29/2015')
--LOH.patient_id = LOD.patient_id
--LOD.sample_date = LOH.sample_date
ORDER BY
P.patient_name DESC
try this (or something like it)
SELECT P.patient_name,
lo.patient_id, lo.requesting_location,
lo.sample_date, lo.lab_doing_work,
l.location_name, d.test_code, d.test_rdx,
d.tube_type
FROM mis_db.dbo.lab_order_header lo
join mis_db.dbo.patient p on p.patient_id = lo.Patient_id
join mis_db.dbo.lab_order_detail d on d.lab_order_id = lo.lab_order_id
join mis_db.dbo.lab_sample_rule r on r.rule_id = lo.ruleId -- ????
join mis_db.dbo.location l on l.locationid = lo.requesting_location
WHERE lo.requesting_location = '000839' AND
lo.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY p.patient_name DESC
I ended up going with the following and was able to get the results I wanted:
SELECT LOH.patient_id,
patient_name,
[mis_db_rpt].[common].[string_date_format](LOD.sample_date) AS
[Draw Date],
test_description,
LOD.test_code,
LOH.lab_doing_work,
tube_type,
L.short_name
FROM [mis_db].[dbo].[lab_order_header]
LOH
INNER JOIN
[mis_db].[dbo].[lab_order_detail]
LOD
ON LOH.lab_order_id = LOD.lab_order_id
INNER JOIN
[mis_db].[dbo].[patient]
P
ON P.patient_id = LOD.patient_id
INNER JOIN
[mis_db].[dbo].[sample_tube]
ST
ON LOD.sample_id = ST.sample_id
INNER JOIN
[mis_db].[dbo].[location] AS
L
ON LOH.lab_doing_work = L.location_id
INNER JOIN
[mis_db].[dbo].[lab_test] AS
LT
ON LOD.test_code = LT.test_code
WHERE LOH.requesting_location = '000839' AND
LOD.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY LOD.sample_date,
patient_name,
LOD.patient_id,
test_description
I would try
Click to run the estimated execution plan in SSMS and see if it suggests any missing indexes. I would think a non clustered index on lo.requesting_location and sample_date might help with the filter
Also in desc index on p.patient_name may help with the performance of the order by.
Try changing the IN date filter to "between '05/28/2015' and '05/29/2015'

Query to return SINGLE DISTINCT row

I have the query below working, the thing is I need to only list each unique "VolumeSerialNumber0" once. There's no shortage of questions and approaches to this problem on SO but they suggest using subqueries and group by clause, but when I try to do that I get an error "columnname is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I feel like it has to be close I'm just not getting the magical syntax perfectly correct.
SELECT
dbo.v_R_System.Netbios_Name0,
dbo.v_GS_LOGICAL_DISK.TimeStamp,
dbo.v_GS_LOGICAL_DISK.Description0,
dbo.v_GS_LOGICAL_DISK.DeviceID0,
dbo.v_GS_LOGICAL_DISK.DriveType0,
dbo.v_GS_LOGICAL_DISK.Name0,
dbo.v_GS_LOGICAL_DISK.SystemName0,
dbo.v_GS_LOGICAL_DISK.VolumeName0,
dbo.v_GS_LOGICAL_DISK.VolumeSerialNumber0,
dbo.v_GS_PARTITION.Size0,
dbo.v_GS_LOGICAL_DISK.FileSystem0
FROM
dbo.v_R_System
INNER JOIN dbo.v_GS_LOGICAL_DISK
ON dbo.v_R_System.ResourceID = dbo.v_GS_LOGICAL_DISK.ResourceID
INNER JOIN dbo.v_GS_PARTITION
ON dbo.v_GS_LOGICAL_DISK.ResourceID = dbo.v_GS_PARTITION.ResourceID
SELECT
MAX(S.Netbios_Name0),
MAX(L.TimeStamp),
MAX(L.Description0),
MAX(L.DeviceID0),
MAX(L.DriveType0),
MAX(L.Name0),
MAX(L.SystemName0),
MAX(L.VolumeName0),
L.VolumeSerialNumber0,
MAX(P.Size0),
MAX(L.FileSystem0)
FROM
dbo.v_R_System S
INNER JOIN dbo.v_GS_LOGICAL_DISK L
ON S.ResourceID = L.ResourceID
INNER JOIN dbo.v_GS_PARTITION P
ON L.ResourceID = P.ResourceID
GROUP BY
L.VolumeSerialNumber0

Recursive query with outer joins?

I'm attempting the following query,
DECLARE #EntityType varchar(25)
SET #EntityType = 'Accessory';
WITH Entities (
E_ID, E_Type,
P_ID, P_Name, P_DataType, P_Required, P_OnlyOne,
PV_ID, PV_Value, PV_EntityID, PV_ValueEntityID,
PV_UnitValueID, PV_UnitID, PV_UnitName, PV_UnitDesc, PV_MeasureID, PV_MeasureName, PV_UnitValue,
PV_SelectionID, PV_DropDownID, PV_DropDownName, PV_DropDownOptionID, PV_DropDownOptionName, PV_DropDownOptionDesc,
RecursiveLevel
)
AS
(
-- Original Query
SELECT dbo.Entity.ID AS E_ID, dbo.EntityType.Name AS E_Type,
dbo.Property.ID AS P_ID, dbo.Property.Name AS P_Name, DataType.Name AS P_DataType, Required AS P_Required, OnlyOne AS P_OnlyOne,
dbo.PropertyValue.ID AS PV_ID, dbo.PropertyValue.Value AS PV_Value, dbo.PropertyValue.EntityID AS PV_EntityID, dbo.PropertyValue.ValueEntityID AS PV_ValueEntityID,
dbo.UnitValue.ID AS PV_UnitValueID, dbo.UnitOfMeasure.ID AS PV_UnitID, dbo.UnitOfMeasure.Name AS PV_UnitName, dbo.UnitOfMeasure.Description AS PV_UnitDesc, dbo.Measure.ID AS PV_MeasureID, dbo.Measure.Name AS PV_MeasureName, dbo.UnitValue.UnitValue AS PV_UnitValue,
dbo.DropDownSelection.ID AS PV_SelectionID, dbo.DropDown.ID AS PV_DropDownID, dbo.DropDown.Name AS PV_DropDownName, dbo.DropDownOption.ID AS PV_DropDownOptionID, dbo.DropDownOption.Name AS PV_DropDownOptionName, dbo.DropDownOption.Description AS PV_DropDownOptionDesc,
0 AS RecursiveLevel
FROM dbo.Entity
INNER JOIN dbo.EntityType ON dbo.EntityType.ID = dbo.Entity.TypeID
INNER JOIN dbo.Property ON dbo.Property.EntityTypeID = dbo.Entity.TypeID
INNER JOIN dbo.PropertyValue ON dbo.Property.ID = dbo.PropertyValue.PropertyID AND dbo.PropertyValue.EntityID = dbo.Entity.ID
INNER JOIN dbo.DataType ON dbo.DataType.ID = dbo.Property.DataTypeID
LEFT JOIN dbo.UnitValue ON dbo.UnitValue.ID = dbo.PropertyValue.UnitValueID
LEFT JOIN dbo.UnitOfMeasure ON dbo.UnitOfMeasure.ID = dbo.UnitValue.UnitOfMeasureID
LEFT JOIN dbo.Measure ON dbo.Measure.ID = dbo.UnitOfMeasure.MeasureID
LEFT JOIN dbo.DropDownSelection ON dbo.DropDownSelection.ID = dbo.PropertyValue.DropDownSelectedID
LEFT JOIN dbo.DropDownOption ON dbo.DropDownOption.ID = dbo.DropDownSelection.SelectedOptionID
LEFT JOIN dbo.DropDown ON dbo.DropDown.ID = dbo.DropDownSelection.DropDownID
WHERE dbo.EntityType.Name = #EntityType
UNION ALL
-- Recursive Query?
SELECT E2.E_ID AS E_ID, dbo.EntityType.Name AS E_Type,
dbo.Property.ID AS P_ID, dbo.Property.Name AS P_Name, DataType.Name AS P_DataType, Required AS P_Required, OnlyOne AS P_OnlyOne,
dbo.PropertyValue.ID AS PV_ID, dbo.PropertyValue.Value AS PV_Value, dbo.PropertyValue.EntityID AS PV_EntityID, dbo.PropertyValue.ValueEntityID AS PV_ValueEntityID,
dbo.UnitValue.ID AS PV_UnitValueID, dbo.UnitOfMeasure.ID AS PV_UnitID, dbo.UnitOfMeasure.Name AS PV_UnitName, dbo.UnitOfMeasure.Description AS PV_UnitDesc, dbo.Measure.ID AS PV_MeasureID, dbo.Measure.Name AS PV_MeasureName, dbo.UnitValue.UnitValue AS PV_UnitValue,
dbo.DropDownSelection.ID AS PV_SelectionID, dbo.DropDown.ID AS PV_DropDownID, dbo.DropDown.Name AS PV_DropDownName, dbo.DropDownOption.ID AS PV_DropDownOptionID, dbo.DropDownOption.Name AS PV_DropDownOptionName, dbo.DropDownOption.Description AS PV_DropDownOptionDesc,
(RecursiveLevel + 1)
FROM Entities AS E2
INNER JOIN dbo.Entity ON dbo.Entity.ID = E2.PV_ValueEntityID
INNER JOIN dbo.EntityType ON dbo.EntityType.ID = dbo.Entity.TypeID
INNER JOIN dbo.Property ON dbo.Property.EntityTypeID = dbo.Entity.TypeID
INNER JOIN dbo.PropertyValue ON dbo.Property.ID = dbo.PropertyValue.PropertyID AND dbo.PropertyValue.EntityID = E2.E_ID
INNER JOIN dbo.DataType ON dbo.DataType.ID = dbo.Property.DataTypeID
INNER JOIN dbo.UnitValue ON dbo.UnitValue.ID = dbo.PropertyValue.UnitValueID
INNER JOIN dbo.UnitOfMeasure ON dbo.UnitOfMeasure.ID = dbo.UnitValue.UnitOfMeasureID
INNER JOIN dbo.Measure ON dbo.Measure.ID = dbo.UnitOfMeasure.MeasureID
INNER JOIN dbo.DropDownSelection ON dbo.DropDownSelection.ID = dbo.PropertyValue.DropDownSelectedID
INNER JOIN dbo.DropDownOption ON dbo.DropDownOption.ID = dbo.DropDownSelection.SelectedOptionID
INNER JOIN dbo.DropDown ON dbo.DropDown.ID = dbo.DropDownSelection.DropDownID
)
SELECT E_ID, E_Type,
P_ID, P_Name, P_DataType, P_Required, P_OnlyOne,
PV_ID, PV_Value, PV_EntityID, PV_ValueEntityID,
PV_UnitValueID, PV_UnitID, PV_UnitName, PV_UnitDesc, PV_MeasureID, PV_MeasureName, PV_UnitValue,
PV_SelectionID, PV_DropDownID, PV_DropDownName, PV_DropDownOptionID, PV_DropDownOptionName, PV_DropDownOptionDesc,
RecursiveLevel
FROM Entities
INNER JOIN [dbo].[Entity] AS dE
ON dE.ID = PV_EntityID
The problem is the second query, the "recursive one" is getting the data I expect since I can't do the LEFT JOINs like in the first query. (At least to my understanding).
If I remove the fetching of the data that requires the LEFT (Outer) JOINs then the recursion works perfectly. My problem is I need both. Is there a way I can accomplish this?
Per http://msdn.microsoft.com/en-us/library/ms175972.aspx you can not have a left/right/outer join in a recursive CTE.
For a recursive CTE you can't use a subquery either so I sugest following this example.
They use two CTE's. The first is not recursive and does the left join to get the data it needs. The second CTE is recursive and inner joins on the first CTE. Since CTE1 is not recursive it can left join and supply default values for the missing rows and is guarenteed to work in the inner join.
However, you can also duplicate a left join with a union and subselect though it isn't really useful normally but it is interesting.
In that case, you would keep your first statement how it is. It will match all rows that join successfully.
Then UNION that query with another query that removes the join, but has a
NOT EXISTS(SELECT 1 FROM MISSING_ROWS_TABLE WHERE MAIN_TABLE.JOIN_CONDITION = MISSING_ROWS_TABLE.JOIN_CONDITION)
This gets all the rows that failed the previous join condition in query 1. You can replace the colmuns you would get from MISSING_ROWS_TABLE with NULL. I had to do this once using a coding framework that didn't support outer joins. Since recursive CTE's don't allow subqueries you have to use the first solution.

Strange performance issue with SELECT (SUBQUERY)

I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..
Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.
In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.