Error in nested SQL statement - sql

Can someone help me fix this SQL statement? I have 2 tables... trying to get a list of all records in table 1 (c) along with a count (if any) of matching records in table 2 (cp_docs).
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
OUTTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Thanks,
Tracy

Hard to say without the error message but your outer join has a couple issues.
OUTER is incorrectly written at OUTTER
Your OUTER keyword needs to be prefixed with LEFT OR RIGHT. With the logic in your query you want likely want LEFT
Fixed SQL:
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
LEFT OUTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Now in your query, you could get null values in the cpd column if there were no values in the cal_prodcedure_doc table. If you look at Max's answer, you would get 0's instead. If you wanted to use your current approach but have the zero's display you would need to wrap cp_docs.cpd in a coalesce function
coalesce(cp_docs.cpd, 0)
In the end I think Max's answer is easier to read and probably the way I would write this query as I think it's easier to read. If the tables are huge you may want to check how each performs to see one is better than the other.

You can just add a subquery to the SELECT clause. It's cleaner than joining a temp table. If you try to read someone else's query to figure out how a calculation is done, you'll start with the SELECT statement. If the select statement points you to a table alias (e.g. cp_docs), you need to find the table in the FROM clause... etc. The execution plans are almost identical; the proposed SELECT clause subquery actually eliminates one innocuous Compute Scaler step.
SELECT c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
(SELECT COUNT(*) FROM cal_procedure_docs where cal_procedure = c.cal_procedure) AS cpd
FROM cal_procedure c

Perhaps you want outer apply :
SELECT TOP 100 c.cal_procedure, c.description, c.active, c.create_user,
c.create_date, c.edit_user, c.edit_date, c.id, cp_docs.cpd
FROM cal_procedure c OUTER APPLY
(select count(id) as cpd
from cal_procedure_doc
where cal_procedure = c.cal_procedure
) cp_docs
ORDER BY ? ? ? ;

Related

How to convert inline SQL queries to JOINS in SQL SERVER to reduce load time

I need help in optimizing this SQL query.
In the main SELECT statement there are three columns which is dependent on the outer query result. This is why my query is taking a long time to return data. I have tried making left joins but this is not working properly.
Can anyone help me to resolve this issue?
SELECT
DISTINCT ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
(
SELECT
STRING_AGG(
(ug.UG_Name),
','
)
FROM
Groups ug
INNER JOIN ApplicantUserGroup augm ON augm.AUGM_UserGroupID = ug.UG_ID
WHERE
augm.AUGM_OrganizationUserID = ou.OrganizationUserID
AND ug.UG_IsDeleted = 0
AND augm.AUGM_IsDeleted = 0
) AS UserGroups,
order1.OrderNumber AS OrderId -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttribute),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributes -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttributeID),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributeID
FROM
ApplicantData acd WITH (NOLOCK)
INNER JOIN ClientPackage ps WITH (NOLOCK) ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
INNER JOIN [ClientOrder] order1 WITH (NOLOCK) ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
INNER JOIN OUser ou WITH (NOLOCK) ON ou.OrganizationUserID = ps.OrganizationUserID
It looks like this query can be simplified, and the dependent subqueries in your SELECT clause removed, Consider your second and third dependent subqueries. You can refactor them into one nondependent subquery with a LEFT JOIN. Using nondependent subqueries is more efficient because the query planner can run them just once, rather than once for each row.
You want two STRING_AGG() results from the same table. This subquery gives those two outputs for every possible combination of HierarchyNodeID and OrganizationUserID values. STRING_AGG() is an aggregate function like SUM() and so works nicely with GROUP BY.
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
You can run this subquery itself to convince yourself it works.
Now, we can LEFT JOIN that into your query. Like this. (For readability I took out the NOLOCKs and used JOIN: it means the same thing as INNER JOIN.)
SELECT DISTINCT
ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
'tempvalue' AS UserGroups, -- shortened for testing
order1.OrderNumber AS OrderId, -- UAT-2455
uat2455.CustomAttributes, -- UAT-2455
uat2455.CustomAttributeIDs -- UAT-2455
FROM ApplicantData acd
JOIN ClientPackage ps
ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
JOIN ClientOrder order1
ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
JOIN OUser ou
ON ou.OrganizationUserID = ps.OrganizationUserID
LEFT JOIN (
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
) uat2455
ON uat2455.HierarchyNodeID = dpm.DPM_ID
AND uat2455.OrganizationUserId = ps.OrganizationUserID
See how we collapsed your second and third dependent subqueries to just one, then used it as a virtual table with LEFT JOIN? We transformed the WHERE clauses from the dependent subqueries into an ON clause.
You can test this: run it with TOP(50) and eyeball the results.
When you're happy, the next step is to transform your first dependent subquery the same way.
Pro tip Don't use WITH (NOLOCK), ever, unless a database administration expert tells you to after looking at your specific query. If your query's purpose is a historical report and you don't care whether the most recent transactions in your database are represented exactly right, you can precede your query with this statement. It also allows the query to run while avoiding locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Pro tip Be obsessive about formatting your queries for readability. You, your colleagues, and yourself a year from now must be able to read and reason about queries like this.

Subquery returning more than one result

I am still fairly new to SQL and the stored procedure I recently created keeps telling me that a subquery is returning more than one result but I can't figure out which one is the problem. If anyone has a moment and can tell me what I am missing, I would greatly appreciate it!
Thanks!
SELECT DISTINCT a.customer_no [id],
x.esal1_desc [constituent],
a.perf [activity],
a.sp_act_dt [activity_date],
c.description[activity_type],
d.display_name_tiny [solicitor],
s.description [status],
ISNULL(a.num_attendees,0)[attending],
a.notes [notes],
e.address [email]
FROM [dbo].t_special_activity a
left outer join [dbo].tr_special_activity_status s ON s.id = a.status
left outer join [dbo].tr_special_activity c ON c.id = a.sp_act
left outer JOIN [dbo].FT_CONSTITUENT_DISPLAY_NAME() d ON a.worker_customer_no = d.customer_no
left outer JOIN [dbo].T_EADDRESS e on a.customer_no=e.customer_no and primary_ind='Y'
left outer JOIN [dbo].TX_CUST_SAL x on a.customer_no=x.customer_no and default_ind='Y'
WHERE a.status IN (ISNULL(#status, (SELECT DISTINCT id FROM TR_SPECIAL_ACTIVITY_STATUS)))
AND a.sp_act_dt BETWEEN (ISNULL(#activity_start,(SELECT MIN(sp_act_dt) FROM T_SPECIAL_ACTIVITY)))
AND (ISNULL(#activity_end,(SELECT MAX(sp_act_dt) FROM T_SPECIAL_ACTIVITY)))
AND ((ISNULL(#list,0) = 0) OR EXISTS (SELECT customer_no FROM T_LIST_CONTENTS lc WITH (NOLOCK)
WHERE a.customer_no = lc.customer_no and lc.list_no = #list))
Alas, you cannot use this expression:
WHERE a.status IN (ISNULL(#status, (SELECT DISTINCT id FROM TR_SPECIAL_ACTIVITY_STATUS)))
The subquery is in a place where a single value is expected. In any case, I think you want:
WHERE #status IS NULL OR
a.status IN (SELECT id FROM TR_SPECIAL_ACTIVITY_STATUS)
Note that select distinct is irrelevant in an IN clause. At best it does nothing; at worst it impedes the optimizer.
I realize this is a little confusing. You are thinking that IN takes a list -- and the list could even be a subquery. But, the elements of the list are scalars not lists. So, when a subquery is an element of the list, then it is assumed to be a single value.

How to filter out all rows matching a certain ID when a variable in one row is 'X'? SQL Server

First off, I'm sorry if the wording of this is bad or incorrect, I'm trying my best.
But anyway, to try to make things simple as possible, I am trying to generate a report of messages that have not been read. So there is a communications table that links to the comm_recpts table where there is a "Has_read_msg" field.
The issue is, these communications are sometimes being routed to a few different users. Therefore, the "has_read_msg" field only updates to "y" for the individual user that has read the message. So if even one user has read the message, I don't want this message at all on my report.
If that wasn't clear, what I am currently getting is all the USERS that haven't read the message, but if the message was read by someone, I don't want them on there. I still want the users names on my report though (If the message wasn't read by anyone).
Here's what I have so far, if it helps at all...
SELECT DISTINCT c.comm_id, c.sender_id, p.last_name, p.first_name, p.date_of_birth,
CASE WHEN(rec.last_name+', '+rec.first_name) IS NULL
THEN r.name
ELSE (rec.last_name+', '+rec.first_name)
END 'Recipient',
r.has_read_msg, r.recipient_type, r.recipient_id, c.sender_type, c.create_timestamp,
r.has_read_msg, c.replied_when, c.delete_ind, c.priority_flag, c.subject,
c.body
FROM ngweb_communications c
LEFT OUTER JOIN person p
ON c.sender_id=CONVERT(varchar(50),p.person_id)
LEFT OUTER JOIN ngweb_comm_recpts r
ON c.comm_id=r.comm_id
LEFT OUTER JOIN user_mstr rec
ON r.recipient_id=CONVERT(varchar(50),rec.user_id)
WHERE 1=1
AND c.sender_type=2
AND r.has_read_msg='N'
AND c.body NOT LIKE 'This message was read by%'
ORDER BY c.create_timestamp desc
I have tried nested statements and aggregate functions, but haven't been able to get it to work yet...
Like I said I'm trying my best to word this, but if I can clarify at all, or share anymore info, please let me know.
I am using SQL Server 2008...thanks to anyone even taking a look at this!
using not exists():
select distinct
c.comm_id
, c.sender_id
, p.last_name
, p.first_name
, p.date_of_birth
, Recipient = isnull(rec.last_name + ', ' + rec.first_name,r.name)
, r.has_read_msg
, r.recipient_type
, r.recipient_id
, c.sender_type
, c.create_timestamp
, r.has_read_msg
, c.replied_when
, c.delete_ind
, c.priority_flag
, c.subject
, c.body
from ngweb_communications c
left join person p
on c.sender_id = convert(varchar(50), p.person_id)
left join ngweb_comm_recpts r
on c.comm_id = r.comm_id
left join user_mstr rec
on r.recipient_id = convert(varchar(50), rec.user_id)
where 1 = 1
and c.sender_type = 2
and r.has_read_msg = 'N'
and c.body not like 'This message was read by%'
and not exists (
select 1
from ngweb_communications ic
inner join ngweb_comm_recpts ir
on ic.comm_id = ir.comm_id
where ic.comm_id = c.comm_id
and ir.has_read_msg = 'Y'
)
order by c.create_timestamp desc
Notes & Questions:
I replaced your case statement with isnull() for simplicity. coalesce() would also work in the same manner.
Do not use string literals for aliases.
Why are you converting ids to varchar(50) for equality comparison? This has significant detrimental impact on performance.
Replace this line in your WHERE clause:
AND r.has_read_msg='N'
With this:
AND NOT EXISTS (Select 1 FROM ngweb_comm_recpts rx WHERE rx.comm_id = c.comm_id AND rx.has_read_msg = 'Y')

Remove duplicate SQL Join results

Okay, I and totally new to SQL so bear with me. I created a statement which I have the results I wanted but wanted get rid of duplicate results. What's a easy solution to this? Here is my statement
SELECT
li.location,
li.logistics_unit,
li.item,
li.company,
li.item_desc,
li.on_hand_qty,
li.in_transit_qty,
li.allocated_qty,
li.lot,
i.item_category3,
location.locating_zone,
location.location_subclass,
i.item_category4
FROM
location_inventory li
INNER JOIN item i ON li.item = i.item
INNER JOIN location l ON l.location = li.location
WHERE
i.item_category3 = 'AS' AND
li.warehouse = 'river' AND
li.location NOT LIKE 'd%' AND
li.location NOT LIKE 'stg%'
ORDER BY
li.item asc
If you are confident in your JOIN then DISTINCT should do the trick like so:
select DISTINCT
location_inventory.location ,
location_inventory.logistics_unit ,
location_inventory.item ,
location_inventory.company ,
location_inventory.item_desc ,
location_inventory.on_hand_qty ,
location_inventory.in_transit_qty ,
location_inventory.allocated_qty ,
location_inventory.lot ,
item.item_category3 ,
location.locating_zone ,
location.location_subclass ,
item.item_category4
from location_inventory
INNER JOIN item
on location_inventory.item=item.item
INNER JOIN location
on location_inventory.location=location.location
where item.item_category3 = 'AS' and
location_inventory.warehouse = 'river' and
location_inventory.location not like 'd%' and
location_inventory.location not like 'stg%'
order by location_inventory.item asc
Assuming that i.item and l.location are primary or unique keys, any duplicates you see are being caused by duplicate items in your location_inventory table. That might or might not be what you want.
SELECT DISTINCT will only eliminate true duplicates (i.e. those where all of the selected columns are duplicate). If that's what you want, use it. Otherwise, you might need to make an inner select that uses SELECT DISTINCT to identify the columns in which you don't want duplicates, and join the results of the inner select back to the tables to pull out all the other data.

Efficient SQL Query Design

What is generally considered the most efficient way to do this type of query?
We have a database of 10 years worth of laboratory data and we would like to select out performance data for various tests. This query for example will select the number of hours its taken to do a test and calculate an average turnaround time and allow us to plot a sparkline of avg TAT per day.
Say we have 100 test names is it acceptable in terms of performance to iterate over the test names in a loop and fire this query off once per loop? Or is there a more efficient way?
SELECT
Date_Authorised_Index.Date_Authorised
, Result_Set.Date_Booked_In
, avg(DATEDIFF('hh',Result_Set.Date_Time_Booked_In,Result_Set.Date_Time_Authorised)) as HrsIn
, count(Date_Authorised_Index.Date_Authorised) as numbers
, Date_Authorised_Index.Registration_Number
, Date_Authorised_Index.Request_Row_ID
, Date_Authorised_Index.Specimen_Number
, Result_Set.Authorised_By
, Result_Set.Namespace
, Result_Set.Set_Code
, Result_Set.Date_Time_Authorised
, Request.Date_Time_Received
, Request.Location
FROM
Date_Authorised_Index Date_Authorised_Index
, Result_Set Result_Set
, Request
WHERE
Date_Authorised_Index.Date_Authorised = Result_Set.Date_Authorised
AND Date_Authorised_Index.Request_Row_ID = Request.Request_Row_ID
AND Date_Authorised_Index.Request_Row_ID = Result_Set.Request_Row_ID
AND (Date_Authorised_Index.Discipline='C') AND Result_Set.Set_Code=?
GROUP BY
Result_Set.Date_Booked_In
For starters I would rewrite this query so it uses explicit join syntax.
Also even though MySQL does not force you to restate every non-aggregate column in the group by clause that doesn't mean that's a good thing.
Unless the Result_Set.Date_Booked_In uniquely identifies a row, you are selecting random values from a multiple of rows.
SELECT
dai.Date_Authorised
, rs.Date_Booked_In
, avg(DATEDIFF('hh',rs.Date_Time_Booked_In,rs.Date_Time_Authorised)) as HrsIn
, count(dai.Date_Authorised) as numbers
, dai.Registration_Number
, dai.Request_Row_ID
, dai.Specimen_Number
, rs.Authorised_By
, rs.Namespace
, rs.Set_Code
, rs.Date_Time_Authorised
, r.Date_Time_Received
, r.Location
FROM
Date_Authorised_Index dai
INNER JOIN Result_Set rs ON (dai.Date_Authorised = rs.Date_Authorised
AND dai.Request_Row_ID = rs.Request_Row_ID)
INNER JOIN Request R ON (dai.Request_Row_ID = r.Request_Row_ID)
WHERE
(dai.Discipline= 'C') AND rs.Set_Code=?
GROUP BY
rs.Date_Booked_In
If you want to select a 100 rows in one go, just make a new table with the set_codes you want to select and join against that.
Make sure you index the field sc.set_code (or better yet make it the primary key)
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
INNER JOIN Setcodes as sc ON (sc.Set_code = rs.SetCode) <<-- extra join.
WHERE
dai.discipline = 'C'
GROUP BY rs.Date_Booked_In
Or you can even use a `IN (...) like below, although that will propably be slower than a join.
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
WHERE
dai.discipline = 'C' AND rs.Set_Code IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
GROUP BY rs.Date_Booked_In