Multiple columns in aggregated expression - sql

I've been struggling with an elegant solution for this for a while, and thought I'd finally cracked it but am now getting the error
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
Which is frustrating me!
In essence the query is:
select
u.username + ' ' + u.surname,
CASE WHEN ugt.type = 'Contract'
THEN
(
select sum(dbo.GET_INVOICE_WEEKLY_AVERAGE_VALUE(pc.placementid, u.UserId))
from PlacementConsultants pc
where pc.UserId = u.UserId
and pc.CommissionPerc >= 80
)
END
from usergradetypes ugt
inner join usergrades ug on ug.gradeid = ugt.usergradetypeid
inner join users u on u.userid = ug.userid
The function GET_INVOICE_WEEKLY_AVERAGE_VALUE is as follows
ALTER function [dbo].[GET_INVOICE_WEEKLY_AVERAGE_VALUE]( #placementid INT, #userid INT )
RETURNS numeric(9,2)
AS
BEGIN
DECLARE #retVal numeric(9,2)
DECLARE #rollingweeks int
SET #rollingweeks = (select ISNULL(rollingweeks,26) FROM UserGradeTypes ugt
inner join UserGrades ug on ugt.UserGradeTypeID = ug.gradeid
WHERE ug.userid = #userid)
SELECT #retVal =
sum(dbo.GET_INVOICE_NET_VALUE(id.InvoiceId)) / #rollingweeks from PlacementInvoices pli
inner join invoicedetails id on id.invoiceid = pli.InvoiceId
where pli.PlacementID = #placementid
and pli.CreatedOn between DATEADD(ww,-#rollingweeks,getdate()) and GETDATE()
RETURN #retVal
The query runs fine without the sum but when I'm trying to sum the value of the deals, it's falling over (which I need to do for a summary page)

I do not know why this fails:
select sum(dbo.GET_INVOICE_WEEKLY_AVERAGE_VALUE(pc.placementid, u.UserId))
but this works:
select sum(dbo.GET_INVOICE_WEEKLY_AVERAGE_VALUE(pc.placementid, pc.UserId))
It is curious and seems like a bug to me.
The error message, though, suggests that all the columns inside the sum() need to come from either the outer referenced tables or the inner referenced tables, but not both. I don't understand the reason for this. My best guess is that mixing the two types of references confuses the optimizer.
I haven't seen this error message before, by the way.
EDIT:
It is very easy to reproduce, and does not require a function call:
with t as (select 1 as col)
select t.*,
(select sum(t2.col + t.col) from t t2) as newcol
from t;
Very interesting. I think this might violate the standard. The equivalent query does run on Oracle.

Related

How to convert inline SQL queries to JOINS in SQL SERVER to reduce load time

I need help in optimizing this SQL query.
In the main SELECT statement there are three columns which is dependent on the outer query result. This is why my query is taking a long time to return data. I have tried making left joins but this is not working properly.
Can anyone help me to resolve this issue?
SELECT
DISTINCT ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
(
SELECT
STRING_AGG(
(ug.UG_Name),
','
)
FROM
Groups ug
INNER JOIN ApplicantUserGroup augm ON augm.AUGM_UserGroupID = ug.UG_ID
WHERE
augm.AUGM_OrganizationUserID = ou.OrganizationUserID
AND ug.UG_IsDeleted = 0
AND augm.AUGM_IsDeleted = 0
) AS UserGroups,
order1.OrderNumber AS OrderId -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttribute),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributes -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttributeID),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributeID
FROM
ApplicantData acd WITH (NOLOCK)
INNER JOIN ClientPackage ps WITH (NOLOCK) ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
INNER JOIN [ClientOrder] order1 WITH (NOLOCK) ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
INNER JOIN OUser ou WITH (NOLOCK) ON ou.OrganizationUserID = ps.OrganizationUserID
It looks like this query can be simplified, and the dependent subqueries in your SELECT clause removed, Consider your second and third dependent subqueries. You can refactor them into one nondependent subquery with a LEFT JOIN. Using nondependent subqueries is more efficient because the query planner can run them just once, rather than once for each row.
You want two STRING_AGG() results from the same table. This subquery gives those two outputs for every possible combination of HierarchyNodeID and OrganizationUserID values. STRING_AGG() is an aggregate function like SUM() and so works nicely with GROUP BY.
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
You can run this subquery itself to convince yourself it works.
Now, we can LEFT JOIN that into your query. Like this. (For readability I took out the NOLOCKs and used JOIN: it means the same thing as INNER JOIN.)
SELECT DISTINCT
ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
'tempvalue' AS UserGroups, -- shortened for testing
order1.OrderNumber AS OrderId, -- UAT-2455
uat2455.CustomAttributes, -- UAT-2455
uat2455.CustomAttributeIDs -- UAT-2455
FROM ApplicantData acd
JOIN ClientPackage ps
ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
JOIN ClientOrder order1
ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
JOIN OUser ou
ON ou.OrganizationUserID = ps.OrganizationUserID
LEFT JOIN (
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
) uat2455
ON uat2455.HierarchyNodeID = dpm.DPM_ID
AND uat2455.OrganizationUserId = ps.OrganizationUserID
See how we collapsed your second and third dependent subqueries to just one, then used it as a virtual table with LEFT JOIN? We transformed the WHERE clauses from the dependent subqueries into an ON clause.
You can test this: run it with TOP(50) and eyeball the results.
When you're happy, the next step is to transform your first dependent subquery the same way.
Pro tip Don't use WITH (NOLOCK), ever, unless a database administration expert tells you to after looking at your specific query. If your query's purpose is a historical report and you don't care whether the most recent transactions in your database are represented exactly right, you can precede your query with this statement. It also allows the query to run while avoiding locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Pro tip Be obsessive about formatting your queries for readability. You, your colleagues, and yourself a year from now must be able to read and reason about queries like this.

Try to define a left join only with where condition

I try to find a particular SQL statement to replace an old SQL query. To summarize, I try to make a left join only with where conditions.
Here is my test environment:
create table Mst
(
Id bigint not null primary key clustered,
Firstname nvarchar(200) not null,
Lastname nvarchar(200) not null
);
create table Dtl
(
Id bigint not null primary key clustered,
MstId bigint not null references Mst(Id),
DetailDescr nvarchar(500) not null
);
I fill the tables with some data:
declare #i as bigint = 0;
while #i < 999
begin
insert into Mst values (#i, N'Name ' + Str(#i), N'Lastname ' + Str(#i));
if (#i % 10 = 0)
insert into Dtl values (#i*5+0, #i, N'Description 1 for ' + Str(#i));
if (#i % 2 = 0)
insert into Dtl values (#i*5+1, #i, N'Description 2 for ' + Str(#i));
if (#i % 3 = 0)
insert into Dtl values (#i*5+2, #i, N'Description 3 for ' + Str(#i));
set #i = #i + 1;
end;
The usual way for a left join is this:
select m.Id, m.Firstname, m.Lastname, d.DetailDescr
From Mst m left join Dtl d
on m.id = d.MstId;
This query returns 1266 rows. But in the old application, which I try to migrate, the select- and from-part is still predefined:
select m.Id, m.Firstname, m.Lastname, d.DetailDescr
From Mst m, Dtl d
The old where condition defines (in a separate software module) a no longer available LEFT JOIN:
where m.id *= d.MstId
So we have to migrate that approach and try to modify only the where condition if possible. For an inner join, the where condition is easy to define:
where m.id = d.MstId
But I need a left join, and I find no way with only modify the where condition. But to rewrite only the where-condition is the best way in that special application.
Thanks in advance for your ideas.
Once upon a time, SQL did not support outer join syntax. It was an ancient world, where telephones were connected by wires to walls, where counties in Europe each had their own currencies, and most Americans watched one of three or four major networks on television.
At that time, Microsoft did not even have a real database. But Sybase offered an outer join operator in the WHERE clause, *=, which Microsoft eventually adapted into SQL Server. Microsoft SQL Server supported this through SQL Server 2008. Hence, no supported version of SQL Server supports outer joins in the WHERE clause.
Happily a much better standard syntax now exists (lest we be despondent and think that things do not get better over time). The "comma operator" in the FROM clause is relegated to its original definition -- a CROSS JOIN. The CROSS JOIN filters out non-matches. For instance, if Dtl has no rows, then CROSS JOIN returns no rows.
That is, there is no way to do what you want generically in the WHERE clause. There are queries that can replicate an outer JOIN, but they require much more surgery to the query. But there is a good alternative, which is to write your queries with the correct, modern syntax.

Update different data type columns

I have two tables GCB.NewsOne & GCB.NewsTwo both table are same except one column
it's GCode in dbo.News table GCode is varchar(100) null and GCB.News table has a bigint null column.
Now I want to update the code in GCode in dbo.News to the value of GCB.News.
I tried like below, but it's not working
UPDATE [GCB].[NewsOne] AS G
SET G.Code = (SELECT P.Code FROM GCB.NewsTwo P WHERE G.ID = P.ID)
Try casting the bigint to varchar:
UPDATE G
SET Code = CAST(P.Code AS VARCHAR(MAX))
FROM [GCB].[NewsOne] G
INNER JOIN GCB.NewsTwo P
ON G.ID = P.ID;
This assumes that your problem really is the types of the two codes, and not something else.
Also note that I rewrote your join using update join syntax, which I think is easier to read.
You may not use an alias in an update statement. This works fine:
UPDATE [GCB].[NewsOne]
SET [GCB].[NewsOne].Code = ( SELECT P.Code FROM GCB.NewsTwo P
WHERE [GCB].[NewsOne].ID=P.ID )

Where is the signature value read from in this query?

I have the following SQL query, and need to figure out where the "signatures" data is actually being read from. It's not from the 'claims' table, and doesn't seem to be from the 'questionnaire_answers' table. I believe it will be a boolean value, if that helps at all.
I'm reasonably proficient at SQL, but the joins have left me a bit confused.
(There's some PHP, but it's not relevant to the question).
$SQL="SELECT surveyor, COUNT(signed_total) AS 'total', SUM(signed_total) AS 'signed_total' FROM (
SELECT DISTINCT claims.claim_id, CONCAT(surveyors.user_first_name, CONCAT(' ', surveyors.user_surname)) AS 'surveyor', CASE WHEN signatures.claim_id IS NOT NULL THEN 1 ELSE 0 END AS 'signed_total' FROM claims
INNER JOIN users surveyors ON claims.surveyor_id = surveyors.user_id
LEFT OUTER JOIN signatures ON claims.claim_id = signatures.claim_id
INNER JOIN questionnaire_answers ON questionnaire_answers.claim_id = claims.claim_id
WHERE (claims.claim_type <> ".$conn->qstr(TYPE_DESKTOP).")
AND (claims.claim_type <> ".$conn->qstr(TYPE_AUDIT).")
AND (claims.claim_cancelled_id <= 0)
AND (claims.date_completed BETWEEN '".UK2USDate($start_date)." 00:00:00' AND '".UK2USDate($end_date)." 23:59:59')
) AS tmp
GROUP BY surveyor
ORDER BY surveyor ASC
";
Thank you!
signatures is a table (see LEFT OUTER JOIN signatures in your query).
As written in FROM clause :
FROM claims
INNER JOIN users surveyors ON claims.surveyor_id = surveyors.user_id
LEFT OUTER JOIN signatures ON claims.claim_id = signatures.claim_id
The LEFT keyword means that the rows of the left table are preserved; So all rows from claims table are considered and NULL marks are added as placeholders for the attributes from the nonpreserved side of the join which is signatures table here.
So CASE WHEN signatures.claim_id IS NOT NULL THEN 1 ELSE 0 END AS 'signed_total' basically checks that if a match between these two tables exists based on claim_id then signed_total column should have value 1 else 0.
Hope that helps!!

Convert sub-query with "NOT IN" operator to join with multiple tables

I need to convert the following sub-query to JOIN. Here I already have JOIN operator in the inner query. Please help.
SELECT *
FROM Consultants
WHERE Consultants.ConsIntID
NOT IN (SELECT Links.ToID
FROM Links JOIN Reminders
ON Links.FromID = Reminders.RemIntID
AND ApptSubType = 'Placed'
AND ToID LIKE 'CS%')
Alright so you probably shouldn't change this to a join I would use NOT EXISTS the reasons for doing so are stated here
I've also replace your ancient join syntax and added aliases to clear this up. The method shown below has been the accepted standard for about 22 years now and is the preferred way to write queries.
SELECT C.*
FROM Consultants as C -- aliases are very useful for clarity
WHERE
NOT EXISTS (
SELECT 1
FROM Links as L
INNER JOIN Reminders as R --New join syntax
ON L.FromID = R.RemIntID
WHERE C.ConsIntID = L.ToID
AND ApptSubType = 'Placed'
AND ToID LIKE 'CS%'
)
SELECT *
FROM Consultants
WHERE Consultants.ConsIntID
NOT IN (SELECT Links.ToID
FROM Links
JOIN Reminders ON(Links.FromID = Reminders.RemIntID)
WHERE ApptSubType = 'Placed'
AND ToID LIKE 'CS%')