aggregate functions are not allowed in WHERE - sql

I am using this query to find the unique records by latest date using postgresql. The error I am having is "aggregate functions are not allowed in WHERE". How to fix error “aggregate functions are not allowed in WHERE” Following this link I have tried to use inner select function. But this did not work. Please help me to edit the query. I am using PgAdmin III as client.
SELECT Distinct t1.pa_serial_
,t1.homeownerm_name
,t1.districtvdc
,t1.date as firstrancheinspection_date
,t1.status
,t1.name_of_data_collector
,t1.fulcrum_id
,first_tranche_inspection_v2_reporting_questionnaire.date_reporting
From first_tranche_inspection_v2 t1
LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire ON (t1.fulcrum_id = first_tranche_inspection_v2_reporting_questionnaire.fulcrum_parent_id)
where first_tranche_inspection_v2_reporting_questionnaire.date_reporting = (
select Max(first_tranche_inspection_v2_reporting_questionnaire.date_reporting)
from first_tranche_inspection_v2
where first_tranche_inspection_v2.pa_serial_ = t1.pa_serial_
);

You want to join the latest reporting questionaire per inspection. In PostgreSQL you can use DISTINCT ON for this:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join
(
select distinct on (fulcrum_parent_id) *
from first_tranche_inspection_v2_reporting_questionnaire
order by fulcrum_parent_id, date_reporting desc
) rq on rq.fulcrum_parent_id = fti.fulcrum_id;
Or use standard SQL's ROW_NUMBER:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join
(
select
ftirq.*,
row_number() over (partition by fulcrum_parent_id order by date_reporting desc) as rn
from first_tranche_inspection_v2_reporting_questionnaire ftirq
) rq on rq.fulcrum_parent_id = fti.fulcrum_id and rq.rn = 1;
What you were trying to do should look like this:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join first_tranche_inspection_v2_reporting_questionnaire rq
on rq.fulcrum_parent_id = fti.fulcrum_id
and (rq.fulcrum_parent_id, rq.date_reporting) in
(
select fulcrum_parent_id, max(date_reporting)
from first_tranche_inspection_v2_reporting_questionnaire
group by fulcrum_parent_id
);
This works, too, and only has the disadvantage that you read the table first_tranche_inspection_v2_reporting_questionnaire twice.

DISTINCT often ends up being implemented with a GROUP BY query in many RDBMS. What I think is happening in your current query is that there is already an implicit aggregation involving the columns in your SELECT. Hence, the correlated subquery involving MAX() actually is an aggregation because of the DISTINCT.
One quick workaround might be to perform the original query without DISTINCT, then subquery the result set to retain only distinct records:
WITH cte AS (
SELECT t1.pa_serial_,
t1.homeownerm_name,
t1.districtvdc,
t1.date as firstrancheinspection_date,
t1.status,
t1.name_of_data_collector,
t1.fulcrum_id,
t2.date_reporting
FROM first_tranche_inspection_v2 t1
LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire t2
ON t1.fulcrum_id = t2.fulcrum_parent_id
WHERE t2.date_reporting = (SELECT MAX(t.date_reporting)
FROM first_tranche_inspection_v2 t
WHERE t.pa_serial_ = t1.pa_serial_)
);
SELECT DISTINCT t.pa_serial_,
t.homeownerm_name,
t.districtvdc,
t.firstrancheinspection_date,
t.status,
t.name_of_data_collector,
t.fulcrum_id,
t.date_reporting
FROM cte t
Note that I went ahead and added an alias to the second table in your join, which leaves the query much easier to read.

Related

How to convert inline SQL queries to JOINS in SQL SERVER to reduce load time

I need help in optimizing this SQL query.
In the main SELECT statement there are three columns which is dependent on the outer query result. This is why my query is taking a long time to return data. I have tried making left joins but this is not working properly.
Can anyone help me to resolve this issue?
SELECT
DISTINCT ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
(
SELECT
STRING_AGG(
(ug.UG_Name),
','
)
FROM
Groups ug
INNER JOIN ApplicantUserGroup augm ON augm.AUGM_UserGroupID = ug.UG_ID
WHERE
augm.AUGM_OrganizationUserID = ou.OrganizationUserID
AND ug.UG_IsDeleted = 0
AND augm.AUGM_IsDeleted = 0
) AS UserGroups,
order1.OrderNumber AS OrderId -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttribute),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributes -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttributeID),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributeID
FROM
ApplicantData acd WITH (NOLOCK)
INNER JOIN ClientPackage ps WITH (NOLOCK) ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
INNER JOIN [ClientOrder] order1 WITH (NOLOCK) ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
INNER JOIN OUser ou WITH (NOLOCK) ON ou.OrganizationUserID = ps.OrganizationUserID
It looks like this query can be simplified, and the dependent subqueries in your SELECT clause removed, Consider your second and third dependent subqueries. You can refactor them into one nondependent subquery with a LEFT JOIN. Using nondependent subqueries is more efficient because the query planner can run them just once, rather than once for each row.
You want two STRING_AGG() results from the same table. This subquery gives those two outputs for every possible combination of HierarchyNodeID and OrganizationUserID values. STRING_AGG() is an aggregate function like SUM() and so works nicely with GROUP BY.
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
You can run this subquery itself to convince yourself it works.
Now, we can LEFT JOIN that into your query. Like this. (For readability I took out the NOLOCKs and used JOIN: it means the same thing as INNER JOIN.)
SELECT DISTINCT
ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
'tempvalue' AS UserGroups, -- shortened for testing
order1.OrderNumber AS OrderId, -- UAT-2455
uat2455.CustomAttributes, -- UAT-2455
uat2455.CustomAttributeIDs -- UAT-2455
FROM ApplicantData acd
JOIN ClientPackage ps
ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
JOIN ClientOrder order1
ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
JOIN OUser ou
ON ou.OrganizationUserID = ps.OrganizationUserID
LEFT JOIN (
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
) uat2455
ON uat2455.HierarchyNodeID = dpm.DPM_ID
AND uat2455.OrganizationUserId = ps.OrganizationUserID
See how we collapsed your second and third dependent subqueries to just one, then used it as a virtual table with LEFT JOIN? We transformed the WHERE clauses from the dependent subqueries into an ON clause.
You can test this: run it with TOP(50) and eyeball the results.
When you're happy, the next step is to transform your first dependent subquery the same way.
Pro tip Don't use WITH (NOLOCK), ever, unless a database administration expert tells you to after looking at your specific query. If your query's purpose is a historical report and you don't care whether the most recent transactions in your database are represented exactly right, you can precede your query with this statement. It also allows the query to run while avoiding locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Pro tip Be obsessive about formatting your queries for readability. You, your colleagues, and yourself a year from now must be able to read and reason about queries like this.

New to SQL writing a query with nesting and joins

Here is my query I am not sure why it wont run. it doesnt seem to like the joins. It ran without the nesting but now the now the joins wont work.
select *
from (
select * ,
row_number () over (partition by t.activity_type__c order by t.ActivityDate desc) ,
x.name,
MAX(t.ActivityDate) as Last_Activity_Date,
/* all these below x or a alias? */
x.Channel__c,
x.Account_18_Digit_ID__c,
x.Advisor_Approach__c,
x.name,
x.BillingState,
x.Current_Month_WT_AUM__c,
x.WT_ETF_AUM_mil__c,
x.ETF_AUM__c,
x.WT_ETF_Market_Share__c,
x.priority_type__C,
x.phone,
/* x.ownerid,
x.ID ? */
rn
from account a
) where rn=1
join [User] u on u.id = x.OwnerId
left join Task t on t.WhatId = x.Id
where t.Activity_Type__c <> 'attempt' and
( Advisor_Approach__c like 'CAPFINANCIAL_SECURITIES%' )
There's a lot going on here, so to start out we need to format it with better indentation. This helps make it obvious we have two where clauses at the same nesting level, one of which is out of place (before the JOINs).
Looking deeper, I see a MAX() function, but it's not allowed in this context unless you also have a GROUP BY clause. We're also missing an alias for the inner query... perhaps this is where the x is supposed to come from? And the inner nested query references columns from tables in the outer query, which are not yet available. Also, you can't use a windowing function result at the same level of nesting, and I don't see what the User table is needed for. After we fix most of this, we also no longer need to nest the accounts table by itself.
This is closest I could come to fixing all these issues, but I know it's still wrong because of (at least) the MAX() function:
select *
from (
select a.name,
/* MAX(t.ActivityDate) as Last_Activity_Date, */
a.Channel__c,
a.Account_18_Digit_ID__c,
a.Advisor_Approach__c,
a.name,
a.BillingState,
a.Current_Month_WT_AUM__c,
a.WT_ETF_AUM_mil__c,
a.ETF_AUM__c,
a.WT_ETF_Market_Share__c,
a.priority_type__C,
a.phone,
/* a.ownerid, a.ID ? */
row_number () over (partition by t.activity_type__c order by t.ActivityDate desc) as rn
from account a
join [User] u on u.id = a.OwnerId
left join Task t on t.WhatId = a.Id
where t.Activity_Type__c <> 'attempt'
and ( Advisor_Approach__c like 'CAPFINANCIAL_SECURITIES%' )
) x
where rn = 1
It seems you have the query upside-down. You seem to be trying to join the lastest task per activity type to the account, but instead of selecting from account and then joining the latest tasks, you are trying to select the lastest task from the account table somehow, which cannot work of course.
Something along the lines of:
select *
from account a
join [User] u on u.id = a.ownerid
left join
(
select
task.*,
row_number() over (partition by activity_type__c order by activitydate desc) as rn
from task
where activity_type__c <> 'attempt'
) t on t.whatid = a.id and t.rn = 1
where a.advisor_approach__c like 'CAPFINANCIAL_SECURITIES%'
order by a.id, t.activity_type__c;

SQL Intersect not supported in Phoenix , alternative for intersect in phoenix?

I have the following SQL expression:
SELECT SS_ITEM_SK AS POP_ITEM_SK
FROM (SELECT SS_ITEM_SK
FROM (SELECT SS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT SS_ITEM_SK,COUNT(SS_ITEM_SK) AS ITEM_SOLD,COUNT(SR_ITEM_SK) AS ITEM_RETURNED FROM STORE_SALES1 right outer join STORE_RETURNS1 on SS_TICKET_NUMBER = SR_TICKET_NUMBER AND SS_ITEM_SK = SR_ITEM_SK GROUP BY SS_ITEM_SK)))
INTERSECT
SELECT CS_ITEM_SK AS POP_ITEM_SK FROM (SELECT CS_ITEM_SK
FROM (SELECT CS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT CS_ITEM_SK,COUNT(CS_ITEM_SK) AS ITEM_SOLD,COUNT(CR_ITEM_SK) AS ITEM_RETURNED FROM CATALOG_SALES1 right outer join CATALOG_RETURNS1 on CS_ORDER_NUMBER = CR_ORDER_NUMBER and CS_ITEM_SK = CR_ITEM_SK GROUP BY CS_ITEM_SK)))
INTERSECT
SELECT WS_ITEM_SK AS POP_ITEM_SK FROM (SELECT WS_ITEM_SK
FROM (SELECT WS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT WS_ITEM_SK,COUNT(WS_ITEM_SK) AS ITEM_SOLD,COUNT(WR_ITEM_SK) AS ITEM_RETURNED FROM WEB_SALES1 right outer join WEB_RETURNS1 on WS_ORDER_NUMBER = WR_ORDER_NUMBER AND WS_ITEM_SK = WR_ITEM_SK GROUP BY WS_ITEM_SK)))
Apache phoenix is not supporting the keyword INTERSECT. Can somebody please help me to correct above query without using INTERSECT?
I think there are multiple ways you can do this:
Join Method
select * from ((query1 inner join query2 on column_names) inner join query3 on column_names)
Exists Method
(query1 where exists (query2 where exists (query3)) )
In Method
(query1 where column_name in (query2 where column_name in (query3)) )
References: https://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-intersect-or-except-often-enough/
and http://phoenix.apache.org/subqueries.html
Although I would use the exists/in over the join since if these queries return huge data then you might have to optimize your queries using this:
https://phoenix.apache.org/joins.html

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..
SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).
What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

Combine two queries, one based upon the other, into one

I have two queries, one based partly on the other. Is there a way of combining them into a single query?
SELECT tblIssues.*, tblIssues.NewsletterLookup
FROM tblIssues
WHERE (((tblIssues.NewsletterLookup)=5));
SELECT tblArea.ID, tblArea.AreaName
FROM tblArea LEFT JOIN Query2 ON tblArea.ID = Query2.[AreaLookup]
WHERE (((tblArea.Dormant)=False) AND ((Query2.tblIssues.NewsletterLookup) Is Null));
If you want to do this in a single query without Query2, you can use the equivalent SQL from Query2 as a subquery in your second example:
SELECT a.ID, a.AreaName
FROM
tblArea AS a
LEFT JOIN
(
SELECT i.*
FROM tblIssues AS i
WHERE i.NewsletterLookup=5
) AS sub
ON a.ID = sub.[AreaLookup]
WHERE
a.Dormant=False
AND sub.NewsletterLookup Is Null;
You ment to perform a JOIN like
SELECT ti.*, tblArea.ID, tblArea.AreaName
FROM tblArea ta
LEFT JOIN tblIssues ti ON ta.ID = ti.[AreaLookup]
WHERE (ti.NewsletterLookup=5 OR ti.NewsletterLookup Is Null)
AND ta.Dormant=False;