New to SQL writing a query with nesting and joins

New to SQL writing a query with nesting and joins - sql

Here is my query I am not sure why it wont run. it doesnt seem to like the joins. It ran without the nesting but now the now the joins wont work.
select *
from (
select * ,
row_number () over (partition by t.activity_type__c order by t.ActivityDate desc) ,
x.name,
MAX(t.ActivityDate) as Last_Activity_Date,
/* all these below x or a alias? */
x.Channel__c,
x.Account_18_Digit_ID__c,
x.Advisor_Approach__c,
x.name,
x.BillingState,
x.Current_Month_WT_AUM__c,
x.WT_ETF_AUM_mil__c,
x.ETF_AUM__c,
x.WT_ETF_Market_Share__c,
x.priority_type__C,
x.phone,
/* x.ownerid,
x.ID ? */
rn
from account a
) where rn=1
join [User] u on u.id = x.OwnerId
left join Task t on t.WhatId = x.Id
where t.Activity_Type__c <> 'attempt' and
( Advisor_Approach__c like 'CAPFINANCIAL_SECURITIES%' )

There's a lot going on here, so to start out we need to format it with better indentation. This helps make it obvious we have two where clauses at the same nesting level, one of which is out of place (before the JOINs).
Looking deeper, I see a MAX() function, but it's not allowed in this context unless you also have a GROUP BY clause. We're also missing an alias for the inner query... perhaps this is where the x is supposed to come from? And the inner nested query references columns from tables in the outer query, which are not yet available. Also, you can't use a windowing function result at the same level of nesting, and I don't see what the User table is needed for. After we fix most of this, we also no longer need to nest the accounts table by itself.
This is closest I could come to fixing all these issues, but I know it's still wrong because of (at least) the MAX() function:
select *
from (
select a.name,
/* MAX(t.ActivityDate) as Last_Activity_Date, */
a.Channel__c,
a.Account_18_Digit_ID__c,
a.Advisor_Approach__c,
a.name,
a.BillingState,
a.Current_Month_WT_AUM__c,
a.WT_ETF_AUM_mil__c,
a.ETF_AUM__c,
a.WT_ETF_Market_Share__c,
a.priority_type__C,
a.phone,
/* a.ownerid, a.ID ? */
row_number () over (partition by t.activity_type__c order by t.ActivityDate desc) as rn
from account a
join [User] u on u.id = a.OwnerId
left join Task t on t.WhatId = a.Id
where t.Activity_Type__c <> 'attempt'
and ( Advisor_Approach__c like 'CAPFINANCIAL_SECURITIES%' )
) x
where rn = 1

It seems you have the query upside-down. You seem to be trying to join the lastest task per activity type to the account, but instead of selecting from account and then joining the latest tasks, you are trying to select the lastest task from the account table somehow, which cannot work of course.
Something along the lines of:
select *
from account a
join [User] u on u.id = a.ownerid
left join
(
select
task.*,
row_number() over (partition by activity_type__c order by activitydate desc) as rn
from task
where activity_type__c <> 'attempt'
) t on t.whatid = a.id and t.rn = 1
where a.advisor_approach__c like 'CAPFINANCIAL_SECURITIES%'
order by a.id, t.activity_type__c;

Related

postgres: LEFT JOIN table and field does not exist

This is my query
SELECT org.id,
org.name,
org.type,
org.company_logo,
(SELECT org_profile.logo_url FROM org_profile WHERE org_profile.org_id = org.id AND org_profile.status = 'active' ORDER BY org_profile.id DESC LIMIT 1) as logo_url,
org.short_description,
org_profile.value_prop,
count(*) OVER () AS total
FROM org
LEFT JOIN user_info ON user_info.id = org.approved_by
INNER JOIN (select distinct org_profile.org_id from org_profile) org_profile ON org_profile.org_id = org.id
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC
I am using LEFT JOIN query with my org_profile table. I used distinct for unique org id but the problem is org_profile.value_prop column does not work. The error is showing column org_profile.value_prop does not exist
I'm trying to solve this issue. But I don't figure out this issue.

basically, the error informs that you try to get the value_prop field from org_profile subquery, which basically doesn't exist.
It's difficult to give a working query by writting just on the paper, but I think that:
it's worth to apply the handy aliasses for each subquery
deduplication, if required, should be done in the subquery. When multiple fields used DISTINCT may be insufficient - RANK function may be required.
you make some operations to get the logo_url by a scalar subquery - it seems a bit strange, especially the same table is used in JOIN - I would suggest to move all logic related to org_profile to the subquery. Scalar expressions would throw an error in case multiple values would be found in output.
SELECT
org.id,
org.name,
org.type,
org.company_logo,
prof.logo_url,
org.short_description,
prof.value_prop,
count(*) OVER () AS total
FROM org
JOIN (
select distinct org_id, logo_url, value_prop -- other deduplication type (RANK) may be required
from org_profile
where status = 'active' -- ?
) prof ON org.id = prof.org_id
LEFT JOIN user_info usr ON usr.id = org.approved_by
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC

SQL not efficient enough, tuning assistance required

We have some SQL that is ok on smaller data volumes but poor once we scale up to selecting from larger volumes. Is there a faster alternative style to achieve the same output as below? The idea is to pull back a single unique row to get latest version of the data... The SQL does reference another view but this view runs very fast - so we expect the issue is here below and want to try a different approach
SELECT *
FROM
(SELECT (select CustomerId from PremiseProviderVersionsToday
where PremiseProviderId = b.PremiseProviderId) as CustomerId,
c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
ROW_NUMBER() OVER (PARTITION BY b.PremiseProviderId
ORDER BY a.effectiveDate DESC) AS rowNumber
FROM PremiseMeterProviderVersions a, PremiseProviders b,
PremiseMeterProviders c
WHERE (a.TransactionDateTimeEnd IS NULL
AND a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND b.PremiseProviderId = c.PremiseProviderId)
) data
WHERE data.rowNumber = 1

As Bilal Ayub stated above, the correlated subquery can result in performance issues. See here for more detail. Below are my suggestions:
Change all to explicit joins (ANSI standard)
Use aliases that are more descriptive than single characters (this is mostly to help readers understand what each table does)
Convert data subquery to a temp table or cte (temp tables and ctes usually perform better than subqueries)
Note: normally, you should explicitly create and insert into your temp table but I chose not to do that here as I do not know the data types of your columns.
SELECT d.CustomerId
, c.D3001_MeterId
, b.CoreSPID
, a.EnteredBy
, rowNumber = ROW_NUMBER() OVER(PARTITION BY b.PremiseProviderId ORDER BY a.effectiveDate DESC)
INTO #tmp_RowNum
FROM PremiseMeterProviderVersions a
JOIN PremiseMeterProviders c ON c.PremiseMeterProviderId = a.PremiseMeterProviderId
JOIN PremiseProviders b ON b.PremiseProviderId = c.PremiseProviderId
JOIN PremiseProviderVersionsToday d ON d.PremiseProviderId = b.PremiseProviderId
WHERE a.TransactionDateTimeEnd IS NULL
SELECT *
FROM #tmp_RowNum
WHERE rowNumber = 1

You are running a correlated query that will run in loop, if size of table is small it will be faster, i would suggest to change it and try to join the table in order to get customerid.
(select CustomerId from PremiseProviderVersionsToday where PremiseProviderId = b.PremiseProviderId) as CustomerId

Consider derived tables including an aggregate query that calculates maximum EffectoveDate by PremiseProviderId and unit level query, each using explicit joins (current ANSI SQL standard) and not implicit as you currently use:
SELECT data.*
FROM
(SELECT t.CustomerId, c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
b.PremiseProviderId, a.EffectiveDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
INNER JOIN PremiseProviderVersionsToday t
ON t.PremiseProviderId = b.PremiseProviderId
) data
INNER JOIN
(SELECT b.PremiseProviderId, MAX(a.EffectiveDate) As MaxEffDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
GROUP BY b.PremiseProviderId
) agg
ON data.PremiseProviderId = agg.PremiseProviderId
AND data.EffectiveDate = agg.MaxEffDate

aggregate functions are not allowed in WHERE

I am using this query to find the unique records by latest date using postgresql. The error I am having is "aggregate functions are not allowed in WHERE". How to fix error “aggregate functions are not allowed in WHERE” Following this link I have tried to use inner select function. But this did not work. Please help me to edit the query. I am using PgAdmin III as client.
SELECT Distinct t1.pa_serial_
,t1.homeownerm_name
,t1.districtvdc
,t1.date as firstrancheinspection_date
,t1.status
,t1.name_of_data_collector
,t1.fulcrum_id
,first_tranche_inspection_v2_reporting_questionnaire.date_reporting
From first_tranche_inspection_v2 t1
LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire ON (t1.fulcrum_id = first_tranche_inspection_v2_reporting_questionnaire.fulcrum_parent_id)
where first_tranche_inspection_v2_reporting_questionnaire.date_reporting = (
select Max(first_tranche_inspection_v2_reporting_questionnaire.date_reporting)
from first_tranche_inspection_v2
where first_tranche_inspection_v2.pa_serial_ = t1.pa_serial_
);

You want to join the latest reporting questionaire per inspection. In PostgreSQL you can use DISTINCT ON for this:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join
(
select distinct on (fulcrum_parent_id) *
from first_tranche_inspection_v2_reporting_questionnaire
order by fulcrum_parent_id, date_reporting desc
) rq on rq.fulcrum_parent_id = fti.fulcrum_id;
Or use standard SQL's ROW_NUMBER:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join
(
select
ftirq.*,
row_number() over (partition by fulcrum_parent_id order by date_reporting desc) as rn
from first_tranche_inspection_v2_reporting_questionnaire ftirq
) rq on rq.fulcrum_parent_id = fti.fulcrum_id and rq.rn = 1;
What you were trying to do should look like this:
select fti.*, rq.*
from first_tranche_inspection_v2 fti
left join first_tranche_inspection_v2_reporting_questionnaire rq
on rq.fulcrum_parent_id = fti.fulcrum_id
and (rq.fulcrum_parent_id, rq.date_reporting) in
(
select fulcrum_parent_id, max(date_reporting)
from first_tranche_inspection_v2_reporting_questionnaire
group by fulcrum_parent_id
);
This works, too, and only has the disadvantage that you read the table first_tranche_inspection_v2_reporting_questionnaire twice.

DISTINCT often ends up being implemented with a GROUP BY query in many RDBMS. What I think is happening in your current query is that there is already an implicit aggregation involving the columns in your SELECT. Hence, the correlated subquery involving MAX() actually is an aggregation because of the DISTINCT.
One quick workaround might be to perform the original query without DISTINCT, then subquery the result set to retain only distinct records:
WITH cte AS (
SELECT t1.pa_serial_,
t1.homeownerm_name,
t1.districtvdc,
t1.date as firstrancheinspection_date,
t1.status,
t1.name_of_data_collector,
t1.fulcrum_id,
t2.date_reporting
FROM first_tranche_inspection_v2 t1
LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire t2
ON t1.fulcrum_id = t2.fulcrum_parent_id
WHERE t2.date_reporting = (SELECT MAX(t.date_reporting)
FROM first_tranche_inspection_v2 t
WHERE t.pa_serial_ = t1.pa_serial_)
);
SELECT DISTINCT t.pa_serial_,
t.homeownerm_name,
t.districtvdc,
t.firstrancheinspection_date,
t.status,
t.name_of_data_collector,
t.fulcrum_id,
t.date_reporting
FROM cte t
Note that I went ahead and added an alias to the second table in your join, which leaves the query much easier to read.

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..

SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).

What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

"Simple" SQL Query

Each of my clients can have many todo items and every todo item has a due date.
What would be the query for discovering the next undone todo item by due date for each file? In the event that a client has more than one todo, the one with the lowest id is the correct one.
Assuming the following minimal schema:
clients (id, name)
todos (id, client_id, description, timestamp_due, timestamp_completed)
Thank you.

I haven't tested this yet, so you may have to tweak it:
SELECT
TD1.client_id,
TD1.id,
TD1.description,
TD1.timestamp_due
FROM
Todos TD1
LEFT OUTER JOIN Todos TD2 ON
TD2.client_id = TD1.client_id AND
TD2.timestamp_completed IS NULL AND
(
TD2.timestamp_due < TD1.timestamp_due OR
(TD2.timestamp_due = TD1.timestamp_due AND TD2.id < TD1.id)
)
WHERE
TD2.id IS NULL
Instead of trying to sort and aggregate, you're basically answering the question, "Is there any other todo that would come before this one?" (based on your definition of "before"). If not, then this is the one that you want.
This should be valid on most SQL platforms.

This question is the classic pick-a-winner for each group. It gets posted about twice a day.
SELECT *
FROM todos t
WHERE t.timestamp_completed is null
and
(
SELECT top 1 t2.id
FROM todos t2
WHERE t.client_id = t2.client_id
and t2.timestamp_completed is null
--there is no earlier record
and
(t.timestamp_due > t2.timestamp_due
or (t.timestamp_due = t2.timestamp_due and t.id > t2.id)
)
) is null

SELECT c.name, MIN(t.id)
FROM clients c, todos t
WHERE c.id = t.client_id AND t.timestamp_complete IS NULL
GROUP BY c.id
HAVING t.timestamp_due <= MIN(t.timestamp_due)
Avoids a subquery, correlated or otherwise but introduces a bunch of aggregate operations which aren't much better.

Some Jet SQL, I realize it is unlikely that the questioner is using Jet, however the reader may be.
SELECT c.name, t.description, t.timestamp_due
FROM (clients c
INNER JOIN
(SELECT t.client_id, Min(t.id) AS MinOfid
FROM todos t
WHERE t.timestamp_completed Is Null
GROUP BY t.client_id) AS tm
ON c.id = tm.client_id)
INNER JOIN todos t ON tm.MinOfid = t.id

The following should get you close, first get the min time for each client, then lookup the client/todo information
SELECT
C.Id,
C.Name,
T.Id
T.Description,
T.timestamp_due
FROM
{
SELECT
client_id,
MIN(timestamp_due) AS "DueDate"
FROM todos
WHERE timestamp_completed IS NULL
GROUP BY ClientId
} AS MinValues
INNER JOIN Clients C
ON (MinValues.client_id = C.Id)
INNER JOIN todos T
ON (MinValues.client_id = T.client_id
AND MinValues.DueDate = T.timestamp_due)
ORDER BY C.Name
NOTE: Written assuming SQL Server

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

New to SQL writing a query with nesting and joins - sql

Related

postgres: LEFT JOIN table and field does not exist

SQL not efficient enough, tuning assistance required

aggregate functions are not allowed in WHERE

SQL - select only newest record with WHERE clause

"Simple" SQL Query

Categories

Resources