is two inner joins is best for optimization of query - sql

i just got a challenge from school optimise this query this is theoretical question
Challenge :
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date"),'YYYY-MM') AS "date_month",
COUNT(DISTINCT CASE WHEN (tableB."date" IS NOT NULL) THEN tableB._id ELSE NULL END) AS "tableB.countB",
COUNT(DISTINCT CASE WHEN (tableC."date" IS NOT NULL) THEN tableC._id ELSE NULL END) AS "tableC.countC"
FROM tableA AS tableA
LEFT JOIN tableB AS tableB ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableB."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
LEFT JOIN tableC AS tableC ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableC."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
WHERE tableA."date" >= CONVERT_TIMEZONE ('America/Los_Angeles','UTC',DATEADD (month,-17,DATE_TRUNC('month',DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
for optimize, i just remove case statements in above mentioned query i think this will also improve the efficiency of query
SELECT To_char(Convert_timezone ('UTC','America/Los_Angeles',tablea."date"),'YYYY-MM') AS "date_month",
Count(DISTINCT
decode(tableb."date", not null,tableb._id,null)
AS "tableB.countB",
Count(DISTINCT
decode(tablec."date", not null,tablec._id ,null)
AS "tableC.countC"
FROM tablea AS tablea
LEFT JOIN tableb AS tableb
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tableb."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
LEFT JOIN tablec AS tablec
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tablec."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
WHERE tablea."date" >= convert_timezone ('America/Los_Angeles','UTC',Dateadd (month,-17,Date_trunc('month',Date_trunc('day',Convert_timezone ('UTC','America/Los_Angeles',Getdate ())))) group BY 1 ORDER BY 1 DESC limit 500;
what you suggest if we remove one left join and merge the statement
is that fine for optimization

... or, use a shorter alias that actually makes the SQL shorter and cleaner. This also helps read-ability. Also, format it to separate clauses (Select, From, Join, Where, Order By, Group by, Having, etc. so they are easy to separate and distinguish with the eye. and use indentation consistent with the logical structure that supports, and does not hinder, you ability to separate those sections one from another.
Just as an example, here's your first SQL query re formatted, but identical in logical structure to what you posted:
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles', a.date),'YYYY-MM') date_month,
COUNT(DISTINCT CASE WHEN (b."date" IS NOT NULL) THEN b._id ELSE NULL END) countB,
COUNT(DISTINCT CASE WHEN (c."date" IS NOT NULL) THEN c._id ELSE NULL END) countC
FROM tableA a
LEFT JOIN tableB b
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',b.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
LEFT JOIN tableC c
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',c.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
WHERE a.date >= CONVERT_TIMEZONE ('America/Los_Angeles', 'UTC',
DATEADD (month,-17,DATE_TRUNC('month',
DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',
GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Here is an optimized version
SELECT DatePart(month, a.Date-8/24) date_month,
sum(case when b.date is Not null then 1 else 0 end) countb,
sum(case when c.date is Not null then 1 else 0 end) countc,
FROM tableA a
LEFT JOIN tableB b
ON b.Date = a.Date -- Timezone offsets are not necessary,
LEFT JOIN tableC c
ON c.date = a.date -- both in same timezone
WHERE a.date >= DateAdd(hour, 8,
DATEADD (month,-17,DATE_TRUNC('month',
GETDATE () ))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;

Presumably, the _id columns are unique. So:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC','America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) = DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) LEFT JOIN
tableC c
ON DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', c."date")) = DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date")
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
Then, the date conversions in the ON clause don't seem necessary, because the two sides are being converted from the same time zone. If the values have no time component (as suggested by a name like date), then the DATE() is not needed either:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON b."date" = b."date" LEFT JOIN
tableC c
ON c."date" = a."date"
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
The WHERE clause is fine. It can take advantage of an index on a(date).

Related

How to apply filter using a joined table

I'm trying to apply a filter to my query (accounts.provider = 'z') using the accounts table. The query I have at the moment is not applying the filter correctly, the full list of payments is being added up, regardless of the provider condition. The reason why I'm using table x to join the accounts table is because table t doesn't have the account_id column to allow me to join it with the accounts table.
This is my current query
SELECT
distinct on (x.day) x.day,
coalesce(pending_payments,0)
from
(( SELECT day::date
FROM generate_series(timestamp '2017-03-13', current_date + interval '1 week', interval '1 day') day
) d
left JOIN (
SELECT date_trunc('day', payment_date)::date AS day,
sum(case when payment_amount > 0
and description not ilike '%credit%'
and state = 'pending'
then payment_amount end) as pending_payments
FROM payments
GROUP BY 1
) t USING (day) inner join payments on payments.payment_date = t.day) x
inner join accounts on accounts.id = x.account_id and accounts.provider = 'z'
where day <= current_date + interval '1 week'
and day >= current_date - interval'6 months'
ORDER BY x.day desc
Thanks for your help
Updated query based on suggestions in the comments but it's not producing the right outcome (see comments).
SELECT
distinct on (t.day) t.day as day,
coalesce(pending_payments,0)
from
( SELECT day::date
FROM generate_series(timestamp '2017-03-13', current_date + interval '1 week', interval '1 day') day
) d
left JOIN (
SELECT date_trunc('day', t.payment_date)::date AS day,
sum(case when t.payment_amount > 0
and t.description not ilike '%credit%'
and t.state = 'success'
then t.payment_amount end) as pending_payments
FROM payments t
inner join payments p on p.payment_date = date_trunc('day', t.payment_date)::date
inner join accounts on accounts.id = p.account_id and accounts.provider = 'z'
where date_trunc('day', t.payment_date)::date <= current_date + interval '1 week'
and date_trunc('day', t.payment_date)::date >= current_date - interval'1 months'
GROUP BY 1
) t USING (day)
ORDER BY day desc
You are calculating the pending_payments (In sub-query) before applying the accounts.provider = 'z' condition.
You should replace this code:
....
....
left JOIN (
SELECT date_trunc('day', payment_date)::date AS day,
sum(case when payment_amount > 0
and description not ilike '%credit%'
and state = 'pending'
then payment_amount end) as pending_payments
FROM payments
GROUP BY 1
) t USING (day) inner join payments on payments.payment_date = t.day) x
inner join accounts on accounts.id = x.account_id and accounts.provider = 'z'
....
....
with
....
....
left JOIN (
SELECT date_trunc('day', t.payment_date)::date AS day,
sum(case when t.payment_amount > 0
and t.description not ilike '%credit%'
and t.state = 'pending'
then t.payment_amount end) as pending_payments
FROM payments t
inner join payments p on p.payment_date = date_trunc('day', t.payment_date)::date
inner join accounts on accounts.id = p.account_id and accounts.provider = 'z'
GROUP BY 1
) t
....
....

sql join not taking all records from another table

I have a query like this
WITH CTE AS
(
SELECT
U.Name, U.Adluserid AS 'Empid',
MIN(CASE WHEN IOType = 0 THEN Edatetime END) AS 'IN',
MAX(CASE WHEN IOType = 1 THEN Edatetime END) AS 'out',
(CASE
WHEN MAX(E.Status) = 1 THEN 'AL'
WHEN MAX(E.Status) = 2 THEN 'SL'
ELSE 'L'
END) AS leave_status
FROM
Mx_ACSEventTrn
RIGHT JOIN
Mx_UserMst U ON Mx_ACSEventTrn.UsrRefcode = U.UserID
LEFT JOIN
Tbl_Zeo_Empstatus E ON Mx_ACSEventTrn.UsrRefcode = E.Emp_Id
WHERE
CAST(Edatetime AS DATE) BETWEEN '2019-11-03' AND '2019-11-03'
GROUP BY
U.Name, U.Adluserid
)
SELECT
[Name], [Empid], [IN], [OUT],
(CASE
WHEN CAST([IN] AS TIME) IS NULL THEN CAST(leave_status AS NVARCHAR(50))
WHEN CAST([IN] AS TIME) < CAST('08:15' AS TIME) THEN 'P'
ELSE 'L'
END) AS status
FROM
CTE
In my employee master table Mx_UserMst I have 67 employees. But here it is showing only a few employees those who are punched. I want to show all employees from employee master
I believe that the problem is his WHERE clause:
where cast(Edatetime as date) between '2019-11-03' and '2019-11-03'
Why not cast(Edatetime as date) = '2019-11-03'?
I'm not sure in which table the column Edatetime belongs (you should qualify all the columns with the correct table name/alias).
You must move the condition to an ON clause:
WITH CTE AS
(
select U.Name,U.Adluserid as 'Empid',
min(case when IOType=0 then Edatetime end) as 'IN',
max(case when IOType=1 then Edatetime end) as 'out',
case max(E.Status) when 1 then 'AL' when 2 then 'SL' else 'L' end as leave_status
from Mx_UserMst U
left join Mx_ACSEventTrn on Mx_ACSEventTrn.UsrRefcode=U.UserID and (cast(Edatetime as date) between '2019-11-03' and '2019-11-03')
left join Tbl_Zeo_Empstatus E on Mx_ACSEventTrn.UsrRefcode=E.Emp_Id
group by U.Name,U.Adluserid
)
SELECT [Name], [Empid],[IN],[OUT],
case
when cast([IN] as time) is null then cast(leave_status as nvarchar(50))
when cast([IN] as time) < cast('08:15' as time) then 'P'
else 'L'
end as status
FROM CTE
If Edatetime belongs to Tbl_Zeo_Empstatus move the condition to the next join's ON clause.
I also changed the RIGHT to a LEFT join so to make the statement more readable.
If you want to keep everything in a particular table, then that should be the first table in the FROM clause. Subsequent joins should be LEFT JOINs and conditions on subsequent tables should be in the ON clause rather than the WHERE clause.
I would also advise you to use table aliases and to only use single quotes for string and date constants -- NOT column aliases.
The following assumes that IOType and Edatetime are in the table Mx_ACSEventTrn. I should not have to guess. You should qualify all column names in the query.
WITH CTE AS (
SELECT U.Name, U.Adluserid AS Empid,
MIN(CASE WHEN AE.IOType = 0 THEN AE.Edatetime END) AS in_dt,
MAX(CASE WHEN AE.IOType = 1 THEN AE.Edatetime END) AS out_dt,
(CASE WHEN MAX(ES.Status) = 1 THEN 'AL'
WHEN MAX(ES.Status) = 2 THEN 'SL'
ELSE 'L'
END) AS leave_status
FROM Mx_UserMst U LEFT JOIN
Mx_ACSEventTrn AE
ON AE.UsrRefcode = U.UserID AND
CAST(AE.Edatetime AS DATE) BETWEEN '2019-11-03' AND '2019-11-03' LEFT JOIN
Tbl_Zeo_Empstatus ES
ON AE.UsrRefcode = ES.Emp_Id AND
GROUP BY U.Name, U.Adluserid
)
SELECT Name, Empid, IN_DT, OUT_DT,
(CASE WHEN IN_DT IS NULL THEN leave_status
WHEN CAST(IN_DT AS TIME) < CAST('08:15' AS TIME) THEN 'P'
ELSE 'L'
END) AS status
FROM CTE;
Some more points:
Don't name aliases things like IN that are already key words. That is why I gave it the name IN_DT.
There is no reason to cast to a TIME to compare to NULL.
I don't see a reason to cast to NVARCHAR(50) in the outer CASE expression.

How to use subquery in select clause that joins to from clause subquery

I am trying to join a subquery to a from clause subquery. However, doing so causes the following error:
SQL Error [208] [S0002]: Invalid object name 'transactions'
I am trying to rewrite multiple queries to fit them into 1 query because the query are almost identical, only a where clause is different.
Here is one of my attempts:
SELECT
transactions.OpeningDateFormatted,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.transactiontypeid = 5) AS AdjustmentSum,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.transactiontypeid = 1) AS InterestSum
FROM
(SELECT
FORMAT(files.OpeningDate, 'yyyy-MM') as OpeningDateFormatted,
amount,
transactiontypeid
FROM
FilesTransactions
INNER JOIN
files ON files.id = filestransactions.exid
WHERE
FilesTransactions.TransactionDate BETWEEN '2015-10-15' AND '2019-10-15'
AND ExID IN (SELECT id FROM files
WHERE files.OpeningDate BETWEEN '2015-10-01' AND '2019-09-30'
AND files.CustomerID = 3258)) transactions
GROUP BY
transactions.OpeningDateFormatted
I have also tried to do the following but it gives me the same amount for every months:
select
FORMAT(files.OpeningDate, 'yyyy-MM') as OpeningDateFormatted,
(select sum(FilesTransactions.Amount) as CollectedSum from FilesTransactions f2 join FilesTransactions on FilesTransactions.id=f2.id where f2.transactiontypeid = 5 and FORMAT(f2.TransactionDate, 'yyyy-MM') like FORMAT(FilesTransactions.TransactionDate, 'yyyy-MM') )
FROM
FilesTransactions
inner join files on files.id = filestransactions.exid
where
FilesTransactions.TransactionDate between '2015-10-15' and '2019-10-15' and
ExID in
(
select id from files where files.OpeningDate between '2015-10-15' and '2019-10-15' and files.CustomerID = 3258
)
GROUP BY
FORMAT(FilesTransactions.TransactionDate, 'yyyy-MM'), FORMAT(files.OpeningDate, 'yyyy-MM')
What I'd like to have is one query that gives me the following
OpeningDateFormatted | AdjustmentSum | InterestSum
2015-11 0 45
2015-12 45.25 7
... ... ...
Consider using conditional aggregation and avoid the derived table entirely.
SELECT FORMAT(files.OpeningDate, 'yyyy-MM') as OpeningDateFormatted
, SUM(CASE when transactiontypeid = 5 then amount else 0 end) AS AdjustmentSum
, SUM(CASE when transactionTypeID = 1 then amount else 0 end) AS InterestSum
FROM FilesTransactions
INNER JOIN files
ON files.id = filestransactions.exid
WHERE FilesTransactions.TransactionDate BETWEEN '2015-10-15' AND '2019-10-15'
AND ExID IN (SELECT id
FROM files
WHERE files.OpeningDate BETWEEN '2015-10-01' AND '2019-09-30'
AND files.CustomerID = 3258)) transactions
GROUP BY FORMAT(files.OpeningDate, 'yyyy-MM')
I might be more inclined to use an exists instead of an IN for exID...
or if you're really stuck on using the transactions table... use a Common table expression (CTE) ...
WITH Transactions as (SELECT
FORMAT(files.OpeningDate, 'yyyy-MM') as OpeningDateFormatted,
amount,
transactiontypeid
FROM
FilesTransactions
INNER JOIN
files ON files.id = filestransactions.exid
WHERE
FilesTransactions.TransactionDate BETWEEN '2015-10-15' AND '2019-10-15'
AND ExID IN (SELECT id FROM files
WHERE files.OpeningDate BETWEEN '2015-10-01' AND '2019-09-30'
AND files.CustomerID = 3258)
SELECT transactions.OpeningDateFormatted,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.transactiontypeid = 5) AS AdjustmentSum,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.transactiontypeid = 1) AS InterestSum
FROM transactions
GROUP BY transactions.OpeningDateFormatted

How to optimize this query for my school project

It's my assignment kindly help me to optimize below two queries.
Optimize assignment 1:
SELECT
n.node_id,
MIN(LEAST(n.date,ec.date)) date
FROM
n, ec
WHERE
(n.node_id = ec.node_id_from OR n.node_id = ec.node_id_to)
AND n.date - ec.date > 0
GROUP BY
n.node_id;
Optimize assignment 2:
SELECT
TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles', tableA."date"), 'YYYY-MM') AS "date_month",
COUNT(DISTINCT CASE WHEN (tableB."date" IS NOT NULL) THEN tableB._id ELSE NULL END) AS "tableB.countB",
COUNT(DISTINCT CASE WHEN (tableC."date" IS NOT NULL) THEN tableC._id ELSE NULL END) AS "tableC.countC"
FROM
tableA AS tableA
LEFT JOIN
tableB AS tableB ON (DATE (CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles',tableB."date"))) = (DATE (CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles',tableA."date")))
LEFT JOIN
tableC AS tableC ON (DATE (CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles',tableC."date"))) = (DATE (CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles',tableA."date")))
WHERE
tableA."date" >= CONVERT_TIMEZONE ('America/Los_Angeles', 'UTC', DATEADD (month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles',GETDATE ()))))
GROUP BY
1
ORDER BY
1 DESC
LIMIT 500;
use short alias that makes sql query shorter and cleaner.
Here is the optimized version of second query
SELECT DatePart(month, a.Date-8/24) date_month,
sum(case when b.date is Not null then 1 else 0 end) countb,
sum(case when c.date is Not null then 1 else 0 end) countc,
FROM tableA a
LEFT JOIN tableB b
ON b.Date = a.Date -- Timezone offsets are not necessary,
LEFT JOIN tableC c
ON c.date = a.date -- both in same timezone
WHERE a.date >= DateAdd(hour, 8,
DATEADD (month,-17,DATE_TRUNC('month',
GETDATE () ))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Very simple solution for assignment #1
SELECT n.node_id, MIN(ec.date) as date
FROM n
JOIN ec
ON n.node_id IN (ec.node_id_from, ec.node_id_to) AND ec.date < n.date
GROUP BY n.node_id;
just using min(ec.date) instead of MIN(LEAST(n.date,ec.date)).
Because the JOIN already forces the ec.date to be lower than n.date anyway.
Also note that a where clause like
where (x >= y and x <= z)
can be changed to
where (x between y and z)

Count based on subset of data

I have a join to a table and I want to include all users who have a record after a certain date, but to only include records after another date in the count.
Here is my SQL :
select a.userid, count(ce.codeentryid)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
So here I want to view a list of all users who have entered a code after 1st Jan 2011, but to only include in the count codes entered after 1st Jan 2013. How would I do this?
EDIT : So this would give me all users who have entered a code after 01/01/2011, but only include codes entered after 01/01/2013 in the count?
select a.userid, count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
Remove the date condition from the ON clause, and use this in the SELECT clause instead of COUNT(ce.codeentryid):
SUM(CASE WHEN ce.entrydate > '2011-01-01 00:00:00' THEN 1 ELSE 0 END)
Your question doesn't make sense, because using two dates is redundant. Unless I assume that you want users whose first count is after 2011-01-01 and then only count what happens after 2013-01-01.
If that is what you want, then use a having clause:
select a.userid, sum(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a inner join
[profile] p
on a.userid = p.userid inner join
codesentered ce
on a.userid = ce.userid
where a.camp = 0
group by a.userid
having min(ce.entrydate) > '2011-01-01 00:00:00'
order by a.userid;
Note that count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END) is the same as count(*). count() counts non-null values. Use sum() instead.