Window function issue - max over partition - sql

I try to rewrite such SQL statements (with many subqueries) to more efficient form using outer join and max/count/... over partition. Old statements:
select a.ID,
(select max(b.valA) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valB) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valC) from something b where a.ID = b.ID_T and b.status != 0),
(select max(b.valD) from something b where a.ID = b.ID_T)
from tool a;
What is important here - there is different condition for max(b.valD). Firstly I didn't noticed this difference and write something like this:
select distinct a.ID,
max(b.valA) over (partition by b.ID_T),
max(b.valB) over (partition by b.ID_T),
max(b.valC) over (partition by b.ID_T),
max(b.valD) over (partition by b.ID_T),
from tool a,
(select * from something
where status != 0) b
where a.ID = b.ID_T(+);
Could I use somewhere in max over partition this condition of b.status != 0 ? Or should I better add 3rd table to join like this:
select distinct a.ID,
max(b.valA) over (partition by b.ID_T),
max(b.valB) over (partition by b.ID_T),
max(b.valC) over (partition by b.ID_T),
max(c.valD) over (partition by c.ID_T),
from tool a,
(select * from something
where status != 0) b,
something c
where a.ID = b.ID_T(+)
and a.ID = c.ID_T(+);
The issue is with selecting and joining millions of rows, my example is just simplification of my query. Could anyone help me to achieve more efficient sql?

You could try to do this using CASE:
select a.ID,
max(CASE WHEN b.status=0 THEN b.valA END),
max(CASE WHEN b.status=0 THEN b.valB END),
max(CASE WHEN b.status=0 THEN b.valC END),
max(b.valD)
from tool a
left join something b ON( b.ID_T = a.ID )
group by a.ID;
Note that I replaced your implicit join by the "new" join-syntax for better readability.

One more way is to use JOIN and group by subquery:
select a.ID,
b.MAX_A,
b.MAX_B,
b.MAX_C,
b2.MAX_D
from tool a
LEFT JOIN
(
SELECT ID_T,max(valA) MAX_A, max(valB) MAX_B, max(valC) MAX_C
FROM something
WHERE status != 0
GROUP BY ID_T
) b
ON a.ID=b.ID_T
LEFT JOIN
(
SELECT ID_T, max(valD) MAX_D
FROM something
GROUP BY ID_T
) b2
ON a.ID=b2.ID_T

Related

Subquery problem when using WHERE in secondary SELECT

I have problem with query.
SELECT
a.code,
b.codename,
(SELECT COUNT(b.IdNum)
FROM
(SELECT *
FROM NumTable
WHERE YEAR(DateOfPoint) = 2019)) AS CountNumber
FROM
NumTable b
JOIN
CodeTable a ON a.id = b.id
WHERE
a.SellYear IS NOT NULL
My First question is about CountNumber is it ok ? I need to count only those b.IdNum that have DateOfPoint = 2019. It should only be to this field not to any other in this query, thats why I didn't use it in the end in WHERE.
Second question is about CountNumber too becouse I still get error msg that I got there incorrect syntax I was looking for it for about hour and couldn't find it.
Thanks
not sure what you are trying to get here but I think group by is more logicly here
SELECT a.code
,b.codename
,Sum(case when b.DateOfPoint= 2019 then 1 else 0) as CountNumber
FROM NumTable b
JOIN CodeTable a ON a.id = b.id
WHERE a.SellYear IS NOT NULL
group by a.code,b.codename
you will get a row for each code and codename and the number of dateofPoint in 2019 that it have if there are none it will return 0
You can use this query. I think, it will work for you:
SELECT a.code,
b.codename
FROM
NumTable b
JOIN
CodeTable a ON a.id = b.id
JOIN
(SELECT *
FROM NumTable
WHERE YEAR(DateOfPoint) = 2019) c ON c.id = b.id
JOIN
(SELECT id, COUNT(b.IdNum) FROM c) d ON c.id = d.id
WHERE
a.SellYear IS NOT NULL
You can use a window function:
SELECT c.code, n.codename, c.cnt_2019
FROM (SELECT n.*,
SUM(CASE WHEN YEAR(DateOfPoint) = 2019 THEN 1 ELSE 0 END) as cnt_2019
FROM NumTable n
) n JOIN
CodeTable c
ON c.id = n.id
WHERE c.SellYear IS NOT NULL;
Note that I also changed the table aliases so they are abbreviations for the table names rather than arbitrary letters.

find all card numbers in which the largest id oracle

I have a request. It works but I can't upgrade it. I want it to show not all records from id cards, but only in which the largest id, that is can be that there are records from id cards, but at one id 100, and at another also record 101, I want that in the answer there was only record from 101.
select a.id,
a.employee_id,
a.STATUS,
a.expiration_date,
a.ID_CARD
from EM_STATUS_CARD a
left join EM_CARD b on a.ID_CARD = b.ID_CARD
where b.del != 'true' or b.del is null
Its good practice to provide a sample data and expected result but in the absence of these information I believe you are looking for something like below query.
with main_query as (select a.id,
a.employee_id,
a.STATUS,
a.expiration_date,
a.ID_CARD
from EM_STATUS_CARD a
left join EM_CARD b on a.ID_CARD = b.ID_CARD
where b.del != 'true' or b.del is null)
select x.id, y.employee_id, y.id_card, y.status, y.expiration from
(select max(id) id, id_card from main_query group by id_card) x, main_query y
where a.id_card = b.id_card;
Should work just fine. I did some refactoring, as Oracle supports using for join
select id,
employee_id,
STATUS,
expiration_date,
ID_CARD
from EM_STATUS_CARD
left join EM_CARD using(ID_CARD)
where (del != 'true' or del is null) and id = (
select max(id)
from EM_STATUS_CARD
)

How to calculate the z score after joining 3 tables in MySQL

I have joined three tables A, B, D using this query,
SELECT [A].ID, [A].Surname, [A].[Given Name], [B].[Pre-U Grade], [D ].[Total Score], [B].[score]
FROM ([A] LEFT JOIN [D] ON [A].ID = [D].[Student ID]) INNER JOIN [B-Results] ON [A].ID = [B].ID
WHERE ((([B].[Pre-U Grade])=IsNumeric([B]![Pre-U Grade])) AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN"))) OR ((([B].[Pre-U Grade])>"0") AND (([D].[Total Score]) Is Not Null) AND (([A].Status) Not In ("REJECTED","OFFERED","WITHDRAWN")))
ORDER BY [D].[Date] DESC;
After joining the tables, the z-score for the 3 numerical columns needs to be calculated.
I came across this example
Calculating Z-Score for each row in MySQL? (simple)
but i didnt know how to use the code given for my problem statement. Can someone kindly help me with this?
SELECT
(pre-u_grade - AVG(pre-u_grade))/STD(pre-u_grade) z_pre-u_grade,
(total_score- AVG(total_score))/STD(total_score) z_total_score,
(score- AVG(score))/STD(score) z_score,
(SELECT
a.id,
a.surname,
a.given_name,
pre-u_grade,
total_score,
score
FROM
a
LEFT JOIN
d
ON
a.id = d.student id)
INNER JOIN
b.results
ON
a.id = b.id
WHERE
(
( b.pre-u_grade = ISNUMERIC(b ! pre-u_grade)
AND d.total score IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn) )
OR
( b.pre-u_grade > 0
AND d.total score ) IS NOT NULL
AND a.status NOT IN ( "rejected", "offered", "withdrawn" ) )
)
ORDER BY
d.date DESC) result;
Try this.

How rewrite COUNT DISTINCT?

I have a problem with only one reducer in hive, because of using count and distinct in one query.
How to rewrite select to eliminate this? Is it possible in window functions?
select
a.second_id,
if(a.proc_id = 'CONST1' and bb.third_id is not null,
count(distinct bb.first_id),
'') as qty
from a a
join (select
b.first_id,
b.second_id,
b.third_id
from b b) bb
on bb.second_id = a.second_id
group by
a.second_id,
a.proc_id,
bb.third_id;
This is your query:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then count(distinct bb.first_id)
end) as qty
from a join
(select b.first_id, b.second_id, b.third_id
from b
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
The count(distinct) can really be handled in the subquery, using group by and window functions. I don't see any value to not aggregating first, so:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then max(bb.num_firsts)
end) as qty
from a join
(select b.second_id, b.third_id,
count(distinct first_id) as num_firsts
from b
group by b.second_id, b.third_id
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
You are aggregating by second_id and third_id in the outer query. So there is only one row from the aggregated subquery in the outer query. The above version uses max(first_id), but you could also include num_firsts in the outer group by.
That still might not fix your problem, but this query is easier to modify. If I recall, the best approach in Hive is a select distinct subquery:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then max(bb.num_firsts)
end) as qty
from a join
(select b.second_id, b.third_id,
count(*) as num_firsts
from (select distinct second_id, third_id, first_id
from b
) b
group by b.second_id, b.third_id
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
This is the same thing if first_id is never null. This will count that as a separate value; if you don't want to, just filter them out.

Join Using Maximum Date

I have a table that I am trying to join with based on an ID, however I only want to join with the rows that have the maximum "PeriodDT" (a datetime column) for that ID.
I have tried using the Top 1 order by that "PeriodDT" however it will only let me select one column or I get the error:
Only one expression can be specified in the select list when the
subquery is not introduced with EXISTS
Here is the query I Used:
Select a.Name as PropertyName,
a.PropertyNum as PropertyNum,
a.City as City,
a.State as State,
b.Name as LoanName,
b.LoanNum,
(select Top 1 c.IntRate as IntRate,
c.MaturityDT
from vNoteDetail c where c.LoanID = b.LoanID Order By c.PeriodDT DESC)
from vProperty a join vLoan b on a.LoanID = b.LoanId
Is there a better way to do this?
Try
Select a.Name as PropertyName,
a.PropertyNum as PropertyNum,
a.City as City,
a.State as State,
b.Name as LoanName,
b.LoanNum,
v1.IntRate,
v1.MaturityDT
from vProperty a join vLoan b on a.LoanID = b.LoanId
CROSS APPLY (select Top 1 c.IntRate as IntRate,
c.MaturityDT
from vNoteDetail c where c.LoanID = b.LoanID Order By c.PeriodDT DESC) AS V1
try this..
;WITH cte
AS (SELECT Row_number() OVER(partition BY clientid ORDER BY perioddt DESC) rn,
intrate,
perioddt MaturityDT,
loanid
FROM vnotedetail)
SELECT a.NAME AS PropertyName,
a.propertynum AS PropertyNum,
a.city AS City,
a.state AS State,
b.NAME AS LoanName,
b.loannum,
c.intrate,
c.maturitydt,
FROM vproperty a
JOIN vloan b
ON a.loanid = b.loanid
JOIN cte c
ON c.loanid = b.loanid
WHERE c.rn = 1
FYI restrictions of subquery CHECK THIS