SQL ZOO Window LAG #8 - sql

Question: For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
Here is a few sample data of the covid table.
What I write:
SELECT name,date,MAX(confirmed-lag) AS PeakNew
FROM(
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) lag
FROM covid
ORDER BY confirmed
) temp
GROUP BY name
HAVING PeakNew>=1000
ORDER BY PeakNew DESC;
The result I got is weird, PeakNew seems correct, but the related date is not.
My answer
The right answer
Anyone can help to get the right answer? Thank you!

The below query works perfectly fine for me. Though the dates and values are correct, the output will say otherwise as the order is different. Here the order is by date, then by name.
SELECT z1.name, DATE_FORMAT(c.dt,'%Y-%m-%d'), z1.nc
FROM
(
SELECT z.name, MAX(z.nc) AS 'mx'
FROM (
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid ) z
WHERE z.nc >= 1000
GROUP BY z.name
) z1
INNER JOIN
(
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid
) c
ON c.nc = z1.mx
AND c.name = z1.name
ORDER BY 2 ASC

The date value in the outer query doesn't correspond to row where MAX(confirmed-lag) is found - it's just a random date value within that group. Check out the section titled, "The ONLY_FULL_GROUP_BY Issue" in this blog post: https://www.percona.com/blog/2019/05/13/solve-query-failures-regarding-only_full_group_by-sql-mode/ for more information.
I used the ROW_NUMBER() function to get the entire row corresponding to the maximum new cases. However, my final result wasn't ordered the way the answer was, and there's no specification to how it should be ordered, so I still didn't get that satisfying happy emoji.

You need to self join to obtain the date on which the max count occurred:
WITH CTE1 as
(SELECT name,DATE_FORMAT(whn, "%Y-%m-%d") as date,
confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY DATE(whn)) as increase
FROM covid
ORDER BY whn),
CTE2 AS
(SELECT name, MAX(increase) as max_increase
FROM CTE1
WHERE increase >999
GROUP BY name
ORDER BY date)
SELECT c1.name,c1.date,c2.max_increase as peakNewCases
FROM CTE1 as c1
JOIN CTE2 as c2
ON c1.name=c2.name AND c1.increase=c2.max_increase

WITH CTE1 as
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') as date_form, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY whn) AS newcases
FROM covid
ORDER BY name,whn)
SELECT name, date_form, newcases FROM
(
SELECT name, date_form, newcases, ROW_NUMBER() OVER (PARTITION BY name ORDER BY newcases DESC) as rank
FROM CTE1
WHERE newcases > 999
) cte2
WHERE rank =1

Related

SQL Find the minimum date based on consecutive values

I'm having trouble constructing a query that can find consecutive values meeting a condition. Example data below, note that Date is sorted DESC and is grouped by ID.
To be selected, for each ID, the most recent RESULT must be 'Fail', and what I need back is the earliest date in that run of 'Fails'. For ID==1, only the 1st two values are of interest (the last doesn't count due to prior 'Complete'. ID==2 doesn't count at all, failing the first condition, and for ID==3, only the first value matters.
A result table might be:
The trick seems to be doing some type of run-length encoding, but even with several attempts manipulating ROW_NUM and an attempt at the tabibitosan method for grouping consecutive values, I've been unable to gain traction.
Any help would be appreciated.
If your database supports window functions, you can do
select id, case when result='Fail' then earliest_fail_date end earliest_fail_date
from (
select t.*
,row_number() over(partition by id order by dt desc) rn
,min(case when result = 'Fail' then dt end) over(partition by id) earliest_fail_date
from tablename t
) x
where rn=1
Use row_number to get the latest row in the table. min() over() to get the earliest fail date for each id. If the first row has status Fail, you select the earliest_fail_date or else it would be null.
It should be noted that the expected result for id=1 is wrong. It should be 2016-09-20 as it is the earliest fail date.
Edit: Having re-read the question, i think this is what you might be looking for. Getting the minimum Fail date from the latest consecutive groups of Fail rows.
with grps as (
select t.*,row_number() over(partition by id order by dt desc) rn
,row_number() over(partition by id order by dt)-row_number() over(partition by id,result order by dt) grp
from tablename t
)
,maxfailgrp as (
select g.*,
max(case when result = 'Fail' then grp end) over(partition by id) maxgrp
from grps g
)
select id,
case when result = 'Fail' then (select min(dt) from maxfailgrp where id = m.id and grp=m.maxgrp) end earliest_fail_date
from maxfailgrp m
where rn=1
Sample Demo

SQL Server Group By with Max on Date field

I hope i can explain the issue i'm having and hopefully so can point me in the same direction.
I'm trying to do a group by (Email Address) on a subset of data, then i'm using a max() on a date field but because of different values in other fields its bring back more rows then require.
I would just like to return the max record per email address and return the fields that are on the same row that are on the max record.
Not sure how i can write this query?
This is a task for ROW_NUMBER:
select *
from
(
select t.*,
-- assign sequential number starting with 1 for the maximum date
row_number() over (partiton by email_address order by datecol desc) as rn
from tab
) as dt
where rn = 1 -- only return the latest row
You can write this query using row_number():
select t.*
from (select t.*,
row_number() over (partition by emailaddress order by date desc) as seqnum
from t
) t
where seqnum = 1;
How about something like this?
select a.*
from baseTable as a
inner join
(select Email,
Max(EmailDate) as EmailDate
from baseTable
group by Email) as b
on a.Email = b.Email
and a.EmailDate = b.EmailDate

Need help to find the middle row using Row_Number

SELECT median.spaid
,median.total
,ROW_NUMBER() OVER (
ORDER BY median.total
) AS row
FROM (
SELECT SpaID
,COUNT(1) AS Total
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID
) AS median
ORDER BY median.total
My issue here is that I need to find the middle row for column "Total" using Row_number. I need to find which "SpaID" is linked to the middle row of the "Total" column.
This is a shot in the dark based on very sparse details but I think you are looking for something like this.
with numberedResults as
(
select spaid
, ROW_NUMBER() over(order by count(*)) as RowNum
from [order]
where DateCreated between '20140401' AND '20140630'
group by SpaID
)
, Medians as
(
select MAX(RowNum) / 2 as Median
, MAX(RowNum) as TotalCount
from numberedResults
)
select *
from numberedResults r
join Medians m on m.Median = r.RowNum
I would suggest not relying on ROW_NUMBER in your query as results using ROW_NUMBER can at times be unpredictable. I understand it seems bulky - -the challenge is the "median" is the middle of grouped rows. Here's the query I believe should work for you:
SELECT SpaID, d FROM
(SELECT SpaID,
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID)
WHERE D=
(SELECT ROUND(MAX(D)/2,0)
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014')
Here is one method of finding the median:
SELECT o.*
FROM (SELECT SpaID, COUNT(*) AS Total,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum,
COUNT(*) OVER () as cnt
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '2014-04-01' AND '2014-04-30'
GROUP BY SpaID
) o
WHERE 2*o.seqnum IN (cnt - 1, cnt);
This is approximate when you have an even number of rows. You are looking for the exact row id, so you have to choose either the one before or after the median (which is between two rows).
Note: You should expression date constants using the ISO standard formats, either YYYYMMDD or YYYY-MM-DD. The first is the safest way in SQL Server (although I personally prefer the hyphens for readability).

SQL SERVER QUERY to select max value record per item

This is the sample table
What I need to achieve is to get or display only the record of tenant with the highest month value. If ever month is equal, I need to base on the latest date value. Here is the sample desired output
With this, I started by this code using max function and incorporated temp table, but unable to get the desired result.
select tenant, name, date, month
into #sample
from tenant
select *
from #sample
where months = (select max(months)from #sample)
and output to something like this. As I believe, the code is getting the max value in the whole list not considering per tenant filtering.
Any help will be greatly appreciated :)
This can be done with the row_number window function:
select tenant, name, date, months
from (select t.*,
row_number() over (partition by t.tenant, t.name order by t.months desc, t.date desc) as rn
from TableName t) x
where rn = 1
You can use a row_number function.
Query
;with cte as
(
select rn = row_number() over
(
partition by tenant
order by months desc,[date] desc
),*
from table_name
)
select tenant,name,[date],months from cte
where rn = 1;

Select Record with Maximum Creation Date

Let us say that I have a database table with the following two records:
CACHE_ID BUSINESS_DATE CREATED_DATE
1183 13-09-06 13-09-19 16:38:59.336000000
1169 13-09-06 13-09-24 17:19:05.762000000
1152 13-09-06 13-09-17 14:18:59.336000000
1173 13-09-05 13-09-19 15:48:59.136000000
1139 13-09-05 13-09-24 12:59:05.263000000
1152 13-09-05 13-09-27 13:28:59.332000000
I need to write a query that will return the CACHE_ID for the record which has the most recent CREATED_DATE.
I am having trouble crafting such a query. I can do a GROUP BY based on BUSINESS_DATE and get the MAX(CREATED_DATE)...of course, I won't have the CACHE_ID of the record.
Could someone help with this?
Not positive on oracle syntax, but use the ROW_NUMBER() function:
SELECT BUSINESS_DATE, CACHE_ID
FROM (SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY BUSINESS_DATE ORDER BY CREATED_DATE DESC) RN
FROM YourTable t
)sub
WHERE RN = 1
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group,  ie: if you PARTITION BY BUSINESS_DATE  then for each unique BUSINESS_DATE value the numbering would start over at 1.  ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
You want to group on business date, and get the CACHE_ID with the most current created date? Use something like this:
select yt.CACHE_ID, yt.BUSINESS_DATE, yt.CREATED_DATE
from YourTable yt
where yt.CREATED_DATE = (select max(yt1.CREATED_DATE)
from YourTable yt1
where yt1.BUSINESS_DATE = yt.BUSINESS_DATE)
Not sure of the exact syntax, but conceptually, can't you just sort by CREATED_DATE descending and take the first one?
Across all records -
select top 1 CACHE_ID from YourTable order by CREATED_DATE desc
For each BUSINESS_DATE -
select distinct
a.BUSINESS_DATE,
(
select top 1 b.CACHE_ID
from YourTable b where a.BUSINESS_DATE = b.BUSINESS_DATE
order by b.CREATED_DATE desc
) as Last_CREATED_DATE
from YourTable a