Need help to find the middle row using Row_Number - sql

SELECT median.spaid
,median.total
,ROW_NUMBER() OVER (
ORDER BY median.total
) AS row
FROM (
SELECT SpaID
,COUNT(1) AS Total
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID
) AS median
ORDER BY median.total
My issue here is that I need to find the middle row for column "Total" using Row_number. I need to find which "SpaID" is linked to the middle row of the "Total" column.

This is a shot in the dark based on very sparse details but I think you are looking for something like this.
with numberedResults as
(
select spaid
, ROW_NUMBER() over(order by count(*)) as RowNum
from [order]
where DateCreated between '20140401' AND '20140630'
group by SpaID
)
, Medians as
(
select MAX(RowNum) / 2 as Median
, MAX(RowNum) as TotalCount
from numberedResults
)
select *
from numberedResults r
join Medians m on m.Median = r.RowNum

I would suggest not relying on ROW_NUMBER in your query as results using ROW_NUMBER can at times be unpredictable. I understand it seems bulky - -the challenge is the "median" is the middle of grouped rows. Here's the query I believe should work for you:
SELECT SpaID, d FROM
(SELECT SpaID,
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014'
GROUP BY SpaID)
WHERE D=
(SELECT ROUND(MAX(D)/2,0)
DENSE_RANK() OVER (ORDER BY COUNT(1)) AS d
FROM dbo.[Order]
WHERE DateCreated BETWEEN '04-01-2014'
AND '04-30-2014')

Here is one method of finding the median:
SELECT o.*
FROM (SELECT SpaID, COUNT(*) AS Total,
ROW_NUMBER() OVER (ORDER BY COUNT(*)) as seqnum,
COUNT(*) OVER () as cnt
FROM dbo.[Order](NOLOCK)
WHERE DateCreated BETWEEN '2014-04-01' AND '2014-04-30'
GROUP BY SpaID
) o
WHERE 2*o.seqnum IN (cnt - 1, cnt);
This is approximate when you have an even number of rows. You are looking for the exact row id, so you have to choose either the one before or after the median (which is between two rows).
Note: You should expression date constants using the ISO standard formats, either YYYYMMDD or YYYY-MM-DD. The first is the safest way in SQL Server (although I personally prefer the hyphens for readability).

Related

Second minimum value for every customer

I am using MySQL database. So, there are two columns I am working on, CustomerId, and OrderDate. I want to find a second-order date (2nd minimum order date) for each customer.
If you are using MySQL 8+, then ROW_NUMBER can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate) rn
FROM yourTable
)
SELECT CustomerId, OrderDate
FROM cte
WHERE rn = 2;
I would recommend using dense_rank as it can give you correct result even if there is duplicate order_date as follows:
SELECT * FROM
(SELECT t.*, DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY OrderDate) dr
FROM yourTable t
) t where dr = 2;
You can use corelated sub-query as follows if your MySQL version do not support analytical functions as follows:
SELECT T.*
FROM YOURTABLE T
WHERE 1 = (
SELECT COUNT(DISTINCT ORDER_DATE)
FROM YOURTABLE TT
WHERE TT.ORDER_DATE > T.ORDER_DATE
)
I would use a subquery like this:
select o.*
from orders o
where o.order_date = (select o2.order_date
from orders o2
where o2.customer_id = o.customer_id
order by o2.order_date
limit 1 offset 1
);
The subquery is a correlated subquery that returns the second date. If you want the second date with other columns, it can be moved to the select.
With an index on (customer_id, order_date), this is likely to be the fastest solution.
This assumes that there is one row per date (or that if there are multiple rows, "second" can be the earliest date). If you want the second distinct date then use select distinct int he subquery -- however select distinct and group by would incur additional overhead.

SQL ZOO Window LAG #8

Question: For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
Here is a few sample data of the covid table.
What I write:
SELECT name,date,MAX(confirmed-lag) AS PeakNew
FROM(
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) lag
FROM covid
ORDER BY confirmed
) temp
GROUP BY name
HAVING PeakNew>=1000
ORDER BY PeakNew DESC;
The result I got is weird, PeakNew seems correct, but the related date is not.
My answer
The right answer
Anyone can help to get the right answer? Thank you!
The below query works perfectly fine for me. Though the dates and values are correct, the output will say otherwise as the order is different. Here the order is by date, then by name.
SELECT z1.name, DATE_FORMAT(c.dt,'%Y-%m-%d'), z1.nc
FROM
(
SELECT z.name, MAX(z.nc) AS 'mx'
FROM (
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid ) z
WHERE z.nc >= 1000
GROUP BY z.name
) z1
INNER JOIN
(
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid
) c
ON c.nc = z1.mx
AND c.name = z1.name
ORDER BY 2 ASC
The date value in the outer query doesn't correspond to row where MAX(confirmed-lag) is found - it's just a random date value within that group. Check out the section titled, "The ONLY_FULL_GROUP_BY Issue" in this blog post: https://www.percona.com/blog/2019/05/13/solve-query-failures-regarding-only_full_group_by-sql-mode/ for more information.
I used the ROW_NUMBER() function to get the entire row corresponding to the maximum new cases. However, my final result wasn't ordered the way the answer was, and there's no specification to how it should be ordered, so I still didn't get that satisfying happy emoji.
You need to self join to obtain the date on which the max count occurred:
WITH CTE1 as
(SELECT name,DATE_FORMAT(whn, "%Y-%m-%d") as date,
confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY DATE(whn)) as increase
FROM covid
ORDER BY whn),
CTE2 AS
(SELECT name, MAX(increase) as max_increase
FROM CTE1
WHERE increase >999
GROUP BY name
ORDER BY date)
SELECT c1.name,c1.date,c2.max_increase as peakNewCases
FROM CTE1 as c1
JOIN CTE2 as c2
ON c1.name=c2.name AND c1.increase=c2.max_increase
WITH CTE1 as
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') as date_form, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY whn) AS newcases
FROM covid
ORDER BY name,whn)
SELECT name, date_form, newcases FROM
(
SELECT name, date_form, newcases, ROW_NUMBER() OVER (PARTITION BY name ORDER BY newcases DESC) as rank
FROM CTE1
WHERE newcases > 999
) cte2
WHERE rank =1

SQL Finding five largest numbers instead of one Max in a table

I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.
Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.
SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;

How to write a derived query in Netezza SQL?

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A
I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;

Query to find the FIRST AND SECOND largest value from a group

i have a query like this:
SELECT
DATEPART(year,some_date),
DATEPART(month,some_date),
MAX(some_value) max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
This returns a table with: year, month, the largest value for the month.
I would like to modify the query so that i could obtain:
year, month, the largest value for the month, the second largest value for the month in each row.
It seems to me that the well-known solutions like "TOP 2", "NOT IN TOP 1" or a subselect won't work here.
(To be really specific - i am using SQL Server 2008.)
It seems to me that the question calls for a query that would return best, and second best in the same row for each month and year, like so:
month, year, best, second best
...
...
and not two rows for the same month and year containing best and second best value.
This is the solution that I came up with, so if anyone has a simpler way of achieving this, I would like to know.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
t1.views as [best],
t2.views as [second best]
from ranks t1
inner join ranks t2
on t1.year = t2.year
and t1.month = t2.month
and t1.rank = 1
and t2.rank = 2
EDIT: Just out of curiosity I did a bit more testing and ended up with a simpler variation on the Stephanie Page's answer that doesn't use an aditional subquery. And I changed the rank() function to row_number() as it doesn't work when two max values are the same.
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
row_number() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
select
t1.year,
t1.month,
max(case when t1.rank = 1 then t1.views else 0 end) as [best],
max(case when t1.rank = 2 then t1.views else 0 end) as [second best]
from
ranks t1
where
t1.rank in (1,2)
group by
t1.year, t1.month
RANK() is maybe the thing you are looking for...
http://msdn.microsoft.com/en-us/library/ms176102.aspx
to do this without joins ( I'll show the Oracle... you'll just use CASE instead of DECODES)
with ranks as (
select
year(entrydate) as [year],
month(entrydate) as [month],
views,
rank() over (partition by year(entrydate), month(entrydate) order by views desc) as [rank]
from product
)
SELECT [year], [month], Max([best]), Max([second best])
FROM
( select
t1.year,
t1.month,
Decode([rank],1,t1.views,0) as [best],
Decode([rank],2,t1.views,0) as [second best]
from ranks t1
where t1.rank <= 2 ) x
GROUP BY [year], [month]
This is a bit old-school but TOP and a subquery will work if you use ORDER BY. Try this:
SELECT TOP 2
DATEPART(year,some_date),
DATEPART(month,some_date),
(SELECT MAX(st1.some_value) FROM some_table AS st1
WHERE DATEPART(month,some_date) = DATEPART(month,st1.some_date)) AS max_value
FROM
some_table
GROUP BY
DATEPART(year,some_date),
DATEPART(month,some_date)
ORDER BY DATEPART(month,some_date) DESC
That will give you the two rows with the "highest" month values and the added subselect should give you the max from each grouping.
You can use a CTE with the ranking functions in SQL Server 2005 and up:
;WITH TopValues AS
(
SELECT
YEAR(some_date) AS 'Year',
MONTH(some_date) AS 'Month',
Some_Value,
ROW_NUMBER() OVER(PARTITION BY YEAR(some_date),MONTH(some_date)
ORDER BY Some_Value DESC) AS 'RowNumber'
FROM
dbo.some_table
)
SELECT
Year, Month, Some_Value
FROM
TopValues
WHERE
RowNumber <= 2
This will "partition" (i.e. group) your data by month/year, order inside each group by Some_Value descending (largest first), and then you can select the first two of each group from that CTE.
RANK() works as well (I most often use ROW_NUMBER) - it produces slightly different results, though - really depends on what your needs are.
Hmmm it's kind of a rig, but you can do this with subqueries... instead of using that max I'd select the some_values which have the matching year & month, row_number()=1 / row_number() = 2 respectively and order by some_value DESC.
The inability to use OFFSET / LIMIT like you can in SQLite is one of my dislikes about SQL Server.