Unrecognized name "rank" in Bigquery - sql

I'm writing a query to get the top ten of each area. The query return correctly the ranks, but when I try to get only the top ten, it return an unrecognized name error, even though I declare its name.
with rank as (
select RANK() OVER (PARTITION BY area ORDER BY total_orders_last30days DESC)
FROM `dataset`)
SELECT orders, name, area,
FROM `dataset` where rank<=10

You have multiple problems with your code:
You define a CTE but don't use it.
You have a column in the CTE but don't provide an alias.
You refer to the original dataset, which doesn't have that column.
This is what you want:
WITH r as (
SELECT d.*,
RANK() OVER (PARTITION BY area ORDER BY total_orders_last30days DESC) as rnk
FROM `dataset` d
)
SELECT orders, name, area,
FROM r
wHERE rnk <= 10 ;

Related

How to ask another question in one question with rank

I want to show it in the first output if the first rank equals 1 in my database
SELECT id, name,password,oldname,number,
RANK() OVER(ORDER BY oldname="mxmx" DESC),
RANK() OVER(ORDER BY number DESC)
as "Rank1"
FROM account
For example like this:
enter image description here
please guide me to reach a conclusion
If you want the first-ranked items to appear first in the results, use:
SELECT
id,
name,
password,
oldname,
number,
RANK() OVER (ORDER BY oldname="mxmx" DESC),
RANK() OVER (ORDER BY number DESC) AS "Rank1"
FROM account
ORDER BY
RANK() OVER (ORDER BY oldname="mxmx" DESC)
Alternatively, you could wrap your original query inside a SELECT * FROM (...) table_alias ORDER BY rank_alias if you prefer. (Though you should select the actual columns you need -- I write the * for brevity only)
PS. Additionally, I found the order of the RANK's themselves returns the results ordered "last-RANK-first", for MySQL 8 at least. Though I can't find a reference that would guarantee this behaviour, your query would be:
SELECT
id,
name,
password,
oldname,
number,
RANK() OVER (ORDER BY number DESC) AS "Rank1",
RANK() OVER (ORDER BY oldname="mxmx" DESC)
FROM account

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

SQL ZOO Window LAG #8

Question: For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
Here is a few sample data of the covid table.
What I write:
SELECT name,date,MAX(confirmed-lag) AS PeakNew
FROM(
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) lag
FROM covid
ORDER BY confirmed
) temp
GROUP BY name
HAVING PeakNew>=1000
ORDER BY PeakNew DESC;
The result I got is weird, PeakNew seems correct, but the related date is not.
My answer
The right answer
Anyone can help to get the right answer? Thank you!
The below query works perfectly fine for me. Though the dates and values are correct, the output will say otherwise as the order is different. Here the order is by date, then by name.
SELECT z1.name, DATE_FORMAT(c.dt,'%Y-%m-%d'), z1.nc
FROM
(
SELECT z.name, MAX(z.nc) AS 'mx'
FROM (
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid ) z
WHERE z.nc >= 1000
GROUP BY z.name
) z1
INNER JOIN
(
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid
) c
ON c.nc = z1.mx
AND c.name = z1.name
ORDER BY 2 ASC
The date value in the outer query doesn't correspond to row where MAX(confirmed-lag) is found - it's just a random date value within that group. Check out the section titled, "The ONLY_FULL_GROUP_BY Issue" in this blog post: https://www.percona.com/blog/2019/05/13/solve-query-failures-regarding-only_full_group_by-sql-mode/ for more information.
I used the ROW_NUMBER() function to get the entire row corresponding to the maximum new cases. However, my final result wasn't ordered the way the answer was, and there's no specification to how it should be ordered, so I still didn't get that satisfying happy emoji.
You need to self join to obtain the date on which the max count occurred:
WITH CTE1 as
(SELECT name,DATE_FORMAT(whn, "%Y-%m-%d") as date,
confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY DATE(whn)) as increase
FROM covid
ORDER BY whn),
CTE2 AS
(SELECT name, MAX(increase) as max_increase
FROM CTE1
WHERE increase >999
GROUP BY name
ORDER BY date)
SELECT c1.name,c1.date,c2.max_increase as peakNewCases
FROM CTE1 as c1
JOIN CTE2 as c2
ON c1.name=c2.name AND c1.increase=c2.max_increase
WITH CTE1 as
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') as date_form, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY whn) AS newcases
FROM covid
ORDER BY name,whn)
SELECT name, date_form, newcases FROM
(
SELECT name, date_form, newcases, ROW_NUMBER() OVER (PARTITION BY name ORDER BY newcases DESC) as rank
FROM CTE1
WHERE newcases > 999
) cte2
WHERE rank =1

Aggregate function like MAX for most common cell in column?

Group by the highest Number in a column worked great with MAX(), but what if I would like to get the cell that is at most common.
As example:
ID
100
250
250
300
200
250
So I would like to group by ID and instead of get the lowest (MIN) or highest (MAX) number, I would like to get the most common one (that would be 250, because there 3x).
Is there an easy way in SQL Server 2012 or am I forced to add a second SELECT where I COUNT(DISTINCT ID) and add that somehow to my first SELECT statement?
You can use dense_rank to return all the id's with the highest counts. This would handle cases when there are ties for the highest counts as well.
select id from
(select id, dense_rank() over(order by count(*) desc) as rnk from tablename group by id) t
where rnk = 1
A simple way to do what you want uses top and order by:
SELECT top 1 id
FROM t
GROUP BY id
ORDER BY COUNT(*) DESC;
This is a statistic called the mode. Getting the mode and max is a bit challenging in SQL Server. I would approach it as:
WITH cte AS (
SELECT t.id, COUNT(*) AS cnt,
row_number() OVER (ORDER BY COUNT(*) DESC) AS seqnum
FROM t
GROUP BY id
)
SELECT MAX(id) AS themax, MAX(CASE WHEN seqnum = 1 THEN id END) AS MODE
FROM cte;

How to write a derived query in Netezza SQL?

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A
I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;