redshift: how to find row_number after grouping and aggregating? - sql

Suppose I have a table of customer purchases ("my_table") like this:
--------------------------------------
customerid | date_of_purchase | price
-----------|------------------|-------
1 | 2019-09-20 | 20.23
2 | 2019-09-21 | 1.99
1 | 2019-09-21 | 123.34
...
I'd like to be able to find the nth highest spending customer in this table (say n = 5). So I tried this:
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (partition by customerid order by total_pay desc) as rn
from my_table group by customerid order by total_pay desc)
select * from cte where rn = 5;
But this gives me nonsense results. For some reason rn doesn't seem to be unique (for example there are a bunch of customers with rn = 1). I don't understand why. Isn't rn supposed to be just a row number?

Remove the partition by in the definition of row_number():
with cte as (
select customerid, sum(price) as total_pay,
row_number() over (order by total_pay desc) as rn
from my_table
group by customerid
)
select *
from cte
where rn = 5;
You are already aggregating by customerid, so each customer has only one row. So the value of rn will always be 1.

Related

How do I find the Sum and Max value per Unique ID in HIVE?

basically how do I turn
id name quantity
1 Jerry 1
1 Jerry 2
1 Nana 1
2 Max 4
2 Lenny 3
into
id name quantity
1 Jerry 3
2 Max 4
in HIVE?
I want to sum up and find the highest quantity for each unique ID
You can use window functions with aggregation:
select id, name, quantity
from (select id, name, sum(quantity) as quantity,
row_number() over (partition by id order by sum(quantity) desc) as seqnum
from t
group by id, name
) t
where seqnum = 1;
You can first calculate the sum of quantity per group, then rank them according to descending quantity, and finally filter the rows with rank = 1.
select
id, name, quantity
from (
select
*,
row_number() over (partition by id order by quantity desc) as rn
from (
select id, name, sum(quantity) as quantity
from mytable
group by id, name
)
) where rn = 1;
try like below
with cte as
(
select id,name,sum(quantity) as q
from table_name group by id,name
) select id,name,q from cte t1
where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

Distinct particular field in select query

I have table with below sample values.
|Id|Keyword|insertedon|
|:-|:------|:---------|
|1 | abcd | 13/12/20 |
|2 | cdef | 14/12/20 |
|3 | abcd | 14/12/20 |
|4 | defg | 14/12/20 |
In the above table i need distinct values of keywords order by insertedon desc order.
I need recent top 5 results.
Expected Result:
defc
abcd
cdef
Please let me know how to achieve this.
You get the top 5 results with TOP(5) in SQL Server. You'd order the keywords by their last insertedon date:
select top(5) keyword
from mytable
group by keyword
order by max(insertedon) desc;
If you are looking for latest entries based on insertedon column, you can find using the group by clause, something like this:
select keyword, max(insertedon)
from table
group by keyword
order by 2 desc
You can just use select distinct:
select distinct keyword
from t;
If you wanted a full row, you could use row_number():
select t.*
from (select t.*,
row_number() over (partition by keyword order by newid()) as seqnum
from t
) t
where seqnum = 1;
EDIT:
For the edited version, you can use:
select distinct keyword
from (select top (5) keyword
from t
order by insertedon desc
) k
Give a row number based on the descending order of the date column and then select the row wth row number 1.
Query
;with cte as(
select [rn] = row_number() over(
partition by [keyword]
order by [insertedon] desc, [id] desc
)
)
select [keyword] from cte
where [rn] = 1;
You can use the analytical functions as follows:
select t.* from
(select t.*,
row_number() over (partition by keyword order by insertedon desc) as rn,
Dense_rank() over (order by insertedon desc) as dr
from t ) t where rn = 1 and dr <= 5;

Return the highest SUM value of all donors by designations

I have the following script:
SELECT DISTINCT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT)
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
It will return something like this:
GIFT_ID GIFT_DESG SUM(GIFT_AMT)
1 A 25
1 B 500
1 C 75
2 A 100
2 B 200
2 C 300
...
My desired outcome is:
GIFT_ID GIFT_DESG SUM(GIFT_AMT)
1 B 500
2 C 300
How would I do that?
Possibly row_number() right? I think it's something with the summing of gift amounts by designation that is throwing me off.
Thank you.
if your DBMS support ROW_NUMBER window function you can try to make row number by GIFT_ID order by SUM(GIFT_AMT) then get rn = 1 row.
SELECT t1.GIFT_ID,t1.GIFT_DESG,t1.GIFT_AMT
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY GIFT_ID ORDER BY GIFT_AMT DESC) rn
FROM (
SELECT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT) GIFT_AMT
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
) t1
) t1
where rn =1
Note
You already use GROUP BY the DISTINCT keyword is no sense, you can remove it from your query.
Here is a sample
CREATE TABLE T(
GIFT_ID int,
GIFT_DESG varchar(5),
GIFT_AMT int
);
insert into t values (1,'A' ,25);
insert into t values (1,'B' ,500);
insert into t values (1,'C' ,75);
insert into t values (2,'A' ,100);
insert into t values (2,'B' ,200);
insert into t values (2,'C' ,300);
Query 1:
SELECT t1.GIFT_ID,t1.GIFT_DESG,t1.GIFT_AMT
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY GIFT_ID ORDER BY GIFT_AMT DESC) rn
FROM T t1
) t1
where rn =1
Results:
| GIFT_ID | GIFT_DESG | GIFT_AMT |
|---------|-----------|----------|
| 1 | B | 500 |
| 2 | C | 300 |
You can do this with no subquery:
SELECT TOP (1) WITH TIES GIFT_ID, GIFT_DESG, SUM(GIFT_AMT)
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG
ORDER BY ROW_NUMBER() OVER (PARTITION BY GIFT_ID ORDER BY SUM(GIFT_AMT) DESC);
You can do it also like this
WITH t as
SELECT GIFT_ID, GIFT_DESG, SUM(GIFT_AMT) AS GIFT_AMT
FROM GIFT_TABLE
GROUP BY GIFT_ID, GIFT_DESG)
SELECT GIFT_ID,
max(GIFT_DESG) KEEP (DENSE_RANK LAST ORDER BY GIFT_AMT),
max(GIFT_AMT) GIFT_AMT
FROM T
GROUP BY GIFT_ID;

Min Date from one column multiple rows

My apologies, I should have added every column and complete problem not just portion.
I have a table A which stores all invoices issued(id 1) payments received (id 4) from clients. Sometimes client pay in 2-3 installments. I want to find dateifference between invoice issued and last payment collected for the invoice. My data looks like this
**a.cltid**|**A.Invnum**|A.Cash|A.Date | a.type| a.status
70 |112 |-200 |2012-03-01|4 |P
70 |112 |-500 |2012-03-12|4 |P
90 |124 |-550 |2012-01-20|4 |P
70 |112 |700 |2012-02-20|1 |p
55 |101 |50 |2012-01-15|1 |d
90 |124 |550 |2012-01-15|1 |P
I am running
Select *, Datediff(dd,T.date,P.date)
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=1
group by a.cltid, a.invnumber,a.cash)T
join
Select *
from (select a.cltid, a.invnumber,a.cash, min(a.date)date
from table.A as A
where a.status<>'d' and a.type=4
group by a.cltid, a.invnumber,a.cash)P
on
T.invnumb=P.invnumber and T.cltid=P.cltid
How can I make it work? So it shows me
70|112|-500|2012-03-12|4|P 70|112|700|2012-02-20|1|p|22
90|124|-550|2012-01-20|4|P 90|124|550|2012-01-15|1|P|5
Edited***
You can use row_number to assign sequence number within each cltid in the order of decreasing date and then filter to get the first row for each cltid which will be the row with latest date for that cltid:
select *
from (
select A.*,
row_number() over (
partition by a.cltid order by a.date desc
) rn
from table.A as A
) t
where rn = 1;
It will return one row (with latest date) for each client. If you want to return all the rows which have latest date, use rank() instead.
Use a ranking function to get all the columns:
select a.*
from (select a.*,
row_number() over (partition by cltid order by date desc) as seqnum
from a
) a
where seqnum = 1;
Use aggregation if you only want the date. The issue with your query is that the group by clause has too many columns:
select a.cltid, max(a.date) as date
from table.A as A
group by a.cltid;
And the fact that min() returns the first date not the last date.
There are many ways to do this. Here are some of them:
test setup: http://rextester.com/VGUY60367
with common_table_expression as () using row_number()
with cte as (
select *
, rn = row_number() over (
partition by cltid, Invnum
order by [date] desc
)
from a
)
select cltid, Invnum, Cash, [date]
from cte
where rn = 1
cross apply version:
select distinct
a.cltid
, a.Invnum
, x.Cash
, x.[date]
from a
cross apply (
select top 1
cltid, Invnum
, [date]
, Cash
from a as i
where i.cltid =a.cltid
and i.Invnum=a.Invnum
order by i.[date] desc
) as x;
top with ties version:
select top 1 with ties
*
from a
order by
row_number() over (
partition by cltid, Invnum
order by [date] desc
)
all return:
+-------+--------+---------------------+------+
| cltid | Invnum | date | Cash |
+-------+--------+---------------------+------+
| 70 | 112 | 12.03.2012 00:00:00 | -500 |
| 90 | 124 | 20.01.2012 00:00:00 | -550 |
+-------+--------+---------------------+------+
You can achieve the desired o/p by this:
Select
a.cltid, a.invnumber,a.cash, max(a.date) [date]
from
YourTable a
group by
a.cltid, a.invnumber, a.cash, a.date

Comparing row values in oracle

I have Table1 with three columns:
Key | Date | Price
----------------------
1 | 26-May | 2
1 | 25-May | 2
1 | 24-May | 2
1 | 23 May | 3
1 | 22 May | 4
2 | 26-May | 2
2 | 25-May | 2
2 | 24-May | 2
2 | 23 May | 3
2 | 22 May | 4
I want to select the row where value 2 was last updated (24-May). The Date was sorted using RANK function.
I am not able to get the desired results. Any help will be appreciated.
SELECT *
FROM (SELECT key, DATE, price,
RANK() over (partition BY key order by DATE DESC) AS r2
FROM Table1 ORDER BY DATE DESC) temp;
Another way of looking at the problem is that you want to find the most recent record with a price different from the last price. Then you want the next record.
with lastprice as (
select t.*
from (select t.*
from table1 t
order by date desc
) t
where rownum = 1
)
select t.*
from (select t.*
from table1 t
where date > (select max(date)
from table1 t2
where t2.price <> (select price from lastprice)
)
order by date asc
) t
where rownum = 1;
This query looks complicated. But, it is structured so it can take advantage of indexes on table1(date). The subqueries are necessary in Oracle pre-12. In the most recent version, you can use fetch first 1 row only.
EDIT:
Another solution is to use lag() and find the most recent time when the value changed:
select t1.*
from (select t1.*
from (select t1.*,
lag(price) over (order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where rownum = 1;
Under many circumstances, I would expect the first version to have better performance, because the only heavy work is done in the innermost subquery to get the max(date). This verson has to calculate the lag() as well as doing the order by. However, if performance is an issue, you should test on your data in your environment.
EDIT II:
My best guess is that you want this per key. Your original question says nothing about key, but:
select t1.*
from (select t1.*,
row_number() over (partition by key order by date desc) as seqnum
from (select t1.*,
lag(price) over (partition by key order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where seqnum = 1;
You can try this:-
SELECT Date FROM Table1
WHERE Price = 2
AND PrimaryKey = (SELECT MAX(PrimaryKey) FROM Table1
WHERE Price = 2)
This is very similar to the second option by Gordon Linoff but introduces a second windowed function row_number() to locate the most recent row that changed the price. This will work for all or a range of keys.
select
*
from (
select
*
, row_number() over(partition by Key order by [date] DESC) rn
from (
select
*
, NVL(lag(Price) over(partition by Key order by [date] DESC),0) prevPrice
from table1
where Key IN (1,2,3,4,5) -- as an example
)
where Price <> prevPrice
)
where rn = 1
apologies but I haven't been able to test this at all.