SQL get rank of ordered data - sql

I have a data set that looks like this (ordered by date):
date
value
first_id
second_id
2020-01-01
10
1
1
2020-01-02
15
1
1
2020-01-03
5
1
2
2020-01-04
75
2
2
2020-01-05
101
2
2
2020-01-06
12
1
1
2020-01-07
5
1
1
2020-01-08
14
1
2
I need to get an aggregation when values are the same for the same first_id and second_id in a sequence, lets say max(value), so I can get:
max_value
first_id
second_id
15
1
1
5
1
2
101
2
2
12
1
1
14
1
2
If you do max(value) and group by, same first_id and second_id combinations will give just one row (regardless of date ordering).
I was thinking to add RANK when one of ids changes, e.g:
date
value
first_id
second_id
rank
2020-01-01
10
1
1
1
2020-01-02
15
1
1
1
2020-01-03
5
1
2
2
2020-01-04
75
2
2
3
2020-01-05
101
2
2
3
2020-01-06
12
1
1
4
2020-01-07
5
1
1
4
2020-01-08
14
1
2
5
But I don't know how to get that rank as well since same id combinations are considered together.

You can use lag() and a cumulative sum to define the groups and then aggregate. You can see the groups if you run this query:
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t;
The logic is saying that a new group starts when the previous date does not have the same values of the two id columns.
Then the aggregation is:
with grps as (
select t.*,
sum(case when prev_date = prev_date2 then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (order by date) as prev_date,
lag(date) over (partition by first_id, second_id order by date) as prev_date2
from t
) t
)
select first_id, second_id, max(value), min(date), max(date)
from grps
group by grp

Related

Finding Latest First x among consecutive x from table

I am trying to write a query to find first latest 1's from each group as below. For example, for Group 1, It shouldn't be 1/2/2022 since it has 1/6/2022 which was shown later. Shouldn't be 1/7/2022 too for Group 1.
Please let me know if you have any idea.
Thanks!
Table x (AsOfDate, Group_Id, Value)
AsOfDate Group_Id Value
1/1/2022 1 0
1/1/2022 2 1
1/2/2022 1 1
1/2/2022 2 1
1/3/2022 1 0
1/3/2022 2 0
1/4/2022 1 0
1/4/2022 2 0
1/5/2022 1 0
1/5/2022 2 1
1/6/2022 1 1
1/6/2022 2 0
1/7/2022 1 1
1/7/2022 2 0
Output
AsOfDate Group_Id
1/6/2022 1
1/5/2022 2
What you want is find the earliest date of the last group for continuous row with Value = 1
Use LAG() window function to find the continuous group of Value
use dense_rank() to rank it by grp find the latest group (r = 1)
min() to get the "first" AsOfDate
select AsOfDate = min(AsOfDate),
Group_Id
from
(
select *, r = dense_rank() over (partition by Group_Id, Value
order by grp desc)
from
(
select *, grp = sum(g) over (partition by Group_Id order by AsOfDate)
from
(
select *, g = case when Value <> lag(Value) over (partition by Group_Id
order by AsOfDate)
then 1
else 0
end
from x
) x
) x
) x
where Value = 1
and r = 1
group by Group_Id

How to add the first value of a column in SQL over another grouping variable

I have a database with a table of the following kind:
customer_id
customer_type
customer_state
state_date
1
A
0
2020-01-01
1
A
1
2020-01-05
1
B
2
2020-01-06
2
X
0
2019-02-07
2
Y
0
2019-02-07
2
X
0
2019-02-07
The columns customer state and state_date represent the evolution over time of the current state of the customer, while customer_id is clearly an unique identifier for the customer.
I am interested in obtaining a table (using an SQL query) with an additional column first_type that tells for each customer the first state it had, like in this example:
customer_id
customer_type
customer_state
state_date
first_type
1
A
0
2020-01-01
A
1
A
1
2020-01-05
A
1
B
2
2020-01-06
A
2
X
0
2019-02-07
X
2
Y
0
2019-02-07
X
2
X
0
2019-02-07
X
Is it possible to do it in SQL? I've tried with a self-join, but it's complicated for me to understand how to pick the first row, or generally the n-th row, over each customer.
Specifically, I'm using Teradata SQL, if some specific functions can be used for this task.
One possibility, using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY state_date) rn
FROM yourTable
)
SELECT customer_id, customer_type, customer_state, state_date,
MAX(CASE WHEN rn = 1 THEN customer_type END) OVER
(PARTITION BY customer_id) first_type
FROM cte;
There's a function in Teradata/Standard SQL:
SELECT
t.*
,FIRST_VALUE(customer_type)
OVER (PARTITION BY customer_id
ORDER BY state_date) AS first_type
FROM mytable AS t

How can I get the minimum date based on a condition in Resdhift?

Let's say you have the following dataset:
id date_col boolean_col
1 2020-01-01 0
1 2020-01-05 1
1 2020-02-01 0
1 2020-03-01 1
2 2020-01-01 0
2 2020-05-01 0
3 2020-01-01 0
3 2020-03-05 1
My final output should be grouped, one row per id. The way I want to group is: if the boolean column is true, I want to bring the minimum - or maximum, I would like to test both , if possible - date for the id. If all boolean columns for the id are false, then I wanted to get the highest date. The desired output would be something like this:
id date_col boolean_col
1 2020-01-05 1
2 2020-05-01 0
3 2020-03-05 1
Any ideas on how to get this? I'm really struggling to find a way
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by boolean_col desc, date desc) as seqnum
from t
) t
where seqnum = 1;
There are two other fun methods. One is aggregation with some cleverness:
select id,
coalesce(max(case when boolean_col = 1 then date end),
max(date)
) as date,
max(boolean_col)
from t
group by id;
The other treats this as a prioritization and uses union all:
select id, max(date), boolean_col
from t
where boolean_col = 1
group by id
union all
select id, max(date), max(boolean_col)
from t
group by id
having max(boolean_col) = 0;

Is there a way to get first row of a group in postgres based on Max(date)

Input :
id name value1 value2 date
1 A 1 1 2019-01-01
1 A 2 2 2019-02-15
1 A 3 3 2019-01-15
1 A 1 1 2019-07-13
2 B 1 2 2019-01-01
2 B 1 3 2019-02-15
2 B 2 1 2019-07-13
3 C 2 4 2019-02-15
3 C 1 2 2019-01-01
3 C 1 9 2019-07-13
3 C 3 1 2019-02-15
Expected Output :
id name value1 value2 date
1 A 1 Avg(value2) 2019-07-13
2 B 2 Avg(value2) 2019-07-13
3 C 1 Avg(value2) 2019-07-13
You can use window functions. rank() over() can be used to identify the first record in each group, and avg() over() will give you a window average of value2 in each group:
select id, name, value1, avg_value2 value2, date
from (
select
t.*,
avg(value2) over(partition by id, name) avg_value2,
rank() over(partition by id, name order by date desc) rn
from mytable t
) t
where rn = 1
sort your data in the right way, use the window function row_number() as identifier and select the first entry of every partition.
with temp_data as
(
select
row_number() over (partition by debug.tbl_data.id order by debug.tbl_data.date desc) as index,
*,
avg(debug.tbl_data.value2)over (partition by debug.tbl_data.id) as data_avg
from debug.tbl_data
order by id asc, debug.tbl_data.date desc
)
select
*
from temp_data
where index = 1
You seem to want the most common value of value1. In statistics, this is called the "mode". You can do this as:
select id, name,
mode() within group (order by value1) as value1_mode,
avg(value2),
max(date)
from t
group by id, name;

Select and aggregate last records base on order

I have different versions of the charges in a table. I want to grab and sum the last charge grouped by Type.
So I want to add 9.87, 9.63, 1.65.
I want the Parent ID , sum(9.87 + 9.63 + 1.65) as the results of this query.
We use MSSQL
ID ORDER CHARGES TYPE PARENT ID
1 1 6.45 1 1
2 2 1.25 1 1
3 3 9.87 1 1
4 1 6.54 2 1
5 2 5.64 2 1
6 3 0.84 2 1
7 4 9.63 2 1
8 1 7.33 3 1
9 2 5.65 3 1
10 3 8.65 3 1
11 4 5.14 3 1
12 5 1.65 3 1
WITH recordsList
AS
(
SELECT Type, Charges,
ROW_NUMBER() OVER (PArtition BY TYPE
ORDER BY [ORDER] DESC) rn
FROM tableName
)
SELECT SUM(Charges) totalCharge
FROM recordsLIst
WHERE rn = 1
SQLFiddle Demo
Use row_number() to identify the rows to be summed, and then sum them:
select SUM(charges)
from (select t.*,
ROW_NUMBER() over (PARTITION by type order by id desc) as seqnum
from t
) t
where seqnum = 1
Alternatively you could use a window aggregate MAX():
SELECT SUM(Charges)
FROM (
SELECT
[ORDER],
Charges,
MaxOrder = MAX([ORDER]) OVER (PARTITION BY [TYPE])
FROM atable
) s
WHERE [ORDER] = MaxOrder
;
SELECT t.PARENT_ID, SUM(t.CHARGES)
FROM dbo.test73 t
WHERE EXISTS (
SELECT 1
FROM dbo.test73
WHERE [TYPE] = t.[TYPE]
HAVING MAX([ORDER]) = t.[ORDER]
)
GROUP BY t.PARENT_ID
Demo on SQLFiddle