Get the record with the most recent date. SQL - sql

ID
Date
Value
715
2022-03-20
183.74
715
2022-03-13
86.45
715
2022-03-06
-10.84
715
2022-02-27
291.87
715
2022-02-20
194.58
715
2022-02-13
97.29
715
2022-02-06
0.00
Is there a way to get the ID,and value for the last day of each month provided.
Output
ID
Date
Value
715:
:2022-03-20:
183.74:
715:
:2022-02-27:
291.87:
Thanks, I have already tried
SELECT T.*
FROM (
SELECT ID, value, date,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date ASC) AS rn
FROM table
) t
But doesn't quite do it.
Thanks

Adjust your current query to sort row_number descending and add a partition on the month of your date. Then, add a where clause to pull rn = 1:
SELECT T.*
FROM (
SELECT ID, value, date,
ROW_NUMBER() OVER (PARTITION BY ID, month(Date) ORDER BY Date desc) AS rn
FROM table
) t
where t.rn = 1
If you need to expand this to multiple years, you can inlcude a partition on the year as well:
SELECT T.*
FROM (
SELECT ID, value, date,
ROW_NUMBER() OVER (PARTITION BY ID, month(Date), year(Date) ORDER BY Date desc) AS rn
FROM table
) t
where t.rn = 1

SELECT T.*
FROM (
SELECT ID, value, date,
ROW_NUMBER() OVER (PARTITION BY ID, month(Date) ORDER BY Date desc) AS rn
FROM table
) t
where t.rn = 1

Related

How to find the time and step between status change

I'm trying to query a dataset about user status changes. and I want to find out the time it takes for the status to change, and the steps in between(number of rows).
Example data:
user_id
Status
date
1
a
2001-01-01
1
a
2001-01-08
1
b
2001-01-15
1
b
2001-01-28
1
a
2001-01-31
1
b
2001-02-01
2
a
2001-01-08
2
a
2001-01-18
2
a
2001-01-28
3
b
2001-03-08
3
b
2001-03-18
3
b
2001-03-19
3
a
2001-03-20
Desired output:
user_id
From
to
days in between
Steps in between
1
a
b
14
2
1
b
a
16
2
1
a
b
1
1
3
b
a
12
3
You might consider below another approach.
WITH partitions AS (
SELECT *, COUNTIF(flag) OVER w AS part FROM (
SELECT *, ROW_NUMBER() OVER w AS rn, status <> LAG(status) OVER w AS flag,
FROM sample_data
WINDOW w AS (PARTITION BY user_id ORDER BY date)
) WINDOW w AS (PARTITION BY user_id ORDER BY date)
)
SELECT user_id,
LAG(ANY_VALUE(status)) OVER w AS `from`,
ANY_VALUE(status) AS `to`,
EXTRACT(DAY FROM MIN(date) - LAG(MIN(date)) OVER w) AS days_in_between,
MIN(rn) - LAG(MIN(rn)) OVER w AS steps_in_between
FROM partitions
GROUP BY user_id, part
QUALIFY `from` IS NOT NULL
WINDOW w AS (PARTITION BY user_id ORDER BY MIN(date));
Query results
with main as (
select
*,
dense_rank() over(partition by user_id order by date) as rank_,
row_number() over(partition by user_id, status order by date) as rank_2,
row_number() over(partition by user_id, status order by date) - dense_rank() over(partition by id order by date) as diff,
row_number() over(partition by user_id order by date) as row_num,
lag(status) over(partition by user_id order by date) as prev_status,
concat(lag(status) over(partition by user_id order by date) , ' to ' , status) as status_change
from table
),
new_rank as (
select
*,
rown_num - diff as row_num_diff,
min(date) over(partition by user_id, status, rown_num - diff) as min_date
from main
),
prev_date as (
select
*,
lag(min_date) over(partition by user_id order by date) as prev_min_date
from new_rank
)
select
status as from,
prev_status as to,
date_diff(prev_min_date, min_date, DAY) as days_in_between
from prev_date
where status !=prev_status and prev_status is not null
Does this seem to work? I tried to solve this but it's very hard to solve it without a fiddle plus:
you may remove the extra steps/ranks that I have added, I left them there so you can visually see what they are doing
I don't get your steps logic so it is missing from the code

Need to get maximum date range which is overlapping in SQL

I have a table with 3 columns id, start_date, end_date
Some of the values are as follows:
1 2018-01-01 2030-01-01
1 2017-10-01 2018-10-01
1 2019-01-01 2020-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2010-10-01 2010-12-01
2 2008-01-01 2009-01-01
I have the above kind of data set where I have to filter out overlap date range by keeping maximum datarange and keep the other date range which is not overlapping for a particular id.
Hence desired output should be:
1 2018-01-01 2030-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2008-01-01 2009-01-01
I am unable to find the right way to code in impala. Can someone please help me.
I have tried like,
with cte as(
select a*, row_number() over(partition by id order by datediff(end_date , start_date) desc) as flag from mytable a) select * from cte where flag=1
but this will remove other date range which is not overlapping. Please help.
use row number with countItem for each id
with cte as(
select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable
)
select id,start_date,end_date
from cte
where seq = 1 or seq = countItem
or without cte
select id,start_date,end_date
from
(select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable) t
where seq = 1 or seq = countItem
demo in db<>fiddle
You can use a cumulative max to see if there is any overlap with preceding rows. If there is not, then you have the first row of a new group (row in the result set).
A cumulative sum of the starts assigns each row in the source to a group. Then aggregate:
select id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by id
order by start_date
rows between unbounded preceding and current row
) as grp
from (select t.*,
max(end_date) over (partition by id
order by start_date
rows between unbounded preceding and 1 preceding
) as prev_end_date
from t
) t
) t
group by id, grp;

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

SQL partition by on date range

Assume this is my table:
ID NUMBER DATE
------------------------
1 45 2018-01-01
2 45 2018-01-02
2 45 2018-01-27
I need to separate using partition by and row_number where the difference between one date and another is greater than 5 days. Something like this would be the result of the above example:
ROWNUMBER ID NUMBER DATE
-----------------------------
1 1 45 2018-01-01
2 2 45 2018-01-02
1 3 45 2018-01-27
My actual query is something like this:
SELECT ROW_NUMBER() OVER(PARTITION BY NUMBER ODER BY ID DESC) AS ROWNUMBER, ...
But as you can notice, it doesn't work for the dates. How can I achieve that?
You can use lag function :
select *, row_number() over (partition by number, grp order by id) as [ROWNUMBER]
from (select *, (case when datediff(day, lag(date,1,date) over (partition by number order by id), date) <= 1
then 1 else 2
end) as grp
from table
) t;
by using lag and datediff funtion
select * from
(
select t.*,
datediff(day,
lag(DATE) over (partition by NUMBER order by id),
DATE
) as diff
from t
) as TT where diff>5
http://sqlfiddle.com/#!18/130ae/11
I think you want to identify the groups, using lag() and datediff() and a cumulative sum. Then use row_number():
select t.*,
row_number() over (partition by number, grp order by date) as rownumber
from (select t.*,
sum(grp_start) over (partition by number order by date) as grp
from (select t.*,
(case when lag(date) over (partition by number order by date) < dateadd(day, 5, date)
then 1 else 0
end) as grp_start
from t
) t
) t;

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138