Get the max month from a query that returns several years - sql

I have a table with dates, one date per month (some months will be missing but that is expected) but several years are return. I need to get the latest month only. So if I have data for say months 8, 7, 6, etc. in 2020 then return those startDate. And for months 10, 11, and 12 it should return the StartDate from 2019 or wherever it finds it that is the latest. id and courseLength are part of the table but irrelevant for this task. StartDate is of type date.
This is the top 15 rows of the table
id StartDate courseLength
153 2020-08-31 63
153 2020-07-31 35
153 2020-06-30 60
153 2020-05-31 17
153 2020-03-31 51
153 2020-01-31 59
153 2019-12-31 30
153 2019-10-31 51
153 2019-08-31 59
153 2019-06-30 54
153 2019-05-31 17
153 2019-03-31 56
153 2019-01-31 55
153 2018-12-31 27
153 2018-10-31 54
And this is what I am expecting
id StartDate courseLength
153 2020-08-31 63
153 2020-07-31 35
153 2020-06-30 60
153 2020-05-31 17
153 2020-03-31 51
153 2020-01-31 59
153 2019-12-31 30
153 2019-10-31 51
153 2018-11-30 65
153 2018-09-31 53
153 2019-05-31 17
153 2018-04-30 13

You can use window functions:
select *
from (
select t.*,
row_number() over(partition by id, month(startdate) order by startdate desc) rn
from mytable t
) t
where rn = 1

try with this
SELECT
R.id, R.StartDate, R.courseLength
FROM (
SELECT
id, StartDate, courseLength, RANK() OVER(PARTITION BY MONTH(StartDate) ORDER BY StartDate DESC) as rank
FROM
#t
) R
WHERE
R.rank = 1

or you can use this :
select * from table
join in (
select
max(date) maxdate
, id
from table
group by
month(date) , id
) max
on max.id = table.id
and max.maxdate = table.date

Related

Snowflake sql - how to get the total subscriber based on gained/lost columns for each day and id?

I have the below sample of my table, that has thousands of ids and a row for each day for each id, I also have a subscriber_gained and subscriber_lost for each day/id. Is there a way to calculate how many followers I have for each day with this amount if data?
metrics_date
id
subscriber_lost
subscriber_gained
2022-12-03
3343
54
37
2022-12-02
3343
29
27
2022-12-03
1223
44
26
2022-12-02
1223
21
36
I want to have a query that shows the running total for that day for that id:
metrics_date
id
subscriber_lost
subscriber_gained
number_of_visitors
2022-12-03
3343
54
37
1209
2022-12-02
3343
29
27
1226
2022-12-03
1223
44
26
3521
2022-12-02
1223
21
36
3539
I've tried this query but the total is off :
select
METRICS_DATE,
channel_id,
number_of_visitors,
case
when lag(number_of_visitors) over(order by METRICS_DATE) is null
then number_of_visitors
when lag(number_of_visitors) over(order by METRICS_DATE) < number_of_visitors
then number_of_visitors - lag(number_of_visitors) over(order by METRICS_DATE)
else 0
end subscribers_gained,
case when lag(number_of_visitors) over(order by METRICS_DATE) > number_of_visitors
then lag(number_of_visitors) over(order by METRICS_DATE) - number_of_visitors
else 0
end subscribers_lost
from (
select METRICS_DATE,
channel_id,
count(*) number_of_visitors
from you.p_content_owner_basic_a3_you
where channel_id = '3343'
group by METRICS_DATE,
channel_id
) t
order by METRICS_DATE desc;
so with some data in a CTE for the simplicity of it:
with data(metrics_date, id, subscriber_lost, subscriber_gained) as (
select * from values
('2022-12-03'::date, 3343, 54, 37),
('2022-12-02'::date, 3343, 29, 27),
('2022-12-03'::date, 1223, 44, 26),
('2022-12-02'::date, 1223, 21, 36),
('2022-12-01'::date, 9999, 0, 10),
('2022-12-02'::date, 9999, 5, 10),
('2022-12-03'::date, 9999, 15, 10),
('2022-12-04'::date, 9999, 10, 10)
)
what you are want is to subtract the two window frame SUM's of the two partials:
select
d.*
,sum(d.subscriber_gained) over ( partition by d.id order by d.metrics_date) -
sum(d.subscriber_lost) over ( partition by d.id order by d.metrics_date) as number_of_visitors
from data as d
order by 2,1;
which can also be express as the sum of the difference.
select
d.*
,sum(d.subscriber_gained - d.subscriber_lost) over ( partition by d.id order by d.metrics_date) as number_of_visitors
from data as d
order by 2,1;
METRICS_DATE
ID
SUBSCRIBER_LOST
SUBSCRIBER_GAINED
NUMBER_OF_VISITORS
2022-12-02
1223
21
36
15
2022-12-03
1223
44
26
-3
2022-12-02
3343
29
27
-2
2022-12-03
3343
54
37
-19
2022-12-01
9999
0
10
10
2022-12-02
9999
5
10
15
2022-12-03
9999
15
10
10
2022-12-04
9999
10
10
10
Percentage change
select
d.*
,d.subscriber_gained - d.subscriber_lost as change
,sum(change) over ( partition by d.id order by d.metrics_date) as number_of_visitors
,round(div0(change, number_of_visitors+change) *100,1) as before_percent_change
,round(div0(change, number_of_visitors) *100,1) as after_percent_change
from data as d
order by 2,1;
gives:
METRICS_DATE
ID
SUBSCRIBER_LOST
SUBSCRIBER_GAINED
CHANGE
NUMBER_OF_VISITORS
BEFORE_PERCENT_CHANGE
AFTER_PERCENT_CHANGE
2022-12-02
1223
21
36
15
15
50
100
2022-12-03
1223
44
26
-18
-3
85.7
600
2022-12-02
3343
29
27
-2
-2
50
100
2022-12-03
3343
54
37
-17
-19
47.2
89.5
2022-12-01
9999
0
10
10
10
50
100
2022-12-02
9999
5
10
5
15
25
33.3
2022-12-03
9999
15
10
-5
10
-100
-50
2022-12-04
9999
10
10
0
10
0
0

Finding most recent startdate, and endDate from consecutive dates

I have a table like below:
user_id
store_id
stock
date
116
2
0
2021-10-18
116
2
0
2021-10-19
116
2
0
2021-10-20
116
2
0
2021-08-16
116
2
0
2021-08-15
116
2
0
2021-07-04
116
2
0
,2021-07-03
389
2
0
2021-07-02
389
2
0
2021-07-01
389
2
0
2021-10-27
52
6
0
2021-10-28
52
6
0
2021-10-29
52
6
0
2021-10-30
116
38
0
2021-05-02
116
38
0
2021-05-03
116
38
0
2021-05-04
116
38
0
2021-04-06
The table can have multiple consecutive days where a product ran out of stock, so I'd like to create a query with the last startDate and endDate where the product ran out of stock. For the table above, the results have to be:
user_Id
store_id
startDate
endDate
116
2
2021-10-18
2021-10-20
116
38
2021-05-02
2021-05-04
389
2
2021-07-01
2021-07-02
52
6
2021-10-28
2021-10-30
I have tried the solution with row_number(), but it didn't work. Does someone have a tip or idea to solve this problem with SQL (PostgreSQL)?
here is how you can do it :
select user_id, store_id,min(date) startdate,max(date) enddate
from (
select *, rank() over (partition by user_id, store_id order by grp desc) rn from (
select *, date - row_number() over (partition by user_id,store_id order by date) * interval '1 day' grp
from tablename
) t) t where rn = 1
group by user_id, store_id,grp
db<>fiddle here

How To Check If Value Is Decreasing Over Months SQLite

i got revenue over accounts monthly what am looking for is to view earnings for each account in descending order from last decrease
here is the query
SELECT account_id,
monthly_date,
earnings
FROM accounts_revenue
GROUP BY account_id,
monthly_date
the data is something like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2017-10-01
2000
55
2018-02-01
1500
55
2018-05-01
1200
55
2018-12-01
3000
55
2019-01-01
900
55
2019-02-01
810
55
2019-04-01
1000
55
2019-05-01
600
55
2020-01-01
800
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
so the data should be like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2018-02-01
1500
55
2018-05-01
1200
55
2019-01-01
900
55
2019-02-01
810
55
2019-05-01
600
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
any idea how to achieve this ??
Use NOT EXISTS:
SELECT ar1.*
FROM accounts_revenue ar1
WHERE NOT EXISTS (
SELECT 1
FROM accounts_revenue ar2
WHERE ar2.account_id = ar1.account_id
AND ar2.monthly_date < ar1.monthly_date
AND ar2.earnings <= ar1.earnings
)
ORDER BY ar1.account_id, ar1.monthly_date;
See the demo.
You can use the lag() window function and a CTE (Or subquery if you prefer) to filter out rows you don't want:
WITH revenue AS
(SELECT account_id, monthly_date, earnings,
lag(earnings) OVER (PARTITION BY account_id ORDER BY monthly_date) AS prev_earnings
FROM accounts_revenue)
SELECT account_id, monthly_date, earnings
FROM revenue
WHERE earnings < prev_earnings OR prev_earnings IS NULL
ORDER BY account_id, monthly_date;
For efficiency, you'll want an index on accounts_revenue(account_id, monthly_date).

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

MAX value of column with corresponding columns

I am using an old SQL Server 2000.
Here is some sample data:
ROOMDATE rate bus_id quantity
2018-09-21 00:00:00.000 129 346686 2
2018-09-21 00:00:00.000 162 354247 36
2018-09-21 00:00:00.000 159 382897 150
2018-09-21 00:00:00.000 120 556111 25
2018-09-22 00:00:00.000 129 346686 8
2018-09-22 00:00:00.000 162 354247 86
2018-09-22 00:00:00.000 159 382897 150
2018-09-22 00:00:00.000 120 556111 25
2018-09-23 00:00:00.000 129 346686 23
2018-09-23 00:00:00.000 162 354247 146
2018-09-23 00:00:00.000 159 382897 9
2018-09-23 00:00:00.000 94 570135 23
Essentially what I am wanting is the MAX quantity of each day with it's corresponding rate and bus_id.
For example, I would want the following rows from my sample data above:
ROOMDATE rate bus_id quantity
2018-09-21 00:00:00.000 159 382897 150
2018-09-22 00:00:00.000 159 382897 150
2018-09-23 00:00:00.000 162 354247 146
From what I have read, SQL Server 2000 did not support ROW_NUMBER. But we can phrase your query using a subquery which finds the max quantity for each day:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT
CONVERT(char(10), ROOMDATE, 120) AS ROOMDATE,
MAX(quantity) AS max_quantity
FROM yourTable
GROUP BY CONVERT(char(10), ROOMDATE, 120)
) t2
ON CONVERT(char(10), t1.ROOMDATE, 120) = t2.ROOMDATE AND
t1.quantity = t2.max_quantity
ORDER BY
t1.ROOMDATE;
Demo