I've used the Lag Function to find the previous value. However I have run into an issue that requires a more complex query.
Here is my scenario.
Our table currently keeps month end data for each record. With the exception of the last 95 days. we like to keep daily records fro the last 95 days. This is what I mean by month end and daily records
ID Date Amount
123 10/31/2019 52
123 11/31/2019 56
123 12/31/2019 59
123 01/25/2020 32
123 01/26/2020 28
123 ... ..
123 03/12/2020 103
Imagine that the ... represent a daily record for id: 123 up until yesterday.
My task worked perfectly for our month end historical data, but i ran into an issue with our daily historical data
what I want is to be able to get the last value from the last day of the previous month for all months.
this is what I currently have for my query
Select ID, Date, Amount,LAG(Amount, 1, 0) OVER(PARTITION BY ID
ORDER BY id,
Date)
AS SharePreviousBalance from dbo.shares
where date >= 20191031
This is the output I would like to have, but my current query does not work the way i want it to work:
ID Date Amount SharePreviousBalance
123 10/31/2019 52 0
123 11/31/2019 56 52
123 12/31/2019 59 56
123 ... .. ..
123 01/25/2020 32 0
123 01/26/2020 28 0
123 01/27/2020 28 0
123 ... .. ..
123 01/31/2020 28 59
123 ... .. ..
123 02/15/2020 28 0
123 ... .. ..
123 02/29/2020 25 28
123 ... .. ..
123 03/05/2020 29 0
123 ... .. ..
123 03/10/2020 30 0
123 ... .. ..
123 03/12/2020 103 25
Any Ideas?
Thank you
With a little conditional logic, you can still do this with lag():
select
t.*,
case when date = eomonth(date) then
coalesce(
lag(amount) over(
partition by id, case when date = eomonth(date) then 1 else 0 end
order by date
),
0
)
end SharePreviousBalance
from mytable t
The idea is to build a partition for "end-of-month" rows (ie rows whose date is the last day of a month). Within that partition, an end-of-month row can access the previous end of month with lag().
Demo on DB Fiddle - I added a few rows to your sample data:
ID | Date | Amount | SharePreviousBalance
--: | :--------- | -----: | -------------------:
123 | 2019-10-31 | 52 | 0
123 | 2019-11-30 | 56 | 52
123 | 2019-12-31 | 59 | 56
123 | 2020-01-20 | 28 | null
123 | 2020-01-25 | 32 | null
123 | 2020-01-26 | 28 | null
123 | 2020-01-31 | 28 | 59
123 | 2020-02-12 | 103 | null
123 | 2020-02-28 | 103 | null
123 | 2020-02-29 | 103 | 28
If you also want to show the value of the previous end of month for the current date, then add that row to the "end-of-month" partition:
select
t.*,
case when date in (eomonth(date), cast(getdate() as date)) then
coalesce(
lag(amount) over(
partition by
id,
case when date in (eomonth(date), cast(getdate() as date)) then 1 else 0 end
order by date
),
0
)
end SharePreviousBalance
from mytable t
order by id, date
Related
This was a question asked in a BI Engineer interview. I could not solve it and still scratching my head around it. Although, I used lag window function using case statements, but the interviewer did not seem to get impressed by it. Is there any other way it can be solved?
Below is the input data. If the day of the week is Monday, then it does not add the previous day's amount. It adds the amount for previous days only till Sunday which only sums up the total amount for the week.
Date Amount
27-Sep-2021 1
28-Sep-2021 13
29-Sep-2021 15
30-Sep-2021 20
1-Oct-2021 9
2-Oct-2021 20
3-Oct-2021 8
4-Oct-2021 2
5-Oct-2021 9
6-Oct-2021 11
7-Oct-2021 15
8-Oct-2021 8
9-Oct-2021 16
10-Oct-2021 3
11-Oct-2021 3
12-Oct-2021 18
And below is the expected output
Date Amount WTD
27-Sep-2021 1 1
28-Sep-2021 13 14
29-Sep-2021 15 29
30-Sep-2021 20 49
1-Oct-2021 9 58
2-Oct-2021 20 78
3-Oct-2021 8 86
4-Oct-2021 2 2
5-Oct-2021 9 11
6-Oct-2021 11 22
7-Oct-2021 15 37
8-Oct-2021 8 45
9-Oct-2021 16 61
10-Oct-2021 3 64
11-Oct-2021 3 3
12-Oct-2021 18 21
The window function sum() over() should help here. Also note that I use SET DATEFIRST 1 to set Monday as the the first day of the week.
Example or dbFiddle
SET DATEFIRST 1;
Select *
,WTD = sum(Amount) over (partition by datepart(Week,Date) order by Date rows unbounded preceding)
from YourTable
Results
Of course, sitting at a terminal, I have the luxury of fixing syntax errors as they appear. I always try to remind myself to determine if a value can play into ROWS UNBOUNDED PRECEDING when a gap problems arises.
CREATE Table MyTable(Date DATETIME, Amount INT)
INSERT MyTable VALUES
('27-Sep-2021',1),
('28-Sep-2021',13),
('29-Sep-2021',15),
('30-Sep-2021',20),
('1-Oct-2021',9),
('2-Oct-2021',20),
('3-Oct-2021',8),
('4-Oct-2021',2),
('5-Oct-2021',9),
('6-Oct-2021',11),
('7-Oct-2021',15),
('8-Oct-2021',8),
('9-Oct-2021',16),
('10-Oct-2021',3),
('11-Oct-2021',3),
('12-Oct-2021',18)
GO
16 rows affected
--Monday=2
;WITH Grouped AS
(
SELECT *,
IsMondayGroup = SUM(CASE WHEN DATEPART(WEEKDAY,Date) = 2 THEN 1 ELSE 0 END) OVER (ORDER BY Date ROWS UNBOUNDED PRECEDING)
FROM
MyTable
)
SELECT
Date,Amount,
GroupSum = SUM(Amount) OVER(PARTITION BY IsMondayGroup ORDER BY Date)
FROM
Grouped
GO
Date | Amount | GroupSum
:---------------------- | -----: | -------:
2021-09-27 00:00:00.000 | 1 | 1
2021-09-28 00:00:00.000 | 13 | 14
2021-09-29 00:00:00.000 | 15 | 29
2021-09-30 00:00:00.000 | 20 | 49
2021-10-01 00:00:00.000 | 9 | 58
2021-10-02 00:00:00.000 | 20 | 78
2021-10-03 00:00:00.000 | 8 | 86
2021-10-04 00:00:00.000 | 2 | 2
2021-10-05 00:00:00.000 | 9 | 11
2021-10-06 00:00:00.000 | 11 | 22
2021-10-07 00:00:00.000 | 15 | 37
2021-10-08 00:00:00.000 | 8 | 45
2021-10-09 00:00:00.000 | 16 | 61
2021-10-10 00:00:00.000 | 3 | 64
2021-10-11 00:00:00.000 | 3 | 3
2021-10-12 00:00:00.000 | 18 | 21
db<>fiddle here
I have a table like this:
Date
Week
2021-01-01
53
2021-01-02
53
2021-01-03
53
2021-01-04
1
2021-01-05
1
2021-01-06
1
2021-01-07
1
...
...
2021-12-30
52
2021-12-31
52
I want to rank weeks not with their values but with Date ascending order. I tried to use
dense_rank() over (order by Week)
and got this results:
Date
Week
2021-01-01
53
2021-01-02
53
2021-01-03
53
2021-01-04
1
2021-01-05
1
2021-01-06
1
2021-01-07
1
...
...
2021-12-30
52
2021-12-31
52
But 53rd week is on 53rd rank, not 1st as I want. Do you know what I need to use in that case? Thx
You can try to use MOD function in ORDER BY.
Because the Week Number seem like between 1 to 53, MOD function will calculate
MOD(53, 53)=> 0
MOD(1, 53) => 1
so on .... .
dense_rank() over (order by MOD(Week, 53))
use order by desc
select *, row_number()over(order by week desc) from table_name
You can simply play with Vertica's date/time functions - and add #D-Shih 's clever idea with the modulo function to it, and no dense-rank needed if the result is the one you display:
WITH
indata (dt) AS (
SELECT DATE '2020-12-30'
UNION ALL SELECT DATE '2020-12-31'
UNION ALL SELECT DATE '2021-01-01'
UNION ALL SELECT DATE '2021-01-02'
UNION ALL SELECT DATE '2021-01-03'
UNION ALL SELECT DATE '2021-01-04'
UNION ALL SELECT DATE '2021-01-05'
[...]
UNION ALL SELECT DATE '2021-12-30'
UNION ALL SELECT DATE '2021-12-31'
UNION ALL SELECT DATE '2022-01-01'
UNION ALL SELECT DATE '2022-01-02'
UNION ALL SELECT DATE '2022-01-03'
UNION ALL SELECT DATE '2022-01-04'
)
SELECT
dt
, WEEK(dt) AS stdweek
, WEEK_ISO(dt) AS isoweek
, MOD(WEEK(dt),53) AS stdwkmod53
, MOD(WEEK_ISO(dt),53) AS isowkmod53
FROM indata;
-- out dt | stdweek | isoweek | stdwkmod53 | isowkmod53
-- out ------------+---------+---------+------------+------------
-- out 2020-12-30 | 53 | 53 | 0 | 0
-- out 2020-12-31 | 53 | 53 | 0 | 0
-- out 2021-01-01 | 1 | 53 | 1 | 0
-- out 2021-01-02 | 1 | 53 | 1 | 0
-- out 2021-01-03 | 2 | 53 | 2 | 0
-- out 2021-01-04 | 2 | 1 | 2 | 1
-- out 2021-01-05 | 2 | 1 | 2 | 1
[...]
-- out 2021-12-30 | 53 | 52 | 0 | 52
-- out 2021-12-31 | 53 | 52 | 0 | 52
-- out 2022-01-01 | 1 | 52 | 1 | 52
-- out 2022-01-02 | 2 | 52 | 2 | 52
-- out 2022-01-03 | 2 | 1 | 2 | 1
-- out 2022-01-04 | 2 | 1 | 2 | 1
I am trying to find number of days to close account for particular account.
I have table like below:
OPER_DAY | CODE_FILIAL | SUM_IN | SALDO_OUT | ACC
-------------------------------------------------------------------------------
2020-11-02 | 00690 | 0 | 1578509367.58 | 001
2020-11-03 | 00690 | 1578509367.58 | 9116497.5 | 001
2020-11-04 | 00690 | 9116497.5 | 0 | 001
2020-11-02 | 00690 | 0 | 157430882.96 | 101
2020-11-03 | 00690 | 157430882.96 | 0 | 101
2020-11-09 | 00690 | 0 | 500000 | 101
2020-11-19 | 00690 | 500000 | 0 | 101
Day starts with 0 sum and ends with 0 for particular ACC. I need to find number of days that filial had taken to close account.
For example for ACC 001 it took 2 days, from 2020-11-02 - 2020-11-04. For 101 ACC it took 11 days. Because from 2020-11-02 - 2020-11-03 -> 1 day,
from 2020-11-09 - 2020-11-19 -> 10 days
Overall 13 days.
Result I want:
----------------------------
CODE_FILIAL | NUM_OF_DAYS
---------------------------
00690 | 13
This reads like a gaps-and-island problem. An island starts with a value of 0 in sum_in, and ends with a value of 0 in saldo_out.
Assuming there there always is at most one end for each start, you can use window functions and aggregation as follows:
select code_filial, sum(end_dt - start_dt) as num_of_days
from (
select code_filial, acc, grp
min(oper_day) as start_dt,
max(case when saldo_out = 0 then oper_day end) as end_dt
from (
select t.*,
sum(case when sum_in = 0 then 1 else 0 end) over(partition by code_filial, acc order by oper_day) as grp
from mytable t
) t
group by code_filial, acc, grp
) group by code_filial
This works by building groups of records with a window sum that increments every time a value of 0 is met in colum sum_in for a given (code_filial, acc) tuple. We can then use aggregation to compute the corresponding end date. The final step is to aggregate by code_filial.
I have a file that has a record of all usage of our product, which includes user id, number of calls made and the date the calls were made (it's rolled up to the date, by user id).
user_id | num_calls | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 35 | 2018-04-18
435 | 10 | 2018-04-18
123 | 20 | 2018-04-19
435 | 90 | 2018-04-20
I want to produce a chart that shows, for each day in the past and going forward, users, who were active in the 30 days prior to and including that date, and how many calls they made over that 30 day period. Ultimately, I will be using this to set various thresholds for "high usage" in a given 30 day period. It would look like this:
user_id | num_calls_in_previous_30_days | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 67 | 2018-04-18
435 | 31 | 2018-04-18
123 | 87 | 2018-04-19
435 | 31 | 2018-04-19
123 | 87 | 2018-04-20
435 | 121 | 2018-04-20
The issue I'm having is that when I try to use the window function
sum(num_calls) over (partition by id ORDER BY UNIX_SECONDS(timestamp(date)) range BETWEEN 2505600 PRECEDING AND CURRENT ROW)
I only get the total number of calls in the last 30 days for users who were active on each specific date as opposed to including all users who were active in the 30 days prior to that date and their usage over that time frame. Using the same data from above, it looks like this:
user_id | num_calls_in_previous_30_days | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 67 | 2018-04-18
435 | 31 | 2018-04-18
123 | 87 | 2018-04-19
435 | 121 | 2018-04-20
I tried another route, which was getting all unique user_ids from the previous 30 days from each date, but I wasn't sure how to join this with my existing usage data to get my desired result.
I'm sure there's a simple solution here, but I've spent a few hours on it and can't seem to wrap my head around how to solve this.
Thanks in advance!
Below example is for BigQuery Standard SQL
#standardSQL
WITH dates AS (
SELECT DATE
FROM (
SELECT MIN(DATE) start, MAX(DATE) finish
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(start, finish)) DATE
), users AS (
SELECT DISTINCT user_id
FROM `project.dataset.table`
)
SELECT user_id, num_calls, DATE,
SUM(num_calls) OVER (win30days) num_calls_in_previous_30_days
FROM users
CROSS JOIN dates
LEFT JOIN `project.dataset.table` USING(DATE, user_id)
WINDOW win30days AS (
PARTITION BY user_id
ORDER BY UNIX_SECONDS(TIMESTAMP(DATE))
RANGE BETWEEN 2505600 PRECEDING AND CURRENT ROW
)
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123 user_id, 32 num_calls, DATE '2018-04-17' DATE UNION ALL
SELECT 435, 21, '2018-04-17' UNION ALL
SELECT 123, 35, '2018-04-18' UNION ALL
SELECT 435, 10, '2018-04-18' UNION ALL
SELECT 123, 20, '2018-04-19' UNION ALL
SELECT 435, 90, '2018-04-20'
), dates AS (
SELECT DATE
FROM (
SELECT MIN(DATE) start, MAX(DATE) finish
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(start, finish)) DATE
), users AS (
SELECT DISTINCT user_id
FROM `project.dataset.table`
)
SELECT user_id, num_calls, DATE,
SUM(num_calls) OVER (win30days) num_calls_in_previous_30_days
FROM users
CROSS JOIN dates
LEFT JOIN `project.dataset.table` USING(DATE, user_id)
WINDOW win30days AS (
PARTITION BY user_id
ORDER BY UNIX_SECONDS(TIMESTAMP(DATE))
RANGE BETWEEN 2505600 PRECEDING AND CURRENT ROW
)
-- ORDER BY DATE, user_id
with result as
Row user_id num_calls DATE num_calls_in_previous_30_days
1 123 32 2018-04-17 32
2 435 21 2018-04-17 21
3 123 35 2018-04-18 67
4 435 10 2018-04-18 31
5 123 20 2018-04-19 87
6 435 null 2018-04-19 31
7 123 null 2018-04-20 87
8 435 90 2018-04-20 121
i am currently making a monthly report using MySQL. I have a table named "monthly" that looks something like this:
id | date | amount
10 | 2009-12-01 22:10:08 | 7
9 | 2009-11-01 22:10:08 | 78
8 | 2009-10-01 23:10:08 | 5
7 | 2009-07-01 21:10:08 | 54
6 | 2009-03-01 04:10:08 | 3
5 | 2009-02-01 09:10:08 | 456
4 | 2009-02-01 14:10:08 | 4
3 | 2009-01-01 20:10:08 | 20
2 | 2009-01-01 13:10:15 | 10
1 | 2008-12-01 10:10:10 | 5
Then, when i make a monthly report (which is based by per month of per year), i get something like this.
yearmonth | total
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-07 | 54
2009-10 | 5
2009-11 | 78
2009-12 | 7
I used this query to achieved the result:
SELECT substring( date, 1, 7 ) AS yearmonth, sum( amount ) AS total
FROM monthly
GROUP BY substring( date, 1, 7 )
But I need something like this:
yearmonth | total
2008-01 | 0
2008-02 | 0
2008-03 | 0
2008-04 | 0
2008-05 | 0
2008-06 | 0
2008-07 | 0
2008-08 | 0
2008-09 | 0
2008-10 | 0
2008-11 | 0
2008-12 | 5
2009-01 | 30
2009-02 | 460
2009-03 | 3
2009-05 | 0
2009-06 | 0
2009-07 | 54
2009-08 | 0
2009-09 | 0
2009-10 | 5
2009-11 | 78
2009-12 | 7
Something that would display the zeroes for the month that doesnt have any value. Is it even possible to do that in a MySQL query?
You should generate a dummy rowsource and LEFT JOIN with it:
SELECT *
FROM (
SELECT 1 AS month
UNION ALL
SELECT 2
…
UNION ALL
SELECT 12
) months
CROSS JOIN
(
SELECT 2008 AS year
UNION ALL
SELECT 2009 AS year
) years
LEFT JOIN
mydata m
ON m.date >= CONCAT_WS('.', year, month, 1)
AND m.date < CONCAT_WS('.', year, month, 1) + INTERVAL 1 MONTH
GROUP BY
year, month
You can create these as tables on disk rather than generate them each time.
MySQL is the only system of the major four that does have allow an easy way to generate arbitrary resultsets.
Oracle, SQL Server and PostgreSQL do have those (CONNECT BY, recursive CTE's and generate_series, respectively)
Quassnoi is right, and I'll add a comment about how to recognize when you need something like this:
You want '2008-01' in your result, yet nothing in the source table has a date in January, 2008. Result sets have to come from the tables you query, so the obvious conclusion is that you need an additional table - one that contains each month you want as part of your result.