I have below Set of data where Count is calculated for each month.
I have to create a report with this data but the count we get for each month around 5th or 6th business day. Till then it always shows NULL or 0.
so, I would like to populate previous value till we get the count for the months.
so, if below is my data set, then I would like to see the count 14793417 for the months >='2018-08-01'. As soon as we get the data for these months then it should pull in the right count and not previous count.
Count yyyymm Month-Date
15339017 201801 2018-01-01
13832952 201802 2018-02-01
14432701 201803 2018-03-01
14259210 201804 2018-04-01
14863942 201805 2018-05-01
14377470 201806 2018-06-01
14793417 201807 2018-07-01
NULL NULL 2018-08-01
NULL NULL 2018-09-01
NULL NULL 2018-10-01
NULL NULL 2018-11-01
NULL NULL 2018-12-01
I tried with lag function but i am not getting it right.
select distinct
[count],
CASE WHEN [count] IS NULL THEN
lag([count]) over(partition by [Month-Date]) order by [Month-Date])
else [count] end,
yyyymm
from #table
group by [Month-Date],yyyymm
Can somebody help me with this?
Not sure what is your #table, but below is the example:
WITH ABC
AS
(
SELECT 1 as month, 500 as total
UNION ALL
SELECT 2, 1000
UNION ALL
SELECT 3, NULL
UNION ALL
SELECT 4 , NULL
)
SELECT month,ISNULL(total,previousMonth)
FROM
(
SELECT *, LAG(TOTAL)OVER(ORDER BY month) as previousMonth
FROM ABC
) as A
Related
I have reviewed many posts about how to find gaps in dates and believe that I am close to figuring it out but need just a little extra help. Per my query I am pulling distinct days with a record count for each distinct day. I have added a "Gap_Days" column which should return a zero if no gap from previous date OR the number of days since the previous date. As you can see all of my Gap_Days are zero when in fact I am missing 10/24 and 10/25. Therefore on 10/26 there should be a gap of 2 since the previous date is 10/23.
Thanks in advance for pointing out what I am probably looking right at.
SELECT DISTINCT Run_Date, COUNT(Run_Date) AS Daily_Count,
Gap_Days = Coalesce(DateDiff(Day,Lag(Run_Date) Over (partition by Run_Date order by Run_Date DESC), Run_Date)-1,0)
FROM tblUnitsOfWork
WHERE (Run_Date >= '2022-10-01')
GROUP BY Run_Date
ORDER BY Run_Date DESC;
Run_Date Daily_Count Gap_Days
2022-10-29 00:00:00.000 8431 0
2022-10-28 00:00:00.000 8204 0
2022-10-27 00:00:00.000 8705 0
2022-10-26 00:00:00.000 7885 0
2022-10-23 00:00:00.000 7485 0
2022-10-22 00:00:00.000 8699 0
2022-10-21 00:00:00.000 9212 0
2022-10-20 00:00:00.000 9220 0
First let's set up some demo data:
DECLARE #table TABLE (ID INT IDENTITY, date DATE)
DECLARE #dt DATE
WHILE (SELECT COUNT(*) FROM #table) < 30
BEGIN
SET #dt = DATEADD(DAY,(ROUND(((50 - 1 -1) * RAND() + 1), 0) - 1)-25,CURRENT_TIMESTAMP)
IF NOT EXISTS (SELECT 1 FROM #table WHERE date = #dt) INSERT INTO #table (date) SELECT #dt
END
ID date
--------
1 2022-11-10
2 2022-11-15
3 2022-10-20
...
28 2022-10-14
29 2022-11-13
30 2022-11-21
This gives us a table variable with 30 random dates in a 50 day window. Now let's look for missing dates:
SELECT *, CASE WHEN ROW_NUMBER() OVER (ORDER BY date) > 1 AND LAG(date,1) OVER (ORDER BY date) <> DATEADD(DAY,-1,date) THEN 'GAP! ' + CAST(DATEDIFF(DAY,LAG(date,1) OVER (ORDER BY date),date)-1 AS NVARCHAR) + ' DAYS MISSING!' END
FROM #table
ORDER BY date
All we're doing here is ignoring the first date (since it's expected there wouldn't be one before then) and from then on comparing the last date (using lag ordered by date) to the current date. If it is not a day before the case statement will produce a message with how many days were missing.
ID date MissingDatesFlag
----------------------------
1 2022-10-08 NULL
4 2022-10-09 NULL
25 2022-10-10 NULL
28 2022-10-11 NULL
22 2022-10-15 GAP! 4 DAYS MISSING!
2 2022-10-18 GAP! 3 DAYS MISSING!
12 2022-10-19 NULL
24 2022-10-20 NULL
....
15 2022-11-18 GAP! 3 DAYS MISSING!
29 2022-11-21 GAP! 3 DAYS MISSING!
20 2022-11-22 NULL
Since the demo data is randomly selected your results may vary, but they should be similar.
I have a table in BQ that looks like this:
Row Field DateTime
1 one 10:00 AM
2 null 10:05 AM
3 null 10:10 AM
4 one 10:30 AM
5 null 11:00 AM
6 two 11:15 AM
7 two 11:30 AM
8 null 11:35 AM
9 null 11:40 AM
10 null 11:50 AM
11 null 12:00 AM
12 null 12:15 AM
13 two 12:30 AM
14 null 12:15 AM
15 null 12:25 AM
16 null 12:35 AM
17 three 12:55 AM
I want to create another column called prevField and fill it out with the last Field value that is not null, when the first and last entry around the null are the same. When the first and last entry around null are different, it should remain null. The result would look like the following:
Row Field DateTime prevField
1 one 10:00 AM null
2 null 10:05 AM one
3 null 10:10 AM one
4 one 10:30 AM one
5 null 11:00 AM null
6 two 11:15 AM two
7 two 11:30 AM two
8 null 11:35 AM two
9 null 11:40 AM two
10 null 11:50 AM two
11 null 12:00 AM two
12 null 12:15 AM two
13 two 12:30 AM two
14 null 12:15 AM null
15 null 12:15 AM null
16 null 12:15 AM null
17 three 12:15 AM three
So far i tried the following code variations for first part of the question (fill out prevField with the last Field value that is not null, when the first and last entry around the null are the same) but without success.
select Field, Datetime,
(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName
(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime
(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName
(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName
from table
EDIT: I added rows to the data and change row numbers
You can use following logic to achieve your goal.
Sample Data creation:
WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one' Field, '10:00 AM' DateTime
UNION ALL
SELECT 123, null, '10:05 AM'
UNION ALL
SELECT
123, null, '10:10 AM'
UNION ALL
SELECT
123 , 'one' , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))
Logic: The query grabs max and min for each field and also the lead and lag values for each row, based on that it determines the prevfield values.
SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a
This query partly solve my problem:
CREATE TEMP FUNCTION ToHex(x INT64) AS (
(SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
);
SELECT
DateTime
Field
, SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
FROM (
SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
FROM `xx.yy.zz`
);
I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!
I Have a OPL_Dates Table with Start Date and End Dates as Below:
dbo.OPL_Dates
ID Start_date End_date
--------------------------------------
12345 1975-01-01 2001-12-31
12345 1989-01-01 2004-12-31
12345 2005-01-01 NULL
12345 2007-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2013-02-07 NULL
12377 2010-01-01 2012-01-01
12489 2011-12-31 NULL
12489 2012-03-01 2012-04-01
The Output I am looking for is:
ID Start_date End_date
-------------------------------------
12345 1975-01-01 2004-12-31
12345 2005-01-01 NULL
12377 2009-06-01 2009-12-31
12377 2010-01-01 2012-01-01
12377 2013-02-07 NULL
12489 2011-12-31 NULL
Basically, I want to show the gap between the OPL periods(IF Any) else I need min of Start Date and Max of End Dates, for a particular ID.NULL means Open-Ended Date which can be converted to "9999-12-31".
The following pretty much does what you want:
with p as (
select v.*, sum(inc) over (partition by v.id order by v.dte) as running_inc
from t cross apply
(values (id, start_date, 1),
(id, coalesce(end_date, '2999-12-31'), -1)
) v(id, dte, inc)
)
select id, min(dte), max(dte)
from (select p.*, sum(case when running_inc = 0 then 1 else 0 end) over (partition by id order by dte desc) as grp
from p
) p
group by id, grp;
Note that it changes the "inifinite" end date from NULL to 2999-12-31. This is a convenience, because NULL orders first in SQL Server ascending sorts.
Here is a SQL Fiddle.
What is this doing? It is unpivoting the dates into a single column, with a 1/-1 flag (inc) indicating whether the record is a start or end. The running sum of this flag then indicates the groups that should be combined. When the running sum is 0, then a group has ended. To include the end date in the right group, a reverse running sum is needed -- but that's a detail.
please help me with getting streaks data. I have table of goal achievements
Table test
dt
2017-01-01
2017-01-02
2017-01-03. //3 days end of streak
2017-02-10 // 1 day
2017-02-15
2017-02-16
2017-02-17
2017-02-18 //4 days
I tried this in MySQL
Select dt, (select count(*) from test as t1 where t1.dt < t2.dt and datediff(t2.dt,t1.dt) =1) as str
from test as t2
And got
Dt str
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-02-10 0
2017-02-15 0
2017-02-16 1
2017-02-17 2
2017-02-18 3
Is it possible to get something like this
Dt. Str
2017-01-03 3
2017-02-10 1
2017-02-18 4
And get Max of it?
You can subtract row number (i.e the number of rows <= current row's date) from the current row's date to classify consecutive rows with one day difference into the same group. Then it is just a grouping operation to calculate the count.
select max(dt) as dt, count(*) as streak
from (select t1.dt
,date(t1.dt,-(select count(*) from t t2 where t2.dt<=t1.dt)||' day') as grp
from t t1
) t
group by grp
Run the inner query to see how groups are assigned.