months between date in the same column - sql

May I ask for some help?
I need to calculate the months between the order dates for the same product ID.
I have the following data set
ORDER_NUM PRODUCT_ID ORDER_DATE
111111 222222 2015-05-20 18:30:38
111112 222223 2015-12-03 19:25:23
111113 222224 2015-12-30 18:16:25
111114 222225 2015-10-30 12:32:06
111115 222226 2015-12-26 16:14:33
111116 222227 2016-03-08 10:23:39
111117 222224 2015-10-01 09:04:56
111118 222223 2015-04-21 11:48:03
111119 222228 2015-11-14 10:00:38
111120 222229 2016-03-22 10:42:32
111121 222230 2015-11-10 12:14:41
111122 222231 2015-11-24 10:05:40
111123 222222 2015-12-05 12:18:28
111124 222232 2015-12-07 11:23:53
111125 222233 2015-07-17 10:47:54
111126 222234 2016-02-08 11:59:30
111127 222235 2015-11-08 15:40:08
111128 222223 2015-09-24 11:16:03
111129 222236 2015-11-09 12:30:04
where ORDER_NUM is unique value, PRODUCT_ID may appear many times and time also.
I need the result to be like:
ORDER_NUM PRODUCT_ID MONTHS_BETWEEN
111111 222222 0
111112 222223 2
111113 222224 3
111114 222225 0
111115 222226 0
111116 222227 0
111117 222224 0
111118 222223 0
111119 222228 0
111120 222229 0
111121 222230 0
111122 222231 0
111123 222222 7
111124 222232 0
111125 222233 0
111126 222234 0
111127 222235 0
111128 222223 5
111129 222236 0
The first appearance of PRODUCT_ID should have “0” value in MONTHS_BETWEEN and each next should have value the months between the current and the previous.
I am not sure that I managed to explain very well …
Please help…

You can use months_between() and lead():
select t.*,
months_between(lead(order_date() over (partition by product_id order by order_date)),
order_date
) as MonthsBetween
from t;
Notes:
This returns a number with decimal places. You might want to use trunc() or round() to get an integer.
This returns NULL when there is no "next" order. You can use COALESCE() to convert that to 0 (or something else) if you like.
To be honest, I can't tell if you want lead() or lag() (time to the next order or from the previous one). Your data is not ordered by date, making it hard to figure out the right ordering. But, you want one or the other.

Related

Rolling On-Hand Remainder column?

CONumber, LineNumber, PartNumber, OrderQty, ScheduleDate, OnHandQty columns are a pure SELECT query with no transformations. I am trying to recreate the RollingOnHand column in SQL.
The rules are
If a part only has one row, report the real [OnHandQty]
If a part has multiple rows, the oldest order consumes its [OrderQty] from [OnHandQty]
The next oldest order pulls its [OrderQty] from the remaining [OnHandQty], repeat until final row of the matching part
The last row of a given part will display the remaining [OnHandQty]
Is this possible to accomplish in an SQL query?
CONumber
LineNumber
PartNumber
OrderQty
ScheduleDate
OnHandQty
RollingOnHand
C02959
00002
Part 01
102
2022-04-01
0
0
C04017
00001
Part 02
2007
2022-04-01
5099
5099
C04107
00001
Part 03
1
2022-03-09
0
0
C04106
00001
Part 04
1
2022-03-09
0
0
C04108
00001
Part 05
1
2022-03-09
0
0
C03514
00002
Part 06
250
2022-03-11
310
250
C03514
00003
Part 06
250
2022-03-18
310
60
C03757
00001
Part 06
250
2022-04-06
310
0
C04225
00002
Part 07
40
2022-03-31
53
53
C03965
00002
Part 08
24
2022-04-04
0
0
C04034
00001
Part 09
88
2022-03-18
128
128
C04144
00002
Part 10
22
2022-04-04
0
0
C04141
00001
Part 10
100
2022-04-04
0
0
C03734
00003
Part 11
116
2022-03-29
103
103
C03379
00001
Part 12
128
2022-03-07
19
19
C03344
00003
Part 13
40
2022-03-11
5
5
C04058
00001
Part 14
407
2022-03-25
0
0
C03697
00002
Part 15
436
2022-04-04
235
235
C03689
00002
Part 16
111
2022-03-16
87
87
C03690
00001
Part 16
250
2022-03-23
87
0
C03690
00002
Part 16
250
2022-04-06
87
0
C03240
00004
Part 17
3
2022-03-16
30
3
C03725
00001
Part 17
250
2022-03-16
30
27
C03725
00002
Part 17
250
2022-03-23
30
0
C03726
00001
Part 17
250
2022-04-01
30
0
C03726
00002
Part 17
250
2022-04-06
30
0
C03596
00017
Part 18
56
2022-04-06
344
344
C03927
00001
Part 19
600
2022-04-04
1800
600
C03927
00002
Part 19
1000
2022-04-06
1800
1200
I think this basically does what you need (Fiddle)
WITH T AS
(
SELECT *,
AlreadyConsumed = SUM(OrderQty) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
PrevLineNumber = LAG([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC),
NextLineNumber = LEAD([LineNumber]) OVER (PARTITION BY [PartNumber] ORDER BY ScheduleDate ASC)
FROM Demo
)
SELECT CONumber,
LineNumber,
PartNumber,
OrderQty,
ScheduleDate,
OnHandQty,
RollingOnHand = CASE
--If a part only has one row, report the real [OnHandQty]
WHEN PrevLineNumber IS NULL
AND NextLineNumber IS NULL THEN OnHandQty
--Not the last row and won't use all the remainder up
WHEN NextLineNumber IS NOT NULL AND Remainder > OrderQty THEN OrderQty
--otherwise use what's left
ELSE Remainder
END
FROM T
CROSS APPLY (SELECT CASE WHEN AlreadyConsumed > OnHandQty THEN 0 ELSE OnHandQty - ISNULL(AlreadyConsumed,0) END) C(Remainder)
The
SUM ... PARTITION BY [PartNumber] ... ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING computes the cumulative OrderQty for all rows before the current row (not including it)
The LAG/ LEAD results are used as indicators to determine whether we are in the first/last rows of a partition and special logic is needed.
I didn't quite follow the rationale behind the business logic so I may have made some invalid simplifications but it returns the desired results with the sample data and anyway the query should be easy to tweak if needed.

combine two rows with 2 months into one row of one month, containing null values into one

I would like to have a dataframe where 1 row only contains one month of data.
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 null null
2019-10-01 1 null 15 210 210
2019-11-01 1 27 42 null 210
2019-11-01 1 null 42 369 579
Expected result:
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 210 210
2019-11-01 1 27 42 369 579
At first, I thought a normal groupby will work, but as I try to group by only by "month" and "cust_id", I got an error saying that closed_deals and checkout also need to be in the groupby.
You may simply aggregate by the (first of the) month and cust_id and take the max of all other columns:
SELECT
month,
cust_id,
MAX(closed_deals) AS closed_deals,
MAX(cum_closed_deals) AS cum_closed_deals,
MAX(checkout) AS checkout,
MAX(cum_checkout) AS cum_checkout
FROM yourTable
GROUP BY
month,
cust_id;

Getting a count by date based on the number of observations with encompassing date ranges

I am working with a table in Microsoft Access whereby I have 2 columns with a start and end date.
I want to get the count by date of the number of rows with date ranges that encompass the date in the output table.
Input Data
Start Date End Date
01/02/2017 03/02/2017
07/02/2017 19/02/2017
09/02/2017 19/02/2017
11/02/2017 12/02/2017
12/02/2017 17/02/2017
Desired Output
Date Count
01/02/2017 1
02/02/2017 1
03/02/2017 1
04/02/2017 0
05/02/2017 0
06/02/2017 0
07/02/2017 1
08/02/2017 1
09/02/2017 2
10/02/2017 2
11/02/2017 3
12/02/2017 4
13/02/2017 3
14/02/2017 3
15/02/2017 3
16/02/2017 3
17/02/2017 3
18/02/2017 2
19/02/2017 2
20/02/2017 0
For this project, I have to use Microsoft Access 2010, so a solution in either SQL code or design view input would be great.
Any help on this would be appreciated. Thanks!
Use the below query to get the required result. You can also change the column with respect to your requirements
SELECT END_DATE AS DATE, COUNT(*) AS COUNT FROM TABLE_NAME
GROUP BY END_DATE ORDER BY END_DATE;

SQL - Creating a timeline for each ID (Vertica)

I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!
EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.

MDX to sum over months with non-empty measure value for the month

This has been stumping me and I'm not sure why it is so difficult.
I have a measure that has empty values up until a certain point in time, and then starts having values. I want to get a monthly average, but only on those months that actually have a non-empty value. I also want my time range to be fixed for the query regardless of which months have values (for example, across the whole year)
Here is one variation of MDX that I tried:
WITH
MEMBER Measures.MonthsWithSales AS
(IIF( IsEmpty(([Time].[Month].CurrentMember,[Measures].[ProductsSold])), 0, [Measures].[MonthCount]))
MEMBER Measures.AvgProductsSold AS
[Measures].[ProductsSold] /Measures.MonthsWithSales
SELECT
{
[Measures].[ProductsSold], [Measures].[MonthCount],
[Measures].[MonthsWithSales], [Measures].[AvgProductsSold]
} ON 0,
[Time].[Month].Members ON 1
FROM MyCube
WHERE [Time].[Year].&[2010-01-01T00:00:00]
which returns something like this:
ProductsSold MonthCount MonthsWithSales AvgProductsSold
All 1644 12 **12** **137**
2010-01-01 00:00:00.000 (null) 1 0 (null)
2010-02-01 00:00:00.000 (null) 1 0 (null)
2010-03-01 00:00:00.000 (null) 1 0 (null)
2010-04-01 00:00:00.000 (null) 1 0 (null)
2010-05-01 00:00:00.000 (null) 1 0 (null)
2010-06-01 00:00:00.000 234 1 1 234
2010-07-01 00:00:00.000 237 1 1 237
2010-08-01 00:00:00.000 236 1 1 236
2010-09-01 00:00:00.000 232 1 1 232
2010-10-01 00:00:00.000 232 1 1 232
2010-11-01 00:00:00.000 233 1 1 233
2010-12-01 00:00:00.000 240 1 1 240
The problem is on the ALL row.
I expect that the MonthsWithSales across the whole year returns 7 not 12
and that AvgProductsSold (per month with sales) is 234.86 not 137.
I realize that it's not doing what I want because it's using the MonthCount at the ALL level. But I do not know how to "sink into" the "per month dimension" to sum up the MonthCount only on the relevant months when it is calculating the "ALL".
I assumed you have 2 levels on the month hierarchy: one with the All member and one for the months.
MEMBER Measures.AvgProductsSold AS
IIf([Time].[Month].CurrentMember.Level.Ordinal = 0
, Avg([Time].[Month].CurrentMember.Children, [Measures].[ProductsSold])
, [Measures].[ProductsSold])
(You may have to replace [Time].[Month].CurrentMember.Children with [Time].[Month].Members)
The Avg function computes the average on the non empty values.
Here is the query I am likely to end up using, properly using my time hierarchy:
WITH
MEMBER [Measures].[MonthsWithSales]
AS
COUNT
(
FILTER
(
DESCENDANTS([Time].[YQMD].CurrentMember,[Time].[YQMD].[Month]),
NOT ISEMPTY([Measures].[ProductsSold])
)
)
MEMBER
[Measures].[AvgProductsSold]
AS
[Measures].[ProductsSold]/[Measures].[MonthsWithSales]
SELECT
{
[Measures].[ProductsSold],
[Measures].[MonthsWithSales],
[Measures].[AvgProductsSold]
} ON 0,
[Time].[Month].Members ON 1
FROM MyCube
The [YQMD] is a time hierarchy with Levels: 1 Year, 2 Quarter, 3 Month, 4 Date