Teradata SQL code to find count between eff start and end date - sql

I have a dataset that has 5 columns. Each account can have multiple rows. I need to group the data by C2 and Monthyear and find counts
ACC_ID, C1 , C2, EFF_START_DATE, EFF_END_DATE
111 , 0 , A , 2018-01-01, 2499-12-31
222 , 0 , A , 2018-02-15 , 2018-03-15
222 , 0 , B , 2018-03-16, 2499-12-31
333 , 0, A, 2000-01-01, 2499-12-31
I need to group this by months and find count for each month. So if someone has 2018-01-01 as EFF_STA_DTE and 2499-12-31 as EFF_END_DATE. They should be a part of all the months starting 2018.
Similarly if someone has 2018-02-15 as EFF_STA_DTE and 2018-03-15 as EFF_END_DATE their count should only reflect for Feb and March 2018.
Also I am only trying to get a count starting 2018 even if eff_start_Date is in past. So 333 in above case will have count 1 in 2018 and henceforth
Tried to extract Month year and do the count based on eff_start_Date but that is giving incorrect result.
Expected Output in above case
MONTH, C2, COUNT
JAN-18, A, 2. -- FOR ACCOUNT 111 ,333
FEB-18, A , 3. -- FOR ACCOUNT 111,222,333
MARCH-18, A, 1 -- FOR ACCOUNT 111,222,333
MARCH-18, B, 1. -- FOR ACCOUNT 222

The most efficient way utilizes Teradata's EXPAND ON extension to Standard SQL:
WITH cte AS
(
SELECT -- first of month
Trunc(BEGIN(pd), 'mon') AS mon
,C2
FROM tab
-- create a period on-the-fly, adjust the end date as periods exclude the end
EXPAND ON PERIOD(EFF_START_DATE, Next(EFF_END_DATE)) AS pd
-- return one row per month
BY ANCHOR PERIOD MONTH_BEGIN
-- restrict output to a specifc range
FOR PERIOD (date '2018-01-01', date '2018-03-31')
)
SELECT mon, C2, Count(*)
FROM cte
GROUP BY 1,2
ORDER BY 1,2
;

Related

create additional date after and before current row and create new column based on it

Lets say I have this kind of data
create table example
(cust_id VARCHAR, product VARCHAR, price float, datetime varchar);
insert into example (cust_id, product, price, datetime)
VALUES
('1', 'scooter', 2000, '2022-01-10'),
('1', 'skateboard', 1500, '2022-01-20'),
('1', 'beefmeat', 300, '2022-06-08'),
('2', 'wallet', 200, '2022-02-25'),
('2', 'hairdryer', 250, '2022-04-28'),
('3', 'skateboard', 1600, '2022-03-29')
I want to make some kind of additional rows, and after that make new column based on this additional rows
My expectation output will like this
cust_id
total_price
date
is_active
1
3500
2022-01
active
1
0
2022-02
active
1
0
2022-03
active
1
0
2022-04
inactive
1
0
2022-05
inactive
1
300
2022-06
active
1
0
2022-07
active
2
0
2022-01
inactive
2
200
2022-02
active
2
0
2022-03
active
2
250
2022-04
active
2
0
2022-05
active
2
0
2022-06
active
2
0
2022-07
inactive
3
0
2022-01
inactive
3
0
2022-02
inactive
3
1600
2022-03
active
3
0
2022-04
active
3
0
2022-05
active
3
0
2022-06
inactive
3
0
2022-07
inactive
the rules is like this
the first month when the customer make transaction is called active, before this transaction called inactive.
ex: first transaction in month 2, then month 2 is active, month 1 is inactive (look cust_id 2 and 3)
if more than 2 months there isnt transaction, the next month is inactive until there is new transaction is active.
ex: if last transaction in month 1, then month 2 and month 3 is inactive, and month 4, month 5 inactive if month 6 there is new transaction (look cust_id 1 and 3)
well my first thought is used this code, but I dont know what the next step after it
select *,
date_part('month', age(to_date(date, 'YYYY-MM'), to_date(lag(date) over (partition by cust_id order by date),'YYYY-MM')))date_diff
from(
select
cust_id,
sum(price)total_price,
to_char(to_date(datetime, 'YYYY-MM-DD'),'YYYY-MM')date
from example
group BY
cust_id,
date
order by
cust_id,
date)test
I'm open to any suggestion
Try the following, an explanation within query comments:
/* use generate_series to generate a series of dates
starting from the min date of datetime up to the
max datetime with one-month intervals, then do a
cross join with the distinct cust_id to map each cust_id
to each generated date.*/
WITH cust_dates AS
(
SELECT EX.cust_id, to_char(dts, 'YYYY-mm') dts
FROM generate_series
(
(SELECT MIN(datetime)::timestamp FROM example),
(SELECT MAX(datetime)::timestamp + '2 month'::interval FROM example),
'1 month'::interval
) dts
CROSS JOIN (SELECT DISTINCT cust_id FROM example) EX
),
/* do a left join with your table to find prices
for each cust_id/ month, and aggregate for cust_id, month_date
to find the sum of prices for each cust_id, month_date.
*/
monthly_price AS
(
SELECT CD.cust_id,
CD.dts AS month_date,
COALESCE(SUM(price), 0) total_price
FROM cust_dates CD LEFT JOIN example EX
ON CD.cust_id = EX.cust_id AND
CD.dts = to_char(EX.datetime, 'YYYY-mm')
GROUP BY CD.cust_id, CD.dts
)
/* Now, we have the sum of monthly prices for each cust_id,
we can use the max window function with "ROWS BETWEEN 2 PRECEDING AND CURRENT ROW"
to check if one of the (current month or the previous two months) has a sum of prices > 0.
*/
SELECT cust_id, month_date, total_price,
CASE MAX(total_price) OVER
(PARTITION BY cust_id ORDER BY month_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
WHEN 0 THEN 'inactive'
ELSE 'active'
END AS is_active
FROM monthly_price
ORDER BY cust_id, month_date
See demo

Show ID repetition with respect to month in SQL -- assign ordinal numbers to rows

I want to display a count number (ordinal number) of id with respect to month.
The ordinal number should only be provided if the INV value is greater then ZERO.
IF(INV > 0) for a given row, it will gain an ordinal number.
The 1st row below has Zero inv, therefore it has null as a result. The
2nd row, as id-1 appears for the 1st time with inv > 0 it gives a result as 1st. And so-on with the following row, showing 2nd.
Same with id-2 came in the next row. It is showing result as 1st and so on.
Could you please advise how to achieve this in SQL.
ID INV Dates ExpectedResult
1 0 2017/01/01 Null
1 1 2017/02/01 1st
1 2 2017/03/01 2nd
2 5 2016/05/01 1st
3 10 2017/01/01 1st
2 0 2016/04/01 Null
5 2 2017/01/01 1st
2 5 2017/01/01 2nd
2 2 2017/10/01 3rd
Insert into abc values(1,0,2017/01/01)
Insert into abc values(1,1,2017/02/01)
Insert into abc values(1,2,2017/03/01)
Insert into abc values(2,5,2016/05/01)
Insert into abc values(3,10,2017/01/01)
Insert into abc values(2,0,2016/04/01)
Insert into abc values(5,2,2017/01/01)
Insert into abc values(2,5,2017/01/01)
Insert into abc values(2,2,2017/10/01)
Try this:
select ID, DATEPART(year, Dates), DATEPART(month, Dates), COUNT(*)
from MY_TABLE
where INV > 0
group by ID, DATEPART(year, Dates), DATEPART(month, Dates)
with abc as
(
select
*
from (values
(1,0,'2017/01/01')
,(1,1,'2017/02/01')
,(1,2,'2017/03/01')
,(2,5,'2016/05/01')
,(3,10,'2017/01/01')
,(2,0,'2016/04/01')
,(5,2,'2017/01/01')
,(2,5,'2017/01/01')
,(2,2,'2017/10/01')
) a(ID , INV , Dates )
)
select
*
,case when inv=0 then null else
row_number() over (partition by id,case when inv=0 then null else 1 end order by dates)
end nr
from abc
order by id,dates
You have to add a function to convert from 1,2,3 to 1st, 2nd, 3rd etc
see for instance:
How to create ordinal numbers (i.e. "1st" "2nd", etc.) in SQL
You can try this.
select *,
CASE WHEN INV > 0 THEN
ROW_NUMBER() OVER(PARTITION BY ( CASE WHEN INV > 0 THEN ID END ) ORDER BY Dates, INV )
END ExpectedResult
from abc

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)

Totals over rolling timeframe

I have my data arranged like this:
obj_id quantity date
1 3 2014-05-06
2 2 2014-03-12
3 5 2014-10-07
4 7 2014-05-09
2 8 2014-12-31
1 5 2014-01-16
4 1 2014-07-26
3 2 2014-09-15
...
What I need is to find the OBJ_ID's that have the SUM(quantity) > MAX over the period of RANGE days.
In my case MAX is 18 and RANGE is 31 days.
In other words, every given OBJ_ID recieves QUANTITY (no matter of what) from time to time. I need to find OBJ_IDs that had received in total more than 18 and dates that this OBJ_ID recieved Qs span over less than 31 days. Doh.)
I think I need to use LAG here, but not sure how the whole thing should be.
Thanks in advance.
This might need some tweaking as I didn't have the time to decently test it, but maybe it'll get you on the right track:
(I've assumed you want the records where the date is within the last 31 days)
SELECT SUM(quantity)
FROM tblTable
WHERE date between DATEADD(day, -RANGE, GETDATE()) and GETDATE()
HAVING SUM(quantity) > MAX
GROUP BY obj_id
I'm currently testing a solution a colleague of mine has quickly put together:
SELECT A.*
FROM (
SELECT A.obj_id
, A.date
, A.in_month_date
, A.date - A.in_month_date AS in_month
, A.quantity
, A.in_month_quantity
FROM (
SELECT A.obj_id
, A.date
, FIRST_VALUE(A.date)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_date
, A.quantity
, SUM(A.quantity)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_quantity
FROM mytable A
) A
) A
WHERE A.in_month <= 31
AND A.in_month_quantity > 18

How to make a time dependent distribution in SQL?

I have an SQL Table in which I keep project information coming from primavera.
Suppose that i have columns for Start Date,End Date,Duration, and Total Qty as shown below .
How can i distribute Total Qty over Months using these information. What kind of additional columns, sql queries i need in order to get correct monthly distribution?
Thanks in Advance.
Columns in order:
itemname,quantity,startdate,duration,enddate
item1 -- 108 -- 2013-03-25 -- 720 -- 2013-07-26
item2 -- 640 -- 2013-03-25 -- 720 -- 2013-07-26
.
.
I think the key is to break the records apart by month. Here is an example of how to do it:
with months as (
select 1 as mon union all select 2 union all select 3 union all
select 4 as mon union all select 5 union all select 6 union all
select 7 as mon union all select 8 union all select 9 union all
select 10 as mon union all select 11 union all select 12
)
select item, m.mon, quantity / nummonths
from (select t.*, (month(enddate) - month(startdate) + 1) as nummonths
from t
) t join
months m
on month(t.startDate) <= m.mon and
months(t.endDate) >= m.mon;
This works because all the months are within the same year -- as in your example. You are quite vague on how the split should be calculated. So, I assumed that every month from the start to the end gets an equal amount.