I have a table where every row is transaction and there are few columns: clients IDs and dates for every transaction.
I am trying to write a query which will give a table where column N shows number of clients whose first transaction happened in month N made transactions in months: N, N+1, N+2, ...
For example (desired table for 3 months data):
1 2 3
100 90 78
80 80
60
First row of the column 1 shows number of clients whose first transaction happened in month 1, second row shows how many of this clients stayed after 1 month, third row - after two month etc
My current query (Year is a column wit year for the date, like 2017, month is a number of month like 1 for January):
WITH not_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date<date "2017-01-01"),
ID_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date BETWEEN date "2017-01-01" AND date "2017-01-31"
),
from_this AS(
SELECT ID, Year, month
FROM table
)
SELECT Year, Month, count(distinct ID)
FROM from_this
WHERE ID IN (select ID from ID_in)
AND
ID NOT IN (select ID from not_in)
GROUP BY 1,2
ORDER BY 1,2
But this gives only one column (for January 2017) of the desired table. I need to change dates for other months in 2017, 2018 and so on manually.
How to avoid this?
I guess, it should be looped somehow. And I think, I should create volatile table and add columns to it within loop, then select * from it.
Also I can not find an instruction for variables declaration and while loops in Teradata, any clearifications are appreciated.
Related
Context
Using Presto syntax, I'm trying to create an output table that has rolling totals of an 'amount' column value for each day in a month. In each row there will also be a column with a rolling total for the previous month, and also a column with the difference between the totals.
Output Requirements
completed: create month_to_date_amount column that stores rolling total from
sum of amount column. The range for the rolling total is between 1st of month and current row date column value. Restart rolling
total each month. I already have a working query below that creates this column.
SELECT
*,
SUM(amount) OVER (
PARTITION BY
team,
month_id
ORDER BY
date ASC
) month_to_date_amount
FROM (
SELECT -- this subquery is required to handle duplicate dates
date,
SUM(amount) AS amount,
team,
month_id
FROM input_table
GROUP BY
date,
team,
month_id
) AS t
create prev_month_to_date_amount column that:
a. stores previous months rolling amount for the current rows date and team and add to same
output row.
b. Return 0 if there is no record matching the previous month date. (Ex. Prev months date for March 31 is Feb 31 so does not exist). Also a record will not exist for days that have no amount values. Example output table is below.
create movement column that stores the difference
amount between month_to_date_amount column and
prev_month_to_date_amount column from current row.
Question
Could someone assist with my 2nd and 3rd requirements above to achieve my desired output shown below? By either adding on to my current query above, or creating another more efficient one if necessary. A solution with multiple queries is fine.
Input Table
team
date
amount
month_id
A
2022-04-01
1
2022-04
A
2022-04-01
1
2022-04
A
2022-04-02
1
2022-04
B
2022-04-01
3
2022-04
B
2022-04-02
3
2022-04
B
2022-05-01
4
2022-05
B
2022-05-02
4
2022-05
C
2022-05-01
1
2022-05
C
2022-05-02
1
2022-05
C
2022-06-01
5
2022-06
C
2022-06-02
5
2022-06
This answer is a good example of using the window function LAG. In summary the query partitions the data by Team and Day of Month, and uses LAG to get the previous months amount and calculate the movement value.
e.g. for Team B data. The window function will create two partition sets: one with the Team B 01/04/2022 and 01/05/2022 rows, and one with the Team B 02/04/2022 and 02/05/2022 rows, order each partition set by date. Then for each set for each row, use LAG to get the data from the previous row (if one exists) to enable calculation of the movement and retrieve the previous months amount.
I hope this helps.
;with
totals
as
(
select
*,
sum(amount) over(
partition by team, month_id
order by date, team) monthToDateAmount
from
( select
date,
sum(amount) as amount,
team,
month_id
from input_table
group by
date,
team,
month_id
) as x
),
totalsWithMovement
as
(
select
*,
monthToDateAmount
- coalesce(lag(monthToDateAmount) over(
partition by team,day(date(date))
order by team, date),0)
as movement,
coalesce(lag(monthToDateAmount) over
(partition by team, day(date(date))
order by team,month_id),0)
as prevMonthToDateAmount
from
totals
)
select
date, amount, team, monthToDateAmount,
prevMonthToDateAmount, movement
from
totalswithmovement
order by
team, date;
I have a historical database with about 9000 records with unique UserID and date they created an account CreatedDate that looks like this:
UserID CreatedDate
1 5/12/2019
2 1/1/2018
3 4/2/2015
4 8/9/2016
. ..
I would like to know how many accounts were created UP TO a certain date, but for multiple months.
For example, how many accounts were there in Jan 2020, Feb 2020, Mar 2020, so on and so forth.
The manual way would be to do this for each month but it would be tedious:
select count(*)
from SCHEMA
--KEEP REPLACING THE MONTH TO GET COUNTS
where CreatedDate <= '2020-01-31'
Just wondering if there is a more efficient way? A group by wouldn't work because it just totals for each month, but I'm trying to get a historical count. Thanks!
You seem to need running total for each month. If so, you need group by to compute total counts per month and then you have to sum them using analytical sum function.
This is how you would do it in Postgres (db fiddle). Other vendors may differ in the way how month is extracted but the principle is same.
with schema(UserID, CreatedDate) as (values
(1, date '2019-12-05'),
(2, date '2018-01-01'),
(3, date '2015-01-04'),
(4, date '2016-09-08')
)
select month, sum(cnt) over (order by month) from (
select date_trunc('month', CreatedDate)::date as month, count(*) as cnt
from schema
group by date_trunc('month', CreatedDate)::date
) x
Note if data has gaps in month sequence and you want continuous sequence (for example all months between 2015-01 and 2019-12), you have to pregenerate calendar (relation with all months) and left join table schema to it. (It is not in my example yet because of YAGNI.)
How to count the previous 24 months from a current row based on principalreliefflag(It should be based on each row (each month with previous 24 months). So count principal flag(Y) from current row to previous 24 months.)?
Data I have:
Data I need:
Code:
-------------------------------------------------------------------------
----Identify customers who are on principal relief more than one month
-------------------------------------------------------------------------
IF OBJECT_ID('tempdb..##PRappliedmorethan_once') IS NOT NULL
DROP TABLE ##PRappliedmorethan_once
select *,
case when PrincipalReliefFlag='Y' then 1 else 0
end PR_applied_months
into ##PRappliedmorethan_once
from ##TL_details_dates2
IF OBJECT_ID('tempdb..##PRappliedmorethan_once1') IS NOT NULL
DROP TABLE ##PRappliedmorethan_once1
select *,
--Identify customers who have applied for principal relief within the past 12 months
sum(PR_applied_months) over (partition by productitemcode, productcode
order by dim_snapshotdate_key
rows between 23 preceding and current row
)
abcd
into ##PRappliedmorethan_once1
from ##PRappliedmorethan_once
It works but is there any better way?
Try using the DATEDIFF() function to check the last 24 months e.g. :
SELECT COUNT(*) FROM YOUR_TABLE
WHERE DATEDIFF(month, current_row_date, date_to_be_compared) <= 24
I have an Audit table for each and every day. All add/modify/delete records are stored. When any record is deleted it doesn’t show up the next day. Something like below.
Date records
---- --------
15th 100
16th 102 - Pickup all records, between 15 and 16, which are not in 16th
17th 110 - Pickup all records, between 16 and 17, which are not in 17th
18th 150 - Pickup all records, between 17 and 18, which are not in 18th
.. So on..
This is an Audit table which has the deleted records in the previous day, but not present today. I need to pick up all the deleted records, between dates.
But I don’t want to hard code the dates, instead, it should work from date to today()
How to achieve this in a single SQL query? I tried using “Union” it works, but with hardcoded dates. Is there any way we can achieve as a generic query which works as of today.
You can use two levels of aggregation. The first gets the maximum date for each id. The second records on the delete on the next day:
select max_date + interval 1 day, count(*)
from (select a.id, max(date) as max_date
from audit a
group by a.id
) t
group by max_date
order by max_date;
You might want a where clause to limit the maximum date to before the maximum date in the data (otherwise everything will look like it is being deleted on the following day).
An alternative method uses lead():
select date + 1, count(*)
from (select a.*,
lead(date) over (partition by id order by date) as next_date
from audit a
) t
where next_date <> date_add(date, INTERVAL 1 DAY) or next_date is null
group by date
order by date;
If records can be resurrected and you still want to count them as deleted when they disappear, this is the better method.
Here is a db<>fiddle.
select
id
,id_name
, MAX(last_login_date)
, SUM(transaction_count)
, mAX(last_transaction_date)
from sales;
hi I am looking for the results to only include a transaction count for the sales made in the last 12 months. what can I do?
I have max and sum because there are multiple instances of the same ids so they are not unique.
I don't have individual transaction dates. I only have a last transaction date field
You may use months_between function to have 12 months directly :
select id,id_name, MAX(last_login_date), SUM(transaction_count), mAX(last_transaction_date)
from sales
where months_between(trunc(sysdate),last_transaction_date) <= 12
group by id, id_name;
if you need to select all transactions in a month (with current month), you can use this construction:
select id
, id_name
, Max(last_login_date)
, Sum(transaction_count)
, Max(last_transaction_date)
from sales
where last_transaction_date >= add_months(trunc(sysdate,'mm'),-11)
group by id, id_name;