SQL Code How to do iterations in historical table - sql

I need help on SQL
I have a historical table named A. It has month ID, srvc key, etc.
I need to check if a custkey is a new customer in that table A. The logic is - to see if that cust key is new for the current month ID and does not exist prior months (less than the current month ID).
To illustrate,
My current month ID = Feb2022
The cust key MUST exist in Feb2022 BUT not in Jan 2022, Dec2021,.., and so on..
Also, is it possible to tag if a cust key exist in Feb 2022 and Jan 2022 BUT not in Dec 2021, and so on..
select A.\*,B.level_1, B.level_2, B.level_3, B.LE,
case when cust_key in ('2100707688',
'1xxx4',
'1xxxx',
'28xxxx1',
'2xxxxxx',
) then 'New' else 'Old' end as Tag,
A.NET_AMT/(nullif(A.prod_cnt,0)\*B.LE) as ARPU
Hi #NickW,
thanks for responding, what I need is it from sample historical table below, I need to tag CNumber that are new for the current month (202202). They
are new because CNumber2 didnt appear for 202201,202112,20211. I dont care if it appeared 202110 and less. I care only about CNumber which didnt appear
last 3 months.
Cnumber MonthID
1 202202
1 202201
1 202112
1 202111
2 202202
2 202105
2 202104
2 202103
2 202102
2 202101
3 202202
3 202201
3 202112
3 202111
3 202110
3 202109
Based on this sample, Only CNumber 2 satisfies this rule since it appeared on 202202 but not in 202201 202112 202111.
Next, I would want to tag also CNumber who is new for Jan2022.
In this case, current monthID = 202201. Now, that CNumber should not appear in 202112,20211,202110 to be able to say it is New.
Next, want to tag also CNumber who is new for Dec 2022. Now, that CNumber should not appear in 20211,202110,202109 to be able to tell that they are new.
And so on..
My goal is to tag customers on when did they first appear in the historical table via Month ID. I am assuming that that is their booking date. So in a table, my goal is to see a column that is named as booking date.

We can use a cte to get the month of the first entry for the account. With that we can compare and calculate as needed.
create table sales(
cnumber int,
salesDate date);
insert into sales values
(1,'2021-11-15'),
(1,'2021-12-15'),
(1,'2022-01-15'),
(1,'2022-02-15'),
(2,'2022-02-15');
with cre as (
select
cnumber cnum,
DATE_FORMAT(min(salesDate),
'%Y-%m-01') monCre
from sales
group by
cnumber),
salesMonth as(
select
DATE_FORMAT(salesDate,
'%Y-%m-01') as mon,
cnumber cust
from sales
group by
cnumber,
mon)
select
cust customer,
mon "month",
case when mon = monCre
then 'new' else 'existing' end
as "status",
TIMESTAMPDIFF(MONTH,monCre ,mon)
as "account Age"
from salesMonth
join cre on cust = cnum
order by cust, mon;
customer | month | status | account Age
-------: | :--------- | :------- | ----------:
1 | 2021-11-01 | new | 0
1 | 2021-12-01 | existing | 1
1 | 2022-01-01 | existing | 2
1 | 2022-02-01 | existing | 3
2 | 2022-02-01 | new | 0
db<>fiddle here

Related

How to get the last day of the month without LAST_DAY() or EOMONTH()?

I have a table t with:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-29
1
123
10
2021-10-30
1
123
9
2021-10-31
1
123
8
2021-10-29
1
456
100
2021-10-30
1
456
90
2021-10-31
1
456
80
2021-10-29
2
123
18
2021-10-30
2
123
17
2021-11-29
2
456
18
I need to find the AMOUNT of each PRODUCT_ID for each combination of LOCATION + PRODUCT_ID.
If a PRODUCT_ID has no entry for that day the AMOUNT is NULL.
So the result should look like:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-31
1
123
8
2021-10-31
1
456
80
2021-10-31
2
123
NULL
2021-11-30
2
456
NULL
Sadly EXASOL has no LAST_DAY() or EOMONTH() function. How can I solve this?
You can get to the last day of the month using a date_trunc function in combination with date_add:
case
when t.date = date_add('day', -1, date_add('month', 1, date_trunc('month', t.date)))
then 'Y' else 'N' end as end_of_month
That being said, if you group your table for all combinations of locations and products, you will not get NULLs for products without sales on the last day of the month as shown in your output table.
When you group your data, any value that does not exist will simply not show up in your output table. If you want to force nulls to show up, you can create a new table that contains all combinations of products, locations, and hard-coded end of month dates.
Then, you can left join your old table with this new hard-coded table by date, location, and product. This method will give you the NULL values you expect.

PostgreSQL - How to get month/year even if there are no records within that date?

What I'm trying to do in this case is to get the ''most future'' record of a Bills table and get all the record prior 13 months from that last record, so what I've tried is something like this
SELECT
users.name,
EXTRACT(month from priority_date) as month,
EXTRACT(year from priority_date) as year,
SUM("money_balance") as "money_balance"
FROM bills
JOIN users on users.id = bills.user_id
WHERE priority_date >= ( SELECT
DATE_TRUNC('month', MAX(debts.priority_date))
FROM bills
INNER JOIN users ON bills.property_id = users.id
WHERE users.company_id = 15
AND users.active = true
AND bills.paid = false ) - interval '13 month'
AND priority_date <= ( SELECT
MAX(bills.priority_date)
FROM bills
INNER JOIN users ON bills.property_id = users.id
WHERE users.community_id = 15
AND users.active = true
AND debts.paid = false )
AND users.company_id = 15
AND bills.paid = false
AND users.active = true
GROUP BY 1,2,3
ORDER BY year, month
So for instance, lets say the most future date for a created bill is December 2022, this query will give me the info from November 2021 to December 2022
The data will give me something like
name
month
year
money_balance
Joshua..
11
2021
300
Joshua..
1
2022
111
Mark..
1
2022
200
...
...
...
...
John
12
2022
399
In the case of Joshua, because he had no bills to pay in December 2021, it doesn't return anything for that month/year.
Is it possible to return the months/year where there are no records for that month, for each user?
Something like
name
month
year
money_balance
Joshua..
11
2021
300
Joshua..
12
2021
0
Joshua..
1
2022
111
other users
....
...
...
Thank you so much!
We can use a CTE to create the list of months, using the maximum and minimum dates from bill, and then cross join it onto users to get a line for all users for all months. We then left join onto bills to populate the last column.
The problem with this approach is that we can end up with a lot of rows with no value.
create table bills(user_id int,priority_date date, money_balance int);
create table users(id int, name varchar(25));
insert into users values(1,'Joshua'),(2,'Mark'),(3,'John');
insert into bills values(1,'2021-11-01',300),(1,'2022-01-01',111),(2,'2022-01-01',200),(3,'2021-12-01',399);
;with months as
(SELECT to_char(generate_series(min(priority_date), max(priority_date), '1 month'), 'Mon-YY') AS "Mon-YY"
from bills)
SELECT
u.name,
"Mon-YY",
--EXTRACT(month from "Mon-YY") as month,
--EXTRACT(year from "Mon-YY") as year,
SUM("money_balance") as "money_balance"
FROM months m
CROSS JOIN users u
LEFT JOIN bills b
ON u.id = b.user_id
AND to_char(priority_date,'Mon-YY') = m."Mon-YY"
GROUP BY
u.name,
"Mon-YY"
ORDER BY "Mon-YY", u.name
name | Mon-YY | money_balance
:----- | :----- | ------------:
John | Dec-21 | 399
Joshua | Dec-21 | null
Mark | Dec-21 | null
John | Jan-22 | null
Joshua | Jan-22 | 111
Mark | Jan-22 | 200
John | Nov-21 | null
Joshua | Nov-21 | 300
Mark | Nov-21 | null
db<>fiddle here

How to calculate average monthly number of some action in some perdion in Teradata SQL?

I have table in Teradata SQL like below:
ID trans_date
------------------------
123 | 2021-01-01
887 | 2021-01-15
123 | 2021-02-10
45 | 2021-03-11
789 | 2021-10-01
45 | 2021-09-02
And I need to calculate average monthly number of transactions made by customers in a period between 2021-01-01 and 2021-09-01, so client with "ID" = 789 will not be calculated because he made transaction later.
In the first month (01) were 2 transactions
In the second month was 1 transaction
In the third month was 1 transaction
In the nineth month was 1 transactions
So the result should be (2+1+1+1) / 4 = 1.25, isn't is ?
How can I calculate it in Teradata SQL? Of course I showed you sample of my data.
SELECT ID, AVG(txns) FROM
(SELECT ID, TRUNC(trans_date,'MON') as mth, COUNT(*) as txns
FROM mytable
-- WHERE condition matches the question but likely want to
-- use end date 2021-09-30 or use mth instead of trans_date
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id, mth) mth_txn
GROUP BY id;
Your logic translated to SQL:
--(2+1+1+1) / 4
SELECT id, COUNT(*) / COUNT(DISTINCT TRUNC(trans_date,'MON')) AS avg_tx
FROM mytable
WHERE trans_date BETWEEN date'2021-01-01' and date'2021-09-01'
GROUP BY id;
You should compare to Fred's answer to see which is more efficent on your data.

Sorting Master Child

Following is the table:
Groups Method RDate
1 master_6 Sales 2019-10-17
2 master_3 ITO 2017-12-22
3 child_6 SRT 2019-10-21
4 master_4 TO 2019-02-07
5 child_3 ITI 2019-03-09
6 child_6 SRT 2019-03-14
7 master_6 Sales 2019-03-14
8 child_4 TR 2019-03-14
9 master_6 Sales 2019-03-14
I want output as follow.
Groups Method RDate
2 master_3 ITO 2017-12-22
5 child_3 ITI 2019-03-09
4 master_4 TO 2019-02-07
8 child_4 TR 2019-03-14
7 master_6 Sales 2019-03-14
6 child_6 SRT 2019-03-14
9 master_6 Sales 2019-03-14
3 child_6 SRT 2019-10-21
1 master_6 Sales 2019-10-17
Logic is:
Take all rows containing word 'master' and sort them by date.
In result, first row shall be Master having the oldest date,
Next row shall be child of that master (master_1's child is Child_1, master_2 child_2, so on)
Then take next master (2nd lowest date), and then its child
for e.g in my case
master having the lowest date is rec#2, so that will come first row in result. Then for second row, find child of that master_3, so it will be child_3, (if more than 1 record for child_3 found then consider lowest date and put it at row 2 in result), and then next master record and so on.
hope I did well to explain everything.
drop table if exists #A
CREATE TABLE #A(Groups varchar(15), Method varchar(15), RDate date)
insert into #A values
('master_6','Sales','2019/10/17'),
('master_3','ITO','2017/12/22'),
('child_6','SRT','2019/10/21'),
('master_4','TO','2019/02/07'),
('child_3','ITI','2019/03/09'),
('child_6','SRT','2019/03/14'),
('master_6','Sales','2019/03/14'),
('child_4','TR','2019/03/14'),
('master_6','Sales','2019/03/14');
since there are some confusion, i am trying to explain in other words:
in my case any master(Parent) will have only 1 type of child but can have multiple records of same child, just to explain, let's say that ParentA visited Theater 6 times in a year, and their ChildA visited 5 times. ParentB visited 3 times And their child 3 times. All these records are stored in one table by date, but not in any order.
I want output where it takes all parents and their date in Asc order in the background, then take the first parent from that background list, find it's child's visit- if found many visits then just take child's first visit because his parent visit was also first.
then take second record from parent list and find their child's visit
If this is second visit of same parent then find second visit of child, if not found anything then go to the third row of the parent list and find it's child visit.
All remaining(extra) visits of Parent or child will be at the end list.
For your sample data, you can do this in the order by:
order by max(case when groups like 'master%' then date end) over (partition by right(groups, 1)) asc,
right(groups, 1),
(case when groups like 'master%' then 1 else 2 end)
This is doing:
Calculating the date for the entire group, based on the "master" date for the group.
Keeping all of a group together, in case there are ties.
Putting the master record before the child(ren).
When a master gives their number to a child, they can stay together.
with CTE_MASTERS as
(
select Groups, Method, Rdate
, substring(Groups,patindex('%[_]%',Groups)+1,len(Groups)) as groupNr
, row_number() over (partition by Groups order by rdate) as groupRownum
, row_number() over (order by rdate, Groups) as masterRownum
from #A
where Groups like 'master%'
)
, CTE_CHILDS as
(
select Groups, Method, Rdate
, substring(Groups,patindex('%[_]%',Groups)+1,len(Groups)) as groupNr
, row_number() over (partition by Groups order by rdate) as groupRownum
from #A
where Groups like 'child%'
)
, CTE_MASTERS_AND_CHILDS as
(
select *
, cast(1 as bit) as isMaster
from CTE_MASTERS
UNION ALL
select c.*
, m.masterRownum
, 0
from CTE_CHILDS c
left join CTE_MASTERS m
on c.groupNr = m.groupNr
and c.groupRownum = m.groupRownum
)
SELECT Groups, Method, Rdate
FROM CTE_MASTERS_AND_CHILDS
ORDER BY
masterRownum,
isMaster desc,
groupNr,
groupRownum;
Groups | Method | Rdate
:------- | :----- | :------------------
master_3 | ITO | 22/12/2017 00:00:00
child_3 | ITI | 09/03/2019 00:00:00
master_4 | TO | 07/02/2019 00:00:00
child_4 | TR | 14/03/2019 00:00:00
master_6 | Sales | 14/03/2019 00:00:00
child_6 | SRT | 14/03/2019 00:00:00
master_6 | Sales | 14/03/2019 00:00:00
child_6 | SRT | 21/10/2019 00:00:00
master_6 | Sales | 17/10/2019 00:00:00
db<>fiddle here

Sum only for Employee ID's present in latest snapshot

I have a database with a row per month for each employee working in our company. So, if employee A has been working for our company from July 2016 till now, this person has approx. 24 rows (one row for each month she was in service).
I'm trying to summarize the experience each of the current employees have in a particular function. So, if employee A has worked 6 months in Sales and 18 months in Marketing, then I count the number of rows this employee has Sales or Marketing in the column indicating the function.
I have created a code which does seems to count the functional experience per employee, but it double counts data. It does not take the latest snapshot as starting point.
SELECT A.EMPLOYEE_ID,
SUM(CASE WHEN A.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN A.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS A INNER JOIN [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS B ON A.EMPLOYEE_ID = B.EMPLOYEE_ID
WHERE B.WORKLEVEL_CODE > '1'
GROUP BY A.EMPLOYEE_ID
I expected the output for employee A to be EXP_CUS = 6 and EXP_MKT = 18. Instead, the output for both is much higher as it is double counting rows. When I add the line AND B.SNAPSHOT_DATE = '2019-06-30', the output is correct. I don't like to manually adjust the code every month and rather refer to the latest snapshot date.
ADDED
The original table looks like this
SNAPSHOT_DATE | EMPLOYEE_ID | FUNCTION_CODE
2019-06-30 | 000000001 | CUS
2019-06-30 | 000000002 | MKT
2019-05-31 | 000000001 | CUS
2019-05-31 | 000000002 | MKT
2019-04-30 | 000000001 | MKT
2019-04-30 | 000000002 | MKT
The desired output would be
EMPLOYEE_ID | EXP_CUS | EXP_MKT
000000001 | 2 | 1
000000002 | 0 | 3
You can use PIVOT to get your desired result as below-
SELECT EMPLOYEE_ID,
ISNULL([CUS],0) AS [EXP_CUS],
ISNULL([MKT],0) AS [EXP_MKT]
FROM
(
SELECT EMPLOYEE_ID,FUNCTION_CODE,COUNT(SNAPSHOT_DATE) T
FROM your_table
GROUP BY EMPLOYEE_ID,FUNCTION_CODE
)P
PIVOT(
SUM(T)
FOR FUNCTION_CODE IN ([CUS],[MKT])
)PVT
Output is-
EMPLOYEE_ID EXP_CUS EXP_MKT
000000001 2 1
000000002 0 3
I don't understand why you are using a self join. This seems to do what you want:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ed
WHERE ED.WORKLEVEL_CODE > '1'
GROUP BY ED.EMPLOYEE_ID;
If you only want employees with the most recent snapshot date, then you can use window functions:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
(SELECT ED.*,
MAX(SNAPSHOT_DATE) OVER () as OVERALL_MAX_SNAPSHOT_DATE,
MAX(SNAPSHOT_DATE) OVER (PARTITION BY EMPLOYEE_ID) as EMPLOYEE_MAX_SNAPSHOT_DATE
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ED
) ED
WHERE ED.WORKLEVEL_CODE > '1' AND
EMPLOYEE_MAX_SNAPSHOT_DATE = OVERALL_MAX_SNAPSHOT_DATE
GROUP BY ED.EMPLOYEE_ID;