Sum only for Employee ID's present in latest snapshot - sql

I have a database with a row per month for each employee working in our company. So, if employee A has been working for our company from July 2016 till now, this person has approx. 24 rows (one row for each month she was in service).
I'm trying to summarize the experience each of the current employees have in a particular function. So, if employee A has worked 6 months in Sales and 18 months in Marketing, then I count the number of rows this employee has Sales or Marketing in the column indicating the function.
I have created a code which does seems to count the functional experience per employee, but it double counts data. It does not take the latest snapshot as starting point.
SELECT A.EMPLOYEE_ID,
SUM(CASE WHEN A.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN A.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS A INNER JOIN [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS B ON A.EMPLOYEE_ID = B.EMPLOYEE_ID
WHERE B.WORKLEVEL_CODE > '1'
GROUP BY A.EMPLOYEE_ID
I expected the output for employee A to be EXP_CUS = 6 and EXP_MKT = 18. Instead, the output for both is much higher as it is double counting rows. When I add the line AND B.SNAPSHOT_DATE = '2019-06-30', the output is correct. I don't like to manually adjust the code every month and rather refer to the latest snapshot date.
ADDED
The original table looks like this
SNAPSHOT_DATE | EMPLOYEE_ID | FUNCTION_CODE
2019-06-30 | 000000001 | CUS
2019-06-30 | 000000002 | MKT
2019-05-31 | 000000001 | CUS
2019-05-31 | 000000002 | MKT
2019-04-30 | 000000001 | MKT
2019-04-30 | 000000002 | MKT
The desired output would be
EMPLOYEE_ID | EXP_CUS | EXP_MKT
000000001 | 2 | 1
000000002 | 0 | 3

You can use PIVOT to get your desired result as below-
SELECT EMPLOYEE_ID,
ISNULL([CUS],0) AS [EXP_CUS],
ISNULL([MKT],0) AS [EXP_MKT]
FROM
(
SELECT EMPLOYEE_ID,FUNCTION_CODE,COUNT(SNAPSHOT_DATE) T
FROM your_table
GROUP BY EMPLOYEE_ID,FUNCTION_CODE
)P
PIVOT(
SUM(T)
FOR FUNCTION_CODE IN ([CUS],[MKT])
)PVT
Output is-
EMPLOYEE_ID EXP_CUS EXP_MKT
000000001 2 1
000000002 0 3

I don't understand why you are using a self join. This seems to do what you want:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ed
WHERE ED.WORKLEVEL_CODE > '1'
GROUP BY ED.EMPLOYEE_ID;
If you only want employees with the most recent snapshot date, then you can use window functions:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
(SELECT ED.*,
MAX(SNAPSHOT_DATE) OVER () as OVERALL_MAX_SNAPSHOT_DATE,
MAX(SNAPSHOT_DATE) OVER (PARTITION BY EMPLOYEE_ID) as EMPLOYEE_MAX_SNAPSHOT_DATE
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ED
) ED
WHERE ED.WORKLEVEL_CODE > '1' AND
EMPLOYEE_MAX_SNAPSHOT_DATE = OVERALL_MAX_SNAPSHOT_DATE
GROUP BY ED.EMPLOYEE_ID;

Related

SQL Code How to do iterations in historical table

I need help on SQL
I have a historical table named A. It has month ID, srvc key, etc.
I need to check if a custkey is a new customer in that table A. The logic is - to see if that cust key is new for the current month ID and does not exist prior months (less than the current month ID).
To illustrate,
My current month ID = Feb2022
The cust key MUST exist in Feb2022 BUT not in Jan 2022, Dec2021,.., and so on..
Also, is it possible to tag if a cust key exist in Feb 2022 and Jan 2022 BUT not in Dec 2021, and so on..
select A.\*,B.level_1, B.level_2, B.level_3, B.LE,
case when cust_key in ('2100707688',
'1xxx4',
'1xxxx',
'28xxxx1',
'2xxxxxx',
) then 'New' else 'Old' end as Tag,
A.NET_AMT/(nullif(A.prod_cnt,0)\*B.LE) as ARPU
Hi #NickW,
thanks for responding, what I need is it from sample historical table below, I need to tag CNumber that are new for the current month (202202). They
are new because CNumber2 didnt appear for 202201,202112,20211. I dont care if it appeared 202110 and less. I care only about CNumber which didnt appear
last 3 months.
Cnumber MonthID
1 202202
1 202201
1 202112
1 202111
2 202202
2 202105
2 202104
2 202103
2 202102
2 202101
3 202202
3 202201
3 202112
3 202111
3 202110
3 202109
Based on this sample, Only CNumber 2 satisfies this rule since it appeared on 202202 but not in 202201 202112 202111.
Next, I would want to tag also CNumber who is new for Jan2022.
In this case, current monthID = 202201. Now, that CNumber should not appear in 202112,20211,202110 to be able to say it is New.
Next, want to tag also CNumber who is new for Dec 2022. Now, that CNumber should not appear in 20211,202110,202109 to be able to tell that they are new.
And so on..
My goal is to tag customers on when did they first appear in the historical table via Month ID. I am assuming that that is their booking date. So in a table, my goal is to see a column that is named as booking date.
We can use a cte to get the month of the first entry for the account. With that we can compare and calculate as needed.
create table sales(
cnumber int,
salesDate date);
insert into sales values
(1,'2021-11-15'),
(1,'2021-12-15'),
(1,'2022-01-15'),
(1,'2022-02-15'),
(2,'2022-02-15');
with cre as (
select
cnumber cnum,
DATE_FORMAT(min(salesDate),
'%Y-%m-01') monCre
from sales
group by
cnumber),
salesMonth as(
select
DATE_FORMAT(salesDate,
'%Y-%m-01') as mon,
cnumber cust
from sales
group by
cnumber,
mon)
select
cust customer,
mon "month",
case when mon = monCre
then 'new' else 'existing' end
as "status",
TIMESTAMPDIFF(MONTH,monCre ,mon)
as "account Age"
from salesMonth
join cre on cust = cnum
order by cust, mon;
customer | month | status | account Age
-------: | :--------- | :------- | ----------:
1 | 2021-11-01 | new | 0
1 | 2021-12-01 | existing | 1
1 | 2022-01-01 | existing | 2
1 | 2022-02-01 | existing | 3
2 | 2022-02-01 | new | 0
db<>fiddle here

Cumulative Sum Query in SQL table with distinct elements

I have a table like this, with column names as Date of Sale and insurance Salesman Names -
Date of Sale | Salesman Name | Sale Amount
2021-03-01 | Jack | 40
2021-03-02 | Mark | 60
2021-03-03 | Sam | 30
2021-03-03 | Mark | 70
2021-03-02 | Sam | 100
I want to do a group by, using the date of sale. The next column should display the cumulative count of the sellers who have made the sale till that date. But same sellers shouldn't be considered again.
For example,
The following table is incorrect,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 5 | 300
The following table is correct,
Date of Sale | Count(Salesman Name) | Sum(Sale Amount)
2021-03-01 | 1 | 40
2021-03-02 | 3 | 200
2021-03-03 | 3 | 300
I am not sure how to frame the SQL query, because there are two conditions involved here, cumulative count while ignoring the duplicates. I think the OVER clause along with the unbounded row preceding may be of some use here? Request your help
Edit - I have added the Sale Amount as a column. I need the cumulative sum for the Sales Amount also. But in this case , all the sale amounts should be considered unlike the salesman name case where only unique names were being considered.
One approach uses a self join and aggregation:
WITH cte AS (
SELECT t1.SaleDate,
COUNT(CASE WHEN t2.Salesman IS NULL THEN 1 END) AS cnt,
SUM(t1.SaleAmount) AS amt
FROM yourTable t1
LEFT JOIN yourTable t2
ON t2.Salesman = t1.Saleman AND
t2.SaleDate < t1.SaleDate
GROUP BY t1.SaleDate
)
SELECT
SaleDate,
SUM(cnt) OVER (ORDER BY SaleDate) AS NumSalesman,
SUM(amt) OVER (ORDER BY SaleDate) AS TotalAmount
FROM cte
ORDER BY SaleDate;
The logic in the CTE is that we try to find, for each salesman, an earlier record for the same salesman. If we can't find such a record, then we assume the record in question is the first appearance. Then we aggregate by date to get the counts per day, and finally take a rolling sum of counts in the outer query.
The best way to do this uses window functions to determine the first time a sales person appears. Then, you just want cumulative sums:
select saledate,
sum(case when seqnum = 1 then 1 else 0 end) over (order by saledate) as num_salespersons,
sum(sum(sales)) over (order by saledate) as running_sales
from (select t.*,
row_number() over (partition by salesperson order by saledate) as seqnum
from t
) t
group by saledate
order by saledate;
Note that this in addition to being more concise, this should have much, much better performance than a solution that uses a self-join.

SQL Query two values for each record

I'm trying to query a customers table to get the total number of accounts per rep grouped by whether they were created this year or before.
CUSTOMER NAME
ACCOUNT REP
DATE CREATED
The query I'm trying to return would look like.
REP | NEW_ACCOUNTS | OLD_ACCOUNTS | TOTAL
-----------------------------------------
Tom | 100 | 12 | 112
Ted | 15 | 1 | 16
The query I've written looks as follows.
SELECT REP, CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END AS ThisYear
FROM CUSTOMERS
GROUP BY REP, DATE_CREATED
Unfortunately, this is giving me
REP | ThisYear
-----------------------------------------
Tom | 1
Ted | 0
Tom | 0
Ted | 1
Ted | 1
I think you want conditional aggregation:
SELECT REP,
SUM(CASE WHEN YEAR(GETDATE()) = YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS NEW_ACCOUNTS,
SUM(CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS OLD_ACCOUNTS,
COUNT(*) as TOTAL
FROM CUSTOMERS
GROUP BY REP;
This assumes that creation dates are not in the future -- a reasonable assumption.
If you want one row per REP, then the only column in the GROUP BY should be REP.
You can want conditional aggregation :
SELECT REP,
SUM(CASE WHEN YEAR(GETDATE()) = YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS NEW_ACCOUNTS,
SUM(CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS OLD_ACCOUNTS, COUNT(*) AS TOTAL
FROM CUSTOMERS
GROUP BY REP;

How to classify or group values based on prior day values?

I have a data set that repeats daily and shows sales. If a product is released on Day 1 and has between 1-5 sales AND also if on Day 2 it has between 10-50 sales, I want to classify it as "Limited Sales."
If a product is released on Day 1 and has over 1,000 sales and also if on Day 2 it has over 1,000 sales, I want to classify it as "Wide Sales."
How would I go about doing this in standard SQL?
I've tried using some workarounds using CASE WHEN, but I ultimately end up with issues because while I can classify the 1st column with an output, I can't get the 2nd column to have an output that is also based on the 1st output (e.g. Column 1 is TRUE, but Column 2 is FALSE. What I need is for Column 1 = TRUE and Column 2 = True.
Here's what a sample query would look like:
Table looks like this:
Columns: name, day_number, sales
1. Jack | 1 | 5
2. Jack | 2 | 10
3. Mary | 1 | 1250
4. Mary | 2 | 1500
SELECT name,
day_number,
sales,
CASE
WHEN day_number = 1
AND sales >= 1
AND sales <= 5
THEN "LIMITED SALES"
ELSE "WIDE SALES"
END AS status_1,
CASE
WHEN day_number = 2
AND sales >= 10
AND sales <= 50
THEN TRUE
ELSE FALSE
END AS status_2
FROM table
Unfortunately this isn't really going to get me what I want. At the end of the day, I would like to see results like:
1. Jack | 1 | 5 | LIMITED SALES
2. Jack | 2 | 10 | LIMITED SALES
3. Mary | 1 | 1250 | WIDE SALES
4. Mary | 2 | 1500 | WIDE SALES
Is this what you want?
select name,
(case when sum(case when day_number = 1 then sales end) between 1 and 5 and
sum(case when day_number = 2 then sales end) between 10 and 50
then 'Limited Sales'
when sum(case when day_number = 1 then sales end) > 1000 and
sum(case when day_number = 2 then sales end) > 1000
then 'Wide Sales'
else '???'
end) as sales_category
from t
group by name
If you want this on each of the original rows, then use window functions or a join.

SQL to work out sales by product taking into account age

I want to work out sales by product grouped by release date, but also grouped by the age of that product when sold, something like this:
| 3 months | 6 months
2015-01 | 28.1 | 37.1
2015-02 | 29.3 | 35.6
So 28.1 is the average number of products sold of each type, 3 months after being released, for those products released in 2015-01. There are obviously more products sold 6 months after the release date, 37.1.
The following SQL gets a list of sales:
SELECT
d.item AS title,
d.quantity,
a.firstdate AS release_date,
i.date AS invoice_date,
i.date - a.firstdate AS age
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
JOIN (SELECT
d.item,
d.binding,
min(i.date) AS firstdate
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
GROUP BY d.item, d.binding) AS a ON a.item = d.item AND a.binding = d.binding
WHERE
i.discount != 100 AND d.price > 0
AND (d.binding != 'Hardback' OR d.binding != 'Ebooks')
ORDER BY title, invoice_date
And the result looks something like:
title | quantity | release date | invoice date | age
A | 1 | 2013-11-14 | 2013-11-14 | 0
A | 2 | 2013-11-14 | 2013-12-14 | 30
A | 3 | 2013-11-14 | 2014-01-14 | 60
A | 4 | 2013-11-14 | 2014-02-14 | 90
A | 5 | 2013-11-14 | 2014-03-14 | 120
B | 6 | 2013-11-14 | 2013-11-14 | 0
B | 7 | 2013-11-14 | 2013-12-14 | 30
B | 8 | 2013-11-14 | 2014-01-14 | 60
B | 9 | 2013-11-14 | 2014-02-14 | 90
B | 10 | 2013-11-14 | 2014-03-14 | 120
For product A, the total sales 3 months after the release date of 2013-11-14 are 1+2+3=6. For product B, total sales 3 months after are 6+7+8=21.
Average sales per title for the month of 2013-11, 3 months after are (6+21)/2=13.5
For 6 months after it's ((1+2+3+4+5) + (6+7+8+9+10)) / 2 = 27.5
The release date is just the first date the product was sold - this is what the joined sub-query is for. There is probably a better way of doing it.
I tried this to get the averages across 3, 6, 12 and 24 months:
SELECT
to_char(a.release_date, 'YYYY-MM') AS release_date,
avg(CASE WHEN i.date - a.release_date < 92
THEN d.quantity END) AS three_months,
avg(CASE WHEN i.date - a.release_date < 183
THEN d.quantity END) AS six_months,
avg(CASE WHEN i.date - a.release_date < 365
THEN d.quantity END) AS twelve_months,
avg(CASE WHEN i.date - a.release_date < 730
THEN d.quantity END) AS twentyfour_months
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
JOIN (SELECT
d.item,
d.binding,
min(i.date) AS release_date
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
GROUP BY d.item, d.binding) AS a ON a.item = d.item AND a.binding = d.binding
WHERE
i.discount != 100 AND d.price != 0
AND (d.binding != 'Hardback' OR d.binding != 'Ebooks')
GROUP BY release_date
ORDER BY release_date desc
Obviously it's totally wrong because it's not grouping the results by title. It's giving me the average items per order rather than the average items per title.
By the way I am stuck on Postgres 8.2.
If I understand you correctly, this is what you want:
SELECT
to_char(date, 'YYYY-MM') AS release_date,
avg(CASE WHEN age < 92 THEN quantity ELSE 0 END) AS three_months,
avg(CASE WHEN age < 183 THEN quantity ELSE 0 END) AS six_months,
avg(CASE WHEN age < 365 THEN quantity ELSE 0 END) AS twelve_months,
avg(CASE WHEN age < 730 THEN quantity ELSE 0 END) AS twentyfour_months
FROM (
SELECT d.item, d.quantity, (i.date - first_release.date) AS age, fr.date
FROM invoice_details d
JOIN (
SELECT d.item, min(i.date) AS date
FROM invoice_details d
JOIN invoices i ON i.id = d.invoice_id
WHERE d.binding != 'Hardback' AND d.binding != 'Ebooks'
GROUP BY d.item) AS fr USING (item)
JOIN invoice i ON i.id = d.invoice_id
WHERE i.discount != 100 AND d.price > 0) AS foo
GROUP BY release_date
ORDER BY release_date;
This is quite obviously untested because I can't even remember when I last touched an 8.2 installation. Your version does not have common table expressions or lateral joins, to name two critical features in later releases that would have made this rather more intuitive.
Anyway, the trick is to first calculate the age of every invoice relative to the book release date for every book sold, then average it out over the various time periods. Look carefully at the filters as I moved them and slightly altered them ((d.binding != 'Hardback' OR d.binding != 'Ebooks') is very likely not what you want).