SQL Query two values for each record - sql

I'm trying to query a customers table to get the total number of accounts per rep grouped by whether they were created this year or before.
CUSTOMER NAME
ACCOUNT REP
DATE CREATED
The query I'm trying to return would look like.
REP | NEW_ACCOUNTS | OLD_ACCOUNTS | TOTAL
-----------------------------------------
Tom | 100 | 12 | 112
Ted | 15 | 1 | 16
The query I've written looks as follows.
SELECT REP, CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END AS ThisYear
FROM CUSTOMERS
GROUP BY REP, DATE_CREATED
Unfortunately, this is giving me
REP | ThisYear
-----------------------------------------
Tom | 1
Ted | 0
Tom | 0
Ted | 1
Ted | 1

I think you want conditional aggregation:
SELECT REP,
SUM(CASE WHEN YEAR(GETDATE()) = YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS NEW_ACCOUNTS,
SUM(CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS OLD_ACCOUNTS,
COUNT(*) as TOTAL
FROM CUSTOMERS
GROUP BY REP;
This assumes that creation dates are not in the future -- a reasonable assumption.
If you want one row per REP, then the only column in the GROUP BY should be REP.

You can want conditional aggregation :
SELECT REP,
SUM(CASE WHEN YEAR(GETDATE()) = YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS NEW_ACCOUNTS,
SUM(CASE WHEN YEAR(GETDATE()) > YEAR(DATE_CREATED) THEN 1 ELSE 0 END) AS OLD_ACCOUNTS, COUNT(*) AS TOTAL
FROM CUSTOMERS
GROUP BY REP;

Related

Sum only for Employee ID's present in latest snapshot

I have a database with a row per month for each employee working in our company. So, if employee A has been working for our company from July 2016 till now, this person has approx. 24 rows (one row for each month she was in service).
I'm trying to summarize the experience each of the current employees have in a particular function. So, if employee A has worked 6 months in Sales and 18 months in Marketing, then I count the number of rows this employee has Sales or Marketing in the column indicating the function.
I have created a code which does seems to count the functional experience per employee, but it double counts data. It does not take the latest snapshot as starting point.
SELECT A.EMPLOYEE_ID,
SUM(CASE WHEN A.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN A.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS A INNER JOIN [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] AS B ON A.EMPLOYEE_ID = B.EMPLOYEE_ID
WHERE B.WORKLEVEL_CODE > '1'
GROUP BY A.EMPLOYEE_ID
I expected the output for employee A to be EXP_CUS = 6 and EXP_MKT = 18. Instead, the output for both is much higher as it is double counting rows. When I add the line AND B.SNAPSHOT_DATE = '2019-06-30', the output is correct. I don't like to manually adjust the code every month and rather refer to the latest snapshot date.
ADDED
The original table looks like this
SNAPSHOT_DATE | EMPLOYEE_ID | FUNCTION_CODE
2019-06-30 | 000000001 | CUS
2019-06-30 | 000000002 | MKT
2019-05-31 | 000000001 | CUS
2019-05-31 | 000000002 | MKT
2019-04-30 | 000000001 | MKT
2019-04-30 | 000000002 | MKT
The desired output would be
EMPLOYEE_ID | EXP_CUS | EXP_MKT
000000001 | 2 | 1
000000002 | 0 | 3
You can use PIVOT to get your desired result as below-
SELECT EMPLOYEE_ID,
ISNULL([CUS],0) AS [EXP_CUS],
ISNULL([MKT],0) AS [EXP_MKT]
FROM
(
SELECT EMPLOYEE_ID,FUNCTION_CODE,COUNT(SNAPSHOT_DATE) T
FROM your_table
GROUP BY EMPLOYEE_ID,FUNCTION_CODE
)P
PIVOT(
SUM(T)
FOR FUNCTION_CODE IN ([CUS],[MKT])
)PVT
Output is-
EMPLOYEE_ID EXP_CUS EXP_MKT
000000001 2 1
000000002 0 3
I don't understand why you are using a self join. This seems to do what you want:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ed
WHERE ED.WORKLEVEL_CODE > '1'
GROUP BY ED.EMPLOYEE_ID;
If you only want employees with the most recent snapshot date, then you can use window functions:
SELECT ED.EMPLOYEE_ID,
SUM(CASE WHEN ED.FUNCTION_CODE ='CUS' THEN 1 ELSE 0 END) AS EXP_CUS,
SUM(CASE WHEN ED.FUNCTION_CODE ='MKT' THEN 1 ELSE 0 END) AS EXP_MKT
(SELECT ED.*,
MAX(SNAPSHOT_DATE) OVER () as OVERALL_MAX_SNAPSHOT_DATE,
MAX(SNAPSHOT_DATE) OVER (PARTITION BY EMPLOYEE_ID) as EMPLOYEE_MAX_SNAPSHOT_DATE
FROM [dbname].[AGL_V_HRA_FE_R].[VW_HRA_EMPLOYEE_DETAIL] ED
) ED
WHERE ED.WORKLEVEL_CODE > '1' AND
EMPLOYEE_MAX_SNAPSHOT_DATE = OVERALL_MAX_SNAPSHOT_DATE
GROUP BY ED.EMPLOYEE_ID;

How to flag active customers who have at least one transaction per month?

Objective is to create a flag for active customers.
An active customer is someone who has atleast one transaction every month.
Time frame - May 2018 to May 2019
Data is at transaction level
-------------------------------------
txn_id | txn_date | name | amount
-------------------------------------
101 2018-05-01 ABC 100
102 2018-05-02 ABC 200
-------------------------------------
output should be like this -
----------------
name | flag
----------------
ABC active
BCF inactive
You can use aggregation to get the active customers:
select name
from t
where txn_date >= '2018-05-01' and txn_date < '2019-06-01'
group by name
having count(distinct last_day(txn_date)) = 13 -- all months accounted for
EDIT:
If you want a flag, just move the condition to a case expression:
select name,
(case when count(distinct case when txn_date >= '2018-05-01' and txn_date < '2019-06-01' then last_day(txn_date) end) = 13
then 'active' else 'inactive'
end) as flag
from t;

How to classify or group values based on prior day values?

I have a data set that repeats daily and shows sales. If a product is released on Day 1 and has between 1-5 sales AND also if on Day 2 it has between 10-50 sales, I want to classify it as "Limited Sales."
If a product is released on Day 1 and has over 1,000 sales and also if on Day 2 it has over 1,000 sales, I want to classify it as "Wide Sales."
How would I go about doing this in standard SQL?
I've tried using some workarounds using CASE WHEN, but I ultimately end up with issues because while I can classify the 1st column with an output, I can't get the 2nd column to have an output that is also based on the 1st output (e.g. Column 1 is TRUE, but Column 2 is FALSE. What I need is for Column 1 = TRUE and Column 2 = True.
Here's what a sample query would look like:
Table looks like this:
Columns: name, day_number, sales
1. Jack | 1 | 5
2. Jack | 2 | 10
3. Mary | 1 | 1250
4. Mary | 2 | 1500
SELECT name,
day_number,
sales,
CASE
WHEN day_number = 1
AND sales >= 1
AND sales <= 5
THEN "LIMITED SALES"
ELSE "WIDE SALES"
END AS status_1,
CASE
WHEN day_number = 2
AND sales >= 10
AND sales <= 50
THEN TRUE
ELSE FALSE
END AS status_2
FROM table
Unfortunately this isn't really going to get me what I want. At the end of the day, I would like to see results like:
1. Jack | 1 | 5 | LIMITED SALES
2. Jack | 2 | 10 | LIMITED SALES
3. Mary | 1 | 1250 | WIDE SALES
4. Mary | 2 | 1500 | WIDE SALES
Is this what you want?
select name,
(case when sum(case when day_number = 1 then sales end) between 1 and 5 and
sum(case when day_number = 2 then sales end) between 10 and 50
then 'Limited Sales'
when sum(case when day_number = 1 then sales end) > 1000 and
sum(case when day_number = 2 then sales end) > 1000
then 'Wide Sales'
else '???'
end) as sales_category
from t
group by name
If you want this on each of the original rows, then use window functions or a join.

sql query to get difference of sum over two columns spread across two tables grouped by month [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to write a query to get difference of sum over two columns spread across two tables grouped by month.
Schema:
TableA
mass numeric,
weight numeric
sampleDt date
TableB
mass numeric,
weight numeric,
sampleDt date
Sample Data for Table A
100|200|2017-01-03
10 |20 |2017-01-05
200|400|2017-12-23
Sample Data for Table B
10 | 20 | 2017-01-20
10 | 20 | 2017-01-21
100 | 200 | 2017-12-12
2 | 4 | 2017-06-12
Expected Output
Month,Year |AMassTotal |AWeightTotal |BMassTotal |BWeightTotal |AMassTotal-BMassTotal
Jan,17 | 110 | 220 | 20 | 40 | 90
Jun,17 | 0 | 0 | 2 | 4 | -2
Dec,17 | 200 | 400 |100 |200 | 100
Use a full outer join and a group by:
select to_char(sampledt, 'yyyy-mm') as month_year,
coalesce(sum(a.mass),0) as a_mass_total,
coalesce(sum(a.weight),0) as a_weight_total,
coalesce(sum(b.mass),0) as b_mass_total,
coalesce(sum(b.weight),0) as b_weight_total,
coalesce(sum(a.mass),0) - coalesce(sum(b.mass),0) as mass_total_diff
from table_a a
full join table_b b using (sampledt)
group by to_char(sampledt, 'yyyy-mm');
If you want year and month in separate columns, you can use:
select extract(year from sampledt) as year,
extract(month from sampledt) as month,
coalesce(sum(a.mass),0) as a_mass_total,
coalesce(sum(a.weight),0) as a_weight_total,
coalesce(sum(b.mass),0) as b_mass_total,
coalesce(sum(b.weight),0) as b_weight_total,
coalesce(sum(a.mass),0) - coalesce(sum(b.mass),0) as mass_total_diff
from table_a a
full join table_b b using (sampledt)
group by extract(year from sampledt), extract(month from sampledt)
order by 1,2;
Except for the to_char() function the above is ANSI standard SQL.
Online example: http://rextester.com/YAN23912
For MySQL
SELECT TB.`Month,Year`,
IFNULL(AMassTotal,0),
IFNULL(AWeightTotal,0),
IFNULL(BMassTotal,0),
IFNULL(BWeightTotal,0),
(IFNULL(AMassTotal,0)-IFNULL(BMassTotal,0)) AS 'AMassTotal-BMassTotal'
FROM
(
SELECT DATE_FORMAT(sampleDt,'%M,%Y') AS `Month,Year`,
SUM(CASE WHEN mass IS NULL THEN 0 ELSE mass END) AS AMassTotal,
SUM(CASE WHEN weight IS NULL THEN 0 ELSE weight END) AS AWeightTotal
From TableA
GROUP BY DATE_FORMAT(sampleDt,'%M,%Y')
) AS TA
RIGHT JOIN
(
SELECT DATE_FORMAT(sampleDt,'%M,%Y') AS `Month,Year`,
SUM(CASE WHEN mass IS NULL THEN 0 ELSE mass END) AS BMassTotal,
SUM(CASE WHEN weight IS NULL THEN 0 ELSE weight END) AS BWeightTotal
FROM TableB
GROUP BY DATE_FORMAT(sampleDt,'%M,%Y')
) AS TB
ON TA.`Month,Year`=TB.`Month,Year`
Live Demo
http://sqlfiddle.com/#!9/2a6e24/22
Try this:
This works in SQL Server:
SELECT CONVERT(CHAR(3), sampleDt, 0)+','+CAST(DATEPART(YEAR,sampleDt) AS VARCHAR) [Month,Year]
,ISNULL(SUM(CASE WHEN D.Tab=1 THEN mass END),0) AMassTotal
,ISNULL(SUM(CASE WHEN D.Tab=1 THEN weight END),0) AWeightTotal
,ISNULL(SUM(CASE WHEN D.Tab=2 THEN mass END),0) BMassTotal
,ISNULL(SUM(CASE WHEN D.Tab=2 THEN weight END),0) BWeightTotal
,ISNULL(SUM(CASE WHEN D.Tab=1 THEN mass END)-SUM(CASE WHEN D.Tab=2 THEN mass END),0) [AMassTotal-BMassTotal]
FROM(
SELECT 1 AS Tab,* FROM TableA
UNION ALL
SELECT 2,* FROM TableB
)D
GROUP BY LEFT(DATEPART(MONTH,sampleDt),3)+DATEPART(YEAR,sampleDt)
select CONVERT(CHAR(3), GETDATE(), 0)
SQL Fiddle Demo: SQL Fiddle Demo

SQL to work out sales by product taking into account age

I want to work out sales by product grouped by release date, but also grouped by the age of that product when sold, something like this:
| 3 months | 6 months
2015-01 | 28.1 | 37.1
2015-02 | 29.3 | 35.6
So 28.1 is the average number of products sold of each type, 3 months after being released, for those products released in 2015-01. There are obviously more products sold 6 months after the release date, 37.1.
The following SQL gets a list of sales:
SELECT
d.item AS title,
d.quantity,
a.firstdate AS release_date,
i.date AS invoice_date,
i.date - a.firstdate AS age
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
JOIN (SELECT
d.item,
d.binding,
min(i.date) AS firstdate
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
GROUP BY d.item, d.binding) AS a ON a.item = d.item AND a.binding = d.binding
WHERE
i.discount != 100 AND d.price > 0
AND (d.binding != 'Hardback' OR d.binding != 'Ebooks')
ORDER BY title, invoice_date
And the result looks something like:
title | quantity | release date | invoice date | age
A | 1 | 2013-11-14 | 2013-11-14 | 0
A | 2 | 2013-11-14 | 2013-12-14 | 30
A | 3 | 2013-11-14 | 2014-01-14 | 60
A | 4 | 2013-11-14 | 2014-02-14 | 90
A | 5 | 2013-11-14 | 2014-03-14 | 120
B | 6 | 2013-11-14 | 2013-11-14 | 0
B | 7 | 2013-11-14 | 2013-12-14 | 30
B | 8 | 2013-11-14 | 2014-01-14 | 60
B | 9 | 2013-11-14 | 2014-02-14 | 90
B | 10 | 2013-11-14 | 2014-03-14 | 120
For product A, the total sales 3 months after the release date of 2013-11-14 are 1+2+3=6. For product B, total sales 3 months after are 6+7+8=21.
Average sales per title for the month of 2013-11, 3 months after are (6+21)/2=13.5
For 6 months after it's ((1+2+3+4+5) + (6+7+8+9+10)) / 2 = 27.5
The release date is just the first date the product was sold - this is what the joined sub-query is for. There is probably a better way of doing it.
I tried this to get the averages across 3, 6, 12 and 24 months:
SELECT
to_char(a.release_date, 'YYYY-MM') AS release_date,
avg(CASE WHEN i.date - a.release_date < 92
THEN d.quantity END) AS three_months,
avg(CASE WHEN i.date - a.release_date < 183
THEN d.quantity END) AS six_months,
avg(CASE WHEN i.date - a.release_date < 365
THEN d.quantity END) AS twelve_months,
avg(CASE WHEN i.date - a.release_date < 730
THEN d.quantity END) AS twentyfour_months
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
JOIN (SELECT
d.item,
d.binding,
min(i.date) AS release_date
FROM invoices i
JOIN invoice_details d ON i.id = d.invoice_id
GROUP BY d.item, d.binding) AS a ON a.item = d.item AND a.binding = d.binding
WHERE
i.discount != 100 AND d.price != 0
AND (d.binding != 'Hardback' OR d.binding != 'Ebooks')
GROUP BY release_date
ORDER BY release_date desc
Obviously it's totally wrong because it's not grouping the results by title. It's giving me the average items per order rather than the average items per title.
By the way I am stuck on Postgres 8.2.
If I understand you correctly, this is what you want:
SELECT
to_char(date, 'YYYY-MM') AS release_date,
avg(CASE WHEN age < 92 THEN quantity ELSE 0 END) AS three_months,
avg(CASE WHEN age < 183 THEN quantity ELSE 0 END) AS six_months,
avg(CASE WHEN age < 365 THEN quantity ELSE 0 END) AS twelve_months,
avg(CASE WHEN age < 730 THEN quantity ELSE 0 END) AS twentyfour_months
FROM (
SELECT d.item, d.quantity, (i.date - first_release.date) AS age, fr.date
FROM invoice_details d
JOIN (
SELECT d.item, min(i.date) AS date
FROM invoice_details d
JOIN invoices i ON i.id = d.invoice_id
WHERE d.binding != 'Hardback' AND d.binding != 'Ebooks'
GROUP BY d.item) AS fr USING (item)
JOIN invoice i ON i.id = d.invoice_id
WHERE i.discount != 100 AND d.price > 0) AS foo
GROUP BY release_date
ORDER BY release_date;
This is quite obviously untested because I can't even remember when I last touched an 8.2 installation. Your version does not have common table expressions or lateral joins, to name two critical features in later releases that would have made this rather more intuitive.
Anyway, the trick is to first calculate the age of every invoice relative to the book release date for every book sold, then average it out over the various time periods. Look carefully at the filters as I moved them and slightly altered them ((d.binding != 'Hardback' OR d.binding != 'Ebooks') is very likely not what you want).