Count how many times certain fields in records with repeating values are found (or not found) in another table - sql

First of all, I wish to say hi to the community here. The posts here have been a great help with VBA but this is my first question ever. I have a task that I need to solve in SQL (in MS Access) but it's sort of new to me and the task seems to be too complex.
I have a table in Access with the following structure(let's call it Tinvoices):
invoice | year | date | company | step | agent
5110001 | 2019 | 15/01/2019 | 1201 | 0 | John
5110001 | 2019 | 15/01/2019 | 1201 | 1 | Jack
5110002 | 2019 | 10/02/2019 | 1202 | 0 | John
5110002 | 2019 | 10/02/2019 | 1202 | 1 | Jack
5110002 | 2019 | 10/02/2019 | 1202 | 2 | Daniel
5110002 | 2019 | 10/02/2019 | 1202 | 3 | John
5110003 | 2019 | 12/03/2019 | 1205 | 0 | Jack
5110003 | 2019 | 12/03/2019 | 1205 | 1 | Daniel
5110003 | 2019 | 12/03/2019 | 1205 | 2 | David
This table relates to actions on invoices. Invoices and their related data are repeated with each step.
There is another table, which contains agents belonging to a certain department (let's call it Tdeptusers):
agent
John
Jack
What I need to do is the following. Have distinct lines for the invoices (the most unique key is combining the invoice, year and company) and counting in separate steps have been done by users in the Tdeptusers table and how many by users who are not in Tdeptusers. Something like this:
invoice | year | month | company | actionsByOurDept | actionsByOthers
5110001 | 2019 | 1 | 1201 | 2 | 0
5110002 | 2019 | 2 | 1202 | 3 | 1
5110003 | 2019 | 3 | 1205 | 1 | 2
I'm kind of a beginner, so you'll have to excuse me in providing usable codes. Being a complete beginner, I got stuck after the absolute basics. I have stuff like this:
SELECT
invoice,
year,
DatePart("m", Date) AS month,
company,
Sum(IIf(i.agent IN(d.agent), 1, 0)) AS actionsByOurDept,
Sum(IIf(i.agent IN(d.agent), 0, 1)) AS actionsByOthers
FROM Tinvoices AS i, Tdeptusers AS d
GROUP BY invoice, year, DatePart("m", Date), company;
This doesn't give back the desired result, mostly not in actionsByOthers, instead I get huge numbers. Maybe something similar to this solution might work but I haven't been able to do it.
Much appreciation for the help, folks.

Use proper standard explicit JOIN syntax :
SELECT i.invoice, year, DatePart("m", i.Date) AS month, i.company,
SUM( IIF(d.agent IS NOT NULL, 1, 0) ) AS actionsByOurDept,
SUM( IIf(d.agent IS NULL, 1, 0) ) AS actionsByOthers
FROM Tinvoices AS i LEFT JOIN
Tdeptusers AS d
ON d.agent = i.agent
GROUP BY i.invoice, i.year, DatePart("m", i.Date), i.company;

Use left join:
SELECT invoice, year, DatePart("m", Date) AS month, company,
COUNT(d.agent) AS actionsByOurDept,
SUM(IIF(d.agent IS NULL, 1, 0)) AS actionsByOthers
FROM Tinvoices AS i LEFT JOIN
Tdeptusers AS d
ON d.agent = i.agent
GROUP BY invoice, year, DatePart("m", Date), company;
You can directly count your department's users using COUNT().

Related

SQLite: generating customer counts for a date range (months) using a normalized table

I have a sales funnel dataset in SQLite and each row represents a movement through the funnel. As there are quite a few ways a potential customer can move through the funnel (and possibly even go backwards), I wasn't planning on flattening/denormalizing the table. How could I calculate "the number of customers per month up to today"?
customer | opp_value | status_old | status_new | current_status | status_change_date | current_lead_status | lead_created_date
cust_8 | 22 | confirmed | paying | paying | 2020-01-01 | Customer | 2020-01-01
cust_9 | 23 | confirmed | paying | churned | 2020-01-03 | Customer | 2020-01-02
cust_9 | 23 | paying | churned | churned | 2020-03-24 | Customer | 2020-02-25
cust_13 | 30 | negotiation | lost | paying | 2020-04-03 | Lost | 2020-03-20
cust_14 | 45 | qualified | confirmed | paying | 2020-03-03 | Customer | 2020-02-28
cust_14 | 45 | confirmed | paying | paying | 2020-04-03 | Customer | 2020-02-28
... | ... | ... | ... | ... | ... | ... | ...
We're assuming we use end-of-month as definition for whether a customer is still with us.
The result, with the above data should be:
month | customers
Jan-2020 | 2 (cust_8, cust_9)
Feb-2020 | 1 (cust_8, cust_9)
Mar-2020 | 1 (cust_8) # cust_9 churned
Apr-2020 | 2 (cust_8, cust_14)
May-2020 | 2 (cust_8, cust_14)
The part I'd really like to understand is how to create the month column, as I can't rely on the dates of status_change_date as there might be missing records. Would one have to manually generate that column? I know I can generate dates manually using:
WITH RECURSIVE cnt (
x
) AS (
SELECT 0
UNION ALL
SELECT x + 1
FROM cnt
LIMIT (
SELECT
ROUND(((julianday ('2020-05-01') - julianday ('2020-01-01')) / 30) + 1))
)
SELECT
date(julianday ('2020-01-01'), '+' || x || ' month') AS month
FROM cnt
but wondering if there is a better way? Would it possibly be easier to create a snapshot table and generate the current state of each customer for each date?
If you have the dates, you can use a brute-force method. This determines the most recent status for each customer for each date:
select d.date,
sum(as_of_status = 'paying')
from (select distinct d.date, t.customer,
first_value(status_new) over (partition by d.date, t.customer order by t. status_change_date desc) as as_of_status
from dates d join
t
on t.status_change_date <= d.date
) dc
group by d.date
order by d.date;

Is it possible to find the MAX value of a an already aggregated calculation inside the same view?

I have created a calculation in Microsoft SQL Server Management Studio that creates a running total per company and quarter, but at a monthly level and this part works fine.
So if company X sold 40 apples, hypothetically, in Jan and then 60 in Feb, then the running total in Feb would be 100 and if they sold 30 in March, then March's running total would be 130 and then in April it would reset for the new quarter.
What I need now is to find the MAX of these values, per month across all companies. So if Company 'X' sold 100 in Feb, but Company 'Y' sold 150, I want to return 150.
The calculation I use to get the rolling values per quarter calls on two functions to calculate the quarter each month falls into, as well as the relevant Fiscal Period / year ('GetQuarter' and 'GetFiscalPeriod' being the functions).
So my question is, is there any way to find the max at a different level of detail (in this case across ALL Companies) when the value you are looking at is already aggregated at Company level?
I'm told Stored Procedures would make this a lot simpler but the software I use can't call on Stored Procedures, only views and tables.
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
SUM(cm.Actual) OVER (
PARTITION BY (
SELECT dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2))),
cm.Measure,
cm.Company_Code,
(LEFT((SELECT dbo.GetFiscalPeriod(cm.[Date])), 4))
ORDER BY cm.[Date]
) AS Current_QTD_Actual
FROM mytable cm
Desired Output would look like the "MAX" field below:
+--------------+--------+-----+-----+----------+---------+-----+------------+
| Company_Code | Actual | QTD | MAX | Date | Measure | QTR | FiscalYear |
| AAA | 40 | 40 | 40 | 20180701 | Bananas | Q1 | 2019 |
| BBB | 35 | 35 | 40 | 20180701 | Bananas | Q1 | 2019 |
| AAA | 60 | 100 | 105 | 20180801 | Bananas | Q1 | 2019 |
| BBB | 70 | 105 | 105 | 20180801 | Bananas | Q1 | 2019 |
| AAA | 30 | 130 | 150 | 20180901 | Bananas | Q1 | 2019 |
| BBB | 45 | 150 | 150 | 20180901 | Bananas | Q1 | 2019 |
| AAA | 25 | 25 | 45 | 20181001 | Bananas | Q2 | 2019 |
| BBB | 45 | 45 | 45 | 20181001 | Bananas | Q2 | 2019 |
| AAA | 30 | 55 | 85 | 20181101 | Bananas | Q2 | 2019 |
| BBB | 40 | 85 | 85 | 20181101 | Bananas | Q2 | 2019 |
+--------------+--------+-----+-----+----------+---------+-----+------------+
As the QTD calculation I currently have is already a rolled up SUM, simply wrapping this in a MAX function does not work for obvious reasons.
I tried creating a temporary table within the calculation using examples I've seen online, which I then call back into the original table and max that value but I think my syntax is wrong because it never comes out right (I'm still a novice so temporary table syntaxes still elude me quite a bit).
You seem to want the cumulative sum of the maximum values for each month. If this is correct, you can use two levels of window functions:
select measure, fiscalyear, qtr, date, actual,
sum(actual) over (partition by measure fiscalyear, qtr order by date) as running_actual
from (select t.*,
row_number() over (partition by measure, date order by actual desc) as seqnum
from t
) t
where seqnum = 1;
You can't stack aggregates together on the same SELECT with the only exception of appying a windowed aggregate (with an OVER clause) over a regular aggregate. For example:
SELECT
T.GroupedColumn,
RowsByGroup = COUNT(*), -- Regular aggregate
SumOfAllRows = SUM(COUNT(*)) OVER () -- Windowed aggregate of a regular one
FROM
MyTable AS T
GROUP BY
T.GroupedColumn
You can however apply them if you warp the former on a subquery or CTE, which also make the query more readable IMO. I believe you are looking for something like the following:
;WITH RunningSumPerQuarterPerCompany AS
(
SELECT
cm.Company_Code,
cm.[Date],
cm.Measure,
Current_QTD_Actual = SUM(cm.Actual) OVER (
PARTITION BY
dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
cm.Measure,
cm.Company_Code,
LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
ORDER BY
cm.[Date]),
-- Add additional PARTITION BY columns for the GROUP BY later on
Quarter = dbo.GetQuarter(SUBSTRING(cm.[Date], 5, 2)),
FiscalPeriod = LEFT(dbo.GetFiscalPeriod(cm.[Date]), 4)
FROM
mytable cm
),
MaxRunningSumPerQuarter AS
(
SELECT
R.Quarter,
R.FiscalPeriod,
Max_Current_QTD_Actual = MAX(R.Current_QTD_Actual)
FROM
RunningSumPerQuarterPerCompany AS R
GROUP BY
R.Quarter,
R.FiscalPeriod -- GROUP BY whichever dimension you need
)
SELECT
R.*,
M.Max_Current_QTD_Actual
FROM
RunningSumPerQuarterPerCompany AS R
LEFT JOIN MaxRunningSumPerQuarter AS M ON
R.Quarter = M.Quarter AND
R.FiscalPeriod = M.FiscalPeriod -- Join by the GROUP BY columns to display the MAX

Comparing SUM of values with different tables in SQL Server

I have two tables holding similar values, and I need to compare the two and find the differences between them:
SQL FIDDLE - http://sqlfiddle.com/#!6/7412e/9
Now you can see there is a difference between between the 2 tables for the figures in Jun-17.
AS you can see (as a total for everyone) table 1 has £75 for June but table 2 has £125 for june.
The result I'm looking for is when amounts are summed together and compared between tables on a monthly basis, if there is a difference in amount between the two tables I want it listed under 'Unknown'.
| MonthYear | Person | Amount | Month total
+-----------+--------+--------+--------------
| Jun-17 | Sam | 25 | 75(Table1)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 75(Table1)
| Jun-17 | Sam | 25 | 125(Table2)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 125(Table2)
| Jun-17 | | 50 | 125(Table2)
Now when there is a difference between the amount total over a month I want the difference to be classed as unknown
e.g
| MonthYear | Person | Amount | Month total
+-----------+--------+--------+--------------
| Jun-17 | Sam | 25 | 75(Table1)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 75(Table1)
| Jun-17 | Sam | 25 | 125(Table2)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 125(Table2)
| Jun-17 | Unknown| 50 | 125(Table2)
I understand that you could create a case when the person is null to display unknown but i need it to be specifically calculated on the difference between the 2 tables on a monthly calculation.
Does this make sense to anyone, its really hard to explain.
Generally, in any FROM clause a table name can be replaced with another SELECT as long as you give it a corelation name (t1 and t2 in this one):
SELECT t1.MonthYear, t1.AmountT1, t2.AmountT2, t1.amountT1 - isnull(t2.amountT2, 0) as Unknown'
from
( SELECT
MonthYear,
SUM(Amount) AS [AmountT1]
FROM
Invoice
GROUP BY MonthYear) t1
left outer join
( SELECT
MonthYear,
SUM(Amount) AS [AmountT2]
FROM
Invoice2
GROUP BY MonthYear) t2 on t2.MonthYear = t1.MonthYear

MS Access: How to group by first column- summing some columns while keeping others the same

I have Data that looks like:
ID | Year | State | Cost
----+-----------+-----------+-----------
1 | 2012 | CA | 10
2 | 2009 | FL | 90
3 | 2005 | MA | 50
2 | 2009 | FL | 75
1 | 2012 | CA | 110
I need it to look like:
ID | Year | State | Cost
----+-----------+-----------+-----------
1 | 2012 | CA | 120
2 | 2009 | FL | 165
3 | 2005 | MA | 50
So I need the year to remain the same, the state to remain the same, but the cost to be summed for each ID.
I know how to do the summing, but I don't know how to make the year and state stay the same.
You use a GROUP BY or TOTALS Query. Something like,
SELECT
ID,
[Year],
State,
Sum(Cost) As TotalCost
FROM
yourTable
GROUP BY
ID,
[Year],
State;
A Group By clause GROUPS the records based on the common information. The Sum adds up the column specified to give you the right information.

Nested query - looking for better solution

Consider a sales application where we have two tables SALES and INTERNAL_SALES.
Table "SALES" references the number of transactions made by each sales person outside the company.
Table "INTERNAL_SALES" references the number of transactions made by each sales person inside the company to another sales person.
SALES Table:
Each date has one entry against each sales person even if transactions are zero.
id | day | sales_person | number_of_transactions
1 | 2011-08-01 | Tom | 1000
2 | 2011-08-01 | Ben | 500
3 | 2011-08-01 | Anne | 1500
4 | 2011-08-02 | Tom | 0
5 | 2011-08-02 | Ben | 800
6 | 2011-08-02 | Anne | 900
7 | 2011-08-03 | Tom | 3000
8 | 2011-08-03 | Ben | 0
9 | 2011-08-03 | Anne | 40
INTERNAL_SALES Table:
This table logs only the transactions that were actually made between sales persons.
id | day | sales_person_from | sales_person_to | number_of_transactions
0 | 2011-08-01 | Tom | Ben | 10
1 | 2011-08-01 | Tom | Anne | 20
2 | 2011-08-01 | Ben | Tom | 50
3 | 2011-08-03 | Anne | Tom | 30
4 | 2011-08-03 | Anne | Tom | 30
Now the problem is to come up with total transactions by each sales person on a daily basis. The way I did this is:
SELECT day, sales_person, sum(num_transactions) from
(
SELECT day, sales_person, number_of_transactions As num_transactions FROM sales;
UNION
SELECT day, sales_person_from As sales_person, sum(number_of_transactions) As num_transactions FROM internal_sales GROUP BY day, sales_person_from;
)
GROUP BY day, sales_person;
This is too slow and looks ugly. I am seeking a better solution. By the way the database being used in Oracle and I have no control over database except that I can run queries against it.
There is no need to aggregate twice, and the union operator typically does an implicit unique sort which, again, in not necessary in your case.
SELECT day, sales_person, sum(num_transactions) from
(
SELECT day, sales_person, number_of_transactions As num_transactions FROM sales;
UNION ALL
SELECT day, sales_person_from, number_of_transactions FROM internal_sales;
)
GROUP BY day, sales_person;
Removing the intermediate aggregation and the unique sort should help a bit.