Comparing SUM of values with different tables in SQL Server - sql

I have two tables holding similar values, and I need to compare the two and find the differences between them:
SQL FIDDLE - http://sqlfiddle.com/#!6/7412e/9
Now you can see there is a difference between between the 2 tables for the figures in Jun-17.
AS you can see (as a total for everyone) table 1 has £75 for June but table 2 has £125 for june.
The result I'm looking for is when amounts are summed together and compared between tables on a monthly basis, if there is a difference in amount between the two tables I want it listed under 'Unknown'.
| MonthYear | Person | Amount | Month total
+-----------+--------+--------+--------------
| Jun-17 | Sam | 25 | 75(Table1)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 75(Table1)
| Jun-17 | Sam | 25 | 125(Table2)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 125(Table2)
| Jun-17 | | 50 | 125(Table2)
Now when there is a difference between the amount total over a month I want the difference to be classed as unknown
e.g
| MonthYear | Person | Amount | Month total
+-----------+--------+--------+--------------
| Jun-17 | Sam | 25 | 75(Table1)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 75(Table1)
| Jun-17 | Sam | 25 | 125(Table2)
| Sep-17 | Ben | 50 | 50(Table2)
| Jun-17 | Tom | 50 | 125(Table2)
| Jun-17 | Unknown| 50 | 125(Table2)
I understand that you could create a case when the person is null to display unknown but i need it to be specifically calculated on the difference between the 2 tables on a monthly calculation.
Does this make sense to anyone, its really hard to explain.

Generally, in any FROM clause a table name can be replaced with another SELECT as long as you give it a corelation name (t1 and t2 in this one):
SELECT t1.MonthYear, t1.AmountT1, t2.AmountT2, t1.amountT1 - isnull(t2.amountT2, 0) as Unknown'
from
( SELECT
MonthYear,
SUM(Amount) AS [AmountT1]
FROM
Invoice
GROUP BY MonthYear) t1
left outer join
( SELECT
MonthYear,
SUM(Amount) AS [AmountT2]
FROM
Invoice2
GROUP BY MonthYear) t2 on t2.MonthYear = t1.MonthYear

Related

Complex nested aggregations to get order totals

I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns

Count how many times certain fields in records with repeating values are found (or not found) in another table

First of all, I wish to say hi to the community here. The posts here have been a great help with VBA but this is my first question ever. I have a task that I need to solve in SQL (in MS Access) but it's sort of new to me and the task seems to be too complex.
I have a table in Access with the following structure(let's call it Tinvoices):
invoice | year | date | company | step | agent
5110001 | 2019 | 15/01/2019 | 1201 | 0 | John
5110001 | 2019 | 15/01/2019 | 1201 | 1 | Jack
5110002 | 2019 | 10/02/2019 | 1202 | 0 | John
5110002 | 2019 | 10/02/2019 | 1202 | 1 | Jack
5110002 | 2019 | 10/02/2019 | 1202 | 2 | Daniel
5110002 | 2019 | 10/02/2019 | 1202 | 3 | John
5110003 | 2019 | 12/03/2019 | 1205 | 0 | Jack
5110003 | 2019 | 12/03/2019 | 1205 | 1 | Daniel
5110003 | 2019 | 12/03/2019 | 1205 | 2 | David
This table relates to actions on invoices. Invoices and their related data are repeated with each step.
There is another table, which contains agents belonging to a certain department (let's call it Tdeptusers):
agent
John
Jack
What I need to do is the following. Have distinct lines for the invoices (the most unique key is combining the invoice, year and company) and counting in separate steps have been done by users in the Tdeptusers table and how many by users who are not in Tdeptusers. Something like this:
invoice | year | month | company | actionsByOurDept | actionsByOthers
5110001 | 2019 | 1 | 1201 | 2 | 0
5110002 | 2019 | 2 | 1202 | 3 | 1
5110003 | 2019 | 3 | 1205 | 1 | 2
I'm kind of a beginner, so you'll have to excuse me in providing usable codes. Being a complete beginner, I got stuck after the absolute basics. I have stuff like this:
SELECT
invoice,
year,
DatePart("m", Date) AS month,
company,
Sum(IIf(i.agent IN(d.agent), 1, 0)) AS actionsByOurDept,
Sum(IIf(i.agent IN(d.agent), 0, 1)) AS actionsByOthers
FROM Tinvoices AS i, Tdeptusers AS d
GROUP BY invoice, year, DatePart("m", Date), company;
This doesn't give back the desired result, mostly not in actionsByOthers, instead I get huge numbers. Maybe something similar to this solution might work but I haven't been able to do it.
Much appreciation for the help, folks.
Use proper standard explicit JOIN syntax :
SELECT i.invoice, year, DatePart("m", i.Date) AS month, i.company,
SUM( IIF(d.agent IS NOT NULL, 1, 0) ) AS actionsByOurDept,
SUM( IIf(d.agent IS NULL, 1, 0) ) AS actionsByOthers
FROM Tinvoices AS i LEFT JOIN
Tdeptusers AS d
ON d.agent = i.agent
GROUP BY i.invoice, i.year, DatePart("m", i.Date), i.company;
Use left join:
SELECT invoice, year, DatePart("m", Date) AS month, company,
COUNT(d.agent) AS actionsByOurDept,
SUM(IIF(d.agent IS NULL, 1, 0)) AS actionsByOthers
FROM Tinvoices AS i LEFT JOIN
Tdeptusers AS d
ON d.agent = i.agent
GROUP BY invoice, year, DatePart("m", Date), company;
You can directly count your department's users using COUNT().

How to join transactional data with customer data tables and perform case-based operations in SQL

I'm trying to perform a query between two different tables and come up with a case by case scenario, coming up with a list of records of calls for a specific month.
Here are my tables:
Customer table:
+----+----------------+------------+
| id | name | number |
+----+----------------+------------+
| 1 | John Doe | 8973221232 |
| 2 | American Dad | 7165531212 |
| 3 | Michael Clean | 8884731234 |
| 4 | Samuel Gatsby | 9197543321 |
| 5 | Mike Chat | 8794029819 |
+----+----------------+------------+
Transaction data:
+----------+------------+------------+----------+---------------------+
| trans_id | incoming | outgoing | duration | date_time |
+----------+------------+------------+----------+---------------------+
| 1 | 8973221232 | 9197543321 | 64 | 2018-03-09 01:08:09 |
| 2 | 3729920490 | 7651113929 | 276 | 2018-07-20 05:53:10 |
| 3 | 8884731234 | 8973221232 | 382 | 2018-05-02 13:12:13 |
| 4 | 8973221232 | 9234759208 | 127 | 2018-07-07 15:32:30 |
| 5 | 7165531212 | 9197543321 | 852 | 2018-08-02 07:40:23 |
| 6 | 8884731234 | 9833823023 | 774 | 2018-07-03 14:27:52 |
| 7 | 8273820928 | 2374987349 | 120 | 2018-07-06 05:27:44 |
| 8 | 8973221232 | 9197543321 | 79 | 2018-07-30 12:51:55 |
| 9 | 7165531212 | 7651113929 | 392 | 2018-05-22 02:27:38 |
| 10 | 5423541524 | 7165531212 | 100 | 2018-07-21 22:12:20 |
| 11 | 9197543321 | 2983479820 | 377 | 2018-07-20 17:46:36 |
| 12 | 8973221232 | 7651113929 | 234 | 2018-07-09 03:32:53 |
| 13 | 7165531212 | 2309483932 | 88 | 2018-07-16 16:22:21 |
| 14 | 8973221232 | 8884731234 | 90 | 2018-09-03 13:10:00 |
| 15 | 3820838290 | 2093482348 | 238 | 2018-04-12 21:59:01 |
+----------+------------+------------+----------+---------------------+
What am I trying to accomplish?
I'm trying to compile a list of "costs" for each of the customers that made calls on July 2018. The costs are based on:
1) If the customer received a call (incoming), the cost of the call is equal to the duration;
2) if the customer made a call (outgoing), the cost of the call is 100 if the call is 30 or less in duration. If it exceeds 30 duration, then the cost is 100 plus 5 * duration of the exceeded period.
If the customer didn't make any calls during that month he shouldn't be on the list.
Examples:
1) Customer American Dad has 3 incoming calls and 1 outgoing call, however only trans_id 10 and 13 are for the month of July. He should be paying a total of 538:
for trans_id 10 = 450 (100 for the first 30s + 5 * 70 for the remaining)
for trans_id 13 = 88
2) Customer Samuel Gatsby has 1 incoming call and 3 outgoing calls, however only trans_id 8 and 11 are for the month of July. He should be paying a total of 722:
for trans_id 8 = 345 (100 for the first 30s + 5 * 49 for the remaining)
for trans_id 11 = 377
Considering only these two examples, the output would be:
+----+----------------+------------+------------+
| id | name | number | billable |
+----+----------------+------------+------------+
| 2 | American Dad | 7165531212 | 538 |
| 4 | Samuel Gatsby | 9197543321 | 722 |
+----+----------------+------------+------------+
Note: Mike Chat shouldn't be on the list as he didn't make or receive any calls for that specific month.
What have I tried so far?
I've been playing cat and mouse with this one, I'm using the number as uniqueID, already attempted both a full outer join and combining where incoming or outgoing is not null then applying rules by case, tried doing a left join and applying cases, but I'm circling around and I can't get to a final list. Whenever I get incoming or outgoing, I'm either not able to apply the case or not able to come with both together. Really appreciate the help!
select customer_name.name, customer_name.number, bill = (CASE
WHEN customer_name.number = transaction_data.incoming then 'sum bill'
else 'multiply and add'
end)
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Note: I'm using sqlite to try it out online but the database is on SQL Server 2012, so I know that I can use a date format much easier, that way, but I'd like to keep as close to T-SQL as possible.
Also tried creating a case to determine whether it's incoming call or outgoing, but I'm only getting incoming as a result, even though trans_id 10 is outgoing:
select name, number, duration, case
when customer_name.number = transaction_data.incoming then 'incoming'
when customer_name.number = transaction_data.outgoing then 'outgoing'
END direction
from customer_name
left join transaction_data on customer_name.number = transaction_data.incoming or customer_name.name = transaction_data.outgoing
where strftime('%Y-%m', transaction_data.date_time) = '2018-07'
Try this:
SELECT
c."name", c.number,
SUM(CASE c.number
WHEN t.incoming THEN t.duration
ELSE IIF(t.duration - 30 < 0, 0, t.duration - 30) * 5 + 100
END) AS billable
FROM Customer AS c INNER JOIN [Transaction] AS t
ON c.number IN(t.incoming, t.outgoing)
WHERE t.date_time >= '20180701' AND t.date_time < '20180801'
GROUP BY c."name", c.number
Output:
| name | number | billable |
+---------------+------------+----------+
| John Doe | 8973221232 | 440 |
| American Dad | 7165531212 | 538 |
| Michael Clean | 8884731234 | 774 |
| Samuel Gatsby | 9197543321 | 722 |
Test it online with SQL Fiddle.

Slicing account balance data in BigQuery to generate a debit report

I have a collection of account balances over time:
+-----------------+------------+-------------+-----------------------+
| account_balance | department | customer_id | timestamp |
+-----------------+------------+-------------+-----------------------+
| 5 | A | 1 | 2019-02-12T00:00:00 |
| -10 | A | 1 | 2019-02-13T00:00:00 |
| -35 | A | 1 | 2019-02-14T00:00:00 |
| 20 | A | 1 | 2019-02-15T00:00:00 |
+-----------------+------------+-------------+-----------------------+
Each record shows the total account balance of a customer at a specified timestamp. The account balance increases e.g. to 20 from -35, when a customer tops-up his account with 55. As a customer uses a services, his account balances decreases e.g. from 5 to -10.
I want to aggregate this data in two ways:
1) Get the debit, credit and balance (credit-debit) of a department per month and year. The results from April should be a summary of all previous months:
+---------+--------+-------+------------+-------+--------+
| balance | credit | debit | department | month | year |
+---------+--------+-------+------------+-------+--------+
| 5 | 10 | -5 | A | 1 | 2019 |
| 20 | 32 | -12 | A | 2 | 2019 |
| 35 | 52 | -17 | A | 3 | 2019 |
| 51 | 70 | -19 | A | 4 | 2019 |
+---------+--------+-------+------------+-------+--------+
A customer's account balance might not change every month. There might be account balance records of customer 1 in February, but not March.
Notes towards the solution:
use EXTRACT(MONTH from timestamp) month
use EXTRACT(YEAR from timestamp) year
GROUP BY month, year, department
2) Get the change of debit, credit and balance of a department by date.
+---------+--------+-------+------------+-------------+
| balance | credit | debit | department | date |
+---------+--------+-------+------------+-------------+
| 5 | 10 | -5 | A | 2019-01-15 |
| 15 | 22 | -7 | A | 2019-02-15 |
| 15 | 20 | -5 | A | 2019-03-15 |
| 16 | 18 | -2 | A | 2019-04-15 |
+---------+--------+-------+------------+-------------+
51 70 -19
When I create a SUM of the deltas, I should get the same values as the last row from results in 1).
Notes towards the solution:
use account_balance - LAG(account_balance) OVER(PARTITION BY department ORDER BY timestamp ASC) delta to compute deltas
Your question is unclear, but it sounds like you want to get the outstanding balance at any given point in time.
The following query does this for 1 point in time.
with calendar as (
select cast('2019-06-01' as timestamp) as balance_calc_ts
),
most_recent_balance as (
select customer_id, balance_calc_ts,max(timestamp) as most_recent_balance_ts
from <table>
cross join calendar
where timestamp < balance_calc_ts -- or <=
group by 1,2
)
select t.customer_id, t.account_balance, mrb.balance_calc_ts
from <table> t
inner join most_recent_balance mrb on t.customer_id = mrb.customer_id and t.timestamp = mrb.balance_calc_ts
If you need to calculate it at a series of points in time, you will need to modify the calendar CTE to return more dates. This is the beauty of CROSS JOINS in BQ!

Nested query - looking for better solution

Consider a sales application where we have two tables SALES and INTERNAL_SALES.
Table "SALES" references the number of transactions made by each sales person outside the company.
Table "INTERNAL_SALES" references the number of transactions made by each sales person inside the company to another sales person.
SALES Table:
Each date has one entry against each sales person even if transactions are zero.
id | day | sales_person | number_of_transactions
1 | 2011-08-01 | Tom | 1000
2 | 2011-08-01 | Ben | 500
3 | 2011-08-01 | Anne | 1500
4 | 2011-08-02 | Tom | 0
5 | 2011-08-02 | Ben | 800
6 | 2011-08-02 | Anne | 900
7 | 2011-08-03 | Tom | 3000
8 | 2011-08-03 | Ben | 0
9 | 2011-08-03 | Anne | 40
INTERNAL_SALES Table:
This table logs only the transactions that were actually made between sales persons.
id | day | sales_person_from | sales_person_to | number_of_transactions
0 | 2011-08-01 | Tom | Ben | 10
1 | 2011-08-01 | Tom | Anne | 20
2 | 2011-08-01 | Ben | Tom | 50
3 | 2011-08-03 | Anne | Tom | 30
4 | 2011-08-03 | Anne | Tom | 30
Now the problem is to come up with total transactions by each sales person on a daily basis. The way I did this is:
SELECT day, sales_person, sum(num_transactions) from
(
SELECT day, sales_person, number_of_transactions As num_transactions FROM sales;
UNION
SELECT day, sales_person_from As sales_person, sum(number_of_transactions) As num_transactions FROM internal_sales GROUP BY day, sales_person_from;
)
GROUP BY day, sales_person;
This is too slow and looks ugly. I am seeking a better solution. By the way the database being used in Oracle and I have no control over database except that I can run queries against it.
There is no need to aggregate twice, and the union operator typically does an implicit unique sort which, again, in not necessary in your case.
SELECT day, sales_person, sum(num_transactions) from
(
SELECT day, sales_person, number_of_transactions As num_transactions FROM sales;
UNION ALL
SELECT day, sales_person_from, number_of_transactions FROM internal_sales;
)
GROUP BY day, sales_person;
Removing the intermediate aggregation and the unique sort should help a bit.