How to determine date paid given billing and payment data in BigQuery - sql

I need to know when a bill was paid to determine how early or late it was paid. Unfortunately, I just have billing creation data and payment records.
Example 1
--raw data
WITH bill AS (
SELECT 'b1' AS id, DATE('2022-01-01') AS created, DATE('2022-01-15') AS due, 50 AS amount UNION ALL
SELECT 'b2', DATE('2022-01-01'), DATE('2022-01-30'), 50 UNION ALL
SELECT 'b3', DATE('2022-01-03'), DATE('2022-01-17'), 50 UNION ALL
SELECT 'b4', DATE('2022-01-03'), DATE('2022-02-01'), 50 UNION ALL
SELECT 'b5', DATE('2022-01-05'), DATE('2022-01-19'), 50 UNION ALL
SELECT 'b6', DATE('2022-01-05'), DATE('2022-02-04'), 50
),
payment AS (
SELECT 'p1' AS id, DATE('2022-01-10') AS made, 50 AS amount UNION ALL
SELECT 'p2', DATE('2022-01-11'), 50
)
-- setup
SELECT * FROM bill
I want a query that returns all of the data from the bill table plus the date that the bill was paid, which is theoretically derived from the payment table.
In the example above, the solution could be to sort the bill rows by the due date and apply the payments accordingly:
--raw data
WITH bill AS (
SELECT 'b1' AS id, DATE('2022-01-01') AS created, DATE('2022-01-15') AS due, 50 AS amount UNION ALL
SELECT 'b2', DATE('2022-01-01'), DATE('2022-01-30'), 50 UNION ALL
SELECT 'b3', DATE('2022-01-03'), DATE('2022-01-17'), 50 UNION ALL
SELECT 'b4', DATE('2022-01-03'), DATE('2022-02-01'), 50 UNION ALL
SELECT 'b5', DATE('2022-01-05'), DATE('2022-01-19'), 50 UNION ALL
SELECT 'b6', DATE('2022-01-05'), DATE('2022-02-04'), 50
),
payment AS (
SELECT 'p1' AS id, DATE('2022-01-10') AS made, 50 AS amount UNION ALL
SELECT 'p2', DATE('2022-01-11'), 50
),
--start solution
p AS (
SELECT
payment.id,
payment.made,
payment.amount,
SUM( payment.amount ) OVER (
ORDER BY payment.made
) AS amount_cumulative
FROM payment
),
b AS (
SELECT
bill.id,
bill.created,
bill.due,
bill.amount,
SUM( bill.amount ) OVER (
ORDER BY bill.due
) AS amount_cumulative
FROM bill
),
repayments AS (
SELECT
b.*,
p.made,
ROW_NUMBER() OVER (
PARTITION BY b.id
ORDER BY p.made ASC
) AS seq
FROM b
LEFT JOIN p
ON b.amount_cumulative <= p.amount_cumulative
WHERE p.made IS NOT NULL
)
SELECT
b.*,
repayments.made AS payment_date
FROM b
LEFT JOIN repayments
ON b.id = repayments.id
WHERE (repayments.seq = 1 OR repayments.seq IS NULL)
ORDER BY b.id
Example 2
However, that solution breaks down if we change some of the bill and payment dates (because payments must be applied to the bill with an oustanding balance and the earliest due date at the time of the payment). For example:
--raw data
WITH bill AS (
SELECT 'b1' AS id, DATE('2022-01-01') AS created, DATE('2022-01-15') AS due, 50 AS amount UNION ALL
SELECT 'b2', DATE('2022-01-01'), DATE('2022-01-30'), 50 UNION ALL
SELECT 'b3', DATE('2022-01-03'), DATE('2022-01-17'), 50 UNION ALL
SELECT 'b4', DATE('2022-01-03'), DATE('2022-02-01'), 50 UNION ALL
SELECT 'b5', DATE('2022-01-05'), DATE('2022-01-19'), 50 UNION ALL
SELECT 'b6', DATE('2022-01-05'), DATE('2022-02-04'), 50
),
payment AS (
SELECT 'p1' AS id, DATE('2022-01-02') AS made, 100 AS amount UNION ALL
SELECT 'p2', DATE('2022-01-11'), 50
),
--start solution
p AS (
SELECT
payment.id,
payment.made,
payment.amount,
SUM( payment.amount ) OVER (
ORDER BY payment.made
) AS amount_cumulative
FROM payment
),
b AS (
SELECT
bill.id,
bill.created,
bill.due,
bill.amount,
SUM( bill.amount ) OVER (
ORDER BY bill.due
) AS amount_cumulative
FROM bill
),
repayments AS (
SELECT
b.*,
p.made,
ROW_NUMBER() OVER (
PARTITION BY b.id
ORDER BY p.made ASC
) AS seq
FROM b
LEFT JOIN p
ON b.amount_cumulative <= p.amount_cumulative
WHERE p.made IS NOT NULL
)
SELECT
b.*,
repayments.made AS payment_date
FROM b
LEFT JOIN repayments
ON b.id = repayments.id
WHERE (repayments.seq = 1 OR repayments.seq IS NULL)
ORDER BY b.id
This example is even trickier if p1 is for an amount = 75 (in which case b2 should remain not completely paid, but b3 should be paid in full on 2022-01-11).
Trickiest Example
This is the most contrived, but it best illustrates the problem. Payment 1 goes toward Bill 1 because that is the only available bill at the time of the payment (the payment almost pays the first bill in full). At the time of Payment 2, the bill with the earliest unpaid balance is now Bill 2, so all of the payment goes toward Bill 2 (and again, the payment almost pays the second bill in full). However, at the end of this sequence, 99/100 is paid toward Bill 1 and 49/50 is paid toward Bill 2, but neither Bill 1 nor Bill 2 are paid in full, so the paid in full date should be NULL for both.
--raw data
WITH bill AS (
SELECT 'b1' AS id, DATE('2022-01-01') AS created, DATE('2022-01-15') AS due, 100 AS amount UNION ALL
SELECT 'b2', DATE('2022-01-02'), DATE('2022-01-14'), 50
),
payment AS (
SELECT 'p1' AS id, DATE('2022-01-10') AS made, 99 AS amount UNION ALL
SELECT 'p2', DATE('2022-01-11'), 49
),
-- setup
SELECT * FROM bill
Question
Can I get the payment_date with a query? If so, what does that look like?

Please refer query below. Idea is to distribute payments over range.
--raw data
WITH bill AS (
SELECT 'b1' AS id, DATE('2022-01-01') AS created, DATE('2022-01-15') AS due, 50 AS amount UNION ALL
SELECT 'b2', DATE('2022-01-01'), DATE('2022-01-30'), 50 UNION ALL
SELECT 'b3', DATE('2022-01-03'), DATE('2022-01-17'), 50 UNION ALL
SELECT 'b4', DATE('2022-01-03'), DATE('2022-02-01'), 50 UNION ALL
SELECT 'b5', DATE('2022-01-05'), DATE('2022-01-19'), 50 UNION ALL
SELECT 'b6', DATE('2022-01-05'), DATE('2022-02-04'), 50
), payment AS (
SELECT 'p1' AS id, DATE('2022-01-10') AS made, 50 AS amount UNION ALL
SELECT 'p2', DATE('2022-01-11'), 50
), b as (
-- setup
SELECT
bill.id,
bill.created,
bill.due,
bill.amount,
SUM( bill.amount ) OVER (
ORDER BY bill.created, bill.due
) AS amount_cumulative_created,
SUM( bill.amount ) OVER (
ORDER BY bill.due
) as amount_cumulative_due,
row_number() over (order by bill.due) crn
FROM bill
order by created
), p as (
SELECT
p.id,
p.made,
p.amount,
case when
row_number() over (order by 0) =
max(un) over (partition by p.id) then
floor(p.amount / max(un) over (partition by p.id)) +
mod(p.amount, max(un) over (partition by p.id))
else
floor(p.amount / max(un) over (partition by p.id))
end new_amount,
row_number() over (order by 0) prn
FROM
(select *,
row_number() over (order by payment.made) prn
from payment) p join b
on p.prn = b.crn
,unnest(generate_array(1,
(select coalesce(max(b.crn),0)+1 from b
where p.amount > b.amount_cumulative_due
and p.made between b.created and b.due),
1)) un
), p_distrib as (
select *,
SUM( p.new_amount ) OVER (
ORDER BY p.prn
) AS payment_cumulative, from p
)
select b.*,
(select min(pd.made) from p_distrib pd
where
--- Adjust here for what date takes preference - created vs. due.
--- Replace amount_cumulative_created with amount_cumulative_due as needed
b.amount_cumulative_created <= pd.payment_cumulative
and pd.made between b.created and b.due)
from b;

Related

Oracle SQL - Displaying only net effect (unmatched) rows

Following is my sample table structure
Name Amount
A 100
A 100
A -100
A 100
A 100
A -100
B 10
A 100
There is no Primary Key in this table.
Desired Output:
Name Amount
A 100
A 100
B 10
A 100
Explanation:
I need to cancel out matching rows, i.e., one -100 nullifies one +100.
Therefore i need to display only rows that are not offset / not nullified one to one.
This can be done in PL/SQL by populating the rows to a temporary table and deleting one positive for every one corresponding negative. However, I require to do this on the fly using SQL statements.
Regards,
Raghu
You can enumerate the rows using row_number() and then use that to "cancel" them:
select t.name, t.amount
from (select t.*,
sum(amount) over (partition by name, abs(amount), seqnum) as sum_amount
from (select t.*,
row_number() over (partition by name, amount order by name) as seqnum
from t
) t
) t
where sum_amount <> 0;
Here is a db<>fiddle.
You can give each row a ROW_NUMBER unique to that name/amount pair and then count whether, for a name/ABS(amount) there are one or two values for each of those ROW_NUMBER and discard the rows where there are two (one positive and one negative):
SELECT name,
amount
FROM (
SELECT name,
amount,
COUNT( amount ) OVER ( PARTITION BY name, ABS( amount ), rn )
AS num_matches
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY name, amount ORDER BY ROWNUM ) AS rn
FROM table_name t
)
)
WHERE num_matches = 1
So, for your sample data:
CREATE TABLE table_name ( Name, Amount ) AS
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', -100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', -100 FROM DUAL UNION ALL
SELECT 'B', +10 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL;
This outputs:
NAME | AMOUNT
:--- | -----:
A | 100
A | 100
A | 100
B | 10
db<>fiddle here
If there are never more negative values than positive it's a task for EXCEPT ALL. Oracle doesn't support it, but this is a rewrite:
select name, amount
from
(
select name, amount, row_number() over (partition by name, amount order by amount)
from tab
where amount > 0
minus
select name, -amount, row_number() over (partition by name, amount order by amount)
from tab
where amount < 0
) dt
or
with cte as
(
select name, amount, row_number() over (partition by name, amount order by amount) as rn
from tab
)
select name, amount
from
(
select name, amount, rn
from cte
where amount > 0
minus
select name, -amount, rn
from cte
where amount < 0
) dt

How can I select rows with MAX(Column) when another column has distinct values in oracle?

I have four columns like this.
Material Description Quantity Date
a 133 200 26-09-2016 12:33
a 133 400 27-09-2016 10:33
I need to take the quantity from that material when Max(Date);
I tried this but if quantity is distinct in shows both rows.
Select material , description , quantity , max(date)
FROM materials
group by material, description , quantity
Use that condition in WHERE clause like
Select material , description , quantity
FROM materials
WHERE "Date" = (select max("Date") from materials)
Use the RANK() analytic function:
SELECT *
FROM (
SELECT materials,
description,
quantity,
date,
RANK() OVER ( PARTITION BY materials ORDER BY date DESC ) AS rnk
FROM materials
)
WHERE rnk = 1;
This will get multiple rows if there are rows with the same materials and maximum date values - if you only want a single row then use ROW_NUMBER() instead of RANK().
you can use row_number() like this (i'am add material b, if you need find quantity for all materials in group "a" and "b"):
WITH a(Material, Description , Quantity , sDate) AS
(SELECT 'b', 133, 1200 , to_date('26-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'b', 133, 2200 , to_date('29-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'a', 133, 200 , to_date('26-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'a', 133, 400 , to_date('27-09-2016 10:33','dd-mm-yyyy hh24:mi') FROM dual )
SELECT *
FROM (SELECT a.*,
row_number() over(partition BY material order by sdate DESC) rn
FROM a)
WHERE rn = 1
MATERIAL DESCRIPTION QUANTITY SDATE RN
-------- ----------- ---------- --------- ----------
a 133 400 27-SEP-16 1
b 133 2200 29-SEP-16 1
SELECT *
FROM (
SELECT materials,
description,
quantity,
date,
RANK() OVER ( PARTITION BY materials ORDER BY date DESC ) AS rnk
FROM materials
)
WHERE rnk = 1;

How to use SUM and COUNT and add the results as a resulting column

CREATE TABLE my_table ( bank_account, bank_id, amount ) AS
SELECT 123, 600, 1500 FROM DUAL UNION ALL
SELECT 123, 600, 2500 FROM DUAL UNION ALL
SELECT 123, 600, 3500 FROM DUAL UNION ALL
SELECT 123, 700, 500 FROM DUAL UNION ALL
SELECT 123, 700, 1000 FROM DUAL UNION ALL
SELECT 456, 800, 2000 FROM DUAL UNION ALL
SELECT 456, 900, 2000 FROM DUAL UNION ALL
SELECT 456, 900, 4000 FROM DUAL;
I need to write the SQL code where the result would look like this:
Where:
total_amount - the sum of all transactions bank_account made in specific bank_id
number_of_transactions - number of transactions made by bank_account in specific bank_id
total_num_trans - total number of transactions made by bank_account
total_am_trans - total amount of transactions made by bank_account
I've only managed to get some of the results I need, but can't get them all.
This is with what I've started:
SELECT t.bank_account
, t.bank_id
, count(*) number_of_transactions
, sum(t.amount) total_amount
FROM my_table t
GROUP BY t.bank_account
, t.bank_id
ORDER BY t.bank_account
Thanks.
Oracle Setup:
CREATE TABLE my_table ( bank_account, bank_id, amount ) AS
SELECT 123, 600, 1500 FROM DUAL UNION ALL
SELECT 123, 600, 2500 FROM DUAL UNION ALL
SELECT 123, 600, 3500 FROM DUAL UNION ALL
SELECT 123, 700, 500 FROM DUAL UNION ALL
SELECT 123, 700, 1000 FROM DUAL UNION ALL
SELECT 456, 800, 2000 FROM DUAL UNION ALL
SELECT 456, 900, 2000 FROM DUAL UNION ALL
SELECT 456, 950, 4000 FROM DUAL;
Query:
SELECT bank_account,
bank_id,
total_amount,
number_of_transactions,
SUM( number_of_transactions ) OVER ( PARTITION BY bank_account ) AS total_num_trans,
SUM( total_amount ) OVER ( PARTITION BY bank_account ) AS total_am_trans,
number_of_transactions
/ SUM( number_of_transactions ) OVER ( PARTITION BY bank_account )
* 100 AS percentage_trans,
total_amount
/ SUM( total_amount ) OVER ( PARTITION BY bank_account )
* 100 AS percentage_trans
FROM (
SELECT bank_account,
bank_id,
count(*) AS number_of_transactions,
sum(amount) AS total_amount
FROM my_table
GROUP BY bank_account
, bank_id
)
Output:
BANK_ACCOUNT BANK_ID TOTAL_AMOUNT NUMBER_OF_TRANSACTIONS TOTAL_NUM_TRANS TOTAL_AM_TRANS PERCENTAGE_TRANS PERCENTAGE_TRANS
------------ ---------- ------------ ---------------------- --------------- -------------- ---------------- ----------------
123 600 7500 3 5 9000 60 83.3333333
123 700 1500 2 5 9000 40 16.6666667
456 800 2000 1 3 8000 33.3333333 25
456 900 2000 1 3 8000 33.3333333 25
456 950 4000 1 3 8000 33.3333333 50
Try this;)
select t1.*, t2.total_num_trans, t2.total_am_trans, (t1.number_of_transactions / t2.total_num_trans) * 100 as percentage_trans, (t1.total_amount / t2.total_am_trans) * 100 as percentage_amount
from (
select bank_account, bank_id, sum(amount) as total_amount, count(1) as number_of_transactions
from my_table
group by bank_account, bank_id) t1
left join (
select bank_account, sum(total_amount) as total_am_trans, sum(number_of_transactions) as total_num_trans
from (
select bank_account, bank_id, sum(amount) as total_amount, count(1) as number_of_transactions
from my_table
group by bank_account, bank_id ) t
group by bank_account ) t2 on t1.bank_account = t2.bank_account
order by t1.bank_account
Try to join 2 aggregations, the coarse one only groups by bank account, the finer one also by bank id.
SELECT tfine.bank_account
, tfine.bank_id
, tfine.total_amount
, tfine.number_of_transactions
, tcoarse.total_num_trans
, tcoarse.total_am_trans
FROM (
SELECT t1.bank_account
, t1.bank_id
, count(*) number_of_transactions
, sum(t1.amount) total_amount
FROM my_table t1
GROUP BY t1.bank_account
, t1.bank_id
) tfine
JOIN (
SELECT t2.bank_account
, count(*) total_num_trans
, sum(t2.amount) total_am_trans
FROM my_table t2
GROUP BY t2.bank_account
) tcoarse
ON tcoarse.bank_account = tfine.bank_account
ORDER BY tfine.bank_account
, tfine.bank_id
;
Online demo on ideone.

SQL: Earliest Date After Latest Null If Exists

Using T-Sql I am looking to return the min date after the latest null if one exists and simply the min date on any products where there are no nulls.
Table:
DateSold Product
12/31/2012 A
1/31/2013
2/28/2013 A
3/31/2013 A
4/30/2013 A
5/31/2013
6/30/2013 A
7/31/2013 A
8/31/2013 A
9/30/2013 A
12/31/2012 B
1/31/2013 B
2/28/2013 B
3/31/2013 B
4/30/2013 B
5/31/2013 B
6/30/2013 B
7/31/2013 B
8/31/2013 B
9/30/2013 B
For product “A” 6/30/2013 is the desired return while for product “B” 12/31/2012 is desired.
Result:
MinDateSold Product
6/30/2013 A
12/31/2012 B
Any solutions will greatly be appreciated. Thank you.
This does it for me, if there's a GROUP involved, otherwise how do you know whether the NULLs are in the run of A or B products? I realise this may not be exactly what you're after, but I hope it helps anyway.
WITH DATA_IN AS (
SELECT 1 as grp,
convert(DateTime,'12/31/2012') as d_Date,
'A' AS d_ch
UNION ALL
SELECT 1, '1/31/2013', NULL UNION ALL
SELECT 1, '2/28/2013', 'A' UNION ALL
SELECT 1, '3/31/2013', 'A' UNION ALL
SELECT 1, '4/30/2013', 'A' UNION ALL
SELECT 1, '5/31/2013', NULL UNION ALL
SELECT 1, '6/30/2013', 'A' UNION ALL
SELECT 1, '7/31/2013', 'A' UNION ALL
SELECT 1, '8/31/2013', 'A' UNION ALL
SELECT 1, '9/30/2013', 'A' UNION ALL
SELECT 2, '12/31/2012', 'B' UNION ALL
SELECT 2, '1/31/2013', 'B' UNION ALL
SELECT 2, '2/28/2013', 'B' UNION ALL
SELECT 2, '3/31/2013', 'B' UNION ALL
SELECT 2, '4/30/2013', 'B' UNION ALL
SELECT 2, '5/31/2013', 'B' UNION ALL
SELECT 2, '6/30/2013', 'B' UNION ALL
SELECT 2, '7/31/2013', 'B' UNION ALL
SELECT 2, '8/31/2013', 'B' UNION ALL
SELECT 2, '9/30/2013', 'B'
)
SELECT
grp as YourGroup,
(SELECT Min(d_date) -- first date after...
FROM DATA_IN
WHERE d_date>
Coalesce( -- either the latest NULL
(SELECT max(d_Date)
FROM DATA_IN d2
WHERE d2.grp=d1.grp AND d2.d_ch IS NULL
)
, '1/1/1901' -- or a base date if no NULLs
)
) as MinDateSold
FROM DATA_IN d1
GROUP BY grp
Results :
1 2013-06-30 00:00:00.000
2 2012-12-31 00:00:00.000
One approach to this is to count the number of NULL values that appear before a given row for a given value. This divides the ranges into groups. For each group, take the minimum date. And, find the largest minimum date for each product:
select product, minDate
from (select product, NumNulls, min(DateSold) as minDate,
row_number() over (partition by product order by min(DateSold) desc
) as seqnum
from (select t.*,
(select count(*)
from table t2
where t2.product is null and t2.DateSold <= t.DateSold
) as NumNulls
from table t
) t
group by Product, NumNUlls
) t
where seqnum = 1;
In your data, there is no mixing of different products in a range, so this query sort of assumes that is true as well.

SQL Server: A Grouping question that's annoying me

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:
Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.
Month Employee PercentOfTotal
--------------------------------
1 Alice 25%
1 Barbara 65%
1 Claire 10%
2 Alice 25%
2 Barbara 50%
2 Claire 25%
3 Alice 25%
3 Barbara 65%
3 Claire 10%
As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.
What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.
I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)
ID Employee PercentOfTotal
--------------------------------
X Alice 25%
X Barbara 65%
X Claire 10%
Y Alice 25%
Y Barbara 50%
Y Claire 25%
Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.
WITH temp_ids (Month)
AS
(
SELECT DISTINCT MIN(Month)
FROM employees_paid
GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
FROM employees_paid EMP
JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal
Thanks y'all!
-Ricky
This gives you an answer in a slightly different format than you requested:
SELECT DISTINCT
T1.PercentOfTotal AS Alice,
T2.PercentOfTotal AS Barbara,
T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
ON T2.Month = T3.Month AND T3.Employee = 'Claire'
Result:
Alice Barbara Claire
25% 50% 25%
25% 65% 10%
If you want to, you can use UNPIVOT to turn this result set into the form you asked for.
SELECT rn AS ID, Employee, PercentOfTotal
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
FROM (
SELECT DISTINCT
T1.PercentOfTotal AS Alice,
T2.PercentOfTotal AS Barbara,
T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
AND T2.Employee = 'Barbara'
JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt
Result:
ID Employee PercentOfTotal
1 Alice 25%
1 Barbara 50%
1 Claire 25%
2 Alice 25%
2 Barbara 65%
2 Claire 10%
What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.
With Employees As
(
Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
Union All Select 1, 'Barbara', .65
Union All Select 1, 'Claire', .10
Union All Select 2, 'Alice', .25
Union All Select 2, 'Barbara', .50
Union All Select 2, 'Claire', .25
Union All Select 3, 'Alice', .25
Union All Select 3, 'Barbara', .65
Union All Select 3, 'Claire', .10
)
, EmployeeRanks As
(
Select Month, Employee, PercentOfTotal
, Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
From Employees
)
, Signatures As
(
Select Month
, Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
From EmployeeRanks
Group By Month
)
, DistinctSignatures As
(
Select Min(Month) As MinMonth, SignatureValue
From Signatures
Group By SignatureValue
)
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
Join DistinctSignatures As D
On D.MinMonth = E.Month
I'm assuming performance won't be great (cause of the subquery)
SELECT * FROM employees_paid where Month not in (
SELECT
a.Month
FROM
employees_paid a
INNER JOIN employees_paid b ON
(a.employee = B.employee AND
a.PercentOfTotal = b.PercentOfTotal AND
a.Month > b.Month)
GROUP BY
a.Month,
b.Month
HAVING
Count(*) = (SELECT COUNT(*) FROM employees_paid c
where c.Month = a.Month)
)
The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month).
The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
We then GROUP by COUNT of matched entries for each month-month combination
Then the HAVING excludes months that don't have as many matches as there are month entries
The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)
If I have understood you correctly then, for a general solution, I think you would need to concatenate the whole group together - e.g. to produce Alice:0.25, Barbara:0.50, Claire:0.25. Then select the distinct groups so something like the following would do it (rather clunkily).
WITH EmpSalaries
AS
(
SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL
SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month, 'Claire' AS Employee, 0.10
),
Months AS
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
(
Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
From EmpSalaries As S1
Where S1.Month = Months.Month
Order By S1.Employee
For Xml Path('')
), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
FROM MonthlySummary
GROUP BY Summary)
I just put together this solution
while writing this question, which
seems to work
I don't think it does work. Here I've added a further two groups (month = 4 and 5 respectively) which I would consider to be distinct yet the result is the same i.e. month = 1 and 2 only:
WITH employees_paid (Month, Employee, PercentOfTotal)
AS
(
SELECT 1, 'Alice', 0.25
UNION ALL
SELECT 1, 'Barbara', 0.65
UNION ALL
SELECT 1, 'Claire', 0.1
UNION ALL
SELECT 2, 'Alice', 0.25
UNION ALL
SELECT 2, 'Barbara', 0.5
UNION ALL
SELECT 2, 'Claire', 0.25
UNION ALL
SELECT 3, 'Alice', 0.25
UNION ALL
SELECT 3, 'Barbara', 0.65
UNION ALL
SELECT 3, 'Claire', 0.1
UNION ALL
SELECT 4, 'Barbara', 0.25
UNION ALL
SELECT 4, 'Claire', 0.65
UNION ALL
SELECT 4, 'Alice', 0.1
UNION ALL
SELECT 5, 'Diana', 0.25
UNION ALL
SELECT 5, 'Emma', 0.65
UNION ALL
SELECT 5, 'Fiona', 0.1
),
temp_ids (Month)
AS
(
SELECT DISTINCT MIN(Month)
FROM employees_paid
GROUP
BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
FROM employees_paid AS EMP
INNER JOIN temp_ids AS IDS
ON EMP.Month = IDS.Month
GROUP
BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;