Mdx Median query - ssas

Could really use a hand from someone on this mdx query. I am trying to produce a rolling median for the last 365 days on per user and per day basis. I need the median to be the median per user response days. It seems like a simple computation but I cannot see how to get it to work with the crossjoin in the mix. Any help would be so very appreciated! If you even have a suggestion on a direction to attack this from that would be great.
SET [2Years] AS
'{[FirstOrderDate].[Full Date].&[2010-01-15T00:00:00]:[FirstOrderDate].[Full Date].[2012-08-20T00:00:00]}'
MEMBER [Measures].[2YearMedianLag]
AS
median({[FirstOrderDate].[Full Date].currentmember.lag(365):[FirstOrderDate].[FullDate].currentmember} , [Measures].[Response Days])
SELECT {[Measures].[Response Days], [Measures].[MedianLag]} ON 0,
NonEmpty(crossjoin( [days],
[User].[User ID].children),[Measures].[Response Days]) ON 1
FROM [UserRevenue]
Thank you in advance for your assistance. 
EDIT:
SampleData (UserName varchar(100) null, FirstOrderDate Datetime null, ResponseDays int null)
('Jim', '2001-01-03', 10)
('Fred', '2001-01-03', 80)
('Frank', '2001-01-04', 30)
('Sally', '2001-01-05', 18)
('Joan', '2001-01-06', 26)
('Bill', '2001-01-06', 15)
('Ted', '2001-01-08', 29)
('Sam', '2001-01-10', 9)
('Jane', '2001-01-17', 200)
SampleOutput (FirstOrderDate datetime null, MedianResponseDays int null)
('2001-01-03', 45)
('2001-01-04', 30)
('2001-01-05', 24)
('2001-01-06', 22)
('2001-01-07', 22)
('2001-01-08', 26)
('2001-01-09', 26)
('2001-01-10', 22)
('2001-01-11', 22)
('2001-01-12', 22)
('2001-01-13', 22)
('2001-01-14', 22)
('2001-01-15', 22)
('2001-01-16', 22)
('2001-01-17', 26)

It's tricky because you need to work with a different set of rolling dates per day on rows. Are you sure you want 365 for the lag? That gives you 1 year plus 1 day. Anyway, this technique uses an inline named set to create a named set for each combination of user/date and assigns it a unique number, then you can pull that named set back out again in a StrToSet function to match up with the current row's user and dates. This version factors in each individual user:
with
set Users as [User].[User ID].Children
set UsersDates as NonEmpty((Users, [FirstOrderDate].[Full Date].children), [Measures].[Response Days])
set [Rolling Period] as
Generate(
UsersDates,
StrToSet(
"{[FirstOrderDate].[Full Date].currentmember.lag(364): [FirstOrderDate].[Full Date].currentmember} as RP" + CStr(UsersDates.CurrentOrdinal)
)
)
member [Measures].[Median Lag] as
median(
StrToSet("RP" +
CStr(Rank(([User].[User ID].CurrentMember, [FirstOrderDate].[Full Date].CurrentMember), UsersDates)))
, [Measures].[Response Days])
select
{
[measures].[Response Days]
, [measures].[Median Lag]
}
on columns,
UsersDates
on rows
from UserRevenue
UPDATE #1: This version ignores the individual user and instead uses the response for all users for the applicable set of dates:
with
set Users as [User].[User ID].Children
set Dates as NonEmpty([FirstOrderDate].[Full Date].children, [Measures].[Response Days])
set [Rolling Period] as
Generate(
Dates,
StrToSet(
"{[FirstOrderDate].[Full Date].currentmember.lag(364): [FirstOrderDate].[Full Date].currentmember} as RP"
+ CStr(Dates.CurrentOrdinal)
)
)
member [Measures].[Median Lag] as
median(
StrToSet("RP" +
CStr(Rank([FirstOrderDate].[Full Date].CurrentMember, Dates)))
, ([Measures].[Response Days], [User].[User ID].[All]))
select
{
[measures].[Response Days]
, [measures].[Median Lag]
}
on columns,
(Users, Dates)
on rows
from UserRevenue
UPDATE #2: Third time's a charm? Here's a query that gets me the results in your sample output. The key is that the set needs to generate a tuple for each date/user combination for the current date and store that as an inline named set, one per possible date which is uniquely identified by rank. So the first date (1/3) is rank 1, second date (1/4) is rank 2 etc when you look at the list of dates on rows. The first date 1/3/2001 has two items in the set - one with Jim for 1/3 and one with Fred for 1/3. So in the median calculation, the response days for each item in the related set need to be used. Because 1/3 is rank 1 in the list dates, the set called RP1 is retrieved, combined with response days for the items in the set (Jim and Fred) and the median is calculated. Then the next date, 1/4, contains three items - the same as for 1/3 but now also Frank for 1/4, so that requires a recalculation of the median and so on.
with
set Users as [User].[User ID].Children
set Dates as [FirstOrderDate].[Full Date].children
set [Rolling Period] as
Generate(
Dates,
StrToSet(
"NonEmpty(({[FirstOrderDate].[Full Date].currentmember.lag(364): [FirstOrderDate].[Full Date].currentmember}
, Users), [Measures].[Response Days]) as RP"
+ CStr(Dates.CurrentOrdinal)
)
)
member [Measures].[Median Lag] as
median(
StrToSet("RP" +
CStr(Rank([FirstOrderDate].[Full Date].CurrentMember, Dates)))
, [Measures].[Response Days])
select
{[measures].[Median Lag]}
on columns,
Dates
on rows
from UserRevenue

The MedianResponseDays measure iterates on the user to compute the median value of Response Days from a given date to the current date. I put the 365 days. on the rows.
WITH MEMBER [Measures].[MedianResponseDays] AS
Median([User].[User ID].children * [FirstOrderDate].[FullDate].CurrentMember:[FirstOrderDate].[FullDate].DefaultMember, [Measures].[Response Days])
SELECT {[Measures].[MedianResponseDays]} ON 0,
NON EMPTY {[FirstOrderDate].[Full Date].currentmember.lag(364):[FirstOrderDate].[FullDate].currentmember} ON 1
FROM [UserRevenue]

I have a fact table fct_line_details having two columns mtid, productivity
mtid productivity
---- ------------
1 400
1 200
1 600
2 700
3 900
I want to calculate the median for each mtid in SSAS . (median for mtid 1=400 )

Related

Need to add 3 months to each value within a column, based on the 1st '3 Months' calculated off the Admission Date column in T-SQL

I have 14K records table as the following (example of the data related to one particular client_id = 1002):
(my date format is mm/dd/yyyy, months come first)
ClientsEpisodes:
client_id adm_date disch_date
1002 3/11/2005 5/2/2005
1002 8/30/2005 2/16/2007
1002 3/16/2017 NULL
In SQL Server (T-SQL) - I need to calculate + 3 months date into the new column [3Month Date], where the 1st "+ 3 months" value will be calculated off my existing [adm_date] column. Then + 3 more months should be added to the value in [3Months Date], then the next 3 months should be added to the next value in the [3Months Date] column, and so on..., until [3MonthsDate] <= [disch_date]. When [3Months Date] is more than [disch_date] then the data shouldn't be populated. If my [disch_date] IS NULL then the condition should be
[3Months Date] <= current date (whatever it is) from GETDATE() function.
Here is what I expect to see as a result:
(I highlighted my dates offsets with different colors, for a better view)
Below, I'll clarify with more detailed explanation, about each populated (or not populated) data set:
My first [adm_date] from ClientsEpisode table was 3/11/2005.
Adding 3 months:
3/11/2005 + 3 months = 6/11/2005 - falls AFTER the initial [disch_date] (5/2/2005) - not populated
Next [adm_date] from ClientEpisode is 8/3/2005 + 3 Months = 11/30/2005;
then + 3 months must be added to 11/30/2005 = 2/30/2006;
then 2/30/2006 + 3 months = 5/30/2006;
then 5/30/2006 + 3 months = 8/30/2006;
then 8/30/2006 + 3 months = 11/30/2006;
then 11/30/2006 + 3 months = 3/2/2007 - falls AFTER my [disch_date]
(2/16/2007) - not populated
the same algorithm for the next [adm_date] - [disch_date] sets 11/5/2007-2/7/2009 (in dark blue).
then, where [adm_date] = 3/16/17, I have [disch_date] IS NULL, so, the algorithm applies until
[3 Months Date] <= current date (10/15/2020 in this case)
You can use recursive common expression. Below is an example. Note, that you can change the DATEADD part with other (for example add 90 days if you want) - it's a matter of bussness logic.
DECLARE #DataSource TABLE
(
[client_id] INT
,[adm_date] DATE
,[disch_date] DATE
);
INSERT INTO #DataSource ([client_id], [adm_date], [disch_date])
VALUES (1002, '3/11/2005 ', '5/2/2005')
,(1002, '8/30/2005 ', '2/16/2007')
,(1002, '3/16/2017 ', NULL);
WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [client_id]) AS [row_id]
,[client_id]
,[adm_date]
,DATEADD(MONTH, 3, [adm_date]) AS [3Month Date]
,ISNULL([disch_date], GETUTCDATE()) AS [disch_date]
FROM #DataSource
WHERE DATEADD(MONTH, 3, [adm_date]) <= ISNULL([disch_date], GETUTCDATE())
),
RecursiveDataSource AS
(
SELECT [row_id]
,[client_id]
,[adm_date]
,[3Month Date]
,[disch_date]
,0 AS [level]
FROM DataSource
UNION ALL
SELECT DS.[row_id]
,DS.[client_id]
,DS.[adm_date]
,DATEADD(MONTH, 3, RDS.[3Month Date])
,DS.[disch_date]
,[level] + 1
FROM RecursiveDataSource RDS
INNER JOIN DataSource DS
ON RDS.[row_id] = DS.[row_id]
AND DATEADD(MONTH, 3, RDS.[3Month Date]) < DS.[disch_date]
)
SELECT *
FROM RecursiveDataSource
ORDER BY [row_id]
,[level];
This question already has an accepted answer, but you say in the comments for that, that you have performance problems. Try this instead - it's also a lot simpler.
A recursive CTE is really useful if the value of the next row depends on the value of the previous row.
Here, we don't need the answer to the previous row - we just add n x 3 months (e.g., 3 months, 6 months, 9 months) and filter the rows you want to keep.
Therefore, instead of doing a recursive CTE, just do it via set logic.
Here's some data setup:
CREATE TABLE #Datasource (client_id int, adm_date date, disch_date date);
INSERT INTO #Datasource (client_id, adm_date, disch_date) VALUES
(1002, '20050311', '20050502'),
(1002, '20050830', '20070216'),
(1002, '20170316', NULL),
(1002, '20071105', '20090207');
And here's the simple SELECT
WITH DataSourceMod AS
(SELECT client_id, adm_date, disch_date, ISNULL(disch_date, getdate()) AS disc_date_mod
FROM #Datasource
),
Nums_One_to_OneHundred AS
(SELECT a * 10 + b AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) B(b)
)
SELECT ds.client_id, ds.adm_date, ds.disch_date, DATEADD(month, 3*Nums.n, ds.adm_date) AS ThreeMonthDate
FROM DataSourceMod ds
CROSS JOIN Nums_One_to_OneHundred Nums
WHERE DATEADD(month, 3* Nums.n, ds.adm_date) <= ds.disc_date_mod
ORDER BY ds.client_id, ds.adm_date;
This works by
Calculating the effective discharge date (the specified date, or today)
Calculating all possible rows for up to 300 months in the future (the table One_to_OneHundred .. um.. has all the values from 1 to 100, then multiplied by 3.)
Only taking those that fulfil the date condition
You can further optimise this if desired, by limiting the number of 3 months you need to add. Here's a rough version.
WITH DataSourceMod AS
(SELECT client_id, adm_date, disch_date, ISNULL(disch_date, getdate()) AS disc_date_mod,
FLOOR(DATEDIFF(month, adm_date, ISNULL(disch_date, getdate())) / 3) + 1 AS nMax
FROM #Datasource
),
Nums_One_to_OneHundred AS
(SELECT a * 10 + b AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) B(b)
)
SELECT ds.client_id, ds.adm_date, ds.disch_date, DATEADD(month, 3*Nums.n, ds.adm_date) AS ThreeMonthDate
FROM DataSourceMod ds
INNER JOIN Nums_One_to_OneHundred Nums ON Nums.n <= ds.nMax
WHERE DATEADD(month, 3* Nums.n, ds.adm_date) <= ds.disc_date_mod
ORDER BY ds.client_id, ds.adm_date;

Splitting out a cost dynamically across weeks

I’m creating an interim table in SQL Server for use with PowerBI to query financial data.
I have a finance transactions table tblfinance with
CREATE TABLE TBLFinance
(ID int,
Value float,
EntryDate date,
ClientName varchar (250)
)
INSERT INTO TBLFinance(ID ,Value ,EntryDate ,ClientName)
VALUES(1,'1783.26','2018-10-31 00:00:00.000','Alpha')
, (2,'675.3','2018-11-30 00:00:00.000','Alpha')
, (3,'243.6','2018-12-31 00:00:00.000','Alpha')
, (4,'8.17','2019-01-31 00:00:00.000','Alpha')
, (5,'257.23','2019-01-31 00:00:00.000','Alpha')
, (6,'28','2019-02-28 00:00:00.000','Alpha')
, (7,'1470.61','2019-03-31 00:00:00.000','Bravo')
, (8,'1062.86','2019-04-30 00:00:00.000','Bravo')
, (9,'886.65','2019-05-31 00:00:00.000','Bravo')
, (10,'153.31','2019-05-31 00:00:00.000','Bravo')
, (11,'150.24','2019-06-30 00:00:00.000','Bravo')
, (12,'690.14','2019-07-31 00:00:00.000','Charlie')
, (13,'21.67','2019-08-31 00:00:00.000','Charlie')
, (14,'339.29','2018-10-31 00:00:00.000','Charlie')
, (15,'807.96','2018-11-30 00:00:00.000','Delta')
, (16,'48.94','2018-12-31 00:00:00.000','Delta')
I’m calculating transaction values that fall within a week. My week ends on a Sunday, so I have the following query:
INSERT INTO tblAnalysis
(WeekTotal
, WeekEnd
, Client
)
SELECT SUM (VALUE) AS WeekTotal
, dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate) AS WeekEnd
, ClientName as Client
FROM dbo.tblFinance
GROUP BY dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate), CLIENTNAME
I’ve now been informed that some of the costs incurred within a given week maybe monthly, and therefore need to be split into 4 weeks, or annually, so split into 52 weeks. I will write a case statement to update the costs based on ClientName, so assume there is an additional field called ‘Payfrequency’.
I want to avoid having to pull the values affected into a temp table, and effectively write this – because there’ll be different sums applied depending on frequency.
SELECT *
INTO #MonthlyCosts
FROM
(
SELECT
client
, VALUE / 4 AS VALUE
, WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, nt_acnt
, VALUE / 4 AS VALUE
, DATEADD(WEEK,1,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,2,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,3,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
) AS A
I know I need a stored procedure to hold variables so the calculations can be carried out dynamically, but have no idea where to start.
You can use recursive CTEs to split the data:
with cte as (
select ID, Value, EntryDate, ClientName, payfrequency, 1 as n
from TBLFinance f
union all
select ID, Value, EntryDate, ClientName, payfrequency, n + 1
from cte
where n < payfrequency
)
select *
from cte;
Note that by default this is limited to 100 recursion steps. You can add option (maxrecursion 0) for unlimited numbers of days.
The best solution would be to make use of a numbers table. If you can create a table on your server with one column holding a sequence of integer numbers.
You can then use it like this for your weekly values:
SELECT
client
, VALUE / 52 AS VALUE
, DATEADD(WEEK,N.Number,WEEKENDING) AS WEEKENDING
FROM tblAnalysis AS A
CROSS JOIN tblNumbers AS N
WHERE NCHAR.Number <=52

Calculating Percentages with SUM and Group by

I am trying to create an Over Time Calculation based on some set criteria. It goes as follows.
Overtime is posted on any day that is over 8 hrs but an employee has to reach 40 total hrs first and the calculation starts at the 1st day moving forward in the week. The Overtime is calculated based on the percentage taken of the SUM total of the cost codes worked.
First you have to find the percentage of each cost code worked for the entire week per employee id. See Example below
Then each day that is Over 8 hrs you take the time on that code for the day and multiply it by the calculated percentage. At the end of the week the regular hours must total 40hrs if they have gone over 40 for the week. See below example
CREATE TABLE [Totals](
[Day] nvarchar (10) null,
[EmployeeID] [nvarchar](100) NULL,
[CostCode] [nvarchar](100) NULL,
[TotalTime] [real] NULL,)
INSERT Into Totals (day,employeeid, CostCode, TotalTime) VALUES
('1','1234','1', 2),
('1','1234','2', 7.5),
('2','1234','1', 1.5),
('2','1234','2', 8),
('3','1234','1', 1),
('3','1234','2', 6),
('4','1234','1', 2),
('4','1234','2', 8),
('5','1234','1', 2),
('5','1234','2', 8),
('1','4567','1', 2),
('1','4567','2', 8.5),
('2','4567','1', 1.5),
('2','4567','2', 7.6),
('3','4567','1', 1),
('3','4567','2', 5),
('4','4567','1', 2),
('4','4567','2', 8),
('5','4567','1', 2),
('5','4567','2', 8)
To get the percentage of each cost Worked it is the SUM total time of each cost per week / SUM total time of the entire week
SELECT employeeid,CostCode,SUM(totaltime) As TotalTime ,
ROUND(SUM(Totaltime) / (select SUM(TotalTime) from Totals where employeeid = '1234') * 100,0) as Percentage
from Totals WHERE EmployeeID = '1234' group by EmployeeID, CostCode
Percentage Calculated for the Week by Cost = 18% worked on Cost 1 and 82% on Cost 2
I would like to take the percentage results for the week and calculate the total time each day in the query
Results Example Day 1: for EmployeeID 1234
Day CostCode RegTime OverTime
1 1 1.73 .27
1 2 6.27 1.23
After editing i get your result, try this:
select calc.*
--select [day], CostCode, EmployeeID
--, CPr * DayEmpRT RegTime_old
, TotalTime - CPr * DayEmpOT RegTime
, CPr * DayEmpOT OverTime
from (
select Agr.*
--, round(EmpC1T / EmpT, 2) C1Pr
--, round(1 - (EmpC1T / EmpT), 2) C2Pr
, round(EmpCT / EmpT, 2) CPr
, case when DayEmpT > 8 then 8 else DayEmpT end DayEmpRT
, case when DayEmpT > 8 then DayEmpT - 8 else 0 end DayEmpOT
from (
select Totals.*
, SUM(TotalTime) over (partition by EmployeeID, [day]) DayEmpT
--, SUM(case when CostCode = 1 then TotalTime end) over (partition by EmployeeID) EmpC1T
, SUM(TotalTime) over (partition by EmployeeID, CostCode) EmpCT
, SUM(TotalTime) over (partition by EmployeeID) EmpT
from Totals
WHERE EmployeeID = '1234' ) Agr ) calc
order by 1,2,3
here is simplest way to calculate this:
select calc.*
, TotalTime * pr4R RegTime
, TotalTime * pr4O OverTime
from(
select Agr.*
, case when EmpT > 40 then round(40/EmpT, 2) else 1 end pr4R
, case when EmpT > 40 then round(1 - 40/EmpT, 2) else 1 end pr4O
from (
select Totals.*
, SUM(TotalTime) over (partition by EmployeeID) EmpT
from Totals
WHERE EmployeeID = '1234' ) Agr ) calc
but be watch on day 3, because there is only 7h.
The 1st query calculate days separately and live day 3.
The 2nd query scale all hours.
it could be another one, that calculate all emp rows but scale separatly RegTime and OverTime, with exception on day where is no 8h and increment it to 8h from OverTime.
This should help you get started...
-- % based on hours worked for each code on a DAILY basis (The original 21% in the question was based on this)
SELECT
T.EmployeeId,
T.Day,
T.CostCode,
T.TotalTime,
CAST(100.0 * T.TotalTime / X.DailyHours AS DECIMAL(10,2)) AS PercentageWorked
FROM #Totals T
INNER JOIN (
SELECT
EmployeeId,
Day,
SUM(TotalTime) AS DailyHours
FROM #Totals
GROUP BY EmployeeId, Day
) X ON X.EmployeeId = T.EmployeeId AND X.Day = T.Day
-- % based on hours worked for each code on a WEEKLY basis (The revised question)
SELECT
T.EmployeeId,
T.CostCode,
SUM(T.TotalTime) AS TotalTime,
CAST(100.0 * SUM(T.TotalTime) / X.WeeklyHours AS DECIMAL(10,2)) AS PercentageWorked
FROM #Totals T
INNER JOIN (
SELECT
EmployeeId,
SUM(TotalTime) AS WeeklyHours
FROM #Totals
GROUP BY EmployeeId
) X ON X.EmployeeId = T.EmployeeId
GROUP BY
T.EmployeeId,
T.CostCode,
X.WeeklyHours

SSRS Expression - Subtract SUMS

I need some help with an SSRS expression that sums up amounts and then subtracts sums. I have a dataset that has accounts and corresponding money/amount values. I'm trying to write an expression that sums up the money/amount values from one group of the accounts in a specified range, and then subtracts it from the money/amount total of another range. For example:
(Sum(amt) where acct between 40000 and 49999) -
(Sum(amt) where (acct between 50000 and 59999) or (acct between 66000 and 69999)) -
(Sum(amt) where acct between 76000 and 79825) -
(Sum(amt) where acct between 89000 and 90399)
I could really use some help translating the SQL logic above into an expression to be used for a textbox in SSRS. Any advice would be really helpful! Thanks!
Try this :-
=Sum(
iif(Fields!acct.Value) >= 1 and
Fields!acct.Value) < 4 ,
Fields!amt.Value,0
)
)
-
Sum(
iif(
(Fields!acct.Value>=5 and Fields!acct.Value<10)
or (Fields!acct.Value>=12 and Fields!acct.Value< 15),
Fields!amt.Value,0
)
)
-
Sum(
iif(
Fields!acct.Value) >= 76000 and
Fields!acct.Value) < 79825 ,
Fields!amt.Value,0
)
)
-
Sum(
iif(
Fields!acct.Value) >= 89000 and
Fields!acct.Value) < 90399 ,
Fields!amt.Value,0
)
)

sum divided values problem (dealing with rounding error)

I've a product that costs 4€ and i need to divide this money for 3 departments.
On the second column, i need to get the number of rows for this product and divide for the number of departments.
My query:
select
department, totalvalue,
(totalvalue / (select count(*) from departments d2 where d2.department = p.product))
dividedvalue
from products p, departments d
where d.department = p.department
Department Total Value Divided Value
---------- ----------- -------------
A 4 1.3333333
B 4 1.3333333
C 4 1.3333333
But when I sum the values, I get 3,999999. Of course with hundreds of rows i get big differences...
Is there any chance to define 2 decimal numbers and round last value? (my results would be 1.33 1.33 1.34)
I mean, some way to adjust the last row?
In order to handle this, for each row you would have to do the following:
Perform the division
Round the result to the appropriate number of cents
Sum the difference between the rounded amount and the result of the division operation
When the sum of the differences exceeds the lowest decimal place (in this case, 0.01), add that amount to the results of the next division operation (after rounding).
This will distribute fractional amounts evenly across the rows. Unfortunately, there is no easy way to do this in SQL with simple queries; it's probably better to perform this in procedural code.
As for how important it is, when it comes to financial applications and institutions, things like this are very important, even if it's only by a penny, and even if it can only happen every X number of records; typically, the users want to see values tie to the penny (or whatever your unit of currency is) exactly.
Most importantly, you don't want to allow for an exploit like "Superman III" or "Office Space" to occur.
With six decimals of precision, you would need about 5,000 transactions to notice a difference of one cent, if you round the final number to two decimals. Increasing the number of decimals to an acceptable level would eliminate most issues, i.e. using 9 decimals you would need about 5,000,000 transactions to notice a difference of a cent.
Maybe you can make a forth row that will be Total - sum(A,B,C).
But it depends on what you want to do, if you need exact value, you can keep fractions, else, truncate and don't care about the virtual loss
Also can be done simply by adding the rounding difference of a particular value to the next number to be rounded (before rounding). This way the pile remains always the same size.
Here's a TSQL (Microsoft SQL Server) implementation of the algorithm provided by Martin:
-- Set parameters.
DECLARE #departments INTEGER = 3;
DECLARE #totalvalue DECIMAL(19, 7) = 4.0;
WITH
CTE1 AS
(
-- Create the data upon which to perform the calculation.
SELECT
1 AS Department
, #totalvalue AS [Total Value]
, CAST(#totalvalue / #departments AS DECIMAL(19, 7)) AS [Divided Value]
, CAST(ROUND(#totalvalue / #departments, 2) AS DECIMAL(19, 7)) AS [Rounded Value]
UNION ALL
SELECT
CTE1.Department + 1
, CTE1.[Total Value]
, CTE1.[Divided Value]
, CTE1.[Rounded Value]
FROM
CTE1
WHERE
Department < #departments
),
CTE2 AS
(
-- Perform the calculation for each row.
SELECT
Department
, [Total Value]
, [Divided Value]
, [Rounded Value]
, CAST([Divided Value] - [Rounded Value] AS DECIMAL(19, 7)) AS [Rounding Difference]
, [Rounded Value] AS [Calculated Value]
FROM
CTE1
WHERE
Department = 1
UNION ALL
SELECT
CTE1.Department
, CTE1.[Total Value]
, CTE1.[Divided Value]
, CTE1.[Rounded Value]
, CAST(CTE1.[Divided Value] + CTE2.[Rounding Difference] - ROUND(CTE1.[Divided Value] + CTE2.[Rounding Difference], 2) AS DECIMAL(19, 7))
, CAST(ROUND(CTE1.[Divided Value] + CTE2.[Rounding Difference], 2) AS DECIMAL(19, 7))
FROM
CTE2
INNER JOIN CTE1
ON CTE1.Department = CTE2.Department + 1
)
-- Display the results with totals.
SELECT
Department
, [Total Value]
, [Divided Value]
, [Rounded Value]
, [Rounding Difference]
, [Calculated Value]
FROM
CTE2
UNION ALL
SELECT
NULL
, NULL
, SUM([Divided Value])
, SUM([Rounded Value])
, NULL
, SUM([Calculated Value])
FROM
CTE2
;
Output:
You can plug in whatever numbers you want at the top. I'm not sure if there is a mathematical proof for this algorithm.