Power BI: Simple addition give wrong result - sum

I'm counting number of unique IDs per month in given timeframe and I've ancountered two strange things:
1. Looking for the same thing but using two different approaches (value for each month and cumulative value month by month) gives different values. See screenshot below.
2. When You'll add by hand values in first column (monthly value) the result is 868, when PowerBI summerize it - it's 864 o_O
any ideas?
DAX Formulas below:
Y-1 Kandydaci = CALCULATE(
distinctcount(getDataForTeb[ID_DANE_OSOBOWE]);
DATESBETWEEN(
getDataForTeb[Złożenie podania];
DATE(YEAR(now())-1;4;1);
IF(DATE(YEAR(NOW())-1;MONTH(NOW());DAY(NOW()))<=DATE(YEAR(NOW())-1;11;30);
DATE(YEAR(NOW())-1;MONTH(NOW());DAY(NOW()));DATE(YEAR(NOW())-1;11;30)));
ISBLANK(getDataForTeb[REZYGNACJA_DATA]))
Y-1 Kandydaci cumulative = CALCULATE(
DISTINCTCOUNT(getDataForTeb[ID_DANE_OSOBOWE]);
FILTER(
ALL (getDataForTeb);
AND (
getDataForTeb[Złożenie podania] <= MAX(getDataForTeb[Złożenie podania])-364;
AND (
getDataForTeb[Złożenie podania] <= DATE(YEAR(NOW())-1; 11; 30);
getDataForTeb[Złożenie podania] >= DATE(YEAR(NOW())-1; 4; 1)
)
)
);
ISBLANK(getDataForTeb[REZYGNACJA_DATA])
)
Another interesting example just from a while ago: different file, no DAX involved:

Yes! This is the magic of DISTINCTCOUNT(). It has counted the number of distinct values for the [ID_DANE_OSOBOWE] column in each month, but when the measure is evaluated for all months, it does not double count the values which appear in more than one month.
Simplified:
| ID | Month |
+----+-------+
| 1 | March |
| 1 | April |
When you have a measure My Measure = DISTINCTCOUNT(tbl[ID]) for each month the value will be 1, but when you do a distinct count for all months then the value will still be 1 because there is only one distinct value.

In general when you're getting strange results when grand total (calculated automatically) is different then sum of partial results it is either the case explained by mendosi above (regarding DISTINCTCOUNT) because of switching filter context for each row of calculation, or because some calculations count BLANK values as 1 - in March 2019 update of PowerBI a new DAX function was introduced: DistinctCountNoBlank which eliminates counting BLANK values.

Related

Finding a lagged cumulative total from a table containing both cumulative and delta entries

I have a SQL table with a schema where a value is either a cumulative value for a particular category, or a delta on top of the previous value. While I appreciate this is not a particularly great design, it comes from an external source and thus I can't change it in any way.
The table looks something like the following:
Date Category AmountSoldType AmountSold
-----------------------------------------------------
Jan 1 Apples Cumulative 100
Jan 1 Bananas Cumulative 50
Jan 2 Apples Delta 20
Jan 2 Bananas Delta 10
Jan 3 Apples Delta 25
Jan 3 Bananas Cumulative 75
For this example, I want to produce the total cumulative number of fruits sold by item at the beginning of each day:
Date Category AmountSold
--------------------------------
Jan 1 Apples 0
Jan 1 Bananas 0
Jan 2 Apples 100
Jan 2 Bananas 50
Jan 3 Apples 170
Jan 3 Bananas 60
Jan 4 Apples 195
Jan 4 Bananas 75
Intuitively, I want to take the most recent cumulative total, and add any deltas that have appeared since that entry.
I imagine something akin to
SELECT Date, Category
LEAD((subquery??), 1) OVER (PARTITION BY Category ORDER BY Date) AS Amt
FROM Fruits
GROUP BY Date, Category
ORDER BY Date ASC
is what I want, but I'm having trouble putting the right subquery together. Any suggestions?
You seem to want to add the deltas to the most recent cumulative -- all before the current date.
If so, I think this logic does what you want:
select f.*,
(max(case when date = date_cumulative then amountsold else 0 end) over (partition by category
) +
sum(case when date > date_cumulative then amountsold else 0 end) over (partition by category order by date rows between unbounded preceding and 1 preceding
)
) amt
from (select f.*,
max(case when AmountSoldType = 'cumulative' then date else 0 end) over
(partition by category order by date rows between unbounded preceding and current_row
) as date_cumulative
from fruits f
) f
I'm a bit confused by this data set (notwithstanding the mistake in adding up the apples). I assume the raw data states end-of-day figures, so for example 20 apples were sold on Jan 2 (because there is a delta of 20 reported for that day).
In your example results, it does not appear valid to say that zero apples were sold on Jan 1. It isn't actually possible to say how many were sold on that day, because it is not clear whether the 100 cumulative apples were accrued during Jan 1 (and thus should be excluded from the start-of-day figure you seek) or whether they were accrued on previous days (and should be included), or some mix of the two. That day's data should thus be null.
It is also not clear whether all data sets must begin with a cumulative, or whether data sets can begin with a delta (which might require working backwards from a subsequent cumulative), and whether you potentially have access to multiple data sets from your external source which form a continuous consistent sequence, or whether "cumulatives" relate purely to a single data set received. I'm going to assume at least that all data sets begin with a cumulative.
All that said, this problem is a simple case of firstly converting all rows into either all deltas, or all cumulatives. Assuming we go for all cumulatives, then recursing through each row in order, it is a case of either selecting the AmountSold as-is (if the row is a cumulative), or adding the AmountSold to the result of the previous step (if it is a delta).
Once pre-processed like this, then for a start-of-day cumulative, it is all just a question of looking at the previous day's cumulative (which was an end-of-day cumulative, if my initial assumption was correct that all raw data relates to end-of-day figures).
Using the LAG function in this final step to get the previous day's cumulative, will also neatly produce a null for the first row.

Performing math on SELECT result rows

I have a table that houses customer balances and I need to be able to see when accounts figures have dropped by a certain percentage over the previous month's balance per account.
My output consists of an account id, year_month combination code, and the month ending balance. So I want to see if February's balance dropped by X% from January's, and if January's dropped by the same % from December. If it did drop then I would like to be able to see what year_month code it dropped in, and yes I could have 1 account with multiple drops and I hope to see that.
Anyone have an ideas on how to perform this within SQL?
EDIT: Adding some sample data as requested. On the table I am looking at I have year_month as a column, but I do have access to get the last business day date per month as well
account_id | year_month | ending balance
1 | 2016-1 | 50000
1 | 2016-2 | 40000
1 | 2016-3 | 25
Output that I would like to see is the year_month code when the ending balance has at least a 50% decline from the previous month.
First I would recommend making Year_Month a yyyy-mm-dd format date for this calculation. Then take the current table and join it to itself, but the date that you join on will be the prior month. Then perform your calculation in the select. So you could do something like this below.
SELECT x.*,
x.EndingBalance - y.EndingBalance
FROM Balances x
INNER JOIN Balances y ON x.AccountID = y.AccountID
and x.YearMonth = DATEADD(month, DATEDIFF(month, 0, x.YearMonth) - 1, 0)

MDX: group by with LastPeriods group by, Mondrian Schema: levelType hours

I'm new to MDX and Mondrian and have two time related questions:
1.)
The MDX command
SELECT NON EMPTY {[Country].[Country].Members} ON COLUMNS, [Time].[2012].[Q1 2012].[2].[2012-02-08]:[Time].[2012].[Q4 2012].[11].[2012-11-08] ON ROWS FROM [MyCube] WHERE {[Measures].[Sales]}
prints the result grouped by days:
2012-02-08 | 2873 | 9829 | ...
2012-02-09 | ...
But I want to define the date range in days and get the result grouped by months:
2012-02 | 34298| ...
2012-03 | ...
2.)
The Mondrian schema documentation lists the time level types TimeYears, TimeQuarters, TimeMonths and TimeDays. Is it possible to define hours too?
Thanks a lot.
1)
The range function in MDX returns members of the level you're using. In your case :
[Time].[2012].[Q1 2012].[2].[2012-02-08]:[Time].[2012].[Q4 2012].[11].[2012-11-08]
You're using days so that's why you're getting all days. Use month instead of days in your range function. In case you do not want the data before the 8th, an option would be using a subselect to filter :
SELECT
NON EMPTY {[Country].[Country].Members} ON COLUMNS,
[Time].[Your month level].members} ON ROWS
FROM (
SELECT
{[Measures].[Sales]} ON 0,
[Time].[2012].[Q1 2012].[2].[2012-02-08]:[Time].[2012].[Q4 2012].[11].[2012-11-08] ON 1,
FROM [MyCube] )
2) Don't know for Mondrian, but in any case you can create a time dimension based on an 'existing' table

Adjust date column for change over time

This is an easy enough problem, but wondering if anyone can provide a more elegant solution.
I've got a table that consists of a date column (month end dates over time) and several value columns--say the price on a variety of stocks over time, one column for each stock. I'd like to calculate the change in value columns for each period represented in the date column (eg, a daily return from a table filled with prices).
My current plan is to join the table to itself and simply create a new column for the return as ret = b.price/a.price - 1. Code as follows:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a, #temp b
where datediff(day, a.Date,b.Date) between 25 and 35
order by a.Date
This works fine, BUT:
(1) I need to do this for, say, dozens of stocks--is there a good way to replicate the calculation without copying and pasting the return calculation and replacing 'stock1' with each other stock name?
(2) Is there a better way to do this join? I'm effectively doing a cross join at this point and only keeping entries that are adjacent (as defined by the datediff and range), but wondering if there's a better way to join a table like this to itself.
EDIT: Per request, data is in the form (my data has multiple price columns though):
Date Price
7/1/1996 349.22
7/31/1996 337.72
8/30/1996 343.70
9/30/1996 357.23
10/31/1996 364.07
11/29/1996 385.04
12/31/1996 383.68
And from that, I'd like to calculate return, to generate a table like this (again, with additional columns for the extra price columns that exist in the actual table):
Date Ret
7/31/1996 -0.03
8/30/1996 0.02
9/30/1996 0.04
10/31/1996 0.02
11/29/1996 0.06
12/31/1996 0.00
I would do the following. First, use the month and year to do the self join. I woudl recommend you take the year * 12 + the month number to get a unique value for each month and year combination. So, Jan of 2011 would have a value of (2011 * 12 + 1 = 24133) and December of 2010 would have a value of (2010 * 12 + 12 = 24132). This will allow you to accurately compare months without having to mess with rolling over from December to January. Next, you need to supply the calculations in the select clause. If you have the stock values in different columns then you will have to type them out as a.stock1-b.stock1, a.stock2-b.stock2, etc. The only way around that would be to massage the data to where there is only one stock value column and add a stockname column that would identify what stock that value is for.
Using the Month and Year for the self join, the following query should work:
select Date, Ret = (b.stock1/a.stock1 - 1)
from #temp a
inner join #temp b on (YEAR(a.Date) * 12) + MONTH(a.Date) = (YEAR(b.Date) * 12) + MONTH(b.Date) + 1
order by a.Date

MDX last order date and last order value

I've googled but I cannot get the point
I've a fact table like this one
fact_order
id, id_date, amount id_supplier
1 1 100 4
2 3 200 4
where id_date is the primary key for a dimension that have
id date month
1 01/01/2011 january
2 02/01/2011 january
3
I would like to write a calculated member that give me the last date and the last amount for the same supplier.
Last date and last amount -- it's a maximum values for this supplier?
If "yes", so you can create two metrics with aggregation "max" for fields id_date and amount.
And convert max id_date to appropriate view in the following way:
CREATE MEMBER CURRENTCUBE.[Measures].[Max Date]
AS
IIF([Measures].[Max Date Key] is NULL,NULL,
STRTOMEMBER("[Date].[Calendar].[Date].&["+STR([Measures].[Max Date Key])+"]").name),
VISIBLE = 1 ;
It will works, If maximum dates in your dictionary have maximum IDs. In my opinion You should use date_id not 1,2,3..., but 20110101, 20110102, etc.
If you don't want to obtain max values - please provide more details and small example.