Counting items with multiple criteria - powerpivot

I have a table (getECRs) in PowerPivot.
Right now, I've been able to create a calculated column that counts how many times the row's customer ID (BAN) occurs in the BAN column with the following formula:
=CALCULATE(COUNTROWS(getECRs),ALLEXCEPT(getECRs,getECRs[BAN]))
What I'm having difficulty with is adding multiple criteria to the CALCULATE formula in PowerPivot.
Each row has a column that gives the date the request was generated _CreateDateKey. I'm trying to include criteria that would only include multiple BANs if they fall within 7 days (before or after) the _CreateDateKey for the row.
For example for one BAN, there are the following dates and their expected counts:
_CreateDateKey Count Explanation
6/13/2014 3 Does not include 6/23
6/13/2014 3 Does not include 6/23
6/16/2014 4 Includes all
6/23/2014 2 Does not include the 2 items from 6/13
In Excel I would use a COUNTIFS statement, like below to get the desired result (using table structure naming)
=COUNTIFS([BAN],[#BAN],[_CreateDateKey],">="&[#[_CreateDateKey]]-7,[_CreateDateKey],"<="&[#[_CreateDateKey]]+7)
But I can't seem to figure out the relative criteria needed for the dates. I tried the following as a criteria to the CALCULATE function, but it resulted in an error:
getECRs[_CreateDateKey]>=[_CreateDateKey]-7
Error: Column '_CreateDateKey' cannot be found or may not be used in this expression.

This formula answers your specific question. It's a good pattern to get down as it's highly re-usable - the EARLIER() is referencing the value of the current row (slightly more complex than this but that is the end result):
=
CALCULATE (
COUNTROWS ( getECRs ),
FILTER (
getECRs,
getECRs[BAN] = EARLIER ( getECRs[BAN] )
&& getECRs[_CreateDateKey]
>= EARLIER ( getECRs[_CreateDateKey] ) - 7
&& getECRs[_CreateDateKey]
<= EARLIER ( getECRs[_CreateDateKey] ) + 7
)
)
Fundamentally you should probably be looking to get away from the 'Excel mindset' of using a calculated column and deal with this using a measure.
An adaptation of the above would look like this - it would use the filter context of the PIVOT in which you were using it (e.g. if BAN was rows then you would get the count for that BAN).
You may need to adjust the ALL() if is too 'open' for your real world context and you might have to deal with totals using HASONEVALUE():
=
CALCULATE (
COUNTROWS ( getECRs ),
FILTER (
ALL(getECRs),
getECRs[_CreateDateKey] >= MAX ( getECRs[_CreateDateKey] ) - 7 &&
getECRs[_CreateDateKey] <= MAX ( getECRs[_CreateDateKey] ) + 7
)
)

Related

How to pull rows from a SQL table until quotas for multiple columns are met?

I've been able to find a few examples of questions similar to this one, but most only involve a single column being checked.
SQL Select until Quantity Met
Select rows until condition met
I have a large table representing facilities, with columns for each type of resource available and the number of those specific resources available per facility. I want this stored procedure to be able to take integer values in as multiple parameters (representing each of these columns) and a Lat/Lon. Then it should iterate over the table sorted by distance, and return all rows (facilities) until the required quantity of available resources (specified by the parameters) are met.
Data source example:
Id
Lat
Long
Resource1
Resource2
...
1
50.123
4.23
5
12
...
2
61.234
5.34
0
9
...
3
50.634
4.67
21
18
...
Result Wanted:
#latQuery = 50.634
#LongQuery = 4.67
#res1Query = 10
#res2Query = 20
Id
Lat
Long
Resource1
Resource2
...
3
50.634
4.67
21
18
...
1
50.123
4.23
5
12
...
Result includes all rows that meet the queries individually. Result is also sorted by distance to the requested lat/lon
I'm able to sort the results by distance, and sum the total running values as suggested in other threads, but I'm having some trouble with the logic comparing the running values with the quota provided in the params.
First I have some CTEs to get most recent edits, order by distance and then sum the running totals
WITH cte1 AS (SELECT
#origin.STDistance(geography::Point(Facility.Lat, Facility.Long, 4326)) AS distance,
Facility.Resource1 as res1,
Facility.Resource2 as res2
-- ...etc
FROM Facility
),
cte2 AS (SELECT
distance,
res1,
SUM(res1) OVER (ORDER BY distance) AS totRes1,
res2,
SUM(res1) OVER (ORDER BY distance) AS totRes2
-- ...etc, there's 15-20 columns here
FROM cte1
)
Next, with the results of that CTE, I need to pull rows until all quotas are met. Having the issues here, where it works for one row but my logic with all the ANDs isn't exactly right.
SELECT * FROM cte2 WHERE (
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND
(totRes2 <= #res2Query OR (totRes2 > #res2Query AND totRes2- res2 <= #totRes2)) AND
-- ... I also feel like this method of pulling the next row once it's over may be convoluted as well?
)
As-is right now, it's mostly returning nothing, and I'm guessing it's because it's too strict? Essentially, I want to be able to let the total values go past the required values until they are all past the required values, and then return that list.
Has anyone come across a better method of searching using separate quotas for multiple columns?
See my update in the answers/comments
I think you are massively over-complicating this. This does not need any joins, just some running sum calculations, and the right OR logic.
The key to solving this is that you need all rows, where the running sum up to the previous row is less than the requirement for all requirements. This means that you include all rows where the requirement has not been met, and the first row for which the requirement has been met or exceeded.
To do this you can subtract the current row's value from the running sum.
You could utilize a ROWS specification of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with NULL on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
You can also factor out the distance calculation into a CROSS APPLY (VALUES, avoiding the need for lots of CTEs or derived tables. You now only need one level of derivation.
DECLARE #origin geography = geography::Point(#latQuery, #LongQuery, 4326);
SELECT
f.Id,
f.Lat,
f.Long,
f.Resource1,
f.Resource2
FROM (
SELECT f.*,
SumRes1 = SUM(f.Resource1) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource1,
SumRes2 = SUM(f.Resource2) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource2
FROM Facility f
CROSS APPLY (VALUES(
#origin.STDistance(geography::Point(f.Lat, f.Long, 4326))
)) v1(Distance)
) f
WHERE (
f.SumRes1 < #res1Query
OR f.SumRes2 < #res2Query
);
db<>fiddle
Was able to figure out the problem on my own here. The primary issue I was running into was that I was comparing 25 different columns' running totals versus the 25 stored proc parameters (quotas of resources required by the search).
Changing the lines such as these
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND --...
to
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1) OR #res1Query = 0) AND --...
(adding in the OR #res1Query = 0)solved my issue.
In other words, the search is often only for one or two columns (types of resources) - leaving others as zero. The way my logic was set up caused it to skip over lots of rows because it was instantly marking them as having met the quota (value less than or equal to the quota). like #A Neon Tetra suggested, was pretty close to it already.
Update:
First attempt didn't exactly fix my own issue. Posting the stripped down version of my code that is now working for me.
DECLARE #Lat AS DECIMAL(12,6)
DECLARE #Lon AS DECIMAL(12,6)
DECLARE #res1Query AS INT
DECLARE #res2Query AS INT
-- repeat for Resource 3 through 25, etc...
DECLARE #origin geography = geography::Point(#Lat, #Lon, 4326);
-- CTE to be able to expose distance
cte AS (SELECT TOP(99999) -- --> this is hacky, it won't let me order by distance unless I'm selecting TOP(x) or some other fn?
dbo.Facility.FacilityGUID,
dbo.Facility.Lat,
dbo.Facility.Lon,
#origin.STDistance(geography::Point(dbo.Facility.Lat, dbo.Facility.Lon, 4326))
AS distance,
dbo.Facility.Resource1 AS res1,
dbo.Facility.Resource2 AS res2,
-- repeat for Resource 3 through 25, etc...
FROM dbo.Facility
ORDER BY distance),
-- third CTE - has access to distance so we can keep track of a running total ordered by distance
---> have to separate into two since you can't reference the same alias (distance) again within the same SELECT
fullCTE AS (SELECT
FacilityID,
Lat,
Long,
distance,
res1,
SUM(res1) OVER (ORDER BY distance)AS totRes1,
res2,
SUM(res2) OVER (ORDER BY distance)AS totRes2,
-- repeat for Resource 3 through 25, etc...
FROM cte)
SELECT * -- Customize what you're pulling here for your output as needed
FROM dbo.Facility INNER JOIN fullCTE ON (fullCTE.FacilityID = dbo.Facility.FacilityID)
WHERE EXISTS
(SELECT
FacilityID
FROM fullCTE WHERE (
FacilityID = dbo.Facility.FacilityID AND
-- Keep pulling rows until all conditions are met, as opposed to pulling rows while they're under the quota
NOT (
((totRes1 - res1 >= #res1Query AND #res1Query <> 0) OR (#res1Query = 0)) AND
((totRes2 - res2 >= #res2Query AND #res2Query <> 0) OR (#res2Query = 0)) AND
-- repeat for Resource 3 through 25, etc...
)
)
)

DAX Moving Average Based on 4 Level Index

I have trouble calculating MA for products' prices. Data is in the following format:
Region has municipalities, products are sold there falling into certain categories by dates on a weekly basis, and not all weeks are filled due to seasonality.
I found a ranking formula and adjusted based on these 4 criteria. Here is the DAX expression for ranking (calculated column):
index2 =
RANKX (
FILTER (
_2017,
EARLIER ( _2017[RegionName] ) = _2017[RegionName] &&
EARLIER ( _2017[MunicipalityName] ) = _2017[MunicipalityName] &&
EARLIER (_2017[ProductCategoryName] )= _2017[ProductCategoryName] &&
EARLIER ( _2017[ProductName] ) = _2017[ProductName]
),
_2017[StartDateTime],
,
ASC
)
After a change in product name, the index resets. All is good till here. But when I try to add any kind of running total according to this index, it seems to calculate prices' sum for all products, giving the same result at index reset and so on for every product.
Here are some measures I've tried:
cummulatives =
VAR ind = MAX(_2017[index2])-3
VAR m1=
CALCULATE(
SUM(_2017[SellingPrice]),
FILTER(
ALL(_2017),
_2017[index2]<=ind))
VAR m2=
CALCULATE(
COUNT(_2017[SellingPrice]),
FILTER(
ALL(_2017),
_2017[index2]<=ind))
RETURN m2
Attached is an image of the table. Any help would be much appreciated. Thanks!
If you want the cumulative sum to reset with a new ProductName, then that has to be part of your filter context. You've removed that context using the ALL() function.
You can either put it back into the filter context or else not remove it in the first place. I would suggest the latter by using ALLEXCEPT(_2017, _2017[ProductName]) instead of ALL(_2017)

If Statements For Power Pivot

I'm trying to figure out how to calculate my compliance % measures based on if statements.
If [alias]=company A, then the percentage should equal 100%. If it does not, then it should calculate the total complying spend/total overall spend.
However, when I tried to set up the if statement it gives me an error and says that the single value for "alias" column cannot be determined.
I have tried If(Values) statements, but I need it to return more than one value.
Measures always aggregate. The question is what you want the compliance calculation to be when you're looking at 2 companies? 3 companies? Right now, neither your question nor your formula accounts for this possibility at all, hence the error.
If you're thinking "Compliance % doesn't make sense if you're looking at more than one company", then you can write your formula to show BLANK() if there's more than one company:
IF (
HASONEVALUE ( 'Waste Hauling Extract'[Alias] ),
IF (
VALUES ( 'Waste Hauling Extract'[Alias] ) = "company A",
[PCT-Compliant],
[PCT Non-compliant]
),
BLANK ()
)
If you want something else to happen when there's more than one company, then DAX functions like CALCULATE, SUMX or AVERAGEX would allow you to do what you want to do.
The trick with DAX generally is that the formula has to make sense not just on individual rows of a table (where Alias has a unique value), but also on subtotals and grand totals (where Alias does not have a unique value).
Based on your comment that any inclusion of company A results in 100%, you could do something such as:
IF (
ISBLANK (
CALCULATE (
COUNTROWS ( 'Waste Hauling Extract' ),
FILTER ( 'Waste Hauling Extract', 'Waste Hauling Extract'[Alias] = "company A" )
)
),
[PCT Non-compliant],
[PCT-Compliant]
)
The new CALCULATE statement filters the Waste Hauling Extract table to just company A rows, and then counts those rows. If there are no company A rows, then after the filter it will be an empty table and the row count will be blank (rather than 0). I check for this with ISBLANK() and then display either the Non-Compliant or Compliant number accordingly.
Note: the FILTER to just company A only applies to the CALCULATE statement; it doesn't impact the PCT measures at all.

DAX formula for - MAX of COUNT

I have the below dataset:
using the measure:
BalanceCount := COUNT(Balances[Balance])
which gives me the result:
However, I want the Grand Total to show the maximum amount of the BalanceCount, which is 2.
NewMeasure:=
MAXX(
SUMMARIZE(
FactTable
,FactTable[Account]
,FactTable[MonthEnd]
)
,[BalanceCount]
)
SUMMARIZE() groups by the columns specified, and MAXX() iterates through the table specified, returning the maximum of the expression in the second argument evaluated for each row in its input table.
Since the filter context will limit the rows of the fact table, we'll only have the appropriate subsets in each column/row grand total.
I found a solution that works for this particular case. It will not work if columns other than Account and MonthEnd are included in the filter context.
MaxBalanceCount:=
MAXX ( SUMMARIZE (
Balances,
Balances[Account],
Balances[MonthEnd]
),
CALCULATE ( COUNTROWS ( Balances ) )
)

MDX Range with non existing members

I have a MDX query where the string is parsed from a front-end application with forms. End users can restrict a query based on a free text field where they can input a range from invoice number and to invoice number. The query gets build based on this 2 parameters:
SELECT
{[Measures].[Amount]} ON COLUMNS,
NON EMPTY
(
(
[Invoices].[Invoice Number].[Invoice Number].[100000000]:[Invoices].[Invoice Number].[Invoice Number].[222222222])
) ON ROWS
FROM [MyCube]
However the range operator fails if an end user types in a non existing member. I think that I need to convert these fields somehow to a decimal number and than do the check with > and <.
I already have some ideas. However I am not able to get it to work. Here I try to just filter on numbers > 0 (If this works I can fill the the parameter for > and add one for <.
SELECT
{[Measures].[Amount]} ON COLUMNS,
(
FILTER(
[Invoices].[Invoice Number].[Invoice Number].members
, Cdec([Invoices].[Invoice Number].Currentmember.Properties("Key")) > 0
)
) ON ROWS
FROM [MyCybe]
However after 5 minutes I still have no response.. so cancelled the query.
I've calculated this way (it's for dates, but the idea is the same):
with
member [Measures].[RD_Key] as CDec([Report Date].[Report Date ID].Currentmember.Member_Key)
member [Measures].[ResultFilter] as [RD_Key]>20130801 and [RD_Key]<20131013
select {[Measures].[Count],[Measures].[RD_Key]} on 0
,Filter([Report Date].[Report Date ID].members,[Measures].[ResultFilter]) on 1
FROM [DATA]
If FILTER is slow maybe try using the HAVING clause. I'm assuming that no conversion of the Member_Key is needed as keys are usually numeric:
WITH
MEMBER [Measures].[Inv_Key] as
[Invoices].[Invoice Number].Currentmember.Member_Key
SELECT
{
[Measures].[Amount]
,[Measures].[Inv_Key]
} ON COLUMNS
,[Invoices].[Invoice Number].[Invoice Number].members
HAVING
[Measures].[Inv_Key] > 100000000
AND
[Measures].[Inv_Key] < 222222222
ON ROWS
FROM [DATA]