How to filter rows by the result of a calculation - sql

The below code creates a new column "ZScore"
SELECT [Cardholder Name], [Debit Amount], ([Debit Amount] - AVG([Debit Amount]) OVER ()) / (STDEV([Debit Amount]) OVER ()) as [ZScore]
FROM ['Card Data']
ORDER BY [ZScore] DESC;
What I am trying to do is only display the rows where ZScore is >= 3. I have tried the following but everything seems to throw an error.
SELECT [Cardholder Name], [Debit Amount], ([Debit Amount] - AVG([Debit Amount]) OVER ()) / (STDEV([Debit Amount]) OVER ()) as [ZScore]
FROM ['PCard Output']
HAVING (([Debit Amount] - AVG([Debit Amount]) OVER ()) / (STDEV([Debit Amount]) OVER ())) > 3
ORDER BY [ZScore] DESC;
What would be the correct way to only display rows where the calculated z score is >= 3?

Just use a subquery:
SELECT cd.*
FROM (SELECT [Cardholder Name], [Debit Amount], ([Debit Amount] - AVG([Debit Amount]) OVER ()) / (STDEV([Debit Amount]) OVER ()) as [ZScore]
FROM ['Card Data']
) cd
WHERE ZScore > 3
ORDER BY [ZScore] DESC;

Related

Performance issue when converting MDX to DAX queries

I have a MDX query which I want to convert to DAX to improve performance, however the result is not as expected, MDX took 11 secs to complete while DAX was 34 secs. any suggestion to improve the DAX query
MDX Query:
SELECT
{
[Measures].[Internet Total Sales]
} ON COLUMNS,
ORDER(
NONEMPTY
(
{
[Product].[Model Name].[Model Name].AllMembers *
[Product].[Product Line].[Product Line].AllMembers *
[Product].[Product Name].[Product Name].AllMembers *
[Customer].[First Name].[First Name].AllMembers *
[Customer].[Last Name].[Last Name].AllMembers
},
{
[Measures].[Internet Total Sales]
}
),
[Product].[Model Name].CurrentMember.MemberValue, ASC
) ON ROWS
FROM [Model]
DAX Query:
EVALUATE
CALCULATETABLE
(
FILTER
(
SUMMARIZE
(
CROSSJOIN('Product', 'Customer'),
[Model Name],
[Product Line],
[Product Name],
[First Name],
[Last Name],
"Internet Total Sales",
[Internet Total Sales]
),
NOT ISBLANK([Internet Total Sales])
)
)
ORDER BY [Model Name] ASC
Thank you.
EVALUATE
SUMMARIZECOLUMNS(
'Product'[Model Name],
'Product'[Product Line],
'Product'[Product Name],
'Customer'[First Name],
'Customer'[Last Name],
"Internet Total Sales", [Internet Total Sales]
)
ORDER BY 'Product'[Model Name]

Why is my SQL LEFT JOIN query not working?

I am trying to perform some summarizing actions on two different tables and then join them in a single query, but I keep getting the "Syntax error in JOIN operation"
Any advice would be much appreciated. Thanks in advance.
This is my code:
SELECT [Payment File - Q4].[Accident Number] AS [Accident Number],
Count([Payment File - Q4].[Accident Number]) AS [Q4 Count],
Sum([Payment File - Q4].[PI Amount]) AS [Q4 SumOfPI Amount],
Sum([Payment File - Q4].[PD Amount]) AS [Q4 SumOfPD Amount]
FROM [Payment File - Q4] AS [Q4]
LEFT JOIN (
SELECT [Payment File - Q2].[Accident Number] AS [Accident Number],
Count([Payment File - Q2].[Accident Number]) AS [Q2 Count],
Sum([Payment File - Q2].[PI Amount]) AS [Q2 SumOfPI Amount],
Sum([Payment File - Q2].[PD Amount]) AS [Q2 SumOfPD Amount]
FROM [Payment File - Q2]
WHERE ((([Payment File - Q2].[Input Date]) < #1/7/2019#))
GROUP BY [Payment File - Q2].[Accident Number]
) AS [Q2]
ON [Q2].[Accident Number] = [Q4].[Accident Number]
WHERE ((([Payment File - Q4].[Input Date]) < #1/7/2019#))
GROUP BY [Q4].[Accident Number];
I 100% agree with Gordon's comment, to get this working you'll need to do the following:
Your subquery makes use of aggregation functions Sum() and Count() therefore it will need to contain a GROUP BY. In other words your subquery must be able to run and produce a result set all on its own. Right now it will just error due to the lack of a GROUP BY clause.
Your subquery needs an alias. You can't just reference the table name again in the main query as that table is out of context (it only can be referenced inside the same SQL/subquery where it is contained in the FROM clause).
You will need to put your Q4 stuff inside of its own subquery. Otherwise you will being doing a Many:1 join. Both tables need to be aggregated at the [Accident Number] level BEFORE being joined otherwise you risk artificially inflating your sum() and count() because they will be performed AFTER the join.
You will need to add the columns from your subquery into your main query otherwise it's pointless to JOIN it in.
Consider this rewrite:
SELECT Q4.[Accident Number],
[Q4 Count],
[Q4 SumOfPI Amount],
[Q4 SumOfPD Amount],
[Q2 Count],
[Q2 SumOfPI Amount],
[Q2 SumOfPD Amount]
FROM
(
SELECT [Payment File - Q4].[Accident Number] AS [Accident Number],
Count([Payment File - Q4].[Accident Number]) AS [Q4 Count],
Sum([Payment File - Q4].[PI Amount]) AS [Q4 SumOfPI Amount],
Sum([Payment File - Q4].[PD Amount]) AS [Q4 SumOfPD Amount]
FROM [Payment File - Q4].[Accident Number]
GROUP BY [Payment File - Q4].[Accident Number]
WHERE (((Q4.[Input Date])<#1/7/2019#))
) AS Q4
LEFT JOIN
(
SELECT [Payment File - Q2].[Accident Number] AS [Accident Number],
Count([Payment File - Q2].[Accident Number]) AS [Q2 Count],
Sum([Payment File - Q2].[PI Amount]) AS [Q2 SumOfPI Amount],
Sum([Payment File - Q2].[PD Amount]) AS [Q2 SumOfPD Amount]
FROM [Payment File - Q2].[Accident Number]
GROUP BY [Payment File - Q2].[Accident Number]
) AS Q2
ON Q2.[Accident Number]=Q4.[Accident Number]
So now what this is doing is getting all accident records from your Q4 table and restricting the records for < January 7th 2019 from that table. It then sums up your PI and PD for each accident number in Q4. Next it heads to Q2 table and grabs every distinct Accident number and sums up it's PI and PD. Then it takes the Q4 accident numbers and searches the summed up Q2 accident numbers for a match. It displays all of the summed up Q4 (< January 7th 2019) accident numbers and any of the matching accident numbers it found from the summed up Q2 accident numbers.
That may or may not be what you were looking for.

Adding averages to SQL query

I am trying to create a query that shows me the average sales between different countries but I also want to add a 4th column with which shows the total average for that country...
This is my query to select the averages between countries which works ok:
SELECT Avg([Sales]), [From Country],[To Country],
FROM [DB]
GROUP by [From Country],[To Country]
But I also want to add a 4th column that gives the total average sales for [From Country], can this be done?
You have to be careful here. You can use window functions but to get an unbiased average, you need to calculate the total by the total count:
SELECT [From Country], [To Country], Avg([Sales]),
SUM(SUM(Sales)) OVER (PARTITION BY [From Country]) / SUM(COUNT(*)) OVER (PARTITION BY [From Country])
FROM [DB]
GROUP by [From Country], [To Country];
Note that the results are different from:
AVG(AVG(SALES)) OVER (PARTITION BY [From Country])
This is a biased (or weighted) average that treats each TO COUNTRY equally.
You could also make use of a ROLLUP
SELECT [From Country], [To Country], [Avg], [Avg From]
FROM
(
SELECT [From Country], [To Country]
, AVG([Sales]) AS [Avg]
, MAX(CASE WHEN GROUPING_ID([From Country], [To Country]) = 1 THEN AVG([Sales]) END) OVER (PARTITION BY [From Country]) AS [Avg From]
, GROUPING_ID([From Country], [To Country]) AS GroupingId
FROM [DB]
GROUP BY [From Country], [To Country]
WITH ROLLUP
) q
WHERE GroupingId = 0;
Test here

MDX How do you create a variance and variance % in a report

Using AdventureWorksDW2008R I have the following DataSet
SELECT NON EMPTY {
[Measures].[Sales Amount], [Measures].[Total Product Cost], [Measures].[Internet Sales Count]
} ON COLUMNS, NON EMPTY
{
([Order Date].[Calendar Year].[Calendar Year].ALLMEMBERS )
} DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS
FROM [Adventure Works Cube]
Resutls are:
Sales Amount Total Product Cost Internet Sales Count
2005 4342674.0296 2562584.6235 8949
2008 25016003.1911002 14715208.9522001 51449
Is there a way to calculate the variance of each in the report?
For example the Variance of Internet Sales Count would be:
51449 – 8949 = 42500
And the % variance would be
42500/51449 = 83%
I know I can use the following to get the Sum:
=Sum(Fields!Internet_Sales_Count.Value, "DataSet1")
Is there a way to get the 2008 value and subtract the 2005 value?
Here is one possibility:
WITH
MEMBER [Measures].[Internet Sales diff] AS
(
[Delivery Date].[Calendar Year].CurrentMember
,[Measures].[Internet Sales Amount]
)
-
(
[Delivery Date].[Calendar Year].CurrentMember.Lag(1)
,[Measures].[Internet Sales Amount]
), format_string = '#,###,###,##0.00'
SELECT
NON EMPTY
{
[Measures].[Sales Amount]
,[Measures].[Total Product Cost]
,[Measures].[Internet Sales Amount]
,[Measures].[Internet Sales diff]
} ON COLUMNS
,NON EMPTY
{[Delivery Date].[Calendar Year].[Calendar Year].ALLMEMBERS}
DIMENSION PROPERTIES
MEMBER_CAPTION
,MEMBER_UNIQUE_NAME
ON ROWS
FROM [Adventure Works];
The result of the above is the following:
A percentage measure could then be added like this:
WITH
MEMBER [Measures].[Internet Sales diff] AS
(
[Delivery Date].[Calendar Year].CurrentMember
,[Measures].[Internet Sales Amount]
)
-
(
[Delivery Date].[Calendar Year].CurrentMember.Lag(1)
,[Measures].[Internet Sales Amount]
)
,format_string = '#,###,###,##0.00'
MEMBER [Measures].[Internet Sales diff %] AS
IIF
(
[Measures].[Internet Sales Amount] = 0
,null
,
[Measures].[Internet Sales diff]
/
(
[Delivery Date].[Calendar Year].CurrentMember.Lag(1)
,[Measures].[Internet Sales Amount]
)
)
,format_string = '#,###,###,##0.00%'
SELECT
NON EMPTY
{
[Measures].[Sales Amount]
,[Measures].[Total Product Cost]
,[Measures].[Internet Sales Amount]
,[Measures].[Internet Sales diff]
,[Measures].[Internet Sales diff %]
} ON COLUMNS
,NON EMPTY
{[Delivery Date].[Calendar Year].[Calendar Year].ALLMEMBERS}
DIMENSION PROPERTIES
MEMBER_CAPTION
,MEMBER_UNIQUE_NAME
ON ROWS
FROM [Adventure Works];
Results in this:
Here is a better approach using the parallelperiod function:
WITH
MEMBER [Measures].[Internet Sales PrevYr] AS
IIF
(
[Measures].[Internet Sales Amount] = 0
,null
,(
[Measures].[Internet Sales Amount]
,ParallelPeriod
(
[Delivery Date].[Calendar Year].[Calendar Year]
,1
,[Delivery Date].[Calendar Year].CurrentMember
)
)
)
,format_string = '$#,###,###,##0.00'
MEMBER [Measures].[Internet Sales diff] AS
IIF
(
[Measures].[Internet Sales Amount] = 0
,null
,
[Measures].[Internet Sales Amount] - [Measures].[Internet Sales PrevYr]
)
,format_string = '$#,###,###,##0.00'
MEMBER [Measures].[Internet Sales diff %] AS
IIF
(
[Measures].[Internet Sales PrevYr] = 0
,null
,
[Measures].[Internet Sales diff] / [Measures].[Internet Sales PrevYr]
)
,format_string = '#,###,###,##0.00%'
SELECT
NON EMPTY
{
[Measures].[Internet Sales Amount]
,[Measures].[Internet Sales PrevYr]
,[Measures].[Internet Sales diff]
,[Measures].[Internet Sales diff %]
} ON COLUMNS
,NON EMPTY
{[Delivery Date].[Calendar Year].[Calendar Year].MEMBERS} ON ROWS
FROM [Adventure Works];
Results:

MDX, first SUM then AVERAGE

I have a fact model that contains the following data:
WorkOrderNumber | WorkOrderLineNumber | Cost
So I have a dimension WorkOrder:
[WorkOrder].[WorkOrderNumber]
[WorkOrder].[WorkOrderLineNumber]
And a Measure group with the following Measure:
[Measures].[Cost]
I am trying to create a calculate measure:
[Measures].[Average WorkOrder Cost]
This must be calculated by Summing up the values per Work Order and afterwards taking an average of all these sums per workorders.
However I can not seem to get it working.
CASE WHEN [WorkOrder].[WorkOrder].CurrentMember = [WorkOrder].[WorkOrder].[All]
THEN
/* the work order is not selected -> AVG*/
DIVIDE(SUM(),[Measures].[Cost]), Count()) ))
ELSE
/* the Work Order is selected -> SUM*/
SUM([Measures].[Cost])
END
Here is an example of taking an average per sales order using the Adventure Works cube. You could replace with your [Cost] and [WorkOrderNumber]...
with member [CalculatedAvg] as
AVG([Internet Sales Order Details].[Sales Order Number].[Sales Order Number], [Measures].[Internet Sales Amount])
, FORMAT_STRING = "$#,##0.00"
select
{
[Measures].[Internet Sales Amount],
[Measures].[Internet Order Count],
[Measures].[Internet Average Sales Amount],
[CalculatedAvg]
} on 0,
[Date].[Calendar].[Calendar Year].members on 1
from
[Direct Sales]
for example...
AVG([WorkOrder].[WorkOrderNumber].[WorkOrderNumber], [Measures].[Cost])