I'm trying to write a DAX calculated measure which uses nested aggregates to perform a calculation based on the user specified query context, and I'm having trouble figuring out how to apply the query context to the inner aggregation as well as the outer. The simplified structure of my tabular data is below, where each Sale record represents the sale of a single widget by one user to another for the given sale price:
Schema
And some sample data:
Data
The calculated measure query itself is below, which is essentially attempting to calculate on average how good of a bargain each buyer gets for their purchases. It does this by performing the following calculations:
1) Inner 'Calculate': For each Sale record, Calculate the average Price the given Seller sells their widgets for (potentially filtered).
2) Outer 'Calculate': For each Sale record, Calculate the average of the Price minus the amount calculated in #1, essentially giving a differential in actual vs expected sale amounts.
Avg Actual/Expected Differential :=
CALCULATE (
AVERAGEX (
Sale,
Sale[Price]
- CALCULATE ( AVERAGEX ( Sale, Sale[Price] ), ALLEXCEPT ( Sale, Sale[Seller] ) )
)
)
This formula works in the standard case where no query filters are being applied but User related filters. For example, the actual vs expected sales to Dale are:
Sale from Larry for $2, where Larry's average sale is $3.5. Differential = -$1.5
Sale from Bob for $5, where Bob's average sale is $3. Differential = $2
Sale from John for $4, where John's average sale is $4. Differential = $0
Thus, the Avg Actual/Expected Differential is $.5 / 3 = $.17.
The problem I'm running into is applying a query filter on the IsCashSale field and having that filter apply to both the inner and outer Calculate functions.
For example, if I want to filter both inner and outer to only include Sale records which have a True IsCashSale value I can create the appropriate filter in the UI and the outer Calculate function is filtered appropriately, however when the inner Calculate runs, it removes this filter on IsCashSale as a result of the AllExcept function which removes the filters from all columns except the Seller field.
I tried including the Sale[IsCashSale] amount in the AllExcept field list, however because of the current row context, this only includes all records which have the same IsCashSale amount value as the current record of the inner 'Calculate' loop. This causes the scenario where no filter has been applied to IsCashSale to be incorrect.
I believe I may be looking for a way to pass the selected Values from the outer Calculate to the inner one? If I can do that, I can include those values as filters on the inner Calculate. Is there a way to use something like the Earlier or AllSelected functions to do this? Or is there another way to do what I'm attempting to do? Thanks!
How about using the ISFILTERED function?
Avg Actual/Expected Differential 2:=IF(ISFILTERED([IsCashSale]),CALCULATE (
AVERAGEX (
Sale,
Sale[Price]
- CALCULATE ( AVERAGEX ( Sale, Sale[Price] ), ALLEXCEPT ( Sale, Sale[Seller],Sale[IsCashSak]))
)
),
CALCULATE (
AVERAGEX (
Sale,
Sale[Price]
- CALCULATE ( AVERAGEX ( Sale, Sale[Price] ), ALLEXCEPT ( Sale, Sale[Seller]))
)
)
)
Now you will see the results like this when the IsCashSale column is filtered, and when it is not filtered it will behave the original way.
Are these the results that you wanted?
+----+--------+-------+-----------+--------------+------------------------------------+
| Id | Seller | Buyer | IsCashSak | Sum of Price | Avg Actual/Expected Differential 2 |
+----+--------+-------+-----------+--------------+------------------------------------+
| 1 | Bob | John | TRUE | 1 | 0 |
| 2 | John | Bob | TRUE | 2 | -1 |
| 3 | Dale | Bob | TRUE | 1 | -0.5 |
| 8 | Sue | Bob | TRUE | 3 | 0 |
| 10 | John | Dale | TRUE | 4 | 1 |
| 13 | Dale | Kelly | TRUE | 2 | 0.5 |
+----+--------+-------+-----------+--------------+------------------------------------+
Ok, I think I've found a solution, but it's a bit convoluted. It requires breaking out the sum of the individual purchases by the buyer from the average sales for each seller. I believe the big difference was that by using the VALUES(Sale[Seller]) set definition rather than just Sale it allows us to keep whatever filters are in place on the IsCashSale field in the outer Calculate function:
Avg Actual/Expected Differential:=CALCULATE(
(
SUM(Sale[Price]) --Sum of Buyer purchases
- CALCULATE (
SUMX(
VALUES(Sale[Seller]) --Calculate for each unique Seller
,CALCULATE(COUNTROWS(Sale)) --Need to multiply by number of purchases from this particular Seller
* CALCULATE ( --Get Seller's average sale price
AVERAGEX (Sale, Sale[Price])
,ALLEXCEPT(Sale, Sale[Seller], Sale[IsCashSale])
)
)
)
)
/ COUNTROWS(Sale) --Divide by total number of sales to get average
)
Related
I am performing some data analysis on users who have made transactions over the course of three months.
What I would like to do is identify customers who made specific transaction types (Credit) in every single month present in the data table over those two years. As you can see in the data table below, User A has performed a Credit transaction in months 1,2,3 and I would like a flag saying "Frequent" applied to the customer.
User B, however, has not performed a credit transaction every month (month 2 was Debit), and so I would like them to have a different flag name (e.g. "Infrequent").
How can I use SQL to identify if a user has made a specific transaction type each month?
| Date | User | Amount | Transaction Type | **Flag ** |
| 2022-01-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-03-15 | A | $15.00 | Credit | **Flag ** |
...
...
| 2022-01-15 | B | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | B | $15.00 | Debit | **Flag ** |
...
| 2022-03-15 | B | $15.00 | Credit | **Flag ** |
I have tried the following - hoping there is a better or more simple way.
SELECT
Date, User, Amount, Transaction_Type,
CASE WHEN Count(present) = 3 THEN 'Frequent' ELSE 'Infrequent'
FROM Transactions
LEFT JOIN (
SELECT
User,Month(Date),Count(Transaction_Type) as present
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User,Month(Date)
Having
Count(Transaction_Type) > 0
) subquery
ON subquery.User = Transaction.User
GROUP BY
Date,User,Amount,Transaction_Type
That is the way I would approach it. Assuming you are using T-SQL I would make the following changes. Instead of having the LEFT JOIN be to a sub-query, I would make the sub-query a CTE and then joint to that. I find it easier to grok when the main query is not full of sub-queries and you can test the CTE on its own more easily, plus if performance becomes an issue is relatively trivial to convert the CTE to a temp table. without affecting the main query too much.
You have a couple of problems I think. the first is that your subquery is going to return you the count of the credits in each month. If I make 3 credits in January this is going to flag me as frequent because the total is more than 3. You probably want to do a
COUNT(DISTINCT Transaction_type) AS hasCredit
to identify if there is AT LEAST ONE credit transaction, then have another aggregation that
SUM(hasCredit)
to get the number of months in which a credit appears.
using nested sub-queries means your LEFT JOIN would now be two sub-queries deep and dissapearing off the right hand side of your screen. Writing them as CTEs keeps the main logic clean and script narrow.
I think this does what you need, but can't test it because I don't have any sample data.
WITH CTE_HasCredit AS
(
SELECT
User
,Month(Date) AS [TransactionMonth]
,Count(DISTINCT Transaction_Type) AS [hasCredit]
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User
,Month(Date)
Having
Count(Transaction_Type) > 0
)
,
CTE_isFrequent AS
(
SELECT
User
,SUM(hasCredit) AS [TotalCredits]
FROM
CTE_HasCredit
GROUP BY
User
)
SELECT
TXN.Date
, TXN.User
, TXN.Amount
, TXN.Transaction_Type
,CASE
WHEN FRQ.TotalCredits >= 3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType]
FROM
Transactions AS TXN
LEFT JOIN
CTE_isFrequent AS FRQ ON FRQ.User = TXN.User
GROUP BY
TXN.Date
,TXN.User
,TXN.Amount
,TXN.Transaction_Type
I don't think you need the GROUP BY on the main query either; it would de-dupe transactions for the same day for the same amount.
You might also want to look at the syntax for COUNT() OVER(). These would allow you to do the calculations in the main query and would look something like.
,CASE
WHEN COUNT(DISTINCT TXN.Transaction_Type) OVER(PARTITION BY User, MONTH(TXN.Date),TXN.Transaction_Type) >=3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType2]
This second way would give you customer type for both the Debits and Credits. I am not aware of a way to filter the COUNT() OVER() to just Credits, for that you would need to use the CTE method.
Say I have 3 tables in a rails app:
invoices
id | customer_id | employee_id | notes
---------------------------------------------------------------
1 | 1 | 5 | An order with 2 items.
2 | 12 | 5 | An order with 1 item.
3 | 17 | 12 | An empty order.
4 | 17 | 12 | A brand new order.
invoice_items
id | invoice_id | price | name
---------------------------------------------------------
1 | 1 | 5.35 | widget
2 | 1 | 7.25 | thingy
3 | 2 | 1.25 | smaller thingy
4 | 2 | 1.25 | another smaller thingy
invoice_payments
id | invoice_id | amount | method | notes
---------------------------------------------------------
1 | 1 | 4.85 | credit card | Not enough
2 | 1 | 1.25 | credit card | Still not enough
3 | 2 | 1.25 | check | Paid in full
This represents 4 orders:
The first has 2 items, for a total of 12.60. It has two payments, for a total paid amount of 6.10. This order is partially paid.
The second has only one item, and one payment, both totaling 1.25. This order is paid in full.
The third order has no items or payments. This is important to us, sometimes we use this case. It is considered paid in full as well.
The final order has one item again, for a total of 1.25, but no payments as of yet.
Now I need a query:
Show me all orders that are not paid in full yet; that is, all orders such that the total of the items is greater than the total of the payments.
I can do it in pure sql:
SELECT invoices.*,
invoice_payment_amounts.amount_paid AS amount_paid,
invoice_item_amounts.total_amount AS total_amount
FROM invoices
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_payments.amount), 0) AS amount_paid
FROM invoices
LEFT JOIN invoice_payments
ON invoices.id = invoice_payments.invoice_id
GROUP BY invoices.id
) AS invoice_payment_amounts
ON invoices.id = invoice_payment_amounts.invoice_id
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_items.item_price), 0) AS total_amount
FROM invoices
LEFT JOIN invoice_items
ON invoices.id = invoice_items.invoice_id
GROUP BY invoices.id
) AS invoice_item_amounts
ON invoices.id = invoice_item_amounts.invoice_id
WHERE amount_paid < total_amount
But...now I need to get that into rails (probably as a scope). I can use find_by_sql, but that then returns an array, rather than an ActiveRecord::Relation, which is not what I need, since I want to chain it with other scopes (there is, for example, an overdue scope, which uses this), etc.
So raw SQL probably isn't the right way to go here.....but what is? I've not been able to do this in activerecord's query language.
The closest I've gotten so far was this:
Invoice.select('invoices.*, SUM(invoice_items.price) AS total, SUM(invoice_payments.amount) AS amount_paid').
joins(:invoice_payments, :invoice_items).
group('invoices.id').
where('amount_paid < total')
But that fails, since on orders like #1, with multiple payments, it incorrectly doubles the price of the order (due to multiple joins), showing it as still unpaid. I had the same problem in SQL, which is why I structured it in the way I did.
Any thoughts here?
You can get your results using group by and having clause of MySQL as:
Pure MySQL Query:
SELECT `invoices`.* FROM `invoices`
INNER JOIN `invoice_items` ON
`invoice_items`.`invoice_id` = `invoices`.`id`
INNER JOIN `invoice_payments` ON
`invoice_payments`.`invoice_id` = `invoices`.`id`
GROUP BY invoices.id
HAVING sum(invoice_items.price) < sum(invoice_payments.amount)
ActiveRecord Query:
Invoice.joins(:invoice_items, :invoice_payments).group("invoices.id").having("sum(invoice_items.price) < sum(:invoice_payments.amount)")
When building more complex queries in Rails usually Arel Really Exasperates Logicians comes in handy
Arel is a SQL AST manager for Ruby. It
simplifies the generation of complex SQL queries, and
adapts to various RDBMSes.
Here is a sample how the Arel implementation would look like based on the requirements
invoice_table = Invoice.arel_table
# Define invoice_payment_amounts
payment_arel_table = InvoicePayment.arel_table
invoice_payment_amounts = Arel::Table.new(:invoice_payment_amounts)
payment_cte = Arel::Nodes::As.new(
invoice_payment_amounts,
payment_arel_table
.project(payment_arel_table[:invoice_id],
payment_arel_table[:amount].sum.as("amount_paid"))
.group(payment_arel_table[:invoice_id])
)
# Define invoice_item_amounts
item_arel_table = InvoiceItem.arel_table
invoice_item_amounts = Arel::Table.new(:invoice_item_amounts)
item_cte = Arel::Nodes::As.new(
invoice_item_amounts,
item_arel_table
.project(item_arel_table[:invoice_id],
item_arel_table[:price].sum.as("total"))
.group(item_arel_table[:invoice_id])
)
# Define main query
query = invoice_table
.project(
invoice_table[Arel.sql('*')],
invoice_payment_amounts[:amount_paid],
invoice_item_amounts[:total]
)
.join(invoice_payment_amounts).on(
invoice_table[:id].eq(invoice_payment_amounts[:invoice_id])
)
.join(invoice_item_amounts).on(
invoice_table[:id].eq(invoice_item_amounts[:invoice_id])
)
.where(invoice_item_amounts[:total].gt(invoice_payment_amounts[:amount_paid]))
.with(payment_cte, item_cte)
res = Invoice.find_by_sql(query.to_sql)
for r in res do
puts "---- Invoice #{r.id} -----"
p r
puts "total: #{r[:total]}"
puts "amount_paid: #{r[:amount_paid]}"
puts "----"
end
This will return the same output as your SQL query using the sample data you have provided to the question.
Output:
<Invoice id: 2, notes: "An order with 1 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 2.5
amount_paid: 1.25
----
---- Invoice 1 -----
<Invoice id: 1, notes: "An order with 2 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 12.6
amount_paid: 6.1
----
Arel is quite flexible so you can use this as a base and refine the query conditions based on more specific requirements you might have.
I would strongly recommend for you to consider creating a cache columns (total, amount_paid) in the Invoice table and maintain them so you can avoid this complex query. At least the total additional column would be quite simple to create and fill the data.
I am using SQL Server 2014.
My table is called TF and this is what I have so far.
+-----------+------------+--------+--------------+
| IdProduct | Month | Sales | Accumulation |
+-----------+------------+--------+--------------+
| DSN101 | 01/01/2014 | 100 | ((1)) |
| DSN101 | 01/02/2014 | 50 | 50 |
| DSN101 | 01/03/2014 | 250 | 300 |
+-----------+------------+--------+--------------+
IdProduct is a string
Month is a Date
Sales and accumulation are float
The accumulation column was initially null and what I did next didn't work so I put the default value to 1.
This is how I update the table and fill it :
GO
MERGE INTO dbo.TF as A
USING dbo.TF as P
ON (A.IdProduct = P.IdProduct and MONTH(P.Month)=MONTH(A.Month)-1 and YEAR(P.Month)=YEAR(A.Month))
WHEN MATCHED THEN
UPDATE SET A.Accumulation = CASE
WHEN P.Accumulation Is not null then P.Accumulation+A.Sales-1
WHEN MONTH(A.Month)=1 and not exists(select P.Sales) then A.Sales
END;
So at first none of this would work obviously because of the first null that leads to a second null and then to third..
Now the first case works fine, the second doesn't and I just don't get why.
I tried many combinations with no success. What I need is that for the first month in every year the accumulation column equals simply the sales column.
I understand my code makes every line look for the previous one but I don't know how to make it stop when it's January.
Please help me !
It seems that you want accumulative to contain the cumulative sum of the values in the column. You don't need to store this in the table, you can just get it from a query:
select tf.*, sum(sales) over (order by month) - sales
from tf;
The - sales is because the accumulation appears to not include the current month's sales. If you only want it for the current year (which the merge statement suggests), then add a partition by:
select tf.*, sum(sales) over (partition by year(month) order by month) - sales
from tf;
And, if you really want to include this in the table, these are updatable expressions:
with toupdate as (
select tf.*, sum(sales) over (order by month) - sales as newval
from tf
)
update toupdate
set accumulation = newval;
I am trying to create a SELECT statement to obtain the sum of all payments due from a table that contains payment schedule. Fields include PaymentID, PaymentAmount, NumberofMonths. So for example, there are three rows:
PaymentId | PaymentAmount | NumberofMonths
==========================================
1 | 500 | 12
2 | 2000 | 8
3 | 1000 | 6
The total amount of all payments due would be $28,000.00. I'm a bit of a SQL/Oracle novice so I'm not sure how to get started with this.
You can use the aggregate function SUM() and multiply the PaymentAmount by the NumberOfMonths:
select sum(PaymentAmount * NumberofMonths) Total
from yourtable
See SQL Fiddle with Demo
Consider the following data:
Insurance_Comp | 1To30DaysAgeing | 31To60DaysAgeing | 61To90DaysAgeing | TotalVehicles
=============================================================================
ABC | 30 | 5 | 20 | 55
XYZ | 10 | 35 | 5 | 50
I am calculating the number of vehicles aged for particular group after a stock# is assigned to that vehicle. The number of vehicles for a group (ex. 1 to 30 Days Ageing) is calculated using a complex query. I have written SP to get this result. What I want is to calculate the total of vehicles while executing the same select query. For simplification I have created functions in SQL to get number of vehicles for each group.
Right now I m using the query like this ...
Select Ins_Comp, dbo.fn_1To30Ageing(...), dbo.fn_31To60Ageing(...), dbo.fn_61To90Ageing(...) from Table Where ....
I am calculating the total using RowDataBound event of GridView in ASP.NET with C#. Is there any way to calculate the total within query itself? By the way I don't want total as dbo.fn_1To30Ageing(...)+ dbo.fn_31To60Ageing(...) + dbo.fn_61To90Ageing, because that requires double processing time.
I'm not familir with ASP.NET, but if it is only SQL aspect, you can certainly use sub-query:
SELECT A,B,C,SUM(A+B+C) FROM (
Select Ins_Comp, dbo.fn_1To30Ageing(...) AS A, dbo.fn_31To60Ageing(...) AS B, dbo.fn_61To90Ageing(...) AS C from Table Where ....
) TEMP