I'm looking for some guidance on how to approach an MDX query. My situation is that I have sales occuring, which make up the grain of the fact table, and are measures. I have a products dimension and a customer dimension. I also have a date dimension and a time dimension, I made them seperate to keep member counts low on the dimensions.
The query I'm trying to write, is one that asks for the first and last purchase, per customer per product. So, an example result set may look like:
Car - Bob - 2008-12-10 - 15:39 - 2008-12-11 - 16:44
Car - Bill - 2008-12-12 - 09:16 - 2008-12-12 - 09:16
Van - Jim - 2008-12-11 - 14:02 - 2008-12-12 - 22:01
So, Bob bought two cars, and we have the first and last purchases, Bill bought one car so the first and last purchases are the same, Jim may have bought three vans but we only show the first and last.
I've tried using TAIL, but can't seem to get the sets correct to show the last purchase per customer. Even then, expirements with HEAD for the first purchase showed I couldn't use the same dimension twice on the same axis. It's also made harder by the fact that there may be several purchases per day, so the query I need is the last time for the last date for each customer for each product, and the first time for the first date for each customer for each product.
I'm not neccesarily asking for an exact query answer, although that would help, but I am interested in the approach and best methods to use. The platform is SQL Server Analysis Services 2005.
Can't you just use the min and max aggregations on purchase date? Or have I completely missed the problem?
Related
We are trying to calculate average stock from a movements table in a single sql sentence.
As far as we are, no problem with what we thought was a standard approach, instead of adding up the daily stock and divide by the number of days, as we don’t have daily stock, we simply add (movements*remaining days) :
select sum(quantity*(END_DATE-move_date))/(END_DATE-START_DATE)
from move_table
where move_date<=END_DATE
This is a simplified example, in real life we already take care of the initial stock at the starting date. Let’s say there are no movements prior to start_date.
Quantity sign depends on move type (sale, purchase, inventory, etc).
Of course this is done grouping by product, warehouse, ... but you get the idea.
It works as expected and the calculus is fine.
But (there is always a “but”), our customer doesn’t like accounting days when there is no stock (all stock sold out). So, he doesnt like
Sum of (daily_stock) / number_of_days (which is what we calculate using a diferent math)
Instead, he would like
Sum of (daily stock) / number_of_days_in_which_stock_is_not_zero
For sure we can do this in any programming language without much effort, but I was wondering how to do it using plain sql ... and wasn’t able to come up with a solution.
Any suggestion?
Consider creating a new table called something like Stock_EndOfDay_History that has the following columns.
stock#
date
stock_count_eod
This table would get a new row for each stock item at the start of a new day for the prior day. Rows could then be purged from this table once the applicable date value went outside the date window of interest.
To get the "number_of_days_in_which_stock_is_not_zero", use this.
SELECT COUNT(*) AS 'Not_Zero_Stock_Days' FROM Stock_EndOfDay_History
WHERE stock# = <stock#_value>
AND <date_window_clause>
Other approaches might attempt to just add a new column to the existing stock table to maintain a cumulative sum of the " number_of_days_in_which_stock_is_not_zero". But inevitably, questions will be asked as to how did the non-zero stock days count get calculated? Using this new table approach will address those questions better than the new column approach.
I have a table with sales information at the transaction level. We want to institute a new model where we compensate sales reps if a customer has been makes a purchase after more than a year of dormancy. To figure out how much this would have cost historically, I want to add a column with a flag for whether or not each purchase was the Buyer's first in the past 365 days. What I'd like to do is a rowcount in Powerpivot, for all sales made by that customer in the past 365 days, and wrap it in an IF to set the result to 0 or 1.
Example:
Order Date Buyer First Purchase in Year?
1/1/2015 1 1
1/2/2015 2 1
2/1/2015 1 0
4/1/2015 2 0
3/1/2016 2 1
5/1/2017 2 1
Any assistance would be greatly appreciated.
Excellent business use case! It's quite relevant in the business world.
To break this down for you, I will create 3 columns: 2 with some calculations, and 1 with the result. Once you understood how I did this, you can combine all 3 column formulas and make a single column for your dataset, if you like.
Here's a picture of the results:
So here's the 3 columns that I created:
Last Purchase - in order to run this calculation, you need to know when the buyer made their last purchase.
CALCULATE(MAX([Order Date]),FILTER(Table1,[Order Date]<EARLIER([Order Date]) && [Buyer]=EARLIER([Buyer])))
Days Since Last Purchase - now you can compare the Last Purchase date to the current Order Date.
DATEDIFF([Last Purchase],[Order Date],DAY)
First Purchase in 1 Year - finally, the results column. This simply checks to see if it has been more than 365 days since the last purchase OR if the last purchase column is blank (which means it was the first purchase), and creates the flag you want.
IF([Days Since Last Purchase]>365 || ISBLANK([Days Since Last Purchase]),1,0)
Now, you can easily combine the logic of these 3 columns into a single column and get what you want. Hope this helps!
One note I wanted to add is that for this type of analysis it's not a wise move to do row counts as you had originally suggested, as your dataset can easily expand later on (what if you wanted to add more attribute columns?) and then you would have problems. So this solution that I shared with you is much more robust.
I'm trying to build a cube which will contain a history of product prices by on-line sellers. So, it has one simple "fact" table and three dimension tables. The fact looks like this:
product_id
seller_id,
price_date,
product_price
and the dimensions are product, seller, and date. The product dimensions rolls up into manufacturers (so products can be grouped by manufacturers). The seller dimensions just has the seller name, and the date dimension has the normal complement of date levels.
I'd like to have the cube respond to users by not displaying any data unless the user has drilled down into the sku level, and the individual seller level, although I wouldn't mind having the aggregations be averages on the manufacturer level.
But for the date dimension I would like the cube to display lastnonempty.
When I choose lastnonempty as the aggregation property, the prices get summed along the manufacturer and seller dimensions, which is wrong.
Here is a sample of what I'd like to see:
fact table:
date product manufacturer seller price
1/1/2000 sku1 manu1 seller1 $10.00
1/2/2000 sku1 manu1 seller1 $12.00
cube result
manu1 -
sku1 -
Jan 2000 $12.00
1/1/2000 $10.00
1/2/2000 $12.00
Is this possible?
Thanks, --sw
Be careful actually nulling out subtotals since this makes it very difficult for users to even start a PivotTable. I blogged about this dilemma and a solution here:
http://www.artisconsulting.com/blogs/greggalloway/2012/6/8/na-for-subtotals
So it is possible. Try something like:
scope( [Product].[Product].[All], [Measures].[Price] );
this = IIf(IsEmpty([Measures].[Price]),null,0);
Format_String(this) = ";;"; //format zeros as blank
end scope;
Then repeat that code to blank out the manufacturer and seller subtotals.
You can switch the AggregateFunction on your Price measure to LastNonEmpty. But I tend to prefer LastChild for the reasons mentioned here and here. It does add a little more MDX to use LastChild as I explained in that second article. And you may be ok with LastNonEmpty if every product is snapshotted every day.
I have a problem and was wondering if anyone could help or if it is even possible to have an algorithm for something like this.
I need to create a predictive ordering wizard. So based on previous sales, we will determine that that a certain amount of an item is required. E.g 31 apples. Now i need to work out the number of cases that needs to be ordered. If the cases come in say 60, 30, 15, 10 apples, the order should be a case of 30 and a case of 10 apples.
The number of items that need to be ordered change in each row of the result set. The case sizes could also change for each item. So some items may have an option of 5 different cases and some items may land up with an option of only one case.
Other examples would be i need 39 cans of coke and the cases come in only 24 per case. Therefore needing 2 cases. I need 2 shots of baileys and the bottle of baileys come in 50cl or 70cl. Therefore i need the 50cl.
The results sets columns are ItemName, ItemSize, QuantityRequired, PackSize and PackSizeMultiple.
The ItemName is the item to be ordered. ItemSize is the size the item is used in eg. can of coke. QuantityRequired how man of the item, in this case cans of coke, need to be ordered. PackSize is the size of the case. PackSizeMultiple is the number to multiply the item with to work out how many of the items are in the case.
ps. this will be a query in SQL Server 2008
Sounds like you need a UOM (Unit of Measure) table and a function to calc co-pack measure count and and unit count measure qty. with UOM type based on time between orders. You would also need to create a cron cycle and freeze table managed by week/time interval in order to create a freeze view of the current qty sold each week and the number of units since last order. Based on the 2 previous orders to your prior order you would set the current prediction based on min time between the last 2 freeze cycles containing an order and the duration of days between them. based on the average time between orders and the unit qty in each order, you can create a unit decay ratio percentage based on days and store it in each slice forward. Based on a reference to this data you will be able to create a prediction that will allow you to trigger a notice to sales or a message to the client to reorder. In addition, if you engage response data from sales based on unit count feedback from the client, you can reference an actual and tune your decay rate against your prediction. You should also consider managing and rolling up these freezes by month, so that you can view historical trending and forecast revenue based on velocity of reorder and same period last year. Basically this is similar to sales forcasting and we are switching out your opportunity percentage of close with Predicted Remaining Qty. percentage remaining.
Can anyone help with an aggregate function.. MIN.
I have a car table that i want to return minimum sale price and minimum year on a tbale that has identical cars but different years and price ...
Basically if i removed Registration (contains a YEAR) from the group by and select the query works but if i leave it in then i get 3 cars returned which are exactly the same model,make etc but with different years..
But i am using MIN so it should return 1 car with the year 2006 (the minimum year between the 3 cars)
The MIN(SalePrice) is working perfectly .. its the registraton thats not owrking..
Any ideas?
SELECT
MIN(datepart(year,[Registration])) AS YearRegistered,
MIN(SalePrice), Model, Make
FROM
[VehicleSales]
GROUP BY
datepart(year,[Registration]), Model, Make
IF I have correctly understood what you are looking for, you should query:
SELECT Model, Make, MIN(datepart(year,[Registration])) AS YearRegistered, MIN(SalePrice)
FROM [VehicleSales]
GROUP BY Model, Make
Hope it helps.
Turro answer will return the lowest registration year and the lowest price for (Model, Make), but this doesn't mean that lowest price will be for the car with lowest Year.
Is it what you need?
Or, you need one of those:
lowest price between the cars having lowest year
lowest year between the cars having lowest price
-- EDITED ---
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
That's why I made a comment. Imagine next situation
Porshe 911 2004 2000
Porshe 911 2004 3000
Porshe 911 2005 1000
Porshe 911 2005 5000
You'll get result that will not really tell you if this car goes cheaper based on year or not.
Porshe 911 2004 1000
I don't know how you'll tell if car gets cheaper next year based on one row without comparison with previous year, at least.
P.S. I'd like to buy one of cars above for listed price :D
You're getting what you're asking for: the cars are put into different groups whenever their model, make, or year is different, and the (minimum, i.e. only) year and minimum price for each of those groups is returned.
Why are you using GROUP BY?
You are correct about the query, but I want to find the car make/model that gets cheaper the next year ;)
You should find cheapest (or average) make/model per year and compare with the cheapest (or average) from previous year (for the same make/model).
Then you can see which of them gets cheaper the next year (I suppose most of them)