SSAS Standard Edition - Getting Last Value from Fact Table at Transaction Grain - ssas

Background
Hi! I'm trying to work with a non-additive measure in SSAS 2008R2 Standard Edition knowing that semi-/non-additive measures don't work right out of the box in this edition.
I'm trying to calculate a last price invoiced from a fact table that looks something like this (sorry if this isn't the right way to create tables in these posts!):
| Invoice No | Invoice Line | DateId | ProductId | Unit Price |
|------------|--------------|----------|-----------|------------|
| 1 | 1 | 20160901 | 2 | 10 |
| 4 | 2 | 20160901 | 2 | 10 |
Unit Price is defined as a Last Child measure. Invoice No & Invoice Line are used to generate a Degenerate Dimension. Following Chris Webb's blog post, Last Ever Non Empty – a new, fast MDX approach, I can grab the last unit price across dates with no problem.
Problem
However, when one item has two records in the fact table for the same day--like the table above--the Unit Prices of each record still get aggregated when browsing the cube using the Date Dimension. Rather than show $10 on 2016-09-01, my cube is returning $10 + $10 = $20.
Solutions?
This post describes what sounds like the same problem and solves it by adding Hours / Seconds / Milliseconds to the Date dimension. Are there any other ways to handle this situation without modifying the Date dimension?
I'm a complete novice with MDX, but my hope is that it can help. Is there an MDX calculation that can somehow retrieve the unit price from the current Date member that has a [Invoice No] record rather than perform an aggregation at the [ALL] level?
I could change the grain of my fact table to be sales by day and handle all the calculations in TSQL where I feel much more comfortable, but that goes counter to what I've read from Ralph Kimball about Dimensional Modeling and trying to keep the fact table at the lowest possible grain.
I could also handle this in the underlying SQL tables during the ETL process using ROW_NUMBER() and then creating a MIN or MAX measure in SSAS.
Finally, I could calculate the Unit Price as an average by dividing Extended Price / Quantity, but it would be great to retrieve the actual price of the last invoice on a day if possible
Thank you!

If you want to get last ever price:
Select
Tail(
NonEmpty(
{
[DateId].[DateId].[DateId].Members *
[Invoice Line].[Invoice Line].[Invoice Line].Members *
[Invoice No ].[Invoice No ].[Invoice No ].Members *
[ProductId].[ProductId].[ProductId].Members *
[Unit Price].[Unit Price].[Unit Price].Members
},
[Measures].[Invoice Count]
),
1
) on 1,
[Measures].[Invoice Count] on 0
From [YourCube]
If you need the same per customer, you may want to use the Generate() function:
Select
Generate(
[CustomerId].[CustomerId].[CustomerId].Members,
Tail(
NonEmpty(
{
[CustomerId].[CustomerId].CurrentMember *
[DateId].[DateId].[DateId].Members *
[Invoice Line].[Invoice Line].[Invoice Line].Members *
[Invoice No ].[Invoice No ].[Invoice No ].Members *
[ProductId].[ProductId].[ProductId].Members *
[Unit Price].[Unit Price].[Unit Price].Members
},
[Measures].[Invoice Count]
),
1
)
) on 1,
[Measures].[Invoice Count] on 0
From [YourCube]
In sake of performance I'd recommend to add a measure YYYYMMDDPP-style with the MAX aggregation. Where YYYYMMDD is a date code and PP is a price value. For example:
| Invoice No | Invoice Line | DateId | ProductId | Unit Price | DatePrice |
|------------|--------------|----------|-----------|------------|------------|
| 1 | 1 | 20160901 | 2 | 10 | 2016090110 |
| 4 | 2 | 20160901 | 2 | 10 | 2016090110 |
And it will return a max value. In your case it's 2016090110. In order to get only price you'll need an extra calculated measure:
With
Member [Measures].[Last Price] as
Right([Measures].[MaxDatePrice],2)
Select [Measures].[Last Price] on 0
From [BI Fake]
You may want to expend PP size. It depends on your Price range.
Edit: Since your static size of price is 7, you need the following steps:
Create a DatePrice field in YYYYMMDDPPPPPPP format:
[PriceDate] =
year([DateId]) * 10000000000 +
month([DateId]) * 100000000 +
day([DateId]) * 10000000
+ [Unit Price]
Create a calculated measure:
Cint(Right([Measures].[MaxDatePrice],7))
Cint will convert 0000010 into 10.

Related

How do I iterate through subsets of a table in SQL

I'm new to SQL and I would appreciate any advice!
I have a table that stores the history of an order. It includes the following columns: ORDERID, ORDERMILESTONE, NOTES, TIMESTAMP.
There is one TIMESTAMP for every ORDERMILESTONE in an ORDERID and vice versa.
What I want to do is compare the TIMESTAMPs for certain ORDERMILESTONEs to obtain the amount of time it takes to go from start to finish or from order to shipping, etc.
To get this, I have to gather all of the lines for a specific ORDERID, then somehow iterate through them... while I was trying to do this by declaring a TVP for each ORDERID, but this is just going to take more time because some of my datasets are like 20000 rows long.
What do you recommend? Thanks in advance.
EDIT:
In my problem, I want to find the number of days that the order spends in QA. For example, once an order is placed, we need to make the item requested and then send it to QA. so there's a milestone "Processing" and a milestone "QA". The item could be in "Processing" then "QA" once, and get shipped out, or it could be sent back to QA several times, or back and forth between "Processing" and "Engineering". I want to find the total amount of time that the item spends in QA.
Here's some sample data:
ORDERID | ORDERMILESTONE | NOTES | TIMESTAMP
43 | Placed | newly ordered custom time machine | 07-11-2020 12:00:00
43 | Processing | first time assembling| 07-11-2020 13:00:05
43 | QA | sent to QA | 07-11-2020 13:30:12
43 | Engineering | Engineering is fixing the crank on the time machine that skips even years | 07-12-2020 13:00:02
43 | QA | Sent to QA to test the new crank. Time machine should no longer skip even years. | 07-13-2020 16:00:18
0332AT | Placed | lightsaber custom made with rainbow colors | 07-06-2020 01:00:09
0332AT | Processing| lightsaber being built | 07-06-2020 06:00:09
0332AT | QA | lightsaber being tested | 07-06-2020 06:00:09
I want the total number of days that each order spends with QA.
So I suppose I could create a lookup table that has each QA milestone and its next milestone. Then sum up the difference between each QA milestone and the one that follows. My main issue is that I don't necessarily know how many times the item will need to be sent to QA on each order...
To get the hours to complete a specific mile stone of all orders you can do
select orderid,
DATEDIFF(hh, min(TIMESTAMP), max(TIMESTAMP))
from your_table
where ORDERMILESTONE = 'mile stone name'
group by orderid
Assuming you are using SQL Server and your milestones are not repeated, then you can use:
select om.orderid,
datediff(seconds, min(timestamp), max(timestamp))
from order_milestones om
where milestone in ('milestone1', 'milestone2')
group by om.orderid;
If you want to do this more generally on every row, you can use a cumulative aggregation function:
select om.*,
datediff(seconds,
timestamp,
min(case when milestone = 'order' then timestamp end) over
(partition by orderid
order by timestamp
rows between current row and unbounded following
)
) as time_to_order
from order_milestones om
group by om.orderid;
You can create a lookup table taking a milestone and giving you the previous milestone. Then you can left join to it and left join back to the original table to get the row for the same order at the previous milestone and compare the dates (sqlfiddle):
select om.*, datediff(minute, pm.TIMESTAMP, om.TIMESTAMP) as [Minutes]
from OrderMilestones om
left join MilestoneSequence ms on ms.ORDERMILESTONE = om.ORDERMILESTONE
left join OrderMilestones pm on pm.ORDERID = om.ORDERID
and pm.ORDERMILESTONE = ms.PREVIOUSMILESTONE
order by om.TIMESTAMP

Performing math on SELECT result rows

I have a table that houses customer balances and I need to be able to see when accounts figures have dropped by a certain percentage over the previous month's balance per account.
My output consists of an account id, year_month combination code, and the month ending balance. So I want to see if February's balance dropped by X% from January's, and if January's dropped by the same % from December. If it did drop then I would like to be able to see what year_month code it dropped in, and yes I could have 1 account with multiple drops and I hope to see that.
Anyone have an ideas on how to perform this within SQL?
EDIT: Adding some sample data as requested. On the table I am looking at I have year_month as a column, but I do have access to get the last business day date per month as well
account_id | year_month | ending balance
1 | 2016-1 | 50000
1 | 2016-2 | 40000
1 | 2016-3 | 25
Output that I would like to see is the year_month code when the ending balance has at least a 50% decline from the previous month.
First I would recommend making Year_Month a yyyy-mm-dd format date for this calculation. Then take the current table and join it to itself, but the date that you join on will be the prior month. Then perform your calculation in the select. So you could do something like this below.
SELECT x.*,
x.EndingBalance - y.EndingBalance
FROM Balances x
INNER JOIN Balances y ON x.AccountID = y.AccountID
and x.YearMonth = DATEADD(month, DATEDIFF(month, 0, x.YearMonth) - 1, 0)

MS Access: Rank SUM() Values

I am working on an old web app that is still using MS Access as it's data source and I have ran into issue while trying to rank SUM() values.
Let's say I have 2 different account numbers each of those account numbers has an unknown number of invoices. I need to sum up the total of all the invoices, group it by account number then add a rank (1-2).
RAW TABLE EXAMPLE...
Account | Sales | Invoice Number
001 | 400 | 123
002 | 150 | 456
001 | 300 | 789
DESIRED RESULTS...
Account | Sales | Rank
001 | 700 | 1
002 | 150 | 2
I tried...
SELECT Account, SUM(Sales) AS Sales,
(SELECT COUNT(*) FROM Invoices) AS RANK
FROM Invoices
ORDER BY Account
But that query keeps returning the number of records assigned to that account and not a rank.
This would be easier in a report, with a running count: Report - Running Count within a Group
This is not standard in a query, but you can do something with custom functions (it's elaborate, but possible):
http://support.microsoft.com/kb/94397/en-us
Easiest way is to break it up in to 2 queries, the first one is this and I've saved it as qryInvoices:
SELECT Invoices.Account, Sum(Invoices.Sales) AS Sales
FROM Invoices
GROUP BY Invoices.Account;
And then the second query uses the first as follows:
SELECT qryInvoices.Account, qryInvoices.Sales, (SELECT Count(*) FROM qryInvoices AS I WHERE I.Sales > qryInvoices.Sales)+1 AS Rank
FROM qryInvoices
ORDER BY qryInvoices.Sales DESC;
I've tested this and got the desired results as outlined in the question.
Note: It may be possible to achieve in one query using a Defined table, but in this instance it was looking a little ugly.
If you need the answer in one query, it should be
SELECT inv.*, (
SELECT 1+COUNT(*) FROM (
SELECT Account, Sum(Sales) AS Sum_sales FROM Invoices GROUP BY Account
) WHERE Sum_sales > inv.Sum_sales
) AS Rank
FROM (
SELECT Account, Sum(Sales) AS Sum_sales FROM Invoices GROUP BY Account
) inv
I have tried it on Access and it works. You may also use different names for the two instances of "Sum_sales" above to avoid confusion (in which case you can drop the "inv." prefix).

Is there an established pattern for SQL queries which group by a range?

I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)

How Can I Sum Vat in a column in sql server 2008?

suppose i have the following table
CREATE TABLE #ResultTable (NettAmount money, GrossAmount money,TotalVat money)
Given a gross amount eg=250 I know that vat is at 17.5%
How Do i calculate the totalVat?
Thanks for any suggestions
INSERT #ResultTable
(NettAmount, GrossAmount, TotalVat)
SELECT
NettAmount, GrossAmount, GrossAmount * 17.5 /100
FROM
SourceTable
It's unclear what you want to do, sorry...
devnet247 - have a 2nd table that contains the valid date tracked VAT rate along the lines of:
vat_rate | vat_type | stt_date | end_date
-----------------------------------------
0.175 | 1 | 20100101 | null
vat_type | description
-----------------------------------------
1 | standard rate
2 | reduced rate
3 | zero rate
and then join on that table where the invoice date is valid for the row. your final sql would be along the lines of
SELECT SUM(NettAmount * vat_rate as total_vat) from #ResultTable r1, vat_table v1
where r1.invoice_date between v1.stt_date and v1.end_date
and r1.vat_type = v1.vat_type
anyway, if you were tracking the vat that is :)
jim
[edit] - if you were to use a second table, i'd suggest extending that to a 3rd - vat_type table, as vat rates vary across products as well as time. see http://www.hmrc.gov.uk/vat/forms-rates/rates/rates.htm#1
SELECT SUM(GrossAmount) * 17.5 /117.5 AS VATAmount
FROM SourceTable
Bearing in mind that (UK) VAT is due to increase to 20% from January 2011, it would be a good idea to follow Jim's suggestion of a date-tracked VAT rate lookup table.