SSAS & SCD2 - how to deal with IsActive row in Dim - ssas

I am using SQL Server 2014 and Visual Studio 2015.
I have an SCD2 for staff names, for example
SK AltKey Name Gender IsActive
1 15 Sven Svensson M 1
2 16 Jo Jonsson M 1
and in the fact table
SK AgentSK CallDuration DateKey
100 1 335 20160808
101 2 235 20160809
So, you can see the cube is currently linked on FctAgentSK and DimSK. This works as planned. However, when Jo changes gender the SCD2 makes the row inactive (0) and inserts a new row with the new gender and IsActive of '1'.
The problem I face is that the factSK 101 still references the 'OLD' details for the Agent. How should I deal with this to be able to still report on the call, but also reference the "correct" details of the Agent - reflecting their current gender.
When a new fact is inserted it will have the 'NEW' SK assigned, but basically I would need to report on ALL calls that have happened either side of the gender change.
Any suggestions please?
Thank you.

As Nick.McDermaid suggested, if you don't want SCD2 functionality, you could remove it from the dimension design (I've often seen it over-implemented when it's not actually wanted: perhaps you've inherited that kind of setup?).
If you want to/must keep the SCD2 design, but want to report on current staff attributes (gender and any other SCD2 attributes).
Kimball documents a "Type 6" here: SCD types 0,4,5,6,7. You add a "current" value of the attribute to an existing Type2 design. You could then report on the "current" attributes only.
I'm assuming that the Staff Name "Alt Key" is the durable staff-member key, that stays the same through changes in staff attributes? You could make a slightly different Employee dimension (or, hierarchy inside the Employee dimension), that has Alt Key as its leaf-level key. If you don't still have SK as a dimension attribute, this will make the dimension table "collapse" into one member per AltKey, not one member per SK. Obviously, you can't add any SCD2 attributes to this Alt Key hierarchy, as there won't be a single value per key; and this raises special problems about what to call the durable "employee" (i.e. what the Name Column of the leaf level will be), since Employee Name is one of the most obvious SCD2 attributes that will not remain the same. Probably this approach is best combined with an underlying "Type6" inclusion of the "current value" in the dimension data, as described in (1) above.

Related

MS-Access 2021 Adding/Subtracting field entry from one table with the current value in another

I'm new to SQL and Access and am trying to take an entry from an InventoryTransactions.Quantity and summing it with another field from another table MasterInventory.QuantityOnHand. I know there is a way to do this with queries and forms but I'm kind of hitting a roadblock. Any help will be much appreciated!
Example
InventoryTransactions.Quantity (Table)
ID | TransactionItem | TransactionType | Quantity |
[ID] = Autonumber [TransactionItem] = Lookup referencing [.ID],[.ItemName], (Checking Y/N from [.Consumable]), all from [MasterInventory] [TransactionType] - Addition, Removal (Add, Subtract) from [TransactionTypeTable] Quantity - Number
MasterInventory
Here I want to add my record entry for Quantity in the above table and it to be added or subtracted (depending on the entry in the TransactionType field to .QuantityOnHand in this MasterInventory table
Say I've got 20 of something in [MasterInventory].[QuantityOnHand] and I enter 20 into the [ItemTransactions].[Quantity]
Saying that I selected "Addition" in [ItemTransactions].[TransactionType]
The new value in [MasterInventory].[QuantityOnHand] should now be updated to 40 for the corresponding [MasterInventory].[ID] field. (Updated to 0 if I selected "Remove" in [ItemTransactions].[TransactionType]
Let me know if you see this and need clarification please.

Ordering based on one value of many

I have three SQL tables. Users, Registration Field Values, and Registration Fields.
Name
zip code
favorite food
Sue
55555
sushi
Gary
12345
eggs
Where zip code and favorite food are different registration fields.
The relationship is a user has many registration field values, and those values belong to the registration field.
I'm wondering how I can order my table based on a certain registration field. For example, selecting "favorite food", I would want "eggs" before "sushi".
This is confusing to me because I've only seen ORDER BY for an individual column or series of columns. I can't just ORDER BY registration_field_value.value because it needs to be based on only one of those registration fields.
This is like "ORDER BY field value where the associated field id is 'favorite food'", although I don't want to filter anything out.
I'm using Postgres if that makes a difference.
EDIT, adding a
:
You can use case to order based on specific value.
For eg:
ORDER BY
CASE "favorite food"
WHEN 'eggs' THEN 1
ELSE 2
END
The above query will move row with eggs to start and all other value will be moved to bottom.

Many to many relationship with hierarchy in SSAS

I am new to SSAS and have a situation where I need help.
I have a many to many relationship table which contains info about the competitors of a property and the type of competitor it is at any given date.
So, something like this:
PID Type CompID Date
1 A 1 1/1/2001
1 A 2 1/1/2001
1 B 1 2/1/2001
1 B 1 3/1/2001
2 A 1 1/1/2001
2 B 1 1/1/2001
Now I need to include this in the cube and relate it to the main fact table. I have defined the relationship as a many to many but while writing the query to retrieve the information using MDX, I am stuck.
What I need is all the measures for a given property and all the aggregated measures of all its competitors of a given type at a given date.
So, given a proeprty ID, I need to identify the list of its competitors of a given type and on a given date and then I have to aggregate the measures for all these competitor properties.
I am stuck at this place where I have to identify all the competitors of a given property.
e.g. if I fire this query:
Select
{
TYNBC,YOYNBC_Improvement,LYNBC,TYADR,YOYADR_Improvement,LYADR
}
on 1,
Stay_DATE.Month.Month on 0
FROM Cube1
where {Hotel.Hotel_Key.&[480]}*{Stay_DATE.Hierarchy.Year.&[2015]}
The result would be the measures for a given property.
What I want in the result is all the above measures and the same measures for the competitors of the property 480 for a given date and a given competitor type. The issue I am facing is in identifying the competitors of the property in mdx because competitor table is added as a factless fact with many to many relationship in the cube. So, how do I retrieve the list of competitor properties when there is no hierarchy defined as it is not a dimension.
Thanks for your help in advance.

Best way to filter data by criteria, and display description using SQL in MsAccess?

I have a table in MS Access of data containing results from a survey, and I have a look up table of Risk Ids and descriptions of the sort of risk based on the survey results.
What I've tried so far is selecting distinct entries from my survey table, and inputing a new field into my query for the Risk Code whose number will depend on criteria that I determine, which I will then use to look up the risk.
My table for the survey looks like so:
Name | Location | Days spent eating IceCream | Icecream eating location
John Smith | London | 30 | Hull
My Risk ID table looks like so:
RiskID | RiskBool | Description
1 | Yes | At risk - This person eats too much icecream
2 | Yes | Risk - This person does not eat enough icecream
3 | No | Sensible amount of icecream eaten
4 | Yes | It is illegal to eat icecream in Hull
And my query looks (something) like this in access design view
Name | Location | Risk Code | RiskID | Description
I want to write SQL to change the Risk Code to 1, 2, 3, 4 (up to 15 in my real case) and then I will tell it to only display the person and the description for when the Risk ID and Code match. I haven't written this yet.
What is the best way to achieve this?
I see two possibilities:
Set up 15 queries one for each risk ID, add the descriptions to
those and then join those 15 sets of results together. This is what
I know how to do, but could end up quite messily.
Set up some 'check' using if statements, and then some how setting
the Risk Code field for that entry.
My current SQL looks like this, but it doesn't make any checks yet, I'm worried the if statment will be very, very long.
SELECT DISTINCT
[At Risk Employee List].Employee AS Name,
[At Risk Employee List].[DaysIceCream] AS [Days spent eating Icecream],
[At Risk Employee List].[Base Location],
[RiskCode] AS [Risk Code], <----is this where the check would need to go?
RiskDescLookup.RiskBoolean,
RiskDescLookup.RiskExplanation
FROM RiskDescLookup,
[Survey Raw Data]
INNER JOIN
[At Risk Employee List]
ON
[Survey Raw Data].ResID = [At Risk Employee List].[Staff ID]
GROUP BY
[At Risk Employee List].Employee,
[At Risk Employee List].[DaysIceCream],
[At Risk Employee List].[Base Location],
RiskDescLookup.RiskID,
[RiskCode] AS [Risk Code], <----is this where the check would need to go?
RiskDescLookup.RiskBoolean,
RiskDescLookup.RiskExplanation
I imagine the check done by if statements to be Very long and look something like (in pseudocode):
if ( [At Risk Employee List].[Base Location] = Hull, then [RiskCode]=4...., else if (DaysIceCream>42) then....
Is that the best way to do this? Do I even need to have a Risk Code?
I'm a bit lost as to how to produce this 'check' in the best possible way.
I am not entirely certain of your intent, but from what you've posted and the follow up comments it would appear that the process of joining the Risk Code to Risk ID is relatively simple once you have the Risk Code identified for each survey result.
The real issue it seems is how to encapsulate the logic to identify the Risk Code for each survey result. I would suggest "calculating" the risk code value for each survey result externally to your query and then join to those results before finally joining to the Risk ID.
For example, I might add a third table to the design SurveyRisk that contains Name and Risk Code.
Use whatever criteria and logic you need to use to identify the risk for each survey response. Enter these values into the SurveyRisk table. Then, you can simply join Survey to SurveyRisk to Risk to summarize your results.
Feel free to clarify where I'm misunderstanding what you are trying to accomplish and I'll edit my post accordingly.
The best way to do this is to use a look up table that emulates the structure of your data.
Add a row for every 'case', and in MS Access link the corresponding fields together.
Here is a few of the links:
Then alter the SQL to pair up any options that need to go together. For instance each of the checks I make are duplicated for two seperate locations.
Here is an example:
FROM RiskDescLookupReg
INNER JOIN ([Survey Raw Data]
INNER JOIN [At Risk Employee List]
ON [Survey Raw Data].ResID=[At Risk Employee List].[Staff ID])
ON (RiskDescLookupReg.RegTravelChoice=[Survey Raw Data].RegTravelChoice)
And (RiskDescLookupReg.MonthChoice2=[Survey Raw Data].MonthChoice2
And RiskDescLookupReg.PercentageTimeChoice2=[Survey Raw Data].PercentageTimeChoice2
And RiskDescLookupReg.LimitedDurationChoice2=[Survey Raw Data].LimitedDurationChoice2
And RiskDescLookupReg.TemporaryPurposeChoice2=[Survey Raw Data].TemporaryPurposeChoice2)
Or (
RiskDescLookupReg.MonthChoice1=[Survey Raw Data].MonthChoice1
And RiskDescLookupReg.PercentageTimeChoice1=[Survey Raw Data].PercentageTimeChoice1
And RiskDescLookupReg.LimitedDurationChoice1=[Survey Raw Data].LimitedDurationChoice1
And RiskDescLookupReg.TemporaryPurposeChoice1=[Survey Raw Data].TemporaryPurposeChoice1)
Not how there are two blocks for each location. If I only had one location of interest, I could drop the last block.
If you get duplicates because of the way your lookup table is arranged, you need to specify that the parts from the lookup table are enclosed in a LAST, and the parts from the survey in FIRST. Here is an example:
SELECT
[At Risk Employee List].Number,
FIRST([At Risk Employee List].Employee) AS Name,
FIRST([At Risk Employee List].[Base Location]) AS BaseLocation,
LAST(RiskDescLookupReg.RiskBool) AS RiskBool,
LAST(RiskDescLookupReg.RiskDesc) AS RiskDesc,
The use of LAST ensures that if someone would come up as at risk and not at risk, only the LAST at risk case is displayed (those entries come later in the field). This is counter to the fact when duplicates are displayed the at risk ones come first.

How should you separate dimension tables from fact tables if you are not building a data warehouse?

I realize that referring to these as dimension and fact tables is not exactly appropriate. I am at a lost for better terminology, so please excuse this categorization that I use in the post.
I am building an application for employee record keeping.
The database will contain organizational information. The information is mostly defined in three tables: Locations, Divisions, and Departments. However, there are others with similar problems. First, I need to store the available values for these tables. This will allow for available values in the application when managing an employee and for management of these values when adding/deleting departments and such. For instance, the Locations table may look like,
LocationId | LocationName | LocationStatus
1 | New York | Active
2 | Denver | Inactive
3 | New Orleans | Active
I then need to store these values for each employee and keep their history. My first thought was to create LocationHistory, DivisionHistory, and DepartmentHistory tables. I cannot pinpoint why, but this struck me as poor design. My next inclination was to create a DimLocation/FactLocation, DimDivision/FactDivision, DimDepartment/FactDepartment set of tables. I do not believe this makes sense either. I have also considered naming them as a combination of Employee, i.e. EmployeeLocations, EmployeeDivisions, etc. Regardless of the naming convention for these tables, I imagine that data would look similar to a simplified version I have below:
EmployeeId | LocationId | EffectiveDate | EndDate
1 | 3 | 2008-07-01 | NULL
1 | 2 | 2007-04-01 | 2008-06-30
I realize any of the imagined solutions I described above could work, but I am really looking to create a design that will be easy for others to maintain with an intuitive, familiar structure. I would like to receive this community's help, opinions, and experience with this matter. I am open to and would welcome any suggestion to consider. For instance, should I even store the available values for these three tables in the database? Should they be maintained in the application code/business logic layer? Do I just need to get over seeing the word History repeating three times?
Thanks!
Firstly, I see no issue in describing these as Dimension and Fact tables outside of a warehouse :)
In terms of conceptualising and understanding the relationships, I personally see the use of start/end dates perfectly easy for people to understand. Allowing Agent and Location fact tables, and then time dependant mapping tables such as Agent_At_Location, etc. They do, however, have issues worthy of taking note.
If EndDate is 2008-08-30, was the employee in that location UP TO 30th August, or UP TO and INCLUDING 30th August.
Dealing with overlapping date periods in queries can give messy queries, but more importantly, slow queries.
The first one seems simply a matter of convention, but it can have certain implications when dealign with other data. For example, consider that an EndDate of 2008-08-30 means that they ARE at that location UP TO and INCLUDING 30th August. Then you join on to their Daily Agent Data for that day (Such as when they Actually arrived at work, left for breaks, etc). You need to join ON AgentDailyData.EventTimeStamp < '2008-08-30' + 1 in order to include all the events that happened during that day.
This is because the data's EventTimeStamp isn't measured in days, but probably minutes or seconds.
If you consider that the EndDate of '2008-08-30' means that the Agent was at that Location UP TO but NOT INCLDUING 30th August, the join does not need the + 1. In fact you don't need to know if the date is DAY bound, or can include a time component or not. You just need TimeStamp < EndDate.
By using EXCLUSIVE End markers, all of your queries simplify and never need + 1 day, or + 1 hour to deal with edge conditions.
The second one is much harder to resolve. The simplest way of resolving an overlapping period is as follows:
SELECT
CASE WHEN TableA.InclusiveFrom > TableB.InclusiveFrom THEN TableA.InclusiveFrom ELSE TableB.InclusiveFrom END AS [NetInclusiveFrom],
CASE WHEN TableA.ExclusiveFrom < TableB.ExclusiveFrom THEN TableA.ExclusiveFrom ELSE TableB.ExclusiveFrom END AS [NetExclusiveFrom],
FROM
TableA
INNER JOIN
TableB
ON TableA.InclusiveFrom < TableB.ExclusiveFrom
AND TableA.ExclusiveFrom > TableB.InclusiveFrom
-- Where InclusiveFrom is the StartDate
-- And ExclusiveFrom is the EndDate, up to but NOT including that date
The problem with that query is one of indexing. The first condition TableA.InclusiveFrom < TableB.ExclusiveFrom could be be resolved using an index. But it could give a Massive range of dates. And then, for each of those records, the ExclusiveDates could all be just about anything, and certainly not in an order that could help quickly resolve TableA.ExclusiveFrom > TableB.InclusiveFrom
The solution I have previously used for that is to have a maximum allowed gap between InclusiveFrom and ExclusiveFrom. This allows something like...
ON TableA.InclusiveFrom < TableB.ExclusiveFrom
AND TableA.InclusiveFrom >= TableB.InclusiveFrom - 30
AND TableA.ExclusiveFrom > TableB.InclusiveFrom
The condition TableA.ExclusiveFrom > TableB.InclusiveFrom STILL can't benefit from indexes. But instead we've limitted the number of rows that can be returned by searching TableA.InclusiveFrom. It's at most only ever 30 days of data, because we know that we restricted the duration to a maximum of 30 days.
An example of this is to break up the associations by calendar month (max duration of 31 days).
EmployeeId | LocationId | EffectiveDate | EndDate
1 | 2 | 2007-04-01 | 2008-05-01
1 | 2 | 2007-05-01 | 2008-06-01
1 | 2 | 2007-06-01 | 2008-06-25
(Representing Employee 1 being in Location 2 from 1st April to (but not including) 25th June.)
It's effectively a trade off; using Disk Space to gain performance.
I've even seen this pushed to the extreme of not actually storing date Ranges, but storing the actual mapping for each and every day. Essentially, it's like restricting the maximum duration to 1 day...
EmployeeId | LocationId | EffectiveDate
1 | 2 | 2007-06-23
1 | 2 | 2007-06-24
1 | 3 | 2007-06-25
1 | 3 | 2007-06-26
Instinctively I initially rebelled against this. But in subsequent ETL, Warehousing, Reporting, etc, I actually found it Very powerful, adaptable, and maintainable. I actually saw people making fewer coding mistakes, writing code in less time, the code ending up running faster, and being much more able to adapt to clients' changing needs.
The only two down sides were:
1. More disk space taken (But trival compared to the size of fact tables)
2. Inserts and Updates to this mapping was slower
The actual slow down for Inserts and Updates only actually matter Once, where this model was being used to represent a constantly changing process net; where the app wanted to change the mapping about 30 times a second. Even then it worked, it just chomped up more CPU time than was ideal.
If you want to be efficient and keep a history, do these things. There are multiple solutions to this problem, but this is the one that I keep going back to:
Remember that each row represents a single entity, if you make corrections that entity, that's fine, but don't re-use and ID for a new Location. Set it up so that instead of deleting a Location, you mark it as deleted with a bit and hide it from the interface, that way when it's referenced historically, it's still there.
Create a history table that includes the current value, or no records if a value isn't currently set. Have the foreign key tie back to the employee and tie to the location.
Create a column in the employee table that points to the current active location in the history. When you need to get the employees location, you join in the history table based on this ID. When you need to get all of the history for an employee you join from the history table.
This structure keeps it all normalized, and gives you an easy way to find the current value without having to do any date comparisons.
As far as using the word history, think of it in different terms: since it contains the current item as well as historical items, it's really just a junction table that keeps around the old item. As such you can name it something like EmployeeLocations.