Looking for Implementing Role Playing dimension in Azure Analysis Services (Tabular) - sql

As of now in my fact table i have around 6 Date Keys which needs to be linked to one Date Dimension.
Possible solution on table with me is:
Creating multiple date dimension and linking it with Fact Table. As this involves duplicating the data. I wanted to avoid this solution as i have around 10 Date Columns in my fact Table.
I am looking for a solution in order to reduce the redundancy of number of Tables in my Model.
Something like User-Role

Without increasing the number of tables, we can use the relationship which is relevant to the measures. This can be achieved by enforcing the data model to activate the relationship we need. The relationship can be moved from inactive to active in DAX using USERELATIONSHIP function. Below are the snaps showing one active and 2 inactive relationships and usage of "USERELATIONSHIP" while implementing the measures and results later.

Related

Capacity Management database

I am designing database using Microsoft Access for capacity management. I have different types of equipment and need to track the power usage by every month. For instance, I have HVAC system and Chiller System. Withing each system there are different equipments like AHU_1, AHU_2 ,AHU_3, MAU_1, MAU_2 and etc in HVAC system and CHWP_1, CHWP_2, CWP_1, CWP_2 and etc in Chiller system.
I need to track the power or usage by every month. For each system i have separate tables containing their respective equipments. What will be suitable way to track the usage? This is what I'm planning to do which I believe there are three options as in the picture below:
Creating a main table called Chiller_usage Table which will have all the equipments and dates with usage value. The problem i see is that it will has repetitive of each equipments due to dates and the pro is not many tables.
Creating each equipment table which will have dates and usage. The problem is I have around 60 to 70 equipments with 5 different major systems and will lead to mass amount of table which will be very difficult when making queries and reports.
Creating date table with equipments and usage value. This looks promising for now because i will have few table initially and as times goes on there will be 12 tables each year which is alot in the future.
What I'm thinking of is the first option since it is easy to manage when making custom queries because I need to perform calculation in terms of costing, usage analysis of each equipment with graphs and etc. But that i believe will be clumsy due to repetitive name of equipments due to variable dates. Is there any other viable options? Thank you.
Assuming you need to store monthly energy usage for each piece of equipment. Normalize the tables. neither the person entering the data or the manager asking for reports needs to see the complexity of the underlying tables. The person entering the data sees both a form for adding systems/equipment and a form for entering energy usage per equipment per month. The manager sees a report with just the information he wants like energy costs per a system per a year. The normalized tables can be recombined into human readable tables with queries. Access trys to make making the forms and reports as simple as clicking on those appropriate queries and then clicking on create form/report. In practice some assembly is required. Behind the scenes the forms put the correct data in the correct tables and the report shows only the data the manager wants. For instance. here is a normalized table structure based on the data you provided and some assumptions:
The tables are normalized and have the relationships shown. Beneath there is a query to show the total power each system uses between any two dates such as for a yearly report. So data looking like this:
is turned into this:

Data Warehouse - Storing unique data over time

Basically we are building a reporting dashboard for our software. We are giving the Clients the ability to view basic reporting information.
Example: (I've removed 99% of the complexity of our actual system out of this example, as this should still get across what I'm trying to do)
One example metric would be...the number of unique products viewed over a certain time period. AKA, if 5 products were each viewed by customers 100 times over the course of a month. If you run the report for that month, it should just say 5 for number of products viewed.
Are there any recommendations on how to go about storing data in such a way where it can be queried for any time range, and return a unique count of products viewed. For the sake of this example...lets say there is a rule that the application cannot query the source tables directly, and we have to store summary data in a different database and query it from there.
As a side note, we have tons of other metrics we are storing, which we store aggregated by day. But this particular metric is different because of the uniqueness issue.
I personally don't think it's possible. And our current solution is that we offer 4 pre-computed time ranges where metrics affected by uniqueness are available. If you use a custom time range, then that metric is no longer available because we don't have the data pre-computed.
Your problem is that you're trying to change the grain of the fact table. This can't be done.
Your best option is what I think you are doing now - define aggregate fact tables at the grain of day, week and month to support your performance constraint.
You can address the custom time range simply by advising your users that this will be slower than the standard aggregations. For example, a user wanting to know the counts of unique products sold on Tuesdays can write a query like this, at the expense of some performance loss:
select distinct dim_prod.pcode
,count(*)
from fact_sale
join dim_prod on dim_prod.pkey = fact_sale.pkey
join dim_date on dim_date.dkey = fact_sale.dkey
where dim_date.day_name = 'Tuesday'
group by dim_prod.pcode
The query could also be written against a daily aggregate rather than a transactional fact, and as it would be scanning less data it would run faster, maybe even meeting your need
From the information that you have provided, I think you are trying to measure ' number of unique products viewed over a month (for example)'.
Not sure if you are using Kimball methodologies to design your fact tables. I believe in Kimball methodology, an Accumulating Snapshot Fact table will be recommended to meet such a requirement.
I might be preaching to the converted(apologies in that case), but if not then I would let you go through the following link where the experts have explained the concept in detail:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
I have also included another link from Kimball, which explains different types of fact tables in detail:
http://www.kimballgroup.com/2014/06/design-tip-167-complementary-fact-table-types/
Hope that explains the concepts in detail. More than happy to answer any questions(to the best of my ability)
Cheers
Nithin

Two independent measures based on two time hierarchies

I've got a simple cube with a fact table which has a date field among others and connected it with a time dimension which has 2 hierarchies.
What I want to do is create one measure that will be filtered only by the one time hierarchy and a second one for the second time hierarchy.
Basically this:
Measure1 ----> Cannot be affected by filtering of time_hierarchy2 and gets filtered only by time_hierarchy1
And the same for Measure2.
With what I've tried so far I can't do this because whenever I add a time hierarchy in the cube browser filter area, it affects both measures while I want them to be independent.
Is this possible?
The idea is to create two instances (i.e Cube Dimensions) of your Database Dimension and put one Hierarchy in each of them. This concept is also known as a Role-Playing Dimension.
You can then add filters using these role-playing dimensions to filter your Measures.
In the way you have described your current data model this is not possible. Within Analysis Services if you were to review the Dimension Usage tab you will notice the dimension to measure group usage. For a single measure to dimension relationship the measure will be affected by all attributes/hierarchies of the related dimension when browsing the cube.
If a viable option would be to have a separate TimeKey in your fact you may establish a Role Playing Dimension and have multiple constraints from the your fact to Time dimension.
Another option could be similar to where I recently split this setup into multiple facts each with a single reference to the Time dimension so that I could the plot separate measures to the same graph on the same time axis. How to avoid Role Playing Dimension

audit table vs. Type 2 Slowly Changing Dimension

In SQL Server 2008+, we'd like to enable tracking of historical changes to a "Customers" table in an operational database.
It's a new table and our app controls all writing to the database, so we don't need evil hacks like triggers. Instead we will build the change tracking into our business object layer, but we need to figure out the right database schema to use.
The number of rows will be under 100,000 and number of changes per record will average 1.5 per year.
There are at least two ways we've been looking at modelling this:
As a Type 2 Slowly Changing Dimension table called CustomersHistory, with columns for EffectiveStartDate, EffectiveEndDate (set to NULL for the current version of the customer), and auditing columns like ChangeReason and ChangedByUsername. Then we'd build a Customers view over that table which is filtered to EffectiveEndDate=NULL. Most parts of our app would query using that view, and only parts that need to be history-aware would query the underlying table. For performance, we could materialize the view and/or add a filtered index on EffectiveEndDate=NULL.
With a separate audit table. Every change to a Customer record writes once to the Customer table and again to a CustomerHistory audit table.
From a quick review of StackOverflow questions, #2 seems to be much more popular. But is this because most DB apps have to deal with legacy and rogue writers?
Given that we're starting from a blank slate, what are pros and cons of either approach? Which would you recommend?
In general, the issue with SCD Type- II is, if the average number of changes in the values of the attributes are very high, you end-up having a very fat dimension table. This growing dimension table joined with a huge fact table slows down the query performance gradually. It's like slow-poisoning.. Initially you don't see the impact. When you realize it, it's too late!
Now I understand that you will create a separate materialized view with EffectiveEndDate = NULL and that will be used in most of your joins. Additionally for you, the data volume is comparatively low (100,000). With average changes of only 1.5 per year, I don't think data volume / query performance etc. are going to be your problem in the near future.
In other words, your table is truly a slowly changing dimension (as opposed to a rapidly changing dimension - where your option #2 is a better fit). In your case, I will prefer option #1.

How to build a proper DB schema to have "periodic snapshots" of a table for a selected day? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem to be solved:
Im new to DataBases and Im trying to find out the best way to store changes in a table, that is a daily snapshot of some statuses: eg. "hotel_room_rentals" table (with 20 columns - every can change).
Id like to be able to generate that table for a selected day (e.g. data inside changes on production, so I have to store it somewhere else), or do some other transformations on it (e.g. average number of days rented in a period)
My theoretical example - detailed:
Let's say that Im creating a DB for a hotel.
In the production system I have a table that shows info for all 10 000 rooms in the hotel.
This is a daily snapshot - let's assume that the table is updated once per day.
Some attributes of a room change often: e.g. is_rented; customer_number, rate_usd.
Some attributes dont change too often: e.g. disabled_room, room_color, type_of_furniture.
Room_number obviously does not change (primary key)
Now I want to find the best way to track changes in this table; the best way to create statistics on base of this table (e.g. average number of days rented in a period) and to be able to generate the table for selected date (e.g. 2013-01-01)
MY IDEA:
Since I have no clue about databases, my idea is to copy the whole table every day, with 1 more column, called "DB_dump_date" (with a date). This is a pretty straightforward approach, which will probably require a lot of space; since my 10k rooms table, will have to be copied 365 times in a year.
OTHER SOLUTIONS:
On some other website, I was recommended to create two tables:
"Reservation" table with these columns: Startdate Enddate Room Rate Occupant_name
Then to transform this table into a FactReservations table: Date Room Is_occupied Rate Occupant_name
I do not understand how does this help me... in fact I assume I would have to make 20 intermediary tables and then 20 Fact tables (since I have 20 columns in my database).
QUESTIONS:
What are the recommended ways to deal with such problems?
Is there any DB schema that is prepared to deal with it, without the user making magic ETLs? (e.g. a DB that can optimize the problem by itself)
What are the alternatives?
How would you, smart people, do this? (preferably in MS Access... or some freeware technology)
edit:
one more thing - everything can change in the table, not only room reservetions, everything; and I want to be able to track the changes
stop - slow down - and take a breath.
do not - repeat do not make copies of tables each day. this approach is way off base.
your problem is a normalization problem. as you indicate - you have other suggestions on how to normalize - this is the direction you want to go.
Your goal will be to find a structure that accommodates the SQL statements that can answer your questions (and hopefully many more that you haven't thought up yet) This will be one static model where the tables do not change or get copied, but are instead static - and the only thing that changes is the data inside the tables. (ideally - to me there will also be few to no updates, only inserts)
You will certainly need a ROOM table, and a CUSTOMER table, and then a relation between them possibly RESERVATION.
these can then fill up - and you can get all the answers to the questions you posed without any copying or materialization or anything.. just SQL.
You need to focus on the requirements and start there. So far for requirements I see are:
-Generate that table for a selected day
-average number of days rented in a period
If we consider two extremes of design, at the more complex end would be a datamart with SCD tables, tracking changes to rooms, and at the simple end would be some kind of log table, along the lines of what you have already mentioned.
Reading between the lines, I don't really see any requirement for knowing the attributes of a room on a given day, but I do see a requirement for analysis of historical transactions.
So my suggestion is have a good hard think about your requirements before you start designing the database.
There is no magic design to cover this automatically. Dimensional design is a standard way of modelling business data to allow for easy analysis, but it might be over the top for your requirement.
Welcome to the world of databases! With that in mind – take almost everything that you know about Excel and throw it out the window. Whereas it’s much more difficult in Excel to define relationships between two sheets of a workbook and report off of those two different sheets, so the majority of the time it’s easier to simply copy the same data down a single sheet, it’s trivially easy to do using Access or any other relational database.
Typically what you’d want to do is create several normalized tables and define a relationship between them. Then, when querying the view, you can easily join between the tables to get the data that you need.
So, working off of the assumption that you’re building this for simple reporting and not to create a property management system (if you are looking at that – I’d recommend that you look at some of the players in the industry, like Micros or Agilysys), based on my experience working in the industry, I’d recommend the following table layout:
Reservations – this holds the reservation information (guest name,
arrival date, departure date, check-in date, check-out date, rate if
you use a blended rate, etc.)
Rooms – this holds information on your rack (number, wing code, max
guests, # beds, smoking/non, view, type, etc.)
Room Status – Only if you need to track if a room is on
reserve/hold/OOO/OTM (Status type, date start, date end)
Room Status Types – Types of room status holds and how it affects
inventory (type, out of inventory flag)
Rates (if you don’t use a blended rate) – one entry per reservation
per night (guest, rate)
Personally, I’m a huge fan of using surrogate keys for the unique identifiers, because all too often I've been burned where something changes in the business process and a natural key that was previously unique all of a sudden can be duplicated. In that vein, each table would have a surrogate key and the joins would be as follows:
Reservations – Rooms (many to one)
Rooms – Room Status (one to many)
Room Status – Room Status Types (many to one)
Reservations – Rates (one to many)
If you define the relationships properly in Access (i.e. foreign key relationships in other DBMS), it should automatically use them to build your joins when creating your queries (called Views in just about every other DBMS) or reports.
For learning about databases I’d recommend that you review:
Wikipedia on Join types
Wikipedia on Slowly Changing Dimension (you could use some of
these techniques to record changes in room information over time)
Wikipedia on Relational Databases
Office documentation on Access
Kimball Group Design Tips (great for data warehouse/datamart
design)
if you need to use your existing table then the following is not applicable. If the data can be migrated to a new schema then this will readily address the challenge. TRE is an approach which uses the current view paradigm for development but fully supports the time dimensions of data (which are system time=when the data goes into the db and valid time=the business time which applies to the data). By working in the current view approach of TRE this sort of problem is straightforward. Take a look at:- http://youtu.be/V1EcsuJxUno