How to schedule an SSAS cube refresh only for new facts or updated dimensions?

How to schedule an SSAS cube refresh only for new facts or updated dimensions? - ssas

Having built a few "test" datacubes through using VS2017, my team are now ready to start working with them in a more production like manner. As such there are a few basic tasks that we need to implement, but we are struggling to find useful resources for.
How can we do a monthly refresh of the cube without regenerating all of our dimensions and fact tables?
Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design?
To have a guess at this:
In our ETL databases (bearing in mind we're using VS2017) we need to:
For the Tables that are used in the DataSourceView, that will ultimately become the Dimensions in the cube:
Create "current" snapshots of our dimensions based on the raw source databases; i.e. what does the Customer dimension look like now?
Compare this with the slowly changing dimension table as held in the ETL from our last processing run.
Make the necessary row inserts and update the audit fields of any old entries.
For the Fact Tables:
For the period since the last refresh add any additional entries to the tables. This should use the updated Dimensions.
When we refresh the datacube on the AnalysisServer what will this do?
Presumably the Dimensions tables are refreshed in their entirety as they are usually relatively small; but will the Fact tables refresh completely or just from the last place they were updated.
Apologies for the basic nature of this question, but we've moved beyond the idealised tutorial stage and are now wallowing in an abyss of jargon and our own ignorance :-(

How can we do a monthly refresh of the cube without regenerating all
of our dimensions and fact tables?
You need to implement an incremental loading inside your ETL logic. You can choose between two types of incremental loading:
Insert & Update only: You can use Lookup Component (IncInsertUpdate)
Insert & Update & Delete: You'll have to implement a bit more complex logic (IncInsertUpdateDelete)
Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design?
Yes, there is Slowly Changing Dimension Component that you can use to handle SCDs.

Related

Slowly changing dimensions in HANA Views?

I am a newbie to HANA.Our org is planning to build a native datawarehouse on top of SAP HANA. Till date we have implemented SCD types using the ETL approach in SAP BODS. Wondering if some types of SCD's could be offloaded onto the HANA Studio by utilising the Views in HANA. Please help me in this regard.

This is a rather broad question that does not allow for a single correct answer.
With slowly changing dimensions (SCD) one tries to preserve data changes over time in a data warehouse. The idea is that changing “master”-data, e.g. which sales person is responsible for which sales region, can be correctly reflected in queries.
One approach (SCD type 2) uses validity timestamps for the records to indicate the time for which those were the valid information.
This approach can be implemented easily with HANA as all one needs to do is to add those validity timestamps to the dimension table. HANA 2 takes this a bit further by providing Bi-temporal history tables (system- and application time).
For this use case one could use the application time ranges in combination with the SELECT ... AS OF TIMESTAMP... feature. This will automatically filter the records that were valid at the provided point in time.
This is also supported with calculation views, CDS views and SQL views.
Whether or not that is improving upon your existing setup is a different question altogether.

Is it possible to create a User-Defined hierarchy between dimension attributes that are unrelated?

I am working on spiking out a BI solution and I am trying to create a Hierarchy in out 'Client' dimension more or less to replicate what I know the end users will do.
The Client dimension table has 3 foreign key relationships with other tables, each of these relationships are standalone from the others. They are Role, Service Type, and Status.
Whenever this dimension will be used it will almost always be with the Role attribute first so I tried to create hierarchies like Role -> Service Type -> Client. Now when I try to process with this setup I get the error "The table that is required for a join cannot be reached based on the relationships in the data source view"
Is there any way to create a Hierarchy with disparate attributes like this?

It isn't possible to build a hierarchy across multiple dimensions that are not related to one another in that way.
One option would be to restructure your schema to a Star schema rather than a Snowflake schema. If that's not an option, depending on the details you might be able to build a query in SSAS or a view in SQL server, and then create a dimension in SSAS from that.
From the information given it's hard to judge whether Role, Service Type, and Status should be in separate tables, but have a good think about this. Should the attributes you want to use from them actually be attributes of Client? Would doing this allow you to get rid of some of those tables and/or reduce the amount of Snowflaking? Remember that dimensional models are very different from a normal relational model: the aim is to keep the number of joins low, and it can be acceptable to repeat attributes in different tables if that's going to be more efficient. Of course, sometimes restructuring just isn't a realistic prospect, which is where a query or view might come in useful to work around this issue.
I've been learning SSAS recently myself, and have found that when I'm struggling to get something working in SSAS, the problem is very often the underlying schema not being set up in the best way for what I'm trying to accomplish.
(Apologies if this is a little vague but it's difficult to be more detailed without knowing the nature of the data, and I don't yet have the ability to comment and ask for more information!)

Cube is already deployed, need to update friendly names in DSV, how to push those updates into the cube?

So, as the title reads I need to update the friendly names of attributes and measures in the datasourceview (DSV). I have done so which is fairly straightforward. The problem I am now running into is getting these changes into the cube. I have tried deploying the cube, processing the dimensions, processing the cube. (I have tried all of these in numerous sequences)
I would think that this would be something that happens a lot. I can't believe that one would have to create a new cube each time friendly names are changed.

Ok, I completely overlooked this you deploy/process your cube. Then you have to go into the tab cube structure. Delete your old measure or attribute and add the newly named one in its place. If you have a better way of doing this please let me know.

Tree Structure With History in SQL Server

I'm not sure if it's a duplicate question, but I couldn't find anything on this subject.
What's the best way for storing tree structure in a table with ability to store history of changes in that structure.
Thank you for any help.
UPDATE:
I have Employees Table and I need to store the structure of branches, departments, sections, sub sections.. in a table.
I need to store historical information on employees branches, departments, to be able to retrieve an employee's branch, department, section, sub section even if the structure has been changed.
UPDATE 2:
There's a solution to save the whole structure in a history table on every change in the structure, but is that the best approach?
UPDATE 3:
There's also Orders Table. I must store employee's position, branch, department, section, sub section and other on every order. That's the main reason for storing history. It will be used very often. In another words I should be able to show db data for every past day.
UPDATE 4:
Maybe using hierarchyid is an option?
What if a node is renamed? What should I do, if I need the old name on old orders?

I think you are looking for something like this. It provides a complete tree structure. This is used for a directory, but it can be used for anything: divisions, departments, sections, etc.
The History is separate and it is best if you get your head aroun dthe Node structure before contemplating the history. For the History of any table, all that is required is the addition of a DateTime or TimeStamp column to the PK. The history stores before images of the current rows.
Functions such as (a) resolving the path of the tree and (b) finding the relevant history rows that were current at some point in time, are performed using pure SQL. With MS you can use recursive CTEs for (a), or write a simpler form in a stored proc.
For (b), I would implement a derived version of the table (Node, Department, whatever) that returns the rows that were current at the relevant point in time; then use that for the historic version of (a).
It is not necessary to copy and save the entire tree structure every time it changes.
Feel free to ask questions if you need any clarification.
Data Model
▶Tree Structure with History Data Model◀
Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.

Depending on how the historical information will be used will determine whether you need a temporal solution or simply an auditing solution. In a temporal solution, you would store a date range over which a given node applies in its given position and the "current" hierarchy is derived by using the current date and time to query for active nodes in order to report on the hierarchy. To say that this is a complicated solution is an understatement.
Given that we are talking about an employee hierarchy, my bet is that an auditing solution will suffice. In an auditing solution, you have one table(s) for the current hierarchy and store changes somewhere else that is accessible. It should be noted that the "somewhere else" doesn't necessarily need to be data. If changes to the company hierarchy are infrequent, then you could even use a seriously low-tech solution of creating a report (or series of reports) of the company hierarchy and store those in PDF format. When someone wanted to know what the hierarchy looked like last May, they could go find the corresponding PDF printout.
However, if it is desired to have the audit trail be queryable, then you could consider something like SQL Server 2008's Change Tracking feature or a third-party solution which does something similar.
Remember that there is more to the question of "best" than the structure itself. There is a cost-benefit analysis of how much effort is required vs. the functionality it provides. If stakeholders need to query for the hierarchy at any moment in time and it fluctuates frequently (wouldn't want to work there :)) then a temporal solution may be best but will be more costly to implement. If the hierarchy changes infrequently and granular auditing is not necessary, then I'd be inclined to simply store PDFs of the hierarchy on a periodic basis. If there is a real need to query for changes or granular auditing is needed, then I'd look at an auditing tool.
Change Tracking
Doing a quick search, here's a couple of third-party solutions for auditing:
ApexSQL
Lumigent Technologies

there are several approaches, however with your limited information I would say to use single table with parent relationship, for the history you can simply implement audit of the table
UPDATE: based on your new information I would not use a single table data store, it looks like your hierarchical structure is much more complex than a simple tree, also it looks like you have some well defined nodes of the structures especially at the top before getting into the sections and sub-sections; so multi-table relations might be a better fit;
as far as the audit tables, there are plenty of resources to find out what will work best for you, there are per row and per column audits, etc.

One thing to note is that you dont ahve to historize the tree structure. It always only grows, never gets smaller.
What chagnes is the USE of the nodes. Their data may change.
For eaxample a site.
/a/b/c
will be there forever. A may be a folder, b too, c a file. Later a may be a folter, b a tombstone (no data here at the moment) as c. But the tree in itself never changes.
Then, add a version number, and for every node a historiy / list of uses (node types) with start and possibly (can be null) end version.
The code for showing a version X then can build out the "real" tree for this moment easily.
To support moves of nodes, have a "rehook" type that indicates the noe was moved to another item at this version.
Voila.

Upgrade strategies for bad DB schema designs

I've shown up at a new job and discovered database which is in dire need of some help. There are many many things wrong with it, including
No foreign keys...anywhere. They're faked by using ints and managing the relationship in code.
Practically every field can be NULL, which isn't really true
Naming conventions for tables and columns are practically non-existent
Varchars which are storing concatenated strings of relational information
Folks can argue, "It works", which it is. But moving forward, it's a total pain to manage all of this with code and opens us up to bugs IMO. Basically, the DB is being used as a flat file since it's not doing a whole lot of work.
I want to fix this. The issues I see now are:
We have a lot of data (migration, possibly tricky)
All of the DB logic is in code (with migration comes big code changes)
I'm also tempted to do something "radical" like moving to a schema-free DB.
What are some good strategies when faced with an existing DB built upon a poorly designed schema?

Enforce Foreign Keys: If a relationship exists in the domain, then it should have a Foreign Key.
Renaming existing tables/columns is fraught with danger, especially if there are many systems accessing the Database directly. Gotchas include tasks that run only periodically; these are often missed.
Of Interest: Scott Ambler's article: Introduction To Database Refactoring
and Catalog of Database Refactorings

Views are commonly used to transition between changing data models because of the encapsulation. A view looks like a table, but does not exist as a finite object in the database - you can change what column is being returned for a given column alias as desired. This allows you to setup your codebase to use a view, so you can move from the old table structure to the new one without the application needing to be updated. But it means the view has to return the data in the existing format. For example - your current data model has:
SELECT t.column --a list of concatenated strings, assuming comma separated
FROM TABLE t
...so the first version of the view would be the query above, but once you created the new table that uses 3NF, the query for the view would use:
SELECT GROUP_CONCAT(t.column SEPARATOR ',')
FROM NEW_TABLE t
...and the application code would never know that anything changed.
The problem with MySQL is that the view support is limited - you can't use variables within it, nor can they have subqueries.
The reality to the changes you wish to make is effectively rewriting the application from the ground up. Moving logic from the codebase into the data model will drastically change how the application gets the data. Model-View-Controller (MVC) is ideal to implement with changes like these, to minimize the cost of future changes like these.

I'd say leave it alone until you really understand it. Then make sure you don't start with one of the Things You Should Never Do.

Read Scott Ambler's book on Refactoring Databases. It covers a good many techniques for how to go about improving a database - including the transitional measures needed to allow both old and new programs to work with the changing design.

Create a completely new schema and make sure that it is fully normalized and contains any unique, check and not null constraints etc that are required and that appropriate data types are used.
Prepopulate each table that fills the parent role in a foreign key relationship with a single 'Unknown' record.
Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services) but there are plenty of others) that you can use to refill the new schema from the existing one on a regular basis. Use the 'Unknown' record as the parent of any orphaned records - there will be plenty ;). You will need to put some thought into how you will consolidate duplicate records - this will probably need to be on a case by case basis.
Use as many iterations as are necessary to refine your new schema (ensure that the ETL Process is maintained and run regularly).
Create views over the new schema that match the existing schema as closely as possible.
Incrementally modify any clients to use the new schema making temporary use of the views where necessary. You should be able to gradually turn off parts of the ETL process and eventually disable it completely.

First see how bad the code is related to the DB if it is all mixed in no DAO layer you shouldn't think about a rewrite but if there is a DAO layer then it would be time to rewrite that layer and DB along with it. If possible make the migration tool based on using the two DAOs.
But my guess is there is no DAO so you need to find what areas of the code you are going to be changing and what parts of the DB that relates to hopefully you can cut it up into smaller parts that can be updated as you maintain. Biggest deal is to get FKs in there and start checking for proper indexes there is a good chance they aren't being done correctly.
I wouldn't worry too much about naming until the rest of the db is under control. As for the NULLs if the program chokes on a value being NULL don't let it be NULL but if the program can handle it I wouldn't worry about it at this point in the future if it is doing a default value move that to the DB but that is way down the line from the sound of things.
Do something about the Varchars sooner rather then later. If anything make that the first pure background fix to the program.
The other thing to do is estimate the effort of each areas change and then add that price to the cost of new development on that section of code. That way you can fix the parts as you add new features.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas