Slowly changing dimensions in HANA Views?

Slowly changing dimensions in HANA Views? - hana

I am a newbie to HANA.Our org is planning to build a native datawarehouse on top of SAP HANA. Till date we have implemented SCD types using the ETL approach in SAP BODS. Wondering if some types of SCD's could be offloaded onto the HANA Studio by utilising the Views in HANA. Please help me in this regard.

This is a rather broad question that does not allow for a single correct answer.
With slowly changing dimensions (SCD) one tries to preserve data changes over time in a data warehouse. The idea is that changing “master”-data, e.g. which sales person is responsible for which sales region, can be correctly reflected in queries.
One approach (SCD type 2) uses validity timestamps for the records to indicate the time for which those were the valid information.
This approach can be implemented easily with HANA as all one needs to do is to add those validity timestamps to the dimension table. HANA 2 takes this a bit further by providing Bi-temporal history tables (system- and application time).
For this use case one could use the application time ranges in combination with the SELECT ... AS OF TIMESTAMP... feature. This will automatically filter the records that were valid at the provided point in time.
This is also supported with calculation views, CDS views and SQL views.
Whether or not that is improving upon your existing setup is a different question altogether.

Related

Multiple Datamarts Architecture / Modeling on Snowflake cloud datawarehouse

Context :
Let's suppose we have multiple datamarts (Ex : HR, Accounting, Marketing ...) and all of them use the Star Schema as dimensional modeling (Kimball approach ) .
Question :
Since Snowflake cloud data warehouse architecture eliminate the need to spin off separate physical data marts / databases in order to maintain performance. So, what's the best approach to build the multiple datamarts on Snowflake ?
Create database for each datamart ? create one database (EDW )with multiple schema and each schema refer to a datamart ?
Thank you !

Ron is correct - the answer depends on a few things:
If there are conformed dimensions, then one database and schema might be the way to go
If they are completely non-integrated data marts I would go with separate schemas or even separate databases. They are all logical containers in Snowflake (rather than physical) with full role based access control available to segregate users.
So really - how do you do it today? Does that work for you or are there things you need or want to do that you cannot do today with your current physical setup. How is security set up with your BI tools? Do they reference a database name or just a schema name? If you can, minimize changes to your data pipeline and reporting so you have fewer things that might need refactoring (at least for your first POC or migration).
One thing to note is that with Snowflake you have the ability to easily do cross database joins (i.e., database.schema.table) - all you need is SELECT access, so even if you separate the marts by database oyu can still do cross mart reporting if needed.
Hope that helps.

There is no specific need to separate star schemas at all.
If you're using shared / conformed dimensions across your marts, separation would actually be an anti-pattern.
If your issue is simplifying the segregation of users, schema per mart works well.
All of the approaches you've suggested (DB/mart, DW/schema,...) will work, I'm just not clear on the need.

The goal of having separate data marts is more related to governance, to keep data organized and where it is expected to be found (i.e. sales transactions in the "sales data mart"), and less related to performance issues.
The advantage of having a single database acting as a data warehouse is that all your data for analytics will be stored in one place, making it more accessible and easier to find. In this case, you can use schemas to implement (logically) separate data marts. You can also use schemas within a database to keep development data separate from production data, for each data mart.
Snowflake is different from traditional relational databases; given its technical architecture, it has no issues with joining large tables between different databases/schemas so you can certainly build different data marts in separate databases and join their facts or dimensions with some other Snowflake database/data mart.
In you specific case, if you have a large number of data marts (e.g. 10 or more) and you're not using Snowflake for much more than data warehouseing, I think the best path would be to implement each data mart in its own database and use schemas to manage prod/dev data within each schema. This will help keep data organized, as opposed to quickly reaching a point where you'll have hundreds of tables (every data mart, and its dev/prod versions) in one database, which won't be a great development or maintenance experience.
But, from a performance perspective, there's really no noticeable difference.

How to schedule an SSAS cube refresh only for new facts or updated dimensions?

Having built a few "test" datacubes through using VS2017, my team are now ready to start working with them in a more production like manner. As such there are a few basic tasks that we need to implement, but we are struggling to find useful resources for.
How can we do a monthly refresh of the cube without regenerating all of our dimensions and fact tables?
Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design?
To have a guess at this:
In our ETL databases (bearing in mind we're using VS2017) we need to:
For the Tables that are used in the DataSourceView, that will ultimately become the Dimensions in the cube:
Create "current" snapshots of our dimensions based on the raw source databases; i.e. what does the Customer dimension look like now?
Compare this with the slowly changing dimension table as held in the ETL from our last processing run.
Make the necessary row inserts and update the audit fields of any old entries.
For the Fact Tables:
For the period since the last refresh add any additional entries to the tables. This should use the updated Dimensions.
When we refresh the datacube on the AnalysisServer what will this do?
Presumably the Dimensions tables are refreshed in their entirety as they are usually relatively small; but will the Fact tables refresh completely or just from the last place they were updated.
Apologies for the basic nature of this question, but we've moved beyond the idealised tutorial stage and are now wallowing in an abyss of jargon and our own ignorance :-(

How can we do a monthly refresh of the cube without regenerating all
of our dimensions and fact tables?
You need to implement an incremental loading inside your ETL logic. You can choose between two types of incremental loading:
Insert & Update only: You can use Lookup Component (IncInsertUpdate)
Insert & Update & Delete: You'll have to implement a bit more complex logic (IncInsertUpdateDelete)
Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design?
Yes, there is Slowly Changing Dimension Component that you can use to handle SCDs.

SAP HANA Analytical Views

I have been trying to learn hana these past few days and have been getting some problems. So As i see SAP HANA is used for de-normalization of data(as per some tutorials that i have seen). So i make the analytic views and I have my data denormalized after making the analytical views. What next?. How do I harness/use these views to create reports for business analysis. I need to generate several reports based on this de-normalized data(which i intend to ultimately use for a website based product). Do i need to create different Anaytical views for different reports?

HANA is not for denormalization of your data. You don't have to create aggregate and denormalized tables to speed up your analytics. In a normal analytics scenario you might build these but this will result in duplicate data, double maintenance to keep these up date etc..
Instead of this you can use your normal normalized database tables as a master/transactional data foundation and then build analytic views on these. How many views have to be created for different reports depends on your actual business needs, because views contain data in many aspects so they can be reused. In case of more complex reports you can of course create calculation views to get the exact data you need.
HTH

Data Aggregation - Daily SQL Script vs Data Warehouse

Pardon me if this has already been asked (I know very little about Data Warehouse/BI and have yet to master the keywords).
I have a table that grow by more then 100 000 rows per day, each row having a timestamp and multiple information about an item (dimensions, weight,color,etc). Individual data can be useful for roughly a month after this period we are only interested in aggregations. I have a dedicated software that allow a more detailed visualisation of individual rows and mainly use PowerPivot for my reporting needs.
I could come up with an SQL query that would fill a new table daily:
In which I would have a row for each hour/item/batch and I would summarize the information (sum/average/stddev/etc.)
Within a day my script would be up and running and I could use powerpivot against this new table. All this while staying where I'm comfortable: plain old SQL.
From the few information I gathered reading about DataWarehouse and BI, what I'm about to do sounds a lot like creating dimensions and facts. My question therefore: is it worthwhile to investigate further in that direction (BI) or since my problem is relatively simple I would do better staying in a relational database.
N.B. Reports that are being produced are usually linked against another database to produce more meaningful informations. Task that is very well accomplished by Powerpivot.

Datawarehouses are normally implemented in relational databases, so your existing skills will still be usable.
Given that you have expressed an interest in the dimension/fact table approach to datawarehousing, the canonical books on this approach are usually considered to be:
The Date Warehouse Toolkit (Kimball, Ross)
The Date Warehouse Lifecycle Toolkit (Kimball, Ross, Thornthwaite, Mundy, Becker)
(The former has more of a technical focus, while the latter approaches the subject from a wider lifecycle management viewpoint.)
Implementing DWHs can be time-consuming, so it may be worth continuing with your existing approach even if you decide to build a DWH.

Good news: it sounds like you already have a data warehouse. "Data warehouse" is a very generic term, with no real formal definition - it pretty much means whatever you want it to.
Commonly accepted characteristics are:
Data warehouses do not run on the operational databases
Data warehouses schemas are optimized for querying, not for "normal form" compliance
Data warehouses are populated by "Extract, Transform, Load" proceses (ETL).
It sounds like you're already doing all of that. If there are no business requirements to change, I'd leave it as it is. If your business users are asking to create their own queries, using different levels of aggregation, filtering, or granularit, a star schema may be the way to go.

The most effective solutions are those which are simple, adequate to meet existing needsand stay within available skillsets.
I agree that this approach works well for your situation an if it provides the reports and information you need then its worth starting this way. If you need more complex functionality later then you can go for more complex BI

Is it possible to use Nhibernate with partition of an object over several tables?

We are having a system that gather large quantities of data each month and performes rather advanced calculations that increase the database even more. Since we have the requirements from the customer that we need to store data for fast access three years back and that we must be able to access older data (up to ten years), this however can be low performance and requires some work. We want to avoid performance issues where the database and its tables grows out of proportion.
After discussing using SQL Enterprise (VERY costly and full of traps since we haven't gotten the know-how) and since our system have so many tables that referenses each other we are leaning towards creating some kind of history tables to which we move data in a monthly fashion and rewrite the select queries that we have based on parameters to search either in the regular table or in the history or both depending on the situation.
Since we also are using NHibernate for mapping I was wondering if it is possbible to create a mapping file that handles this by itself (almost) using some sort of polymorfism or inheritance in which each object is stored in different tables based on parameters?
I know this sounds complicated and strange and that there is other methods to perform this but I this question I would rather have people answering the question asked and not give other sugestions to use instead.

As far as I know NHibernate can't do that (each class can be mapped to one table/view )but you can use SQL Queries or StoredProcedures (depends on the version of NHibernate that you are using) to populate mapped objects.
In your case you can have a combined view created by making unions of different tables Then you can use a SQL query to populated your entity.
There's also another solution that you create a summary object for your queries that uses that view ,therefore you can use both HQL and criteria to query this object.

Short answer "no". I would not create views as you mention a lot of joining.
Personally I would create summary tables and map to these directly using a stateless session or a very least mutable=false on the class definition. Think of these summary tables as denormalised data for report only. The only drawback is if historic data changes on a regular basis then the summary tables also needs changing. If historical data never changes then this should be simple to achieve.
I would also most probably store these summary tables on another catalog rather than adding to the size of the current system.
Its not a quick win this one I am afraid.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas