please could someone tell me why some people do this after creating our data warehouse we create report (Repporting ) and Olap analysisenter image description here
my question why will we do olap analysis and we create repport what is the Beneficial of doing both of them , i think reporting is sufficient to help the client to analyse the data.But still some client ask for both .
I use Analysis services models as the source for all reporting. In your case you may have transactional reporting (large amounts of row-level data) which doesn't lend itself to the technology. Analysis services would be better suited to data which is likely to be aggregated.
Tabular models are a great way to present data to users for them to interact with as they can be designed in a way which makes them better for self-service data analytics.
I've also implemented the hybrid approach you mentioned. This can be useful if businesses have varying report requirements. For example dashbiarding could be done using power bi connected to the tabular model whereas transactional reporting such as large emailed spreadsheets could be run from the sql server (perhaps using ssrs or power bi paginated).
Related
My company wants to speed up the process of delivering reports. Internally, we have a team of 12 people working on building reports. The company is a large company with over 10,000 employees. We're asked to work on adhoc reports quite frequently, but it takes us on average 1-2 weeks to deliver these reports. Senior Execs have said that the time to deliver is too slow. An external consulting firm came into to do some discovery work and they have advised that business users should have access to the Azure Data Warehouse, so that they can directly build models in Azure Analysis services and Power BI.
The design that they have suggested is as follows:
Load data from SAP, into the Azure Data Warehouse directly.
Build our data models in the Azure DW - this means all the transformation work is done directly in Azure DW (Staging, Cleansing, Star Schema build).
Build the models in Azure Analysis Services.
Consume in Power BI.
Does this seem like a good strategy? I am new to Azure Data Warehouse and our technical lead is on paternity, so we are unable to ask for his help.
I asked the external consultant what the impact would be directly applying all transformation workloads to Azure DW, and he has said that 'it's mpp, so processing is super-fast'.
Can anyone help? My team technical lead is on paternity so we can't get hold of him.
Azure is certainly a great platform for the modern data warehousing and analytics purpose, but ADW or not requires more study. Generally speaking, you can consider two options:
Volume is not huge ( < 10TB ):
SAP -> SSIS/ADF -> Azure SQL DB -> Azure Analysis Services (as semantic layer) tabular model with DAX -> Power BI
Volume is huge ( > 10TB ):
SAP -> SSIS/ADF -> Azure SQL DW -> Azure Analysis Services semantic layer -> Power BI
Of course, the volume is just one of many factors to consider when you decide the architecture, but it is an important factor from numerous real-world experiences where MPP may not be really necessary. The actual architecture and sizing require a lot more effort to research. The above points are the very general for your reference, to have something to start with and explore further.
If you look for more technical detail to bring SAP data to Azure, you can review our blog at here http://www.aecorsoft.com/blog/2018/2/18/extract-sap-data-to-azure-data-lake-for-scale-out-analytics-in-the-cloud and here http://www.aecorsoft.com/blog/2018/4/26/use-azure-data-factory-to-bring-sap-data-to-azure.
I have several OLTP databases with API's talking to them. I also have ETL jobs pushing data to an OLAP database every few hours.
I've been tasked with building a custom dashboard showing hight level data from the OLAP database. I want to build several API's pointing to the OLAP database. Should I:
Add to my existing API's and call the OLAP database and use a CQRS type pattern, so reads come from OLAP, while writes come from OLTP. My concern here is that there could be a mismatch in the data between reads and writes. How mismatched the data is depends on how often you run the ETL jobs (Hours in my case).
Add to my existing API's and call the OLAP databases then ask the client to choose whether they want OLAP or OLTP data where API's overlap. My concern here is that the client should not need to know about the implementation detail of where the data is coming from.
Write new API's that only point to the OLAP database. This is a lot of extra work.
Don't use #1: when management talk of analyzed reports it don't bother data mismatch between ETL process - obviously you will generate a CEO report after finishing ETL for the day
Don't use #2: this way you'll load transnational system with analytic overhead and dissolve isolation between purpose of two systems (not good for operation and maintenance)
Use #3 as its the best way to fetch processed results, Use modern tools like Excel, PowerQuery, PowerBI to allow you to create rich dashboard with speed instead of going into tables and writing APIs.
We are looking to build a solution on GCP for campaign/ad analytics (ingest doubleclik and other ad-server data into DW). Data is ingested as batch, with star schema but will have updates trickling in for up to a week, need trend analysis for multiple clients (advertisers) and reporting. I can't decide between Google Big table which supports updates and timeseries analysis Vs Big Query which is ideal for star schema and ad-hoc analysis.
Any suggestions? Performance and flexibility are important.
You may find the following solution guide helpful for learning how to build an analysis pipeline using BigQuery and other GCP products and tools:
https://cloud.google.com/solutions/marketing-data-warehouse-on-gcp
Bigtable meanwhile is a good fit for building real-time bidding and other pieces of core ad serving infrastructure. See e.g.:
https://cloud.google.com/customers/mainad/
https://cloud.google.com/solutions/infrastructure-options-for-building-advertising-platforms
I know that OLAP is used in Power Pivot, as far as I know, to speed up interacting with data.
But I know that big data databases like Google BigQuery and Amazon RedShift have appeared in the last few years. Do SQL targeted BI solutions like Looker and Chart.io use OLAPs or do they rely on the speed of the databases?
Looker relies on the speed of the database but does model the data to help with speed. Mode and Periscope are similar to this. Not sure about Chartio.
OLAP was used to organize data to help with query speeds. While used by many BI products like Power Pivot and Pentaho, several companies have built their own ways of organizing data to help with query speed. Sometimes this includes storing data in their own data structures to organize the data. Many cloud BI companies like Birst, Domo and Gooddata do this.
Looker created a modeling language called LookML to model data stored in a data store. As databases are now faster than they were when OLAP was created, Looker took the approach of connecting directly to the data store (Redshift, BigQuery, Snowflake, MySQL, etc) to query the data. The LookML model allows the user to interface with the data and then run the query to get results in a table or visualization.
That depends. I have some experience with BI solution (for example, we worked with Tableau), and it can operate is two main modes: It can execute the query against your server, or can collect the relevant data and store it on the user's machine (or on the server where the app installed). When working with large volumes, we used to make Tableau query the SQL Server itself, that's because our SQL Server machine is very strong compared to the other machines we had.
In any way, even if you store the data locally and want to "refresh" it, when it updates the data it needs to retrieve it from the database, which sometimes can also be an expensive operation (depends on how your data is built and organized).
You should also notice that you compare 2 different families of products: while Google BigQuery and Amazon's RedShift are actually database engines that used to store the data and also query it, most of the BI and reporting solutions are more concerend about querying the data and visualizing it and therefore (generally speaking) are less focused on having smart internal databases (at least from my experience).
Is it correct in my understanding that we can build SSAS cubes sourcing from the transaction Systems? I meant the not the live but copy of the Live.
I'm trying to see if there is any scope to address few reporting needs without the need to build a traditional Data Warehouse and then build cubes on top of the data warehouse, instead build cubes to do Financial monthly aggregated reporting needs sourcing from backup copy of the Transaction systems.
Alternatively, if you have any better way to proceed please suggest.
Regards,
KK
You can create a set of views on top of you transactional system tables and then build your SSAS cubes ontop of those views. This would be less effort than creating a fully fledged datawarehouse.
I am a data warehouse developer (and therefore believe in cubes), but not every reporting solution warrants the cost of building a cube. If your short to medium term reporting requirements are fixed and you don't have users requiring data to be sliced differently each week, then a series of fixed reports may suffice.
You can create a series of SQL Server Reporting Services reports (or extract to Excel) either directly against your copied transactional data, or against a series of summarised tables that are created periodically. If you decide to utilise a series of pre-formatted reporting tables, try to create tables that cover multiple similar reports (rather than 1 monthly report table = 1 report) for ease of ongoing maintenance.
There are many other important aspects to this that you may need to consider first. Like how busy is the transaction system, what is the size of the data, concurrency and availability issues etc.
It is absolutely fine to have a copy of your live data and then build a report on the top of it. Bear in mind that the data you see in the report will not be the latest and there will be a latency factor depending on the frequency of your data pull.