I finally decided to ask this (after a lot of google searching):
So we use Power BI for data visualization and thus in it are some calculated dashboards / data outputs which are used to monitor data quality etc. I want to be able to historical log these results so that over-time we can monitor progress i.e. was data quality improved. This is the end of the initial problem.
One approach to this problem was to connect to PowerBI from the MS-SQL side - hoping we can then set timed triggers to do the log by READING THE POWER-BI DASHBOARDS: So how do I query that (I have already developed a method to determine the connection using the Power-BI port as described here:
EXPORTING DATA FROM POWER BI DESKTOP TO MS-SQL
This is a screenshot from one of my MS-SQL connections through "Analysis Services":
I am assuming the objects named like "LocalDateTable_" are the actual BI analysis I want to query. "New Query" is an MDX type of Query. Should I go this route for my problem (logging powerbi analyses)?
At first this sounds crazy but on reflection I guess it was only a matter of time, and a sign of the maturity of Power BI solutions ...
I would use the SQL Server Profiler to capture the queries generated while you use your dashboard & report.
https://insightsquest.com/2017/05/07/profiler-trace-for-power-bi-desktop/
Then I would build an SSIS package to run the MDX queries and deliver the datasets to SQL Server, with extra columns e.g. StartTime.
Related
We have many models on our SSAS instance and we made it easy for users to start the model processing (through a bespoke Excel add-in).
Now, a side effect is that there are apparently several Tabular models being processed at the same time.
Is there a way to find which tabular models are currently being processed?
You can use the tool SSAS Activity Monitor. The nice part about this tool is you can just launch it and in seconds be looking at all activity in SSAS.
You connect to your SSAS instance then use the Current Queries Panel. There will be a row for every query. When you process, the COMMAND_TEXT will start with <Batch...
and further in the COMMAND_TEXT you'll find the DatabaseID and process Type.
You could also query the DMVs directly. This query from SSMS found it as well, but it's harder to see there:
select * from $SYSTEM.DISCOVER_SESSIONS;
But you could put this SQL into Power Query and have it parse and filter down nicely.
I am assisting my team troubleshoot an issue with a Power BI report we are developing. We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data. We have a requirement to use DirectQuery, as one key requirement for the report is that the most up-to-date data in the database is visible, rather than having a delay in loading/caching the data. We also have the single data source, just the one database.
When we run the report, we see a spike of 200-500 connections to the database from the specific user for the report data source, and those connections don't close. This is clearly an issue and unsustainable for any product. We have a ticket open with Microsoft premium support to address the connections not closing, but in the meantime, I'm wondering if we're doing something wrong inside the report?
When I view the queries in the query editor, we basically have one query for each view, and it's a simple:
let
Source = Sql.Database(Server, Database)
query_view_name = Source{[Schema ......]}[Data]
in
query_view_name
(I don't have the raw code in front of me, but that's the gist of it.)
It seems to me, based on analytics in the database, that "Sql.Database" is opening a new connection every time this view is called. And with 5-6 views, that's 5-6 connections at a minimum; then each time a filter is changed, it's more connections, and it's compounds from there until the database connection pool is maxed out.
Is there a way to populate all the tables using a single connection to the database? Why would Power BI be using so many connections? Can we populate multiple tables in the advanced query editor? Using DirectQuery, are there any suggestions for what we can look at/troubleshoot/change in the report?
Thanks!
Power BI establishes multiple connections to the database to load multiple tables in parallel. If you don't want this, you can turn it off from Options->Current file->Data Load->Enable parallel loading of tables:
Keep in mind, that turning this option off most likely will increase the model loading time.
You may want to take a look at Maximum connections per data source option in Options->Current file->Direct query and the whole section Query reduction beneat it. Turning on Slicer selection and Filter selection on this page is highly recommended for cases like yours, but you need to train your users that they need to click on apply to see the results.
Ok.
We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data.
That's fine.
We have a requirement to use DirectQuery,
But now you're going to have a bad time. DirectQuery + complex views is a recipe for poor performance. Queries against your views will add joins, potentially across the whole model for filter context, as well as Measure and Calculated Column expressions. And these queries will change dynamically, based on the user's interaction with the report. So it's very difficult to see and test all the possible queries.
Basic guidance is to use import mode against views, and only use DirectQuery against properly-indexed tables. To address data freshness, you can replace the views with tables you load and keep up-to-date from your application, or perhaps use an Indexed View, etc.
I am using PBI Desktop to create PBIX files which I later upload to Azure Power BI Embedded (PaaS solution). The PBIX gets data from Azure SQL server in Direct Query mode. I want to increase the efficiency of queries that Power BI Embedded sends to SQL for getting my data.
My pbix contains relationships between many tables and RLS (Row Level Security) configured and is taking a lot of time to load. Please advice if the following options will help me increase the efficiency of queries, thus reducing the time taken by the pbix to load: -
Using Advanced Options in the Get Data dialog box : Inserting a SQL statement here will get only specific data instead of the entire table. This will reduce the data I see in PBI Desktop, but will it really increase the efficiency of queries sent to SQL for the creation of charts? Eg: Say PBIX needs to create a join between two tables. If I use the advanced options, will the Join be done on reduced data?
Using Filters to filter out unwanted rows of the table : Again like above option, this will reduce the data I see in PBI Desktop, but will it really increase the efficiency of queries sent to SQL for the creation of charts? Eg: If I use filters, will the Join be done on reduced data?
[EDIT - I have added one more option below]
Are the queries for charts on different pages of a PBIX file sent to SQL only when the page is loaded? : If this is true then I can separate my charts into different pages to reduce the number of queries sent at once to SQL.
Is there anyone who has an idea how i can log user access in SSAS and Power Pivot?
The problem we face that we need to keep track of who is accessing what. it is not enough to save the question when the result set changes over time (SELECT * ... does not result in the same today as yesterday), so the whole or parts result set need to be logged. I can imagine that it is possible to solve for reports created in SSAS, as it is being in a SQL server. Power Pivot and self-service BI, I'm more thoughtful about how it should be done.
If your Power Pivots are stored in a BI-enabled SharePoint site then those Power Pivot workbooks are effectively on a regular SSAS tabular server also. So any logging that you can put in place for an SSAS tabular database you could probably adapt for PowerPivot workbooks. But in the SharePoint SQL Server instance you will also find a database called DefaultPowerPivotServiceApplication[GUID]. In this database you will have a table called [Usage].[Requests] that records who access what file and when.
I have been working with Pentaho for the last few days. I have been able to setup the Pentaho Report Designer to generate a sample report by follow their documentation. Then I follow this article http://www.robertomarchetto.com/www/how_to_use_pentaho_report_designer_tutorial and managed to export the report to Pentaho BI server.
All I don't understand is Pentaho workflow. What should be the process I should follow which means what's the purpose of exporting the export to Pentaho BI server? Why there is a Data Integration tool? Why there is a BI sever when I can export the report from the Designer tool?
Requirement
All I want to do is retrieve the data from the MYSQL DB. Put them into a data-mart. Then from the data-mart generate a report.(According to what I have read, creating a data mart is the efficient way).
How can I get it done?
Pentaho Data Integration can be used to make this report generation automated.
In report designer you will be passing a parameter or set of parameters to generate a single report output.
With Data integration you can generate the reports for different set of parameters. for eg: if reports are generated on daily basis, we can make it automated for the whole month, so that there is no need of generating reports daily and manually.
And using the Pentaho Business Intelligence server we can make all these operations scheduled.
To generate Data/Table(Fact tables/dimension table) in MYSQL DB From difference source like files/different DB - Data Integration tool comes in to picture .
To create Schema on top of Fact tables - Mondrian tool
To handle user/roles on top of created cubes -Meta data editor
To create simple reports on top of small tables - Report Designer
For sequential Execution (at a go) usage of DI jobs/transformation , Reports, Java script - Design Studio
thanks to user surya.thanuri # forums.pentaho.com
The Data Integration tool is mostly for ETL, it's a separate tool and you can ignore it unless you are doing complex analysis of data from multiple dissimilar data sources. You don't need to 'export' reports to the pentaho server, you can write them directly to a directory then refresh the repository from inside the Pentaho web application. Exporting them is just one workflow technique.
You're going to find that there are about a dozen ways to do any one thing with Pentaho. For instance I use the CDA datasources with my reports vice placing the sql code inside my report. Alternatively you can link up to a Data Integration server to execute the Data Integration scripts to view a result set.
Just to answer your datamart question. In general a datamart should probably be supported by either the Data Integration tool (depending on your situation I don't exactly recommend this) or database functions/replication streams (recommended).
Just to hazard a guess, it sounds like someone tossed you a project saying: We need a BI system, here's the database where the data is stored, here are the reports we're already getting. X looked at Pentaho and liked it. You should use that.
First thing you need to do is understand the shape of the data, volume, tables, interrelations. Figure out what the real questions they want to answer are. Determine whether they need real time reporting, etc..etc. Just getting the datamart together itself, if you even need one, can take quite awhile. I think you may have jumped the gun on Pentaho itself.
thanks to user flamierd # forums.pentaho.com