I have a stored procedure as a source connection in Tableau 8.1. It takes a long time to fetch and display ( about 1 min) 40000 records (there is no bar chart, pie charts etc).
What the stored proc does is it selects 40000 records with some 6-7 table joins.
However the same stored procedures executes and displays the records in sql server management studio within 3 seconds.
After using SQL Server Profiler, it shows that some 45000 inserts into a tableau temp table occurs which takes a long time. Also, it shows in the log file that it takes a high percentage of time for the inserts while the execution of stored proc itself takes about 4-5 seconds only.Is this the problem ?Any suggestion how to over come this issue?
Regards
Gautam S
A few of places to start:
First check out the Tableau log file in your Tableau repository directory after trying to access your data. There will be a lot of information in there, but you should be able to see the actual SQL that Tableau sends to your database -- and that may give you some clues about what it is doing that is taking so long. If nothing else, you can cut and paste the SQL into your database tools and try to isolate the problem SQL without Tableau in the mix
If the log file doesn't give you an idea about how to restructure your system to avoid the long query, then send it along with info about your schema to Tableau support. They may be able to help.
Simplify whatever you can to reduce the problem to its core, get rid of everything in your visualization but a total, and then slowly build it back up to see what causes the behavior. For example, make a test version and remove one table at a time from your query to see what causes the problem.
Avoid using quick filters if you see performance problems (or minimize them) Nice feature, but comes with a performance cost
Try the Tableau performance monitoring (record and analysis) features
Work with a smaller data set during testing so you can more quickly experiment with different approaches
Try replacing your stored procedure with a view. That's usually better if at all possible.
Add indices to speed the joins
If there is no way around the long operation and if updates are infrequent, make a Tableau extract so that you only pay that cost periodically
If none of these things help, cut the problem down to its simplest version and post a schema and the problem SQL Otherwise, people can only give you generic advice
Related
I would like to increated the performance of our pipelines.
The pipelines currently run from an integration runtime.
I am running a single copy activity on tables held on our Source which is a SQL Database. Tables contain just under a million rows, with about 15 columns.
Currently the time it takes to copy a table from Source to Sink(ADLS) is approximately 20mins.
Is there a way to increase the DIU to increase performance?
My current copy settings are as follows:
I'm thinking that if I made some changes to Settings, see below, I would improve performance, but I have never played around to settings before, any suggestions most welcomed.
The activity details for a pipeline run is as follows:
My link service is an Azure Synapse Link service, see below:
From the output window, we can see that almost all the wait time was "Time to first byte", which means your SQL server is slow to reply. It takes ~22 minutes for less than 90K rows. So changes on the ADF side will not help.
If your query is a simple "select * from table", then maybe your SQL server is low on resources. You can check that in your database portal in Azure. Try to add more resources and see if copy times improve.
If this is a query from a view or other complicated query, maybe it needs some improvement (indexes, improve code). You can test that by writing the query result to a table in your SQL database, use that table as the data factory source, and see if this improves copy time.
Quick check , is the Azure SQL and storage account in the same region ? Also I see that your copy activity is set as parraleism as 1 , you can play with number and see if that helps .
How to setyp parallelism please read here : https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#parallel-copy
Please see the snaphot below
As BigQuery is a shared resource, it is possible that one gets different values running the same code on BigQuery. OK one option that I always use is to turn off caching in Query Settings, Cache preference. This way queries will not be cached. The problem with this setting is that if you refresh the browser or leave it idle, that Cache Preference box will be ticked again.
Anyhow I had a discussion with some developers that are optimizing the code. In a nutshell, they take slow running code, run it 5 times and get an average, then following optimization then run the code again 5 times to get an average value for optimized SQL. Details are not clear to me. However, my preference would be (all in BQ console)
create a user session
turn off sql caching
On BQ console paste the slow running code;
On the same session paste the optimized code
Run the codes (separated by ";")
This will ensure that any systematics like BQ busy/overloaded, slow connection etc will affect "BOTH" SQL piece equally and the systematics will be cancelled out. In my option one only need to run it once as caching is turned off as well. Running 5 times to get an average looks excessive and superfluous?
Appreciate any suggestions/feedback
Thanks
Measuring the time is one way, the other way to see if the query has been optimized is the understanding of the query plan and how slots are used effectively.
I've been with BigQuery more than 6 years, and what you describe was never used by me. In BigQuery actually what matters is reducing the costs, and that can be done iteratively rewriting the query, and using partitioning/clustering/materialized views, caching/temporary tables.
I am fairly new to SSRS reports so I am looking for guidance. I have SSRS reports that have 3 visible parameters: Manager, Director, and VP. The report will display data based on the parameters selected. Initially, the report was taking a very long time to load and my research led me to creating a snapshot of the report.
The initial load of the report is really quick (~5 secs) but the parameters are set to "Select All" in all sections. When the report is later filtered to say, only 1 VP, the load time can vary anywhere between 20 to 90 seconds. Because this report will be used by all aspects of management within the organization, load time is critical.
Is it possible to load the filtered data quicker? Is there anything I can do?
Any help will be much appreciated.
Thank you!
This is a pretty broad efficiency issue. One of the big questions is whether or not the query takes a long time to run in the database or just in SSRS. Ideally you would start with optimizing the query and indexing, but that's not always enough. So the work has to be done somewhere, all you can do is shift the work to be done before the report is run. Here are a couple options:
Caching
Turn on caching for the report.
Schedule a subscription to run with each possible value for the parameter. This will cause the report to still load quickly once an individual is specified.
Intermediate Table
Schedule a SQL stored procedure to aggregate and index the data in a new table in your database.
Point the report to run from this data for quick reads.
Each option has it's pros and cons because you have to balance where the data preparation work is done. Sometimes you have to try a few options to see what works best for your situation.
I have a new idea and question about that I would like to ask you.
We have a CRM application on-premise / in house. We use that application kind of 24X7. We also do billing and payroll on the same CRM database which is OLTP and also same thing with SSRS reports.
It looks like whenever we do operation in front end which does inserts and updates to couple of entities at the same time, our application gets frozen until that process finishes. e.g. extracting payroll for 500 employees for their activities during last 2 weeks. Basically it summarize total working hours pulls that numbers from database and writes/updates that record where it says extract has been accomplished. so for 500 employees we are looking at around 40K-50K rows for Insert/Select/Update statements together.
Nobody can do anything while this process runs! We are considering the following options to take care of this issue.
Running this process in off-hours
OR make a copy of DB of Dyna. CRM and do this operations(extracting thousands of records and running multiple reports) on copy.
My questions are:
how to create first of all copy and where to create it (best practices)?
How to make it synchronize in real-time.
if we do select statement operation in copy DB than it's OK, but if we do any insert/update on copy how to reflect that on actual live db? , in short how to make sure both original and copy DB are synchronize to each other in real time.
I know I asked too many questions, but being SQL person, stepping into CRM team and providing suggestion, you know what I am trying to say.
Thanks folks for your any suggestion in advance.
Well to answer your question in regards to the live "copy" of a database a good solution is an alwayson availability group.
https://blogs.technet.microsoft.com/canitpro/2013/08/19/step-by-step-creating-a-sql-server-2012-alwayson-availability-group/
Though I dont think that is what you are going to want in this situation. Alwayson availability groups are typically for database instances that require very low failure time frames. For example: If the primary DB server goes down in the cluster it fails over to a secondary in a second or two at the most and the end users only notice a slight hiccup for a second.
What I think you would find better is to look at those insert statements that are hitting your database server and seeing why they are preventing you from pulling data. If they are truly locking the table maybe changing a large amount of your reads to "nolock" reads might help remedy your situation.
It would also be helpful to know what kind of resources you have allocated and also if you have proper indexing on the core tables for your DB. If you dont have proper indexing then a lot of the queries can take longer then normal causing the locking your seeing.
Finally I would recommend table partitioning if the tables you are pulling against are to large. This can help with a lot of disk speed issues potentially and also help optimize your querys if you partition by time segment (i.e. make a new partition every X months so when a query pulls from one time segment they only pull from that one data file).
https://msdn.microsoft.com/en-us/library/ms190787.aspx
I would say you need to focus on efficiency more then a "copy database" as your volumes arent very high to be needing anything like that from the sounds of it. I currently have a sql server transaction database running with 10 million+ inserts on it a day and I still have live reports hit against it. You just need the resources and proper indexing to accommodate.
I'm running the following setup:
Physical Server
Windows 2003 Standard Edition R2 SP2
IIS 6
ColdFusion 8
JDBC connection to iSeries AS400 using JT400 driver
I am running a simple SQL query against a file in the database:
SELECT
column1,
column2,
column3,
....
FROM LIB/MYFILE
No conditions.
The file has 81 columns - aplhanumeric and numeric - and about 16,000 records.
When I run the query in the emulator using the STRSQL command, the query comes back immediately.
When I run the query on my Web Server, it takes about 30 seconds.
Why is this happening, and is there any way to reduce this time?
While I cannot address whatever overhead might be involved in your web server, I can say there are several other factors to consider:
This may likely have to do primarily in the differences between the way the two system interfaces work.
Your interactive STRSQL session will start displaying results as quickly as it receives the first few pages of data. You are able to page down through that initial data, but generally at some point you will see a status message at the bottom of the screen indicating that it is now getting more data.
I assume your web server is waiting until it receives the entire result set. It wants to get all the data as it is building the HTML page, before it sends the page. Thus you will naturally wait longer.
If this is not how your web server application works, then it is likely to be a JT400 JDBC Properties issue.
If you have overridden any default settings, make sure that those are appropriate.
In some situations the OPTIMIZATION_GOAL settings might be a factor. But if you are reading the table (aka physical file or PF) directly, in its physical sequence, without any index or key, then that might not apply here.
Your interactive STRSQL session will default to a setting of *FIRSTIO, meaning that the query is optimized for returning the first pages of data quickly, which corresponds to the way it works.
Your JDBC connection will default to a "query optimize goal" of "0", which will translate to an OPTIMIZATION_GOAL setting of *ALLIO, unless you are using extended dynamic packages. *ALLIO means the optimizer will try to minimize the time needed to return the entire result set, not just the first pages.
Or, perhaps first try simply adding FOR READ ONLY onto the end of your SELECT statement.
Update: a more advanced solution
You may be able to bypass the delay caused by waiting for the entire result set as part of constructing the web page to be sent.
Send a web page out to the browser without any records, or limited records, but use AJAX code to load the remainder of the data behind the scenes.
Use large block fetches whenever feasible, to grab plenty of rows in one clip.
One thing you need to remember, the i saves the access paths it creates in the job in case they are needed again. Which means if you log out and log back in then run your query, it should take longer to run, then the second time you run the query it'll be faster. When running queries in a web application, you may or may not be reusing a job meaning the access paths have to be rebuilt.
If speed is important. I would:
Look into optimizing the query. I know there are better sources, but I can't find them right now.
Create a stored procedure. A stored procedure saves the access paths created.
With only 16000 rows and no WHERE or ORDER BY this thing should scream. Break the problem down to help diagnose where the bottleneck is. Go back to the IBM i, run your query in the SQL command line and then use the B, BOT or BOTTOM command to tell the database to show the last row. THAT will force the database to cough up the entire 16k result set, and give you a better idea of the raw performance on the IBM side. If that's poor, have the IBM administrators run Navigator and monitor the performance for you. It might be something unexpected, like the 'table' is really a view and the columns you are selecting might be user defined functions.
If the performance on the IBM side is OK, then look to what Cold Fusion is doing with the result set. Not being a CF programmer, I'm no help there. But generally, when I am tasked with solving multi-platform performance issues, the client side tends to consume the entire result set and then use program logic to choose what rows to display/work with. The server is MUCH faster than the client, and given the right hints, the database optimiser can make some very good decisions about how to get at those rows.