postgreSQL explain analyse a function - sql

I am trying to understand the query optimization in postgresql and I have a function with some queries in it. Some of them are simple querys that saves a value into a variable and then the next query takes this variable to find something.. lets say:
function()...
select type into t
from tableA
where code = a_c;
select num into n
from tableB
where id = t;
end function...
and many more.. If I want to explain analyse the whole function I execute the command explain analyse select function(); Is this the right way to do it or should I have to explain analyse every query inside the function and if so with what values?

Consider using the auto_explain module:
The auto_explain module provides a means for logging execution plans
of slow statements automatically, without having to run EXPLAIN by
hand. This is especially helpful for tracking down un-optimized
queries in large applications.
with auto_explain.log_nested_statements turned on:
auto_explain.log_nested_statements (boolean)
auto_explain.log_nested_statements causes nested statements
(statements executed inside a function) to be considered for logging.
When it is off, only top-level query plans are logged. This parameter
is off by default. Only superusers can change this setting.

Related

Is dynamic SQL more performant than static SQL in SQL Server?

I have a work mate who claims that dynamic SQL performs faster than static SQL in many situations, so I constantly see DSQL all over the place. Aside from the obvious downsides like not being able to detect errors until running it and that it's harder to read, is this accurate or not? When I asked why he was using DSQL all the time, he said:
Static is preferred when it is not going to prevent cache reuse and dynamic is preferred when static will prevent cache reuse and reuse is desirable.
I asked why static SQL would prevent cache reuse and he said:
Apparently, when variables are passed to statement predicates it may prevent cache reuse of that execution plan, where DSQL will allow cache reuse in a stored procedure.
So, for example:
select * from mytable where myvar = #myvar
I'm not an expert in SQL Server execution plans, but this seems irrational to me. Why would the engine keep stats in a DSQL statement in a stored procedure, but not a static SQL statement?
Dynamic SQL has the advantage that a query is recompiled every time it is run. This has the advantage that the execution plan can take advantage of the most recent statistics on the table and the values of any parameters.
In addition to being more readable, static SQL has the advantage that it does not need to be recompiled -- saving a step in running the query (well, actually two if you count parsing-->compiling-->executing). This may or may not be a good thing.
You can force static plans to recompile using the with (recompile) option.
Sometimes, you need to use dynamic SQL. As a general rule, though, I would use compiler hints and other efforts to manage performance before depending on dynamic SQL.
From SQL server perspective, I don't think you should be using dynamic SQL much.
Problems can include :
In memory table will be passed read only
With every value concatenated, query will do recompilation.
Execution plan will anyway use table statistics and other db information, but do you need to recompile execution plan. You can mend your queries to use LOOP JOIN, HASH JOIN or MERGE JOIN yourself if you feel you know better than SQL Server engine. Otherwise, just write procedures the way they are intended to be written, and pass parameters.
In my opinion, do not use dynamic queries...Instead use procedures in which every change-able value is passed as parameter and has no hard-coding. Create proper (and fewer) indexes and triggers to deal with data. You will also be hence be able to easily use in memory table variables and memory tables feature directly otherwise in memory table variables are passed to dynamic query as readonly.

Can you run a portion of a script in parallel, based off the results of a select statement?

I have a portion of code which, when simplified, looks something like this:
select #mainlooptableid = min(uid)
from queueofids with (nolock)
while (#mainlooptableid is not null)
begin
-- A large block of code that does several things depending on the nature of #mainlooptableid
-- .
-- .
-- .
-- End of this blocks main logic
delete from queueofids where uid = #mainlooptableid
select #mainlooptableid = min(uid)
from queueofids with (nolock)
end
I would like to be able to run the segment of code that's inside the while loop parallel for all uids inside the queueofids table. Based on what happens inside the loop I can guarantee that they will not interfere with each other in any way if they were to run concurrently, so logically it seems perfectly safe for it to run like this. The real question is if there is any way to get sql to run a portion of code for all values in there?
NOTE: I did think about generating a temp table with a series of created sql statements stored as strings, where each one is identical except for the #mainlooptableid value. But even if I have this table of sql statements ready to execute, I'm not sure how I would get all of these statements to execute concurrently.
I can't think of a way to do this within a single SQL script; scripts are procedural. If you want to explore this idea, you'd probably need to involve some form of multi-threaded application which would handle the looping aspect, and open a thread to hand off the parallelized portion of your current script. Not impossible, but it does introduce some complexity.
If you want to do this all in SQL, then you'll have to rewrite the code to eliminate the loop. As noted in the comments above, SQL Server is set-based, which means that it handles a certain amount of parallelization by doing work "all at once" against a set.
No, there is no way to get SQL statements in the same script to run in parallel.
The closest thing to it is to try to create a set-based way of handling them, instead of running them in a loop.
Be aware that run in parallel will not necessarily make it faster if the threads are competing for the same resources.
I don't think SQL will parallelize statements. But SQL will parallelize execution within a single statement.
Most programming frameworks have parallel. For example in .NET this would rather straight forward. Create a procedure where you pass #mainlooptableid and just call it in parallel.

Create constant string for entire database

I'm still new to SQL, so I'm having some little issues to solve.
I'm running a Postgres database in Acqua Data Studio, with some queries that get follow the same model.
Some variables into these queries are the same, but may change in the future...
Thinking of an optimized database, it would be faster to change the value of a constant than to enter on 20+ queries and change the same aspect in all of them.
Example:
SELECT *
FROM Table AS Default_Configs
LEFT JOIN Table AS Test_Configs
ON Default_Configs.Column1 = 'BLABLABLA'
Imagining 'BLABLABLA' could be 'XXX', how could I make 'BLABLABLA' a constant to every View that is created following this pattern?
Create a tiny function that serves as "global constant":
CREATE OR REPLACE FUNCTION f_my_constant()
RETURNS text AS
$$SELECT text 'XXX'$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE; -- see below
And use that function instead of 'BLABLABLA' in your queries.
Be sure to declare the data type correctly and make the function IMMUTABLE (because it is) for better performance with big queries.
In Postgres 9.6 or later add PARALLEL SAFE, so it won't block parallel query plans. The setting isn't valid in older versions.
To change the constant, replace the function by running an updated CREATE OR REPLACE FUNCTION statement. Invalidates query plans using it automatically, so queries are re-planned. Should be safe for concurrent use. Transactions starting after the change use the new function. But indexes involving the function have to be rebuilt manually.
Alternatively (especially in pg 9.2 or later), you could set a Customized Option as "global constant" for the whole cluster, a given DB, a given role etc, and retrieve the value with:
current_setting('constant.blabla')
One limitation: the value is always text and may have to be cast to a target type.
Related:
User defined variables in PostgreSQL
Many ways to set it:
How does the search_path influence identifier resolution and the "current schema"

How to make PostgresQL optimizer to build execution plan AFTER binding parameters?

I'm developing Pg/PLSQL function for PostgresQL 9.1. When I use variables in a SQL query, optimizer build a bad execution plan. But if I replace a variable by its value the plan is ok.
For instance:
v_param := 100;
select count(*)
into result
from <some tables>
where <some conditions>
and id = v_param
performed in 3s
and
select count(*)
into result
from <some tables>
where <some conditions>
and id = 100
performed in 300ms
In first case optimizer generate a fixed plan for any value of v_param.
In second case optimizer generate a plan based on specified value and it's significantly more efficient despite not using plan caching.
Is it possible to make optimizer to generate plan without dynamic binding and generate a plan every time when I execute the query?
This has been dramatically improved by Tom Lane in the just-released PostgreSQL 9.2; see What's new in PostgreSQL 9.2 particularly:
Prepared statements used to be optimized once, without any knowledge
of the parameters' values. With 9.2, the planner will use specific
plans regarding to the parameters sent (the query will be planned at
execution), except if the query is executed several times and the
planner decides that the generic plan is not too much more expensive
than the specific plans.
This has been a long-standing and painful wart that's previously required SET enable_... params, the use of wrapper functions using EXECUTE, or other ugly hacks. Now it should "just work".
Upgrade.
For anyone else reading this, you can tell if this problem is biting you because auto_explain plans of parameterised / prepared queries will differ from those you get when you explain the query yourself. To verify, try PREPARE ... SELECT then EXPLAIN EXECUTE and see if you get a different plan to EXPLAIN SELECT.
See also this prior answer.
Dynamic queries doesn't use cached plans - so you can use EXECUTE USING statement in 9.1 and older. 9.2 should to work without this workaround as Craig wrote.
v_param := 100;
EXECUTE 'select count(*) into result from <some tables> where <some conditions>
and id = $1' USING v_param;

SP taking 15 minutes, but the same query when executed returns results in 1-2 minutes

So basically I have this relatively long stored procedure. The basic execution flow is that it SELECTS INTO some data into temp tables declared with the # sign and then runs a cursor through these tables to generate a 'running total' into a third temp table which is created using CREATE. Then this resulting temp table is joined with other tables in the DB to generated the result after some grouping etc. The problem is, this SP had been running fine until now returning results in 1-2 minutes. And now, suddenly, its taking 12-15 minutes. If I extract the query from the SP and executed it in management studio by manually setting the same parameters, it returns results in 1-2 minutes but the SP takes very long. Any idea what could be happening? I tried to generate the Actual Execution plans of both the query and the SP but it couldn't generate it because of the cursor. Any idea why the SP takes so long while the query doesn't?
This is the footprint of parameter-sniffing. See here for another discussion about it; SQL poor stored procedure execution plan performance - parameter sniffing
There are several possible fixes, including adding WITH RECOMPILE to your stored procedure which works about half the time.
The recommended fix for most situations (though it depends on the structure of your query and sproc) is to NOT use your parameters directly in your queries, but rather store them into local variables and then use those variables in your queries.
its due to parameter sniffing. first of all declare temporary variable and set the incoming variable value to temp variable and use temp variable in whole application here is an example below.
ALTER PROCEDURE [dbo].[Sp_GetAllCustomerRecords]
#customerId INT
AS
declare #customerIdTemp INT
set #customerIdTemp = #customerId
BEGIN
SELECT *
FROM Customers e Where
CustomerId = #customerIdTemp
End
try this approach
Try recompiling the sproc to ditch any stored query plan
exec sp_recompile 'YourSproc'
Then run your sproc taking care to use sensible paramters.
Also compare the actual execution plans between the two methods of executing the query.
It might also be worth recomputing any statistics.
I'd also look into parameter sniffing. Could be the proc needs to handle the parameters slighlty differently.
I usually start troubleshooting issues like that by using
"print getdate() + ' - step '". This helps me narrow down what's taking the most time. You can compare from where you run it from query analyzer and narrow down where the problem is at.
I would guess it could possible be down to caching. If you run the stored procedure twice is it faster the second time?
To investigate further you could run them both from management studio the stored procedure and the query version with the show query plan option turned on in management studio, then compare what area is taking longer in the stored procedure then when run as a query.
Alternativly you could post the stored procedure here for people to suggest optimizations.
For a start it doesn't sound like the SQL is going to perform too well anyway based on the use of a number of temp tables (could be held in memory, or persisted to tempdb - whatever SQL Server decides is best), and the use of cursors.
My suggestion would be to see if you can rewrite the sproc as a set-based query instead of a cursor-approach which will give better performance and be a lot easier to tune and optimise. Obviously I don't know exactly what your sproc does, to give an indication as to how easy/viable this is for you.
As to why the SP is taking longer than the query - difficult to say. Is there the same load on the system when you try each approach? If you run the query itself when there's a light load, it will be better than when you run the SP during a heavy load.
Also, to ensure the query truly is quicker than the SP, you need to rule out data/execution plan caching which makes a query faster for subsequent runs. You can clear the cache out using:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
But only do this on a dev/test db server, not on production.
Then run the query, record the stats (e.g. from profiler). Clear the cache again. Run the SP and compare stats.
1) When you run the query for the first time it may take more time. One more point is if you are using any corellated sub query and if you are hardcoding the values it will be executed for only one time. When you are not hardcoding it and run it through the procedure and if you are trying to derive the value from the input value then it might take more time.
2) In rare cases it can be due to network traffic, also where we will not have consistency in the query execution time for the same input data.
I too faced a problem where we had to create some temp tables and then manipulating them had to calculate some values based on rules and finally insert the calculated values in a third table. This all if put in single SP was taking around 20-25 min. So to optimize it further we broke the sp into 3 different sp's and the total time now taken was around 6-8 mins. Just identify the steps that are involved in the whole process and how to break them up in different sp's. Surely by using this approach the overall time taken by the entire process will reduce.
This is because of parameter snipping. But how can you confirm it?
Whenever we supposed to optimize SP we look for execution plan. But in your case, you will see an optimized plan from SSMS because it's taking more time only when it called through Code.
For every SP and Function, the SQL server generates two estimated plans because of ARITHABORT option. One for SSMS and second is for the external entities(ADO Net).
ARITHABORT is by default OFF in SSMS. So if you want to check what exact query plan your SP is using when it calls from Code.
Just enable the option in SSMS and execute your SP you will see that SP will also take 12-13 minutes from SSMS.
SET ARITHABORT ON
EXEC YourSpName
SET ARITHABORT OFF
To solve this problem you just need to update the estimate query plan.
There are a couple of ways to update the estimate query plan.
1. Update table statistics.
2. recompile SP
3. SET ARITHABORT OFF in SP so it will always use query plan created for SSMS (this option is not recommended)
For more options please refer to this awesome article -
http://www.sommarskog.se/query-plan-mysteries.html
I would suggest the issue is related to the type of temp table (the # prefix). This temp table holds the data for that database session. When you run it through your app the temp table is deleted and recreated.
You might find when running in SSMS it keeps the session data and updates the table instead of creating it.
Hope that helps :)