I'm writing a stored procedure on SQL Server 2000.
I've written a complicated select statement that takes 18 seconds to run.
The question is, how can I self-join the result correctly and efficiently?
Is it possible to manage it in a single select statement without repeating the big query statement that I currently have? Or should I save the result in a table variable or temp table?
Many thanks.
If the original query runs a long time, but the result is small, yes, you could consider a temporary / helper table, and run your second query (the self-join) on that table.
This may also help keep your approach modular / maintainable. And you'll be able to tune performance on that long-running query without your second one staring you in the face in the meantime.
Related
I have some big tables in an Oracle database, representing transactions etc.
Is there a way in Oracle/SQL to try a query on a subset of the data? Just to speed up the time it takes to try different queries and feel confident that you’ve got the logic right? Once I feel confident that my query is correct I want to run it on the full dataset.
You can limit your result set by the pseudo column ROWNUM.
In your where clause, add:
and rownum < 100 -- or any number you want
And this will put a hard stop at that number of rows.
I have a web app that has a large number of tables and variables that the user can select (or not select) at run time. Something like this:
In the DB:
Table A
Table B
Table C
At run time the user can select any number of variables to return. Something like this:
Result Display = A.field1, A.Field3, B.field19
There are up to 100+ total fields spread across 15+ tables that can be returned in a single result set.
We have a query that currently works by creating a temp table to select and aggregate the desired fields then selecting the desired variables from that table. However, this query takes quite some time to execute (30 seconds). I would like to try and find a more efficient way to return the desired results while still allowing the ability for the user to configure the variables to see. I know this can be done as I have seen it done in other areas. Any suggestions?
Instead of using a temporary table, use a view and recompile the view each time your run the query (or just use a subquery or CTE instead of a view). SQL Server might be able to optimize the view based on the fields being selected.
The best reason to use a temporary table would be when intra-day updates are not needed. Then you could create the "temporary" table at night and just select from that table.
The query optimization approach (whether through a view, CTE, or subquery) may not be sufficient. This is a rather hard problem to solve in general. Typically, though, there are probably themes of variables that come from particular subqueries. If so, you can write a stored procedure to generate dynamic SQL that just has the requisite joins for the variables chosen for a given run. Then use that SQL for fetching from the database.
And finally, perhaps there are other ways to optimize the query regardless of the fields being chosen. If you think that might be the case, then simplify the query for human consumption and ask another question
Good day, I have a query that utilizes a nested select to gather data from several tables... Is there a far better way to rewrite this query to speed up its process? The most time consuming part is the batch insert... hope you can help...
Here is what I would do assuming that your tables are indexed as you have said: I would rip out that select distinct statement and stick it into a separate SP, obviously the data will be in a temp table which is indexed. I would then call this SP from within a main proc and then join this temp table with the main insert statement. This will allow the optimiser to know the distribution of the data in the temp table and make some optimisations. Let me know if that was not clear. I use this technique all the time. It also results in easier to maintain and read code.
Okay, given the givens, I think a good bet would be to use indexed views. This allows alot to your joins and computations to be done at insert time and will seriously reduce the complexity of the actual insert SP.
see http://technet.microsoft.com/en-us/library/dd171921(v=sql.100).aspx
I have virtually the same join query, the difference between my ( >two ) queries being one of the tables on which the join is made. Performance-wise is it better to:
1)rewrite the queries (in one stored procedure ?) OR
2)pass the table on which the join is made as a parameter in a stored procedure (written in plpgsql BTW) and run the query using EXECUTE
I assume 2) is more elegant but word is out that by using EXECUTE one cannot benefit from query optimization
Also, what about when i have a varying number of conditions. How can i make sure the query runs in optimal time? (I take it rewriting the query more than 10 times isn't the way to go :D)
If you want to benefit from the query optimization, you should definitely rewrite the queries.
It does result in less elegant and longer code, that's harder to maintain, but this is a price sometimes necessary to pay for performance.
There is some overhead for using execute, due to repeat planning of the executed query.
For best results and maintainability, write a function that writes the various functions you need. Example:
PostgreSQL trigger to generate codes for multiple tables dynamically
the EXECUTE is dynamic and requires a fresh parse at a minimum - so more overhead.
1)rewrite the queries (in one stored
procedure ?)
If you have the ability to cache the query plan, do so. Dynamically executing SQL means that the backend needs to re-plan the query every time. Check out PREPARE for more details on this.
2)pass the table on which the join is
made as a parameter in a stored
procedure (written in plpgsql BTW) and
run the query using EXECUTE
Not necessary! Pl/PgSQL automatically does a PREPARE/EXECUTE for you. This is one of the primary speed gains that can be had from using Pl/PGSQL. Rhetorical: do you think generating the plan shown in EXPLAIN was cheap or easy? Cache that chunk of work if at all possible.
Also, what about when i have a varying
number of conditions. How can i make
sure the query runs in optimal time?
(I take it rewriting the query more
than 10 times isn't the way to go :D)
Using individual PREPAREed statements is one way, and the most "bare-metal" way of optimizing the execution of queries. You could do exotic things like using a single set returning PL function that you pass different arguments in to and it conditionally executes different SQL, but I wouldn't recommend it. For optimal performance, stick to PREPARE/EXECUTE and manage which named statement inside of your application.
I am wondering which is a more efficent method to retrieve data from the database.
ex.
One particular part of my application can have well over 100 objects. Right now I have it setup to query the database twice for each object. This part of the application periodically refreshes itself, say every 2 minutes, and this application will probably end of being installed on 25-30 pc's. I am thinking that this is a large number of select statements to make from the database, and I am thinking about trying to optimize the procedure. I have to pull the information out of several tables, and both queries are using join statements.
Would it be better to rewrite the queries so that I am only executing the queries twice per update instead of 200 times? For example using a large where statement to include each object, and then do the processing of the data outside of the object, rather than inside each object?
Using SQL Server, .net No indexes on the tables, size of database is less than 10-5th
all things being equal, many statements with few rows is usually worse than few statements with many rows.
show the actual code and get better answers.
The default answer for optimization must always be: don't optimize until you have a demonstrated need for it. The followup is: once you have a demonstrated need for it, and an alternative approach: try both ways to determine which is faster. You can't predict which will be faster, and we certainly can't. KM's answer is good - fewer queries tends to be better - but the way to know is to test.