Is their a better way to rewrite this SQL Query? - sql

Good day, I have a query that utilizes a nested select to gather data from several tables... Is there a far better way to rewrite this query to speed up its process? The most time consuming part is the batch insert... hope you can help...

Here is what I would do assuming that your tables are indexed as you have said: I would rip out that select distinct statement and stick it into a separate SP, obviously the data will be in a temp table which is indexed. I would then call this SP from within a main proc and then join this temp table with the main insert statement. This will allow the optimiser to know the distribution of the data in the temp table and make some optimisations. Let me know if that was not clear. I use this technique all the time. It also results in easier to maintain and read code.

Okay, given the givens, I think a good bet would be to use indexed views. This allows alot to your joins and computations to be done at insert time and will seriously reduce the complexity of the actual insert SP.
see http://technet.microsoft.com/en-us/library/dd171921(v=sql.100).aspx

Related

PostgreSQL efficient count of number of rows WHERE parameter=TRUE

I have a very massive PostgreSQL database table. One of its columns stores a boolean parameter. What I want is a very simple query:
SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue;
The problem I have is that it is very slow as it needs to check every row if it satisfies the criteria. Adding an index to this parameter speeds up the calculation roughly two times, but doesn't really improve the complexity in general.
Some people on the internet suggest adding triggers that update the count which is stored in a separate table. But that feels like too much effort for an easy problem like this.
If you need an exact result.
Then, yes, a trigger-based solution is likely the best path to go.
If an estimate is okay, consider Materialized Views. (Postgres 9.3+)
Something like CREATE MATERIALIZED VIEW myCount AS SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue; would maintain a copy of the expensive query you reference. The only caveat is that this aggregate view would not be automatically updated. To do that you need to call REFRESH MATERIALIZED VIEW, which could be done in a cron, or other timed task.

SQL alternative to a View

I don't really know how to even ask for what i need.
So i try to explain my situation.
I have a rather simple sql query that joins various tables but I need to execute this query with slightly different conditions over and over again.
The execusion time of the query is somewhere around 0.25 seconds.
But all the queries i need to execute take easily 15 seconds.
This is way to long.
What i need is a table or view that holds the query results for me so that i only need to select from this one table instead of joining large tables over and over again.
A view wouldn't really help because it would just execute the same query over and over again. As far as i know.
Is there a way to have something like a view which holds its data as long as its source tables doesn't change ? And will only update and execute the query if it is really necessary?
I think what you described very good fits to
materialized view
usage with fast refresh on commit. However your query need to be eligible for fast refresh.
Another way to use
result_cache
it is automatically invalidates when one of the source tables is changed. I would try both to decide which one suites better for this particular task.
I would suggest table-valued functions for this purpose. Defining such a function requires coding in PL/SQL, but it is not that hard if the function is based on a single query.
You can think of such functions as a way of parameterizing views.
Here is a good place to start learning about them.

SQL performance for a returning stored procedure

I have been asked the following question, what would you look into when you want to improve a stored procedure performance? The stored procedure is returning some value and have three joins in it.
Other than making sure the joins are well written what one can do to make it perform better? This was a general question and no code was provided.
Any ideas?
Check the indexes on the tables used in the joins. Particularly, are the columns used in the joins indexed?
Example -
SELECT *
FROM SomeTable a
JOIN SomeOtherTable b on a.ItemId = b.ItemId
If these tables are large, indexing ItemId in both tables will typically help performance a lot.
You should do the same thing for any columns that are used in the WHERE clause, if your query has one.
WHERE a.ProductId = #SomeVariableYouPassedToTheStoredProc
Indexing ProductId may help in this case.
Query performance is something you could go into a rabbit hole on, but this is a logical (and quick) place to start.
There are a lot of things you can do to optimize procedures, but it sounds like your SQL statement is pretty simple. Some things to watch out for:
Inline functions. These can cause SQL to do a row by row evaluation and slow things down
Data conversions on join statements. These can prevent indexes from being used.
Make sure columns being joined on/in the where clause are indexed (for large data sets)
You can check out this website for more performance tips, but I think I covered most of what you need for simple statements:
SQL Optimizations School
The fact that it's a stored procedure has little to nothing to do with it. Optimise the sql inside.
As to how, all the usual suspects, including written by the sort of eejit who thinks you can guess what's wrong.
Copy the sql from the proc into a suitable tool, prefix it with Explain to see what's going on.
I presume there are others options. For example:
1. each of those joins could use restrict conditions which looks like 'and permited_used_name = (select user_name from user_list where )'. The value could be derived once during procedure start (I mean the first string of procedure) to not overload the DB by many similar queries.
2. starting from Oracle11 you could declare a function as function with cached results (i.e. function is calculated once and isn't recalculated each time when it is invoked) defining a set of tables which changes invalidate cache.
At any case the question is mostly DB-specific.
Run the Query Analyser on the SQL statement

Best to use SQL + JOINS or Stored Proc to combine results from multiple queries?

I am creating a table to summarize data that is gathered from about 8 or so queries that have very light logic/WHERE clauses and all select against different tables.
I was wondering what the best option would be to fetch the summarized data:
One query with multiple JOINS to gather all relevant information
A stored proc that encapsulates the logic and maybe executes the 8 queries and does the "joining" in some other way? This seems more modular and maintainable to me...but I'm not sure.
I am using SQL Server 2008 for this. Any suggestions?
If you can, then use usual SQL methods. Db's are optimized to run them. This "joining in some other way" would probably require the use of cursor which slows down everything. Just let the db do its job. If you need more performance then you should examine execution plan and do what has to be done there(eg. adding indexes).
Databases are pretty good at figuring out the optimal way of executing SQL. It is what they are designed to do. Using stored procedures to load the data in chunks and combining it yourself will be more complex to write, and likely to be less efficient than letting the database just do it for you.
If you are concerned about reusing a complex query in multiple places, consider creating a view of it instead.
Depending on the size of the tables, joining 8 of them could be pretty hairy. I would try it that way first - as others have said, the db is pretty good at figuring this stuff out. If the performance is not as good as you would like, I would try a stored proc which creates a table variable (or a temp table) and inserts the data from each of the 8 tables separately. Then you can return the contents of the table variable to your app.
This method also makes it a little easier to add the 9th, 10th, etc tables in the future. And it gives you an easy way to do any processing you may need on the summarized data before returning it to your app.

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).