MS SQL Server optimizer and varying table and field aliases

MS SQL Server optimizer and varying table and field aliases - sql

We have a lot of queries for which we append a random alias at the end of field and table names (due to a custom ORM implementation that might be hard to change). The queries are like the following (though substantially more complex, most of the time):
SELECT fooA.field1 as field1B,
fooA.field2 as field1C
FROM foo as fooA
The suffixes A, B and C are randomly generated (and longer than one character). Will this hurt performance of our queries (i.e. will the optimizer not be able to recognize repeated queries due to the random part)? We mainly use SQL Server 2005.

Yes, the optimizer will need to reparse and recompile your query each time, since the query hash will change.

The Query Optimizer Engine uses the Execution plan which goes by ObjectId - the aliases are purely for programming purposes, but are not used during execution. So I do not think that performance will be affected by using different aliases or by using small or long aliases.
Quassnoi makes a good point about the rehashing of the query. Although the query performance itself is not impacted, the overall performance environment will be impacted.

If these are ad-hoc queries, then each will get compiled and cached as a separate query. In SQL' for a query to match to a cached plan, the text must be identical (down to case and white-space)
If you've got lots of queries that differ only in the table aliases then you're going to have very poor plan reuse, lots of compiles and a very big plan cache.
It's only stored procedures that match to cached plans by objectid

Related

Do SQL bind parameters affect performance?

Suppose I have a table called Projects with a column called Budget with a standard B-Tree index. The table has 50,000 projects, and only 1% of them have a Budget of over one million. If I ran the SQL Query:
SELECT * From Projects WHERE Budget > 1000000;
The planner will use an index range scan on Budget to get the rows off the heap table. However, if I use the query:
SELECT * From Projects WHERE Budget > 50;
The planner will most likely do a sequential scan on the table, as it will know this query will end up returning most or all rows anyway and there's no reason to load all the pages of the index into memory.
Now, let's say I run the query:
SELECT * From Projects WHERE Budget > :budget;
Where :budget is a bind parameter passed into my database. From what I've read, the query as above will be cached, and no data on cardinality can be inferred. In fact, most databases will just assume an even distribution and the cached query plan will reflect that. This surprised me, as usually when you read about the benefits of bind parameters it's on the subject of preventing SQL injection attacks.
Obviously, this could improve performance if the resulting query plan would be the same, as a new plan wouldn't have to be compiled, but could also hurt performance if the values of :budget greatly varied.
My Question: Why are bind parameters not resolved before the query plan is generated and cached? Shouldn't modern databases strive to generate the best plan for the query, which should mean looking at the value for each parameter and getting accurate index stats?
Note: This question probably doesn't apply to mySql as mySql doesn't cache SQL plans. However, I'm interested in why this is the case on Postgres, Oracle and MS SQL.

For Oracle specifically, it depends.
For quite some time (at least 9i), Oracle has supported bind variable peeking. That means that the first time a query is executed, the optimizer peeks at the value of the bind variable and bases its cardinality estimates on the value of that first bind variable. That makes sense in cases where most of the executions of a query are going to have bind variable values that return similarly sized results. If 99% of the queries are using small budget values, it is highly likely that the first execution will use a small value and thus the cached query plan will be appropriate for small bind variable values. Of course, that means that when you do specify a large bind variable value (or, worse, if you get lucky and the first execution is with a large value) you'll get less than optimal query plans.
If you are using 11g, Oracle can use adaptive cursor sharing. This allows the optimizer to maintain multiple query plans for a single query and to pick the appropriate plan based on the bind variable values. That can get rather complicated over time, though. If you have a query with N bind variables, the optimizer has to figure out how to partition that N-dimensional space into different query plans for different bind variable values in order to figure out when and whether to re-optimize a query for a new set of bind variable values and when to simply reuse an earlier plan. A lot of that work ends up being done at night during the nightly maintenance window in order to avoid incurring those costs during the productive day. But that also brings up issues about how much freedom the DBA wants to give the database to evolve plans over time vs how much the DBA wants to control plans so that the database doesn't suddenly start picking a poor plan that causes some major system to slow to a crawl on a random day.

This surprised me, as usually when you read about the benefits of bind parameters it's on the subject of preventing SQL injection attacks.
Don't confuse parameterized queries with prepared statements. Both offer parameterization, but prepared statements offer the additional caching of the query plan.
Why are bind parameters not resolved before the query plan is generated and cached?
Because sometimes generating the query plan is an expensive step. Prepared statements allow you to amortize the cost of query planning.
However, if all you're looking for is SQL injection protection, don't use prepared statements. Use parameterized queries.
For example, in PHP, you can use http://php.net/pg_query_params to execute a parameterized query WITHOUT caching the query plan; meanwhile http://php.net/pg_prepare and http://php.net/pg_execute are used to cache a plan for a prepared statement and later execute it.
Edit: 9.2 apparently changes the way prepared statements are planned

What aspects of a sql query are relatively costly to one another? Joins? Num of records? columns selected?

How costly would SELECT One, Two, Three be compared to SELECT One, Two, Three, ..... N-Column
If you have a sql query that has two or three tables joined together and is retrieving 100 rows of data, does performance have anything to say whether I should be selecting only the number of columns I need? Or should I write a query that just yanks all the columns..
If possible, could you help me understand what aspects of a query would be relatively costly compared to one another? Is it the joins? is it the large number of records pulled? is it the number of columns in the select statement?
Would 1 record vs 10 record vs 100 record matter?

As an extremely generalized version of ranking those factors you mention in terms of performance penalty and occurrence in the queries you write, I would say:
Joins - Especially when joining on tables with no indexes for the fields you're joining on and/or with tables that have a very large amount of data.
# of Rows / Amount of Data - Again, indexes mitigate this quite a bit, just make sure you have the right ones.
# of Fields - I would say the # of fields in the SELECT clause impact performance the least in most situations.
I would say any performance-driving property is always coupled with how much data you have - sure a join might be fast when your tables have 100 rows each, but when millions of rows are in the tables, you have to start thinking about more efficient design.

Several things impact the cost of a query.
First, are there appropriate indexes for it to use. Fields that are used in a join should almost always be indexed and foreign keys are not indexed by default, the designer of the database must create them. Fields used inthe the where clasues often need indexes as well.
Next, is the where clause sargable, in other words can it use the indexes even if you have the correct ones? A bad where clause can hurt a query far more than joins or extra columns. You can't get anything but a table scan if you use syntax that prevents the use of an index such as:
LIKE '%test'
Next, are you returning more data than you need? You should never return more columns than you need and you should not be using select * in production code as it has additional work to look up the columns as well as being very fragile and subject to create bad bugs as the structure changes with time.
Are you joining to tables you don't need to be joining to? If a table returns no columns in the select, is not used in the where and doesn't filter out any records if the join is removed, then you have an unnecessary join and it can be eliminated. Unnecessary joins are particularly prevalant when you use a lot of views, especially if you make the mistake of calling views from other views (which is a buig performance killer for may reasons) Sometimes if you trace through these views that call other views, you will see the same table joined to multiple times when it would not have been necessary if the query was written from scratch instead of using a view.
Not only does returning more data than you need cause the SQL Server to work harder, it causes the query to use up more of the network resources and more of the memory of the web server if you are holding the results in memory. It is an all arouns poor choice.
Finally are you using known poorly performing techniques when a better one is available. This would include the use of cursors when a set-based alternative is better, the use of correlated subqueries when a join would be better, the use of scalar User-defined functions, the use of views that call other views (especially if you nest more than one level. Most of these poor techniques involve processing row-by-agonizing-row which is generally the worst choice in a database. To properly query datbases you need to think in terms of data sets, not processing one row at a time.
There are plenty more things that affect performance of queries and the datbase, to truly get a grip onthis subject you need to read some books onthe subject. This is too complex a subject to fully discuss in a message board.

Or should I write a query that just yanks all the columns..
No. Just today there was another question about that.
If possible, could you help me understand what aspects of a query would be relatively costly compared to one another? Is it the joins? is it the large number of records pulled? is it the number of columns in the select statement?
Any useless join or data retrieval costs you time and should be avoided. Retrieving rows from a datastore is costly. Joins can be more or less costly depending on the context, amount of indexes defined... you can examine the query plan of each query to see the estimated cost for each step.

Selecting more columns/rows will have some performance impacts, but honestly why would you want to select more data than you are going to use anyway?
If possible, could you help me
understand what aspects of a query
would be relatively costly compared to
one another?
Build the query you need, THEN worry about optimizing it if the performance doesn't meet your expectations. You are putting the horse before the cart.

To answer the following:
How costly would SELECT One, Two,
Three be compared to SELECT One, Two,
Three, ..... N-Column
This is not a matter of the select performance but the amount of time it takes to fetch the data. Select * from Table and Select ID from Table preform the same but the fetch of the data will take longer. This goes hand in hand with the number of rows returned from a query.
As for understanding preformance here is a good link
http://www.dotnetheaven.com/UploadFile/skrishnasamy/SQLPerformanceTunning03112005044423AM/SQLPerformanceTunning.aspx
Or google tsql Performance

Joins have the potential to be expensive. In the worst case scenario, when no indexes can be used, they require O(M*N) time, where M and N are the number of records in the tables. To speed things up, you can CREATE INDEX on columns that are part of the join condition.
The number of columns has little effect on the time required to find rows, but slows things down by requiring more data to be sent.

What others are saying is all true.
But typically, if you are working with tables that already have good indexes, what's most important for performance is what goes into the WHERE statement. There you have to worry more about using a field that has no index or using a statement that can't me optimized.

The difference between SELECT One, Two, Three FROM ... and SELECT One,...,N FROM ... could be like the difference between day and night. To understand the problem, you need to understand the concept of a covering index:
A covering index is a special case
where the index itself contains the
required data field(s) and can return
the data.
As you add more unnecessary columns to the projection list you are forcing the query optimizer to lookup the newly added columns in the 'table' (really in the clustered index or in the heap). This can change an execution plan from an efficient narrow index range scan or seek into a bloated clustered index scan, which can result in differences of times from sub-second to +hours, depending on your data. So projecting unnecessary columns is often the most impacting factor of a query.
The number of records pulled is a more subtle issue. With a large number, a query can hit the index tipping point and choose, again, a clustered index scan over narrower index range scan and lookup. Now the fact that lookups into the clustered index are necessary to start with means the narrow index is not covering, which ultimately may be caused by projecting unnecessary column.
And finally, joins. The question here is joins, as opposed to what else? If a join is required, there is no alternative, and that's all there is to say about this.
Ultimately, query performance is driven by one factor alone: amount of IO. And the amount of IO is driven ultimately by the access paths available to satisfy the query. In other words, by the indexing of your data. It is impossible to write efficient queries on bad indexes. It is possible to write bad queries on good indexes, but more often than not the optimizer can compensate and come up with a good plan. You should spend all your effort in better understanding index design:
Designing Indexes
SQL Server Optimization

Short answer: Dont select more fields then you need - Search for "*" in both your sourcecode and your stored procedures ;)
You allways have to consider what parts of the query will cause which costs.
If you have a good DB design, joining a few tables is usually not expensive. (Make sure you have correct indices).
The main issue with "select *" is that it will cause unpredictable behavior in your results. If you write a query like that, AND access the fields with the columnindex, you will be locked into the DB-Schema forever.
Another thing to consider is the amount of data you have to consider. You might think its trivial, but the Version2.0 of your application suddenly adds a ProfilePicture to the User table. And now the query that will select 100 Users will suddenly use up several Megabyte of bandwith.
The second thing you should consider is the number of rows you return. SQL is very powerfull at sorting and grouping, so let SQL do his job, and dont move it to the client. Limit the amount of records you return. In most applications it makes no sense to return more then 100 rows to a user at once. You might let the user choose to load more, but make it a choice he has to make.
Finally, monitor your SQL Server. Run a profiler against it, and try to find your worst queries. A SQL Query should not take longer then half a second, if it does, something is most likely messed up (Yes... there are operation that can take much longer, but those should have a reason)
Edit:
Once you found the slow query, look at the execution plan... You will see which parts of the query are expensive, and which parts work well... The Optimizer is also a tool that can be used.

I suggest you consider your queries in terms of I/O first. Disk I/O on my SATA II system is 6Gb/sec. My DDR3 memory bandwidth is 12GB/sec. I can move items in memory 16 times faster than I can retrieve from disk. (Ref Wikipedia and Tom's hardware)
The difference between getting a few columns and all the columns for your 100 rows could be the dfference in getting a single 8K page from disk to getting two or more pages from disk. When the pages are finally in memory moving two columns or all columns to a hash table is faster than any measuring tool I have.
I value the advice of the others on this topic related to database design. The design of narrow indexes, using included columns to make covering indexes, avoiding table or index scans in favor of seeks by using an appropiate WHERE clause, narrow primary keys, etc is the diffenence between having a DBA title and being a DBA.

Cost of logic in a query

I have a query that looks something like this:
select xmlelement("rootNode",
(case
when XH.ID is not null then
xmlelement("xhID", XH.ID)
else
xmlelement("xhID", xmlattributes('true' AS "xsi:nil"), XH.ID)
end),
(case
when XH.SER_NUM is not null then
xmlelement("serialNumber", XH.SER_NUM)
else
xmlelement("serialNumber", xmlattributes('true' AS "xsi:nil"), XH.SER_NUM)
end),
/*repeat this pattern for many more columns from the same table...*/
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
It's ugly and I don't like it, and it is also the slowest executing query (there are others of similar form, but much smaller and they aren't causing any major problems - yet). Maintenance is relatively easy as this is mostly a generated query, but my concern now is for performance. I am wondering how much of an overhead there is for all of these case expressions.
To see if there was any difference, I wrote another version of this query as:
select xmlelement("rootNode",
xmlforest(XH.ID, XH.SER_NUM,...
(I know that this query does not produce exactly the same, thing, my plan was to move the logic for handling the renaming and xsi:nil attribute to XSL or maybe to PL/SQL)
I tried to get execution plans for both versions, but they are the same. I'm guessing that the logic does not get factored into the execution plan. My gut tells me the second version should execute faster, but I'd like some way to prove that (other than writing a PL/SQL test function with timing statements before and after the query and running that code over and over again to get a test sample).
Is it possible to get a good idea of how much the case-when will cost?
Also, I could write the case-when using the decode function instead. Would that perform better (than case-statements)?

Just about anything in your SELECT list, unless it is a user-defined function which reads a table or view, or a nested subselect, can usually be neglected for the purpose of analyzing your query's performance.
Open your connection properties and set the value SET STATISTICS IO on. Check out how many reads are happening. View the query plan. Are your indexes being used properly? Do you know how to analyze the plan to see?

For the purposes of performance tuning you are dealing with this statement:
SELECT *
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
How does that query perform? If it returns in markedly less time than the XML version then you need to consider the performance of the functions, but I would astonished if that were the case (oh ho!).
Does this return one row or several? If one row then you have only two things to work with:
is XH.ID indexed and, if so, is the index being used?
does the "many more columns from the same table" indicate a problem with chained rows?
If the query returns several rows then ... Well, actually you have the same two things to work with. It's just the emphasis is different with regards to indexes. If the index has a very poor clustering factor then it could be faster to avoid using the index in favour of a full table scan.
Beyond that you would need to look at physical problems - I/O bottlenecks, poor interconnects, a dodgy disk. The reason why your scope for tuning the query is so restricted is because - as presented - it is a single table, single column read. Most tuning is about efficient joining. Now if XH transpires to be a view over a complex query then it is a different matter.

You can use good old tkprof to analyze statistics. One of the many forms of ALTER SESSION that turn on stats gathering. The DBMS_PROFILER package also gathers statistics if your cursor is in a PL/SQL code block.

Is a dynamic sql stored procedure a bad thing for lots of records?

I have a table with almost 800,000 records and I am currently using dynamic sql to generate the query on the back end. The front end is a search page which takes about 20 parameters and depending on if a parameter was chosen, it adds an " AND ..." to the base query. I'm curious as to if dynamic sql is the right way to go ( doesn't seem like it because it runs slow). I am contemplating on just creating a denormalized table with all my data. Is this a good idea or should I just build the query all together instead of building it piece by piece using the dynamic sql. Last thing, is there a way to speed up dynamic sql?

It is more likely that your indexing (or lack thereof) is causing the slowness than the dynamic SQL.
What does the execution plan look like? Is the same query slow when executed in SSMS? What about when it's in a stored procedure?
If your table is an unindexed heap, it will perform poorly as the number of records grows - this is regardless of the query, and a dynamic query can actually perform better as the table nature changes because a dynamic query is more likely to have its query plan re-evaluated when it's not in the cache. This is not normally an issue (and I would not classify it as a design advantage of dynamic queries) except in the early stages of a system when SPs have not been recompiled, but statistics and query plans are out of date, but the volume of data has just drastically changed.
Not the static one yet. I have with the dynamic query, but it does not give any optimizations. If I ran it with the static query and it gave suggestions, would applying them affect the dynamic query? – Xaisoft (41 mins ago)
Yes, the dynamic query (EXEC (#sql)) is probably not going to be analyzed unless you analyzed a workload file. – Cade Roux (33 mins ago)
When you have a search query across multiple tables that are joined, the columns with indexes need to be the search columns as well as the primary key/foreign key columns - but it depends on the cardinality of the various tables. The tuning analyzer should show this. – Cade Roux (22 mins ago)

I'd just like to point out that if you use this style of optional parameters:
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
The query optimizer will have no idea whether the parameter is there or not when it produces the query plan. I have seen cases where the optimizer makes bad choices in these cases. A better solution is to build the sql that uses only the parameters you need. The optimizer will make the most efficient execution plan in these cases. Be sure to use parameterized queries so that they are reusable in the plan cache.

As previous answer, check your indexes and plan.
The question is whether you are using a stored procedure. It's not obvious from the way you worded it. A stored procedure creates a query plan when run, and keeps that plan until recompiled. With varying SQL, you may be stuck with a bad query plan. You could do several things:
1) Add WITH RECOMPILE to the SP definition, which will cause a new plan to be generated with every execution. This includes some overhead, which may be acceptable.
2) Use separate SP's, depending on the parameters provided. This will allow better query plan caching
3) Use client generated SQL. This will create a query plan each time. If you use parameterized queries, this may allow you to use cached query plans.

The only difference between "dynamic" and "static" SQL is the parsing/optimization phase. Once those are done, the query will run identically.
For simple queries, this parsing phase plus the network traffic turns out to be a significant percentage of the total transaction time, so it's good practice to try and reduce these times.
But for large, complicated queries, this processing is overall insignificant compared to the actual path chosen by the optimizer.
I would focus on optimizing the query itself, including perhaps denormalization if you feel that it's appropriate, though I wouldn't do that on a first go around myself.
Sometimes the denormalization can be done at "run time" in the application using cached lookup tables, for example, rather than maintaining this o the database.

Not a fan of dynamic Sql but if you are stuck with it, you should probably read this article:
http://www.sommarskog.se/dynamic_sql.html
He really goes in depth on the best ways to use dynamic SQL and the isues using it can create.
As others have said, indexing is the most likely culprit. In indexing, one thing people often forget to do is put an index on the FK fields. Since a PK creates an index automatically, many assume an FK will as well. Unfortunately creating an FK does nto create an index. So make sure that any fields you join on are indexed.
There may be better ways to create your dynamic SQL but without seeing the code it is hard to say. I would at least look to see if it is using subqueries and replace them with derived table joins instead. Also any dynamic SQl that uses a cursor is bound to be slow.

If the parameters are optional, a trick that's often used is to create a procedure like this:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null )
AS
SELECT * --not in production code!
FROM Articles
WHERE AuthorId = #AuthorId
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)

There are some good examples of queries with optional search criteria here: How do I create a stored procedure that will optionally search columns?

As noted, if you are doing a massive query, Indexes are the first bottleneck to look at. Make sure that heavily queried columns are indexed. Also, make sure that your query checks all indexed parameters before it checks un-indexed parameters. This makes sure that the results are filtered down using indexes first and then does the slow linear search only if it has to. So if col2 is indexed but col1 is not, it should look as follows:
WHERE col2 = #col2 AND col1 = #col1
You may be tempted to go overboard with indexes as well, but keep in mind that too many indexes can cause slow writes and massive disk usage, so don't go too too crazy.
I avoid dynamic queries if I can for two reasons. One, they do not save the query plan, so the statement gets compiled each time. The other is that they are hard to manipulate, test, and troubleshoot. (They just look ugly).
I like Dave Kemp's answer above.

I've had some success (in a limited number of instances) with the following logic:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null
) AS
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND #EarliestDate is Null
UNION
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND PublishedDate < #EarliestDate

If you are trying to optimize to below the 1s range, it may be important to gauge approximately how long it takes to parse and compile the dynamic sql relative to the actual query execution time:
SET STATISTICS TIME ON;
and then execute the dynamic SQL string "statically" and check the "Messages" tab. I was surprised by these results for a ~10 line dynamic sql query that returns two rows from a 1M row table:
SQL Server parse and compile time:
CPU time = 199 ms, elapsed time = 199 ms.
(2 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Index optimization will doubtfully move the 199ms barrier much (except perhaps due to some analyzation/optimization included within the compile time).
However if the dynamic SQL uses parameters or is repeating than the compile results may be cached according to: See Caching Query Plans which would eliminate the compile time. Would be interesting to know how long cache entries live, size, shared between sessions, etc.

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?

Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints

Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).

The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.

There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005

Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.

I think using SQL query analyzer would be a good start.

In Oracle you can look at the explain plan to compare variations on your query

Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.

The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.

Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor

Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas