Alternatives to CTEs in Microsoft Access - vba

I work with MS Access 2010 on a daily basis and I want to know if there are alternatives to a Common Table Expression as it is used in SQL Server, and how it affects performance?
For example, is it better to create a subquery or is it better to call a query from another query, which essentially is very similar..
Example:
SELECT A.field1,A.Date
FROM (SELECT * FROM TABLE B WHERE B.Date = Restriction )
or
SELECT A.field1,A.Date
FROM SavedQueryB
SavedQueryB:
SELECT * FROM TABLE B WHERE B.Date = Restriction
I feel having multiple queries makes it easier to debug and manage, but does it affect performance when the data-set is very large?
Also, I've seen some videos about implementing the queries thru VBA, however I'm not very comfortable doing it that way yet.
Essentially. What is more efficient or a better practice? Any suggestion or recommendations on better practices?
I am mostly self taught, through videos, and books, and some programming background (VB.NET)

For the simplest queries, such as those in your question, I doubt you would see a significant performance difference between a subquery and a "stacked query" (one which uses another saved query as its data source) approach. And perhaps the db engine would even use the same query plan for both. (If you're interested, you can use SHOWPLAN to examine the query plans.)
The primary performance driver for those 2 examples will be whether the db engine can use indexed retrieval to fetch the rows which satisfy the WHERE restriction. If TABLE.Date is not indexed, the query will require a full table scan. That would suck badly with a very large dataset, and the performance impact from a full scan should far overshadow any difference between a subquery and stacked query.
The situation could be different with a complex subquery as Allen Browne explains:
Complex subqueries on tables with many records can be slow to run.
Among the potential fixes for subquery performance problems, he suggests ...
Use stacked queries instead of subqueries. Create a separate saved
query for JET to execute first, and use it as an input "table" for
your main query. This pre-processing is usually (but not always)
faster than a subquery. Likewise, try performing aggregation in one
query, and then create another query that operates on the aggregated
results. This post-processing can be orders of magnitude faster than a
query that tries to do everything in a single query with subqueries.
I think your best answer will come from testing more complex real world examples. The queries in your question are so simple that conclusions drawn from them will likely not apply to those real world queries.

This is really context-dependent as a host of scenarios will decide the most efficient outcome including data types, join tables, indexes, and more. In essence and for simple queries like posted SELECT statmeents, the two queries are equivalent but the Jet/ACE's (underlying engine of MS Access) query optimizer may decide a different plan again according to structural needs of the query. Possibly, calling an external query adds a step in execution plan but then subqueries can be executed as self-contained tables then linked to main tables.
Recall SQL's general Order of Operations which differs from typed order as each step involves a virtual table (see SQL Server):
FROM clause --VT1
ON clause --VT2
WHERE clause --VT3
GROUP BY clause --VT4
HAVING clause --VT5
SELECT clause --VT6
ORDER BY clause --VT7
What can be said is for stored query objects, MS Access analyzes and caches the optimized "best plan" version. This is often the argument to use stored queries over VBA string queries which the latter was not optimized before execution. Even further, Access' query object is similar to other RDMS' view object (though Jet/ACE does have the VIEW and PROCEDURE objects). A regular discussion in the SQL world involves your very question of efficiency and best practices: views vs subqueries and usually the answer returns "it depends". So, experiment on a needs basis.
And here CTEs are considered "inline-views" denoted by WITH clause (not yet supported in JET/ACE). SQL programmers may use CTEs for readibility and maintainability as you avoid referencing same statement multiple times in body of statement. All in all, use what fits your coding rituals and project requirements, then adjust as needed.
Resources
MS Access 2007 Query Performance
Tips - with note on subquery performance
Intermediate Microsoft Jet
SQL
- with note on subqueries (just learned about the ANY/SOME/ALL)
Microsoft Jet 3.5 Performance White
Paper
- with note on query plans

I cannot recall where but there are discussion out there about this topic of nested or sub-queries. Basically they all suggest saved queries and then referenced the saved query.
From a personal experience, nested queries are extremely difficult to troubleshoot or modify later. Also, if they get too deep I have experienced a performance hit.
Allen Browne has several tips and tricks listed out here
The one place I use nest queries a lot are in the criteria of action queries. This way I do not have any joins and can limit some of the "cannot perform this operation" issue.
Finally, using query strings in VBA. I have found it be much easier to build parameter queries and then in VBA set a variable to the QueryDef and add in the parameters rather than build up a query string in VBA. So much easier to troubleshoot and modify later.
Just my two cents.

Related

How to Prove that using subselect queries in SQL is killing performance of server

One of my jobs it to maintain our database, usually we have troubles with lack of performance while getting reports and working whit that base.
When I start looking at queries which our ERP sending to database I see a lot of totally needlessly subselect queries inside main queries.
As I am not member of developers which is creator of program we using, they do not like much when I criticize they code and job. Let say they do not taking my review as serious statements.
So I asking you few questions about subselect in SQL
Does subselect is taking a lot of more time then left outer joins?
Does exists any blog, article or anything where I subselect is recommended not to use ?
How I can prove that if we avoid subselesct in query that query is going to be faster ?
Our database server is MSSQL2005
"Show, Don't Tell" - Examine and compare the query plans of the queries identified using SQL Profiler. Particularly look out for table scans and bookmark lookups (you want to see index seeks as often as possible). The 'goodness of fit' of query plans depends on up-to-date statistics, what indexes are defined, the holistic query workload.
Execution Plan Basics
Understanding More Complex Query Plans
Using SQL Server Profiler (2005 Version)
Run the queries in SQL Server Management Studio (SSMS) and turn on Query->Include Actual Execution Plan (CTRL+M)
Think yourself lucky they're only subselects (which in some cases the optimiser will produce equivalent 'join plans') and not correlated sub-queries!
Identify a query that is performing a high number of logical reads, re-write it using your preferred technique and then show how few logicals reads it does by comparison.
Here's a tip. To get the total number of logical reads performed, wrap a query in question with:
SET STATISTICS IO ON
GO
-- Run your query here
SET STATISTICS IO OFF
GO
Run your query, and switch to the messages tab in the results pane.
If you are interested in learning more, there is no better book than SQL Server 2008 Query Performance Tuning Distilled, which covers the essential techniques for monitoring, interpreting and fixing performance issues.
One thing you can do is to load SQL Profiler and show them the cost (in terms of CPU cycles, reads and writes) of the sub-queries. It's tough to argue with cold, hard statistics.
I would also check the query plan for these queries to make sure appropriate indexes are being used, and table/index scans are being held to a minimum.
In general, I wouldn't say sub-queries are bad, if used correctly and the appropriate indexes are in place.
I'm not very familiar with MSSQL, as we are using postrgesql in most of our applications. However there should exist something like "EXPLAIN" which shows you the execution plan for the query. There you should be able to see the various steps that a query will produce in order to retrieve the needed data.
If you see there a lot of table scans or loop join without any index usage it is definitely a hint for a slow query execution. With such a tool you should be able to compare the two queries (one with the join, the other without)
It is difficult to state which is the better ways, because it really highly depends on the indexes the optimizer can take in the various cases and depending on the DBMS the optimizer may be able to implicitly rewrite a subquery-query into a join-query and execute it.
If you really want to show which is better you have to execute both and measure the time, cpu-usage and so on.
UPDATE:
Probably it is this one for MSSQL -->QueryPlan
From my own experience both methods can be valid, as for example an EXISTS subselect can avoid a lot of treatment with an early break.
Buts most of the time queries with a lot of subselect are done by devs which do not really understand SQL and use their classic-procedural-programmer way of thinking on queries. Then they don't even think about joins, and makes some awfull queries. So I prefer joins, and I always check subqueries. To be completly honnest I track slow queries, and my first try on slow queries containing subselects is trying to do joins. Works a lot of time.
But there's no rules which can establish that subselect are bad or slower than joins, it's just that bad sql programmer often do subselects :-)
Does subselect is taking a lot of more time then left outer joins?
This depends on the subselect and left outer joins.
Generally, this construct:
SELECT *
FROM mytable
WHERE mycol NOT IN
(
SELECT othercol
FROM othertable
)
is more efficient than this:
SELECT m.*
FROM mytable m
LEFT JOIN
othertable o
ON o.othercol = m.mycol
WHERE o.othercol IS NULL
See here:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
Does exists any blog, article or anything where subselect is recommended not to use ?
I would steer clear of the blogs which blindly recommend to avoid subselects.
They are implemented for a reason and, believe it or not, the developers have put some effort into optimizing them.
How I can prove that if we avoid subselesct in query that query is going to be faster ?
Write a query without the subselects which runs faster.
If you post your query here we possibly will be able to improve it. However, a version with the subselects may turn out to be faster.
Try rewriting some of the queries to elminate the sub-select and compare runtimes.
Share and enjoy.

Correlated query vs inner join performance in SQL Server

let's say that you want to select all rows from one table that have a corresponding row in another one (the data in the other table is not important, only the presence of a corresponding row is important). From what I know about DB2, this kinda query is better performing when written as a correlated query with a EXISTS clause rather than a INNER JOIN. Is that the same for SQL Server? Or doesn't it make any difference whatsoever?
I just ran a test query and the two statements ended up with the exact same execution plan. Of course, for just about any performance question I would recommend running the test on your own environment; With SQL server Management Studio this is easy (or SQL Query Analyzer if your running 2000). Just type both statements into a query window, select Query|Include Actual Query Plan. Then run the query. Go to the results tab and you can easily see what the plans are and which one had a higher cost.
Odd: it's normally more natural for me to write these as a correlated query first, at which point I have to then go back and re-factor to use a join because in my experience the sql server optimizer is more likely to get that right.
But don't take me too seriously. For all I have 26K rep here and one of only 2 current sql topic-specific badges, I'm actually pretty junior in terms of sql knowledge (It's all about the volume! ;) ); certainly I'm no DBA. In practice, you will of course need to profile each method to gauge it's actual performance. I would expect the optimizer to recognize what you're asking for and handle either query in the optimal way, but you never know until you check.
As everyone notes, it all boils down to the optimizer. I'd suggest writing it in whatever way feels more natural to you, then making sure the optimizer can figure out the most effective query plan (gather statistics, create an index, whatever). The SQL Server optimizer is pretty good overall, so long as you give it the information it needs to work with.
Use the join. It might not make much of a difference in performance if you have small tables, but if the "outer" table is very large then it will need to do the EXISTS sub-query for each row. If your tables are indexed on the common columns then it should be far quicker to do the INNER JOIN. BTW, if you want to find all rows that are NOT in the second table, use a LEFT JOIN and test for NULL in the second table--it is much faster than using EXISTS when you have very large tables and indexes.
Probably the best performance is with a join to a derived table. Exists would probably be next (and might be faster). The worst performance would be with a subquery inside the select as it would tend to run row by row instead of as a set.
However, all things being equal and database performance being very dependent on the database design. I would try out all possible methods and see which are faster in your circumstances.

Performance Tuning SQL - How?

How does one performance tune a SQL Query?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
How can the benefits be Quantified?
What does one need to be careful of?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
Using Indexes? How do they work in practice?
Normalised vs Denormalised Data? What are the performance vs design/maintenance trade offs?
Pre-processed intermediate tables? Created with triggers or batch jobs?
Restructure the query to use Temp Tables, Sub Queries, etc?
Separate complex queries into multiples and UNION the results?
Anything else?
How can performance be Quantified?
Reads?
CPU Time?
"% Query Cost" when different versions run together?
Anything else?
What does one need to be careful of?
Time to generate Execution Plans? (Stored Procs vs Inline Queries)
Stored Procs being forced to recompile
Testing on small data sets (Do the queries scale linearly, or square law, etc?)
Results of previous runs being cached
Optimising "normal case", but harming "worst case"
What is "Parameter Sniffing"?
Anything else?
Note to moderators:
This is a huge question, should I have split it up in to multiple questions?
Note To Responders:
Because this is a huge question please reference other questions/answers/articles rather than writing lengthy explanations.
I really like the book "Professional SQL Server 2005 Performance Tuning" to answer this. It's Wiley/Wrox, and no, I'm not an author, heh. But it explains a lot of the things you ask for here, plus hardware issues.
But yes, this question is way, way beyond the scope of something that can be answered in a comment box like this one.
Writing sargable queries is one of the things needed, if you don't write sargable queries then the optimizer can't take advantage of the indexes. Here is one example Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code this query went from over 24 hours to 36 seconds
Of course you also need to know the difference between these 3 join
loop join,
hash join,
merge join
see here: http://msdn.microsoft.com/en-us/library/ms173815.aspx
Here some basic steps that need to follow:
Define business requirements first
SELECT fields instead of using SELECT *
Avoid SELECT DISTINCT
Create joins with INNER JOIN (not WHERE)
Use WHERE instead of HAVING to define filters
Proper indexing
Here are some basic steps which we can follow to increase the performance:
Check for indexes in pk and fk for the tables involved if it is still taking time index the columns present in the query.
All indexes are modified after every operation so kindly do not index each and every column
Before batch insertion delete the indexes and then recreate the indexes.
Select sparingly
Use if exists instead of count
Before accusing dba first check network connections

Does the way you write sql queries affect performance?

say i have a table
Id int
Region int
Name nvarchar
select * from table1 where region = 1 and name = 'test'
select * from table1 where name = 'test' and region = 1
will there be a difference in performance?
assume no indexes
is it the same with LINQ?
Because your qualifiers are, in essence, actually the same (it doesn't matter what order the where clauses are put in), then no, there's no difference between those.
As for LINQ, you will need to know what query LINQ to SQL actually emits (you can use a SQL Profiler to find out). Sometimes the query will be the simplest query you can think of, sometimes it will be a convoluted variety of such without you realizing it, because of things like dependencies on FKs or other such constraints. LINQ also wouldn't use an * for select.
The only real way to know is to find out the SQL Server Query Execution plan of both queries. To read more on the topic, go here:
SQL Server Query Execution Plan Analysis
Should it? No. SQL is a relational algebra and the DBMS should optimize irrespective of order within the statement.
Does it? Possibly. Some DBMS' may store data in a certain order (e.g., maintain a key of some sort) despite what they've been told. But, and here's the crux: you cannot rely on it.
You may need to switch DBMS' at some point in the future. Even a later version of the same DBMS may change its behavior. The only thing you should be relying on is what's in the SQL standard.
Regarding the query given: with no indexes or primary key on the two fields in question, you should assume that you'll need a full table scan for both cases. Hence they should run at the same speed.
I don't recommend the *, because the engine should look for the table scheme before executing the query. Instead use the table fields you want to avoid unnecessary overhead.
And yes, the engine optimizes your queries, but help him :) with that.
Best Regards!
For simple queries, likely there is little or no difference, but yes indeed the way you write a query can have a huge impact on performance.
In SQL Server (performance issues are very database specific), a correlated subquery will usually have poor performance compared to doing the same thing in a join to a derived table.
Other things in a query that can affect performance include using SARGable1 where clauses instead of non-SARGable ones, selecting only the fields you need and never using select * (especially not when doing a join as at least one field is repeated), using a set-bases query instead of a cursor, avoiding using a wildcard as the first character in a a like clause and on and on. There are very large books that devote chapters to more efficient ways to write queries.
1 "SARGable", for those that don't know, are stage 1 predicates in DB2 parlance (and possibly other DBMS'). Stage 1 predicates are more efficient since they're parts of indexes and DB2 uses those first.

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).