Difference between two sql count and subquery count statements - sql

Is there a big performance difference between those two sql count statements, when performing large counts (large here means 100k + records)
first:
SELECT count(*) FROM table1 WHERE <some very complex conditions>
second:
SELECT count(*) FROM (SELECT * FROM table1 WHERE <some very complex conditions>) subquery_alias
I know that first approach is right, but i want to know is this statements will perform similar ?

The query optimizer will most likely transform your second query into the first one. There should be no measurable performance difference between those two queries.

The answer depends on Database being used. for MS SQL the Query Optimizer will optimize the query and both will have similar performance. But for other database system it depends on the intelligence of the Query Optimizer.

Related

is IN(SELECT ...) bad for performance?

Suppose I have the following code:
SELECT *
FROM [myTable]
WHERE [myColumn] IN (SELECT [otherColumn] FROM [myOtherTable])
Will the subquery be executed again and again for every row?
If so, can I execute it and store its results and use them for every row instead? For example:
SELECT [otherColumn]
INTO #Results
FROM [myOtherTable]
SELECT *
FROM [myTable]
WHERE [myColumn] IN (#Results)
SQL server query optimizer is smart enough to not run the same subquery over and over again. If anything, the temp table is less optimal because of additional steps after getting the results.
You can see this by looking at the SQL query execution plan.
Edit: After looking into this further, it can also be more than once. Apparently query optimizer can also do a lot of interesting things like convert your IN to a JOIN to increase performance. There's lots of information on it here: Number of times a nested query is executed
None the less, view your execution plan to see what your RDMS's query optimizer decided to do.
Have you considered using a join instead? I think that could be best in terms of performance.
SELECT * FROM [myTable] INNER JOIN [myOtherTable]
ON ([myTable][myColumn] = [myOtherTable][otherColumn]);
This however will only work if you don't expect duplicates to be in myOtherTable.

Is there a performance or memory benefit : Select A, B, C into temp from query VS Select into temp from (Select A,B,C from table) as TTable

In Sql server (2008 version),
I have two queries like this :
Select * from sampleTable into #tempTable
OR
Select * into #tempTable from (Select Query) as someTableName
Is there performance or memory space benefit in any of these queries ? or both are equally good.
This is known that they individually are better than
Insert into Temp
<Query>
But how about when compared to each other.
Updated Text :
Two queries are like this:
Select A,B,C into #tempTable from TestTable
OR
Select * into #tempTable from (Select A,B,C from TestTable) as someTableName
All of those result in the same query plan.
SQL Server has a query optimizer, and optimizing away redundant columns is about the easiest and most basic optimization there is.
The best way to answer such questions for yourself is to look at the query plans and compare them. It is generally quite pointless to memorize the performance behavior of specific queries. It is a better approach to understand how queries are optimized and executed in general.
Same performance
if you are going to use very large datasets like 10000, then #temptable slows down the query, if you are using this for paging or something there is groupby and in newer versions fetch newxt rows methods

In which sequence are queries and sub-queries executed by the SQL engine?

Hello I made a SQL test and dubious/curious about one question:
In which sequence are queries and sub-queries executed by the SQL engine?
the answers was
primary query -> sub query -> sub sub query and so on
sub sub query -> sub query -> prime query
the whole query is interpreted at one time
There is no fixed sequence of interpretation, the query parser takes a decision on fly
I choosed the last answer (just supposing that it is most reliable w.r.t. others).
Now the curiosity:
where can i read about this and briefly what is the mechanism under all of that?
Thank you.
I think answer 4 is correct. There are a few considerations:
type of subquery - is it corrrelated, or not. Consider:
SELECT *
FROM t1
WHERE id IN (
SELECT id
FROM t2
)
Here, the subquery is not correlated to the outer query. If the number of values in t2.id is small in comparison to t1.id, it is probably most efficient to first execute the subquery, and keep the result in memory, and then scan t1 or an index on t1.id, matching against the cached values.
But if the query is:
SELECT *
FROM t1
WHERE id IN (
SELECT id
FROM t2
WHERE t2.type = t1.type
)
here the subquery is correlated - there is no way to compute the subquery unless t1.type is known. Since the value for t1.type may vary for each row of the outer query, this subquery could be executed once for each row of the outer query.
Then again, the RDBMS may be really smart and realize there are only a few possible values for t2.type. In that case, it may still use the approach used for the uncorrelated subquery if it can guess that the cost of executing the subquery once will be cheaper that doing it for each row.
Option 4 is close.
SQL is declarative: you tell the query optimiser what you want and it works out the best (subject to time/"cost" etc) way of doing it. This may vary for outwardly identical queries and tables depending on statistics, data distribution, row counts, parallelism and god knows what else.
This means there is no fixed order. But it's not quite "on the fly"
Even with identical servers, schema, queries, and data I've seen execution plans differ
The SQL engine tries to optimise the order in which (sub)queries are executed. The part deciding about that is called a query optimizer. The query optimizer knows how many rows are in each table, which tables have indexes and on what fields. It uses that information to decide what part to execute first.
If you want something to read up on these topics, get a copy of Inside SQL Server 2008: T-SQL Querying. It has two dedicated chapters on how queries are processed logically and physically in SQL Server.
It's usually depends from your DBMS, but ... I think second answer is more plausible.
Prime query usually can't be calculated without sub query results.

simple sql query

which one is faster
select * from parents p
inner join children c on p.id = c.pid
where p.x = 2
OR
select * from
(select * from parents where p.x = 2)
p
inner join children c on p.id = c.pid
where p.x = 2
In MySQL, the first one is faster:
SELECT *
FROM parents p
INNER JOIN
children c
ON c.pid = p.id
WHERE p.x = 2
, since using an inline view implies generating and passing the records twice.
In other engines, they are usually optimized to use one execution plan.
MySQL is not very good in parallelizing and pipelining the result streams.
Like this query:
SELECT *
FROM mytable
LIMIT 1
is instant, while this one (which is semantically identical):
SELECT *
FROM (
SELECT *
FROM mytable
)
LIMIT 1
will first select all values from mytable, buffer them somewhere and then fetch the first record.
For Oracle, SQL Server and PostgreSQL, the queries above (and both of your queries) will most probably yield the same execution plans.
I know this is a simple case, but your first option is much more readable than the second one. As long as the two query plans are comparable I'd always opt for the more maintainable SQL code which your first example is for me.
It depends on how good the database is at optimising the query.
If the database manages to optimise the second one into the first one, they are equally fast, otherwise the first one is faster.
The first one gives more freedom for the database to optimise the query. The second one suggests a specific order of doing things. Either the database is able to see past this and optimise it into a single query, or it will run the query as two separate queries with the subquery as an intermediate result.
A database like SQL Server keeps statistics on what the database tables contain, which it uses to determine how to execute the query in the most efficient way. For example, depending on what will elliminate most records it can either start with joining the tables or filtering the parents table on the condition. If you write a query that forces a specific order, that might not be the most efficient order.
I'd think the first. I'm not sure if the optimizer would use any indexes on the the derived table in the second query, or if it would copy out all the rows that match into memory before joining back to the children.
This is why you have DBAs. It depends entirely on the DBMS, and how your tables and indexes are configured, as to which one runs the fastest.
Database tuning is not a set-and-forget operation, it should be done regularly, as the data changes, to ensure your database runs at peak performance. The question is not really meaningful without specifying:
which DBMS you are asking about.
what indexes you have on the tables.
a host of other possible configuration items (which may also depend on the DBMS, such as clustering).
You should run both those queries through the query optimizer to see which one is fastest, then start using that one. That's assuming the difference in noticeable in the first place. If the difference is minimal, go for the easiest to read/maintain.
For me, in the second query you are saying, I don't trust the optimizer to optimize this query so I'll provide some 'hints'.
I'd say, trust the optimizer until it let's you down and only then consider trying to do the optimizer's job for it.

Which is better: Distinct or Group By

Which is more efficient?
SELECT theField
FROM theTable
GROUP BY theField
or
SELECT DISTINCT theField
FROM theTable
In your example, both queries will generate the same execution plan so their performance will be the same.
However, they both have their own purpose. To make your code easier to understand, you should use distinct to eliminate duplicate rows and group by to apply aggregate operators (sum, count, max, ...).
Doesn't matter, it results in the same execution plan. (at least for these queries). These kind of questions are easy to solve, by enabling query analyzer or SSMS to show the execution plan and perhaps the server trace statistics after running the query.
In most cases, DISTINCT and GROUP BY generate the same plans, and their performance is usually identical
You can check the Execution Plan to look for the total cost of this statements. The answer may vary in different scenarios.
Hmmm...so far as I can see in the Execution Plan for running similar queries, they are identical.
In MySQL, DISTINCT seems a bit faster than GROUP BY if theField is not indexed. DISTINCT only eliminate duplicate rows but GROUP BY seems to sort them in addition.