Which of these queries are more efficient?
select 1 as newAndClosed
from sysibm.sysdummy1
where exists (
select 1
from items
where new = 1
)
and not exists (
select 1
from status
where open = 1
)
select 1 as newAndClosed
from items
where new = 1
and not exists (
select 1
from status
where open = 1
)
Look at the explain plan and/or profiler output. Also, measure it. Measure it using bind variables and repeated runs.
I think the second one is faster because on contrary to the first one, the sysibm.sysdummy1 table does not need to parse
From a simplistic point of view I'd expect query 2 to run faster because it involves fewer queries, but as Hank points out, running it through a Profiler is the best way to be sure.
They will produce different results if items contains more than one element with new = 1. exists will only check first record which matches the condition. So I'd vote for first variant, if in actual query you don't have relation between items and status (as in your example).
P.S. Usually I use SELECT 1 WHERE 2==2 when I need only one result from nowhere. If I need more SELECT 1 UNION SELECT 2.
I would personally say the second query.
First, it queries the table Items directly and filter with the WHERE clause on a field of this table.
Second, it uses only one other subquery, instead of two.
In the end, the second example ends doing only two queries, when the first example does three.
Many queries will always be more expensive than less queries to manage for the database engine. Except if you perform EXISTS verifications instead of table joints. A joint is more expensive than an EXISTS clause.
Related
I have two queries:
select * from PRE_DETAIL_REPORT a where item = (select item from apple_skus);
select * from PRE_DETAIL_REPORT a where item IN ('100299122');
the table: APPLE_SKUS
only has one item: 100299122
When I run the first query, it takes 2 minutes to execute
When I run the second query, it takes 3 seconds to execute
What can be the reason?
you can rewrite it in this way:
select a.* from PRE_DETAIL_REPORT a
join apple_skus t on t.item = a.item;
Its the way a sql query syntax works
You have manual values for selection in your 2nd query but in the first case you have subquery specified so again a
FROM CLAUSE, N THEN SELECT so, Querying a table will take
more time than a hardcode value even when theres a single record
You could try EXISTS as it uses correlated subquery which would be much faster
Select * from table t1 where exists (select 1 from table
where
Item =t1.item)
It’s very likely that the difference is due to a different access to PRE_DETAIL_REPORT; and as mentioned earlier by someone, an explain plan (or SQL Monitor report) will tell you the answer.
But until you provide the diagnostic, this is just a guess…
I have two views: current_campaign and last_campaign:
current_campaign always has one row or none.
last_campaign always has one row.
I have another view that needs to get information from one of these: contributors.
If current_campaign has one row, I need to give it the preference.
If current_campaign is empty, then I can get information from last_campaign.
Is that possible?
Since at most 1 row per view seems to be given, there is a very simple and cheap solution:
-- CREATE VIEW contributors AS
TABLE current_campaign
UNION ALL
TABLE last_campaign -- assuming matching row type
LIMIT 1; -- applies to the whole query
If that was an over-simplification:
-- CREATE VIEW contributors AS
SELECT * FROM current_campaign
WHERE ...
UNION ALL
SELECT * FROM last_campaign
WHERE ...
LIMIT 1;
It would be a waste of time to count rows in current_campaign or run an EXISTS semi-join, since LIMIT 1 does everything you need automatically. Postgres stops executing as soon as enough rows are found to satisfy the LIMIT (1 in this case). You'll see "(never executed)" in the output of EXPLAIN ANALYZE for any later SELECT in the list. See links below for more.
This is an implementation detail that only works for UNION ALL (not UNION) and without an outer ORDER BY or other clauses that would force Postgres to consider all rows. I would expect other RDBMS to behave the same, but I only know about Postgres. It's guaranteed to work in all versions up to the current 9.5.
About the short syntax TABLE current_campaign:
Is there a shortcut for SELECT * FROM in psql?
Related, with more explanation, the same a bit more verbose:
Way to try multiple SELECTs till a result is available?
Sum results of a few queries and then find top 5 in SQL
I assume you mean view or table. Vision isn't really a SQL vocabulary term. Here is one way using union all and not exists:
select cc.*
from current_campaign cc
union all
select lc.*
from last_campaign lc
where not exists (select 1
from current_campaign cc
where cc.campaignId = lc.campaignId
);
How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...
I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?
Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.
To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON
Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.
I just recently learned of the existence of the new "EXCEPT" clause in SQL Server (a bit late, I know...) through reading code written by a co-worker. It truly amazed me!
But then I have some questions regarding its usage: when is it recommended to be employed? Is there a difference, performance-wise, between using it versus a correlated query employing "AND NOT EXISTS..."?
After reading EXCEPT's article in the BOL I thought it was just a shorthand for the second option, but was surprised when I rewrote a couple queries using it (so they had the "AND NOT EXISTS" syntax much more familiar to me) and then checked the execution plans - surprise! The EXCEPT version had a shorter execution plan, and executed faster, also. Is this always so?
So I'd like to know: what are the guidelines for using this powerful tool?
EXCEPT treats NULL values as matching.
This query:
WITH q (value) AS
(
SELECT NULL
UNION ALL
SELECT 1
),
p (value) AS
(
SELECT NULL
UNION ALL
SELECT 2
)
SELECT *
FROM q
WHERE value NOT IN
(
SELECT value
FROM p
)
will return an empty rowset.
This query:
WITH q (value) AS
(
SELECT NULL
UNION ALL
SELECT 1
),
p (value) AS
(
SELECT NULL
UNION ALL
SELECT 2
)
SELECT *
FROM q
WHERE NOT EXISTS
(
SELECT NULL
FROM p
WHERE p.value = q.value
)
will return
NULL
1
, and this one:
WITH q (value) AS
(
SELECT NULL
UNION ALL
SELECT 1
),
p (value) AS
(
SELECT NULL
UNION ALL
SELECT 2
)
SELECT *
FROM q
EXCEPT
SELECT *
FROM p
will return:
1
Recursive reference is also allowed in EXCEPT clause in a recursive CTE, though it behaves in a strange way: it returns everything except the last row of a previous set, not everything except the whole previous set:
WITH q (value) AS
(
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
),
rec (value) AS
(
SELECT value
FROM q
UNION ALL
SELECT *
FROM (
SELECT value
FROM q
EXCEPT
SELECT value
FROM rec
) q2
)
SELECT TOP 10 *
FROM rec
---
1
2
3
-- original set
1
2
-- everything except the last row of the previous set, that is 3
1
3
-- everything except the last row of the previous set, that is 2
1
2
-- everything except the last row of the previous set, that is 3, etc.
1
SQL Server developers must just have forgotten to forbid it.
I have done a lot of analysis of except, not exists, not in and left outer join. Generally the left outer join is the fastest for finding missing rows, especially joining on a primary key. Not In can be very fast if you know it will be a small list returned in the select.
I use EXCEPT a lot to compare what is being returned when rewriting code. Run the old code saving results. Run new code saving results and then use except to capture all differences. It is a very quick and easy way to find differences, especially when needing to get all differences including null. Very good for on the fly easy coding.
But, every situation is different. I say to every developer I have ever mentored. Try it. Do timings all different ways. Try it, time it, do it.
EXCEPT compares all (paired)columns of two full-selects.
NOT EXISTS compares two or more tables accoding to the conditions specified in WHERE clause in the sub-query following NOT EXISTS keyword.
EXCEPT can be rewritten by using NOT EXISTS.
(EXCEPT ALL can be rewritten by using ROW_NUMBER and NOT EXISTS.)
Got this from here
There is no accounting for SQL server's execution plans. I have always found when having performance issues that it was utterly arbitrary (from a user's perspective, I'm sure the algorithm writers would understand why) when one syntax made a better execution plan rather than another.
In this case, something about the query parameter comparison allows SQL to figure out a shortcut that it couldn't from a straight select statement. I'm sure that is a deficiency in the algorithm. In other words, you could logically interpolate the same thing, but the algorithm doesn't make that translation on an exists query. Sometimes that is because an algorithm that could reliably figure it out would take longer to execute than the query itself, or at least the algorithm designer thought so.
If your query is fine tuned then there is no performance difference b/w using of EXCEPT clause and NOT EXIST/NOT IN.. first time when I ran EXCEPT after changing my correlated query into it.. I was surprised because it returned with the result just in 7 secs while correlated query was returning in 22 secs.. then I used distinct clause in my correlated query and reran.. it also returned in 7 secs.. so EXCEPT is good when you don't know or don't have time to fine tuned your query otherwise both are same performance wise..