How to speed up sql query execution? - sql

The task is to execute the sql query:
select * from x where user in (select user from x where id = '1')
The subquery contains about 1000 id so it takes a long time.
Maybe this question was already there, but how can I speed it up? (if it is possible to speed up please write for PL SQL and T-SQL or at least one of them).

I would start by rewriting the in condition to exists:
select *
from x
where exists (select 1 from x x1 where x.user = x.user and x1.id = 1)
Then, consider an index on x(user, id) - or x(id, user) (you can try both and see if one offers better improvement that the other).
Another possibility is to use window functions:
select *
from (
select x.*, max(case when id = 1 then 1 else 0 end) over(partition by user) flag
from x
) x
where flag = 1
This might, or might not, perform better than the not exists solution, depending on various factors.

ids are usually unique. Is it sufficient to do this?
select x.*
from x
where id in ( . . . );
You would want an index on id, if it is not already the primary key of the table.

Related

SQL - How to get values from multiple tables without being ambiguous

Apologies if this question had been asked before (it probably did). I never used SQL before and the answers I've got only got me more confused.
I need to find out if an ID exists on different tables and get the total number from all tables.
Here is my query:
select * from public.ui1, public.ui2, public.ui3 where id = '123'
So if id 123 doesn't exist in ui1 and ui2 but does exist in ui3, I'd still like to get it. (I would obviously like to get it if it exists in the other tables)
I am currently getting an ambiguous error message as id exists in all tables but I am not sure how to construct this query in the appropriate manner. I tried join but failed miserably. Any help on how to reconstruct it and a stupid proof explanation would be highly appreciated!
EDIT: What I would finally like to find out is if id = 123 exists in any of the tables.
It's a bit unclear what the result is you expect. If you want the count then you can use a UNION ALL
select 'ui1' as source_table,
count(*) as num_rows
from public.ui1
where id = 123
union all
select 'ui2',
count(*)
from public.ui2
where id = 123
union all
select 'ui3',
count(*)
from public.ui3
where id = 123
If you only want to know if the id exists in at least one of the tables (so a true/false) result you can use:
select exists (select id from ui1 where id = 123
union all
select id from ui2 where id = 123
union all
select id from ui3 where id = 123)
What I would finally like to find out is if id = 123 exists in any of the tables.
The best way to do this is probably just using exists:
select v.id,
(exists (select 1 from public.ui1 t where t.id = v.id) or
exists (select 1 from public.ui2 t where t.id = v.id) or
exists (select 1 from public.ui3 t where t.id = v.id)
) as exists_flag
from (values (123)) v(id);
As written, this returns one row per id defined in values(), along with a flag of whether or not the id exists -- the question you are asking.
This can easily be tweaked if you want additional information, such as which tables the id exists in, or the number of times each appears.

Returning the lowest integer not in a list in SQL

Supposed you have a table T(A) with only positive integers allowed, like:
1,1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18
In the above example, the result is 10. We always can use ORDER BY and DISTINCT to sort and remove duplicates. However, to find the lowest integer not in the list, I came up with the following SQL query:
select list.x + 1
from (select x from (select distinct a as x from T order by a)) as list, T
where list.x + 1 not in T limit 1;
My idea is start a counter and 1, check if that counter is in list: if it is, return it, otherwise increment and look again. However, I have to start that counter as 1, and then increment. That query works most of the cases, by there are some corner cases like in 1. How can I accomplish that in SQL or should I go about a completely different direction to solve this problem?
Because SQL works on sets, the intermediate SELECT DISTINCT a AS x FROM t ORDER BY a is redundant.
The basic technique of looking for a gap in a column of integers is to find where the current entry plus 1 does not exist. This requires a self-join of some sort.
Your query is not far off, but I think it can be simplified to:
SELECT MIN(a) + 1
FROM t
WHERE a + 1 NOT IN (SELECT a FROM t)
The NOT IN acts as a sort of self-join. This won't produce anything from an empty table, but should be OK otherwise.
SQL Fiddle
select min(y.a) as a
from
t x
right join
(
select a + 1 as a from t
union
select 1
) y on y.a = x.a
where x.a is null
It will work even in an empty table
SELECT min(t.a) - 1
FROM t
LEFT JOIN t t1 ON t1.a = t.a - 1
WHERE t1.a IS NULL
AND t.a > 1; -- exclude 0
This finds the smallest number greater than 1, where the next-smaller number is not in the same table. That missing number is returned.
This works even for a missing 1. There are multiple answers checking in the opposite direction. All of them would fail with a missing 1.
SQL Fiddle.
You can do the following, although you may also want to define a range - in which case you might need a couple of UNIONs
SELECT x.id+1
FROM my_table x
LEFT
JOIN my_table y
ON x.id+1 = y.id
WHERE y.id IS NULL
ORDER
BY x.id LIMIT 1;
You can always create a table with all of the numbers from 1 to X and then join that table with the table you are comparing. Then just find the TOP value in your SELECT statement that isn't present in the table you are comparing
SELECT TOP 1 table_with_all_numbers.number, table_with_missing_numbers.number
FROM table_with_all_numbers
LEFT JOIN table_with_missing_numbers
ON table_with_missing_numbers.number = table_with_all_numbers.number
WHERE table_with_missing_numbers.number IS NULL
ORDER BY table_with_all_numbers.number ASC;
In SQLite 3.8.3 or later, you can use a recursive common table expression to create a counter.
Here, we stop counting when we find a value not in the table:
WITH RECURSIVE counter(c) AS (
SELECT 1
UNION ALL
SELECT c + 1 FROM counter WHERE c IN t)
SELECT max(c) FROM counter;
(This works for an empty table or a missing 1.)
This query ranks (starting from rank 1) each distinct number in ascending order and selects the lowest rank that's less than its number. If no rank is lower than its number (i.e. there are no gaps in the table) the query returns the max number + 1.
select coalesce(min(number),1) from (
select min(cnt) number
from (
select
number,
(select count(*) from (select distinct number from numbers) b where b.number <= a.number) as cnt
from (select distinct number from numbers) a
) t1 where number > cnt
union
select max(number) + 1 number from numbers
) t1
http://sqlfiddle.com/#!7/720cc/3
Just another method, using EXCEPT this time:
SELECT a + 1 AS missing FROM T
EXCEPT
SELECT a FROM T
ORDER BY missing
LIMIT 1;

Select nearest but not greater-than row

Given a table action(start:DATE, length:NUMBER, type:NUMBER), with all records unique, I need to select (e.g.) length of last action with type Y before time X:
select action.length
where action.type = Y
and action.start is the biggest, but not greater than X
Proposed solution (improved):
with actionView as (select * from action where type = Y and time <= X)
select length
from actionView
where time = (select max(time) from actionView)
But this still envolves two selects.
What I would like to ask is it possible to perform some analytical or hierarchical or any other oracle magic to this query, to improve it?
(Probably, something like this algo is what I need, but I don't know how to express it in SQL:
savedAction.time = MinimalTime
foreach action in actions
if action.type = y and savedAction.time < action.time <= X
savedAction = action
return savedAction;
)
Oracle has no LIMIT (PostgreSQL, MySQL) or TOP (SQL Server) clause like other RDBMS. But you can use ROWNUM for that:
SELECT *
FROM (
SELECT length
FROM action
WHERE type = Y
AND start < X
ORDER BY start DESC
)
WHERE rownum = 1;
This way, the table will be queried once only.
The details in the manual.
In reply to Dems comment I quote from the link above:
If you embed the ORDER BY clause in a subquery and place the ROWNUM
condition in the top-level query, then you can force the ROWNUM
condition to be applied after the ordering of the rows.
You can use ROW_NUMBER() to evaluate this in a single scan...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY x ORDER BY start DESC) AS sequence_id,
*
FROM
action
WHERE
type = Y
AND start < Z
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
You don't need the PARTITION BY, but it is used where you're getting the 'max' row per goup (such as per person, or item, in your database).
I don't know about any magic, but:
with (select length, time from action where type = Y and time <= X) as typed_action
select length from typed_action
where time = (select max(time) from typed_action)
Will give you less "where" clauses to execute and a (much?) smaller temporary typed_action table.

Query where two columns are in the result of nested query

I'm writing a query like this:
select * from myTable where X in (select X from Y) and XX in (select X from Y)
Values from columns X and XX has to be in the result of the same query: select X from Y.
I think that this query is invoked twice so its senseless. Is there any other option I can write this query more efficiently? Maybe temp table?
Actually no, there isn't a smarter way to write this (without visiting Y twice) given the X that myTable.X and myTable.YY matches to may not be from the same row.
As an alternative, the EXISTS form of the query is
select *
from myTable A
where exists (select * from Y where A.X = Y.X)
and exists (select * from Y where A.XX = Y.X)
If Y contains X values of 1,2,3,4,5, and x.x = 2 and x.xx = 4, they both exist (on different records in Y) and the record from myTable should be shown in output.
EDIT: This answer previously stated that You could rewrite this using _EXISTS_ clauses which will work faster than _IN_. AS Martin has pointed out, this is not true (certainly not for SQL Server 2005 and above). See links
http://explainextended.com/2009/06/16/in-vs-join-vs-exists/
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
It will probably not be particularly efficient to try to write this query by only referencing Y once. However, given that you are using SQL Server 2008, there are variations that can be used:
Select ...
From MyTable As T
Where Exists (
Select 1
From Y
Where Y.X = T.X
Intersect
Select 1
From Y
Where Y.X = T.XX
)
Addition
Actually, I can think of a way you could do it without using Y more than once (Nothing was said about using MyTable more than once). However, this is more for academic reasons as I think that using my first solution will likely perform better:
Select ...
From MyTable As T
Where Exists (
Select 1
From Y
Where Exists(
Select 1
From MyTable1 As T1
Where T1.X = Y.X
Intersect
Select 1
From MyTable1 As T2
Where T2.XX = Y.X
)
And Y.X In(T.X, T.XX)
)
WITH
w_tmp AS(
SELECT x
FROM y
)
SELECT *
FROM myTable
WHERE x IN (SELECT x FROM w_tmp)
AND xx IN (SELECT x FROM w_tmp)
(I've read this in Oracle docs, but I think MS able to do this optimizations too)
This way optimizer knows for sure that you are doing same query and can create temporary table to cash results (But it's still up to optimizer to decide whether it's worth it. For tiny queries, overhead of creating temp table can be too high).
Also (and actually this is way more important for me), when subquery is 50 lines, it's easier for human to see, that the same thing is used in both cases. Pretty much like factoring long functions into subroutines
Docs on MSDN
Not sure what the problem is but isn't simple JOIN an answer?
SELECT t.*
FROM myTable
JOIN Y y1 ON y1.X = myTable.X
JOIN Y y2 ON y2.X = myTable.XX
or
SELECT t.*
FROM myTable, Y y1, Y y2
WHERE y1.X = myTable.X AND y2.X = myTable.XX
ADDED: if there is a strong need to eliminate a second query for Y, let's reverse the logic:
;WITH A(X)
AS (
-- this will select all values that can be found in Y and myTable X and XX fields.
SELECT Y.X -- if there are a lot of dups, add DISTINCT
FROM Y, myTable
WHERE Y.X IN (myTable.X, myTableXX)
)
-- now join back to the orignal table and filter.
SELECT t.*
FROM myTable
-- similar to what has been mentioned before
WHERE EXISTS(SELECT TOP 1 * from A where A.X = myTable.X)
AND EXISTS(SELECT TOP 1 * from A where A.X = myTable.XX)
If you don't like WITH, you may use SELECT INTO clause and create in-memory table.

SQL Return Random Numbers Not In Table

I have a table with user_ids that we've gathered from a streaming datasource of active accounts. Now I'm looking to go through and fill in the information about the user_ids that don't do much of anything.
Is there a SQL (postgres if it matters) way to have a query return random numbers not present in the table?
Eg something like this:
SELECT RANDOM(count, lower_bound, upper_bound) as new_id
WHERE new_id NOT IN (SELECT user_id FROM user_table) AS user_id_table
Possible, or would it be best to generate a bunch of random numbers with a scripted wrapper and pass those into the DB to figure out non existant ones?
It is posible. If you want the IDs to be integers, try:
SELECT trunc((random() * (upper_bound - lower_bound)) + lower_bound) AS new_id
FROM generate_series(1,upper_bound)
WHERE new_id NOT IN (
SELECT user_id
FROM user_table)
You can wrap the query above in a subselect, i.e.
SELECT * FROM (SELECT trunc(random() * (upper - lower) + lower) AS new_id
FROM generate_series(1, count)) AS x
WHERE x.new_id NOT IN (SELECT user_id FROM user_table)
I suspect you want a random sampling. I would do something like:
SELECT s
FROM generate_series(1, (select max(user_id) from users) s
LEFT JOIN users ON s.s = user_id
WHERE user_id IS NULL
order by random() limit 5;
I haven't tested this but the idea should work. If you have a lot of users and not a lot of missing id's it will perform better than the other options, but performance no matter what you do may be a problem.
My pragmatic approach would be: generate 500 random numbers and then pick one which is not in the table:
WITH fivehundredrandoms AS ( RANDOM(count, lower_bound, upper_bound) AS onerandom
FROM (SELECT generate_series(1,500)) AS fivehundred )
SELECT onerandom FROM fivehundredrandoms
WHERE onerandom NOT IN (SELECT user_id FROM user_table WHERE user_id > 0) LIMIT 1;
There is way to do what you want with recursive queries, alas it is not nice.
Suppose that you have the following table:
CREATE TABLE test (a int)
To simplify, you want to insert random numbers from 0 to 4 (random() * 5)::int that are not in the table.
WITH RECURSIVE rand (i, r, is_new) AS (
SELECT
0,
null,
false
UNION ALL
SELECT
i + 1,
next_number.v,
NOT EXISTS (SELECT 1 FROM test WHERE test.a = next_number.v)
FROM
rand r,
(VALUES ((random() * 5)::int)) next_number(v)
-- safety check to make sure we do not go into an infinite loop
WHERE i < 500
)
SELECT * FROM rand WHERE rand.is_new LIMIT 1
I'm not super sure, but PostgreSQL should be able to stop iterating once it has one result, since it knows that the query has limit 1.
Nice thing about this query is that you can replace (random() * 5)::int with any id generating function that you want