I have a SQL statement like following:
SELECT {SOME CASE WHEN STATEMENTS}
FROM {SUBQUERY 1} A, {SUBQUERY 2} B
WHERE {SOME JOIN CONDITIONS}
Background:
Both subquery A and B can be executed in 15 seconds and returns fewer than 20 rows.
JOIN CONDITIONS are only three simple common fields bind together.
Execution plan looks fine, only 25 costs.
The problem in that the whole join operation takes 4 to 5 minutes to run.
Could anyone tell me any possibility to cause this?
Try to use NO_PUSH_PRED Hint:
http://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#BABGJBJC
Alternatively try to rewrite the query into something like:
SELECT {SOME CASE WHEN STATEMENTS}
FROM (
SELECT * FROM (
{ SUBQUERY 1}
)
WHERE rownum > 0
) A,
(
SELECT * FROM (
{SUBQUERY 2}
)
WHERE rownum > 0
) B
WHERE {SOME JOIN CONDITIONS}
this will prevent from pusing join predcates into nested subqueries.
Then both subqueries should be executed using "old" plans, and the total time should be 15+15 seconds + a small time to join 40 rows from both subqueries.
Related
We would expect that this Google BigQuery query would remove at most 10 rows of results - but this query gives us zero results - despite that table A has thousands of rows all with unique ENCNTR_IDs.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_B`
LIMIT 10
)
If we make the query self-referential, it behaves as expected: we get thousands of results with just 10 rows removed.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_A` # <--- same table name
LIMIT 10
)
What are we doing wrong? Why does the first query give us zero results rather than just remove 10 rows of results?
Solution: Use NOT EXISTS instead of NOT IN when dealing with possible nulls:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE NOT EXISTS (SELECT * FROM UNNEST([2,3,null]) i2 WHERE i=i2)
# 1
Previous guess - which turned out to be the cause:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3])
# 1
vs
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3,null])
# This query returned no results.
Are there any nulls in that project.dataset.table_B?
I have 2 tables: Documents and DocumentAttributes.
Document with relevant columns DocID, DelFlag
DocumentAttributes: DocID, aID, aValue
Now I want all DocIDs and aValues with following restriction:
SELECT
[o1].[docid]
, [o1].[aValue]
FROM [DocumentAttributes] [o1]
WHERE [o1].[aID] = 9
AND [o1].[DocID] >= 2356
AND [o1].[DocID] < 90000000
AND [o1].[DocID] NOT IN
(
SELECT
[o].[DocID]
FROM [DocumentAttributes] [o]
WHERE [o].[aID] = 2
)
AND [o1].[DocID] IN
(
SELECT
[d].[DocID]
FROM [DOCUMENTS] [d]
WHERE [d].[DELFLAG] != 2
);
So I want all IDs where Documents have no Attribute with AttributeID = 2 and which are not marked as Deleted.
The SQL statement above works, but it's too slow since I have about 1kk documents with each having about 10 Attributes at least.
The 3 selects themselves cost less than 1 second, so the "not in" is the problem I guess.
Does anyone have an idea how to make it faster?
Thanks in advance
Re-write the not in as left join:
select o1.docid, o1.aValue
from DocumentAttributes o1
left join DocumentAttributes o on o1.DocID = o.DocID and o.aID = 2
where o1.aID = 9 and o1.DocID >= 2356 and o1.DocID < 90000000
and o.DocID is null
and o1.DocID in (
select d.DocID
from DOCUMENTS d
where d.DELFLAG != 2)
Oracle supports the minus keyword. That means you can replace this sort of thing
where myField not in (
select someField
from etc
)
with this sort of thing.
where myField in (
select someField
from wherever
where they are available
minus
select someField
from the same tables
where I want to exclude them
)
I suggest trying both this and the left join method to see which performs better.
Subselects from 'not in' statements are executed for every row, so maybe thats one of your cause.
hi i have 3 questions on sql please :
1-about this simple code
1. with cte as (
2. select * from TABLE1)
3. select * from cte
when select * from TABLE1 compute?
first the line 3 call and then line 1 then line 2
or first 1+2 and then 3?
2- when i do left/right join i have got some row with null, that make sense .
but how can i insert to the row will null "0" instead null?
to all of the row that because the left/right join get null ( if i use inner join i will not show this row )
thanks!
The order of execution is up to the database. The order of execution will depend on tables statistics and other factors. I've seen both order of execution.
If you have a NULL value and you want to show zero, use NVL for Oracle, e.g. NVL(myColumn,0) this will return myColumn if it's not null, otherwise 0. ISNULL for SQL Server and MySQL.
I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:
SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT Scode FROM #TEMP1
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
DROP TABLE #TEMP1
The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?
In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions.
Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01', then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included.
Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.
Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.
SELECT * FROM XSales_Code SC
LEFT JOIN (
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
AND SC.Last_Mdt < '2014-01-01'
AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc
This is hard to explain, but try running the query above without the second-to-last line (AND whatevs.SCode IS NULL) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN".
Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow.
Given that I've got a table with the following, very simple content:
# select * from messages;
id | verbosity
----+-----------
1 | 20
2 | 20
3 | 20
4 | 30
5 | 100
(5 rows)
I would like to select N messages, which sum of verbosity is lower than Y (for testing purposes let's say it should be 70, then correct results will be messages with id 1,2,3).
It's really important to me, that solution should be database independent (it should work at least on Postgres and SQLite).
I was trying with something like:
SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;
However it doesn't seem to work as expected, because it doesn't actually sum all values from verbosity column.
I would be very grateful for any hints/help.
SELECT m.id, sum(m1.verbosity) AS total
FROM messages m
JOIN messages m1 ON m1.id <= m.id
WHERE m.verbosity < 70 -- optional, to avoid pointless evaluation
GROUP BY m.id
HAVING SUM(m1.verbosity) < 70
ORDER BY total DESC
LIMIT 1;
This assumes a unique, ascending id like you have in your example.
In modern Postgres - or generally with modern standard SQL (but not in SQLite):
Simple CTE
WITH cte AS (
SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
FROM messages
)
SELECT *
FROM cte
WHERE total < 70
ORDER BY id;
Recursive CTE
Should be faster for big tables where you only retrieve a small set.
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, verbosity, verbosity AS total
FROM messages
ORDER BY id
LIMIT 1
)
UNION ALL
SELECT c1.id, c1.verbosity, c.total + c1.verbosity
FROM cte c
JOIN LATERAL (
SELECT *
FROM messages
WHERE id > c.id
ORDER BY id
LIMIT 1
) c1 ON c1.verbosity < 70 - c.total
WHERE c.total < 70
)
SELECT *
FROM cte
ORDER BY id;
All standard SQL, except for LIMIT.
Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT works for PostgreSQL and SQLite (and some others). Use TOP 1 for SQL Server, rownum for Oracle. Here's a comprehensive list on Wikipedia.
The SQL:2008 standard would be:
...
FETCH FIRST 1 ROWS ONLY
... which PostgreSQL supports - but hardly any other RDBMS.
The pure alternative that works with more systems would be to wrap it in a subquery and
SELECT max(total) FROM <subquery>
But that is slow and unwieldy.
db<>fiddle here
Old sqlfiddle
This will work...
select *
from messages
where id<=
(
select MAX(id) from
(
select m2.id, SUM(m1.verbosity) sv
from messages m1
inner join messages m2 on m1.id <=m2.id
group by m2.id
) v
where sv<70
)
However, you should understand that SQL is designed as a set based language, rather than an iterative one, so it designed to treat data as a set, rather than on a row by row basis.