Will the partition be hit in an inner Union? - sql

I have the followning SQL statement:
SELECT *
FROM (
SELECT eu_dupcheck AS dupcheck
, eu_date AS threshold
FROM WF_EU_EVENT_UNPROCESSED
WHERE eu_dupcheck IS NOT NULL
UNION
SELECT he_dupcheck AS dupcheck
, he_date AS threshold
FROM WF_HE_HISTORY_EVENT
WHERE he_dupcheck IS NOT NULL
)
WHERE threshold > sysdate - 30
The second table is partitioned by date but the first isn't. I need to know if the partition of the second table will be hit in this query, or will it do a full table scan?

I would be surprised if Oracle were smart enough to avoid a full table scan. Remember that UNION processes the data by removing duplicates. So, Oracle would have to recognize that:
The where clause is appropriate for the partitioning (this is actually easy).
That partitioning does not affect the duplicate removal (this is a bit harder, but true because the date is in the select).
Oracle has a smart optimizer, so perhaps it can recognize this situation (and it would probably avoid the full table scan for a UNION ALL). However, you are safer by moving the condition to the subqueries:
SELECT *
FROM ((SELECT eu_dupcheck AS dupcheck, eu_date AS threshold
FROM WF_EU_EVENT_UNPROCESSED
WHERE eu_dupcheck IS NOT NULL AND eu_date > sysdate - 30
) UNION
(SELECT he_dupcheck AS dupcheck, he_date AS threshold
FROM WF_HE_HISTORY_EVENT
WHERE he_dupcheck IS NOT NULL AND he_date > sysdate - 30
)
) eh;

Related

SQLite sqlite3_step() hangs with big database

I'm writing a small Objective-C library that works with an embedded SQLite database.
The SQLite version I'm using is 3.7.13 (checked with SELECT sqlite_version())
My query is:
SELECT ROUND(AVG(difference), 5) as distance
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM ExampleTable as B
WHERE B.timestamp = (
SELECT MAX(timestamp)
FROM ExampleTable as C
WHERE C.timestamp < A.timestamp
)
) as difference
FROM ExampleTable as A
ORDER BY timestamp)
Basically it outputs the average timestamp difference between rows ordered by timestamp.
I tried the query on a sample database with 35k rows and it runs in around 100ms. So far so good.
I then tried the query on another sample database with 100k rows and it hangs at sqlite3_step() taking up 100% of CPU usage.
Since I cannot step into sqlite3_step() with the debugger, is there another way I can get a grasp of where is the function hanging or a debug log of what is the issue here?
I also tried running other queries from my library on the 100k rows database and there is no issue, but it's also true that these are simple queries with no subquery. Maybe this is the issue?
Thanks
UPDATE
This is the output of EXPLAIN QUERY PLAN as requested:
"1","0","0","SCAN TABLE ExampleTable AS A"
"1","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 2"
"2","0","0","SCAN TABLE ExampleTable AS B"
"2","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 3"
"3","0","0","SEARCH TABLE ExampleTable AS C"
"1","0","0","USE TEMP B-TREE FOR ORDER BY"
"0","0","0","SCAN SUBQUERY 1"
Looking up rows by their timestamp value can be optimized with an index on this column:
CREATE INDEX whatever ON ExampleTable(timestamp);
And this query is inefficient: ORDER BY does not affect values that are averaged, and the timestamp values in B and C are always identical, so you can drop one of them:
SELECT ROUND(AVG(difference), 5) AS distance
FROM (
SELECT timestamp -
(SELECT MAX(timestamp)
FROM ExampleTable AS B
WHERE timestamp < A.timestamp)
AS difference
FROM ExampleTable AS A)
I eventually went with this solution:
CREATE TABLE tmp AS SELECT timestamp FROM ExampleTable ORDER BY timestamp
SELECT ROUND(AVG(difference), 5)
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM tmp as B
WHERE B.rowid = A.rowid-1
) as difference
FROM tmp as A
ORDER BY timestamp)
DROP TABLE ExampleTable
Actually I went further and I am only using this strategy for high number of rows (> 40k), since the other strategy (single query) works better for "small" tables.

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.
If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

Oracle: SELECT where date is less if not equals null

I have a table of records, and one column holds the value when the records turns in-active.
Most of the records are still open, and therefore do not hold any value in the end_date column.
I want to select all of those records, which are still active. One way to achieve this (from the top of my head):
select *
from table t
where nvl(t.end_date, to_date('2099-DEC-31', 'MM-DD-yyyy')) > sysdate
But it doesn't feel right. Is there a better way to achieve what I want?
EDIT: BTW, the table isn't huge, and isn't going to grow :)
select *
from table t
where nvl(t.end_date, to_date('2099-DEC-31', 'MM-DD-yyyy')) > sysdate
won't use a "normal", non function based index, so it may hurt performance.
You could query it like
select *
from table t
where t.end_date > sysdate OR t.end_date is null
instead

Minus or not equal to? Which is better?

I have a table with 3 columns cost, from_date and to_date. I have to select all the rows which do not have the dates from beginning of the month to the end of the month. That is, select rows which do not have the from_date as '1-NOV-2011' and to_date as '30-NOV-2011'. I've written 2 queries.
SELECT * FROM TABLE1 WHERE FROM_DATE <> '1-NOV-2011' OR TO_DATE <> '30-NOV-2011';
and
SELECT * FROM TABLE1 MINUS SELECT * FROM TABLE1 WHERE FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011';
Which one will give a better performance?
Clarification
First off, the two queries are not equivalent. The following sets would produce the same results:
Set 1
Query 1
SELECT * FROM TABLE1
WHERE NOT (FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011');
Query 2
SELECT * FROM TABLE1
MINUS SELECT * FROM TABLE1
WHERE FROM_DATE = '1-NOV-2011' AND TO_DATE = '30-NOV-2011';
Set 2
Query 1
SELECT * FROM TABLE1
WHERE FROM_DATE <> '1-NOV-2011' OR TO_DATE <> '30-NOV-2011';
Query 2
SELECT * FROM TABLE1
MINUS SELECT * FROM TABLE1
WHERE FROM_DATE = '1-NOV-2011' OR TO_DATE = '30-NOV-2011';
Answer
Now to the actual answer. The prima facie answer is that the first query (for either set) will be faster, because it involves only one table access, rather than two. However, that may not be true.
It's possible that the second query will be faster. In the first, the database will need to do a full-table scan, then check each row for the disqualifying values. In the second case, it can do a full table scan without a filter to fulfill the first half off the query. For the second half, if there is an index on FROM_DATE and TO_DATE, it can use an index scan to get the disqualifying rows then perform a set operation to remove those results from the first set.
Whether this is actually faster or not will likely depend a lot on your data. As always, the best way to determine which will be faster for your application is to perform your own benchmarks.
1st one is better, since that involves only a single scan also that does not contains any 'in's or 'not in's. go for 1st first one...
I guess, 1st version will have better performance than 2nd version.
SELECT is happening twice in 2nd query.
The second one will definitely be slower. You're basically pulling two sets in the second one and doing a set difference. Only the smaller set can be pulled with an index (assuming you have indexes, and assuming doesn't do some magical optimization). The first query builds just one set and it is based on indexes.
Disclaimer: That's a simplified explanation, and I know nothing of the inner workings of Oracle, just how I would expect it to work.

Max and Min Time query

how to show max time in first row and min time in second row for access using vb6
What about:
SELECT time_value
FROM (SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
)
ORDER BY time_value DESC;
That should do the job unless there are no rows in SomeTable (or your DBMS does not support the notation).
Simplifying per suggestion in comments - thanks!
SELECT MIN(time_column) AS time_value FROM SomeTable
UNION
SELECT MAX(time_column) AS time_value FROM SomeTable
ORDER BY time_value DESC;
If you can get two values from one query, you may improve the performance of the query using:
SELECT MIN(time_column) AS min_time,
MAX(time_column) AS max_time
FROM SomeTable;
A really good optimizer might be able to deal with both halves of the UNION version in one pass over the data (or index), but it is quite easy to imagine an optimizer tackling each half of the UNION separately and processing the data twice. If there is no index on the time column to speed things up, that could involve two table scans, which would be much slower than a single table scan for the two-value, one-row query (if the table is big enough for such things to matter).