Please help me understand why a sub-query affects the main query's use of index

Please help me understand why a sub-query affects the main query's use of index - sql

Here is the main query without a sub-query:
SELECT * FROM
mytable AS idx
WHERE
idx.ID IN (1,2,3)
AND idx.P1 = 'galleries';
The index on this table is id_path (ID,P1)
Everything is fine at this point, the index is used, 3 rows are examined and 2 are returned. Without the index 9 rows would have to be examined.
Now if i replace the list of IDs with a sub-query that returns exactly the same set of IDs,
the main query still returns the correct rows, but it stops using the index and does an examination of 9 rows as if the index never even existed.
SELECT * FROM
mytable AS idx
WHERE
idx.ID IN (SELECT idxrev.ID FROM mytable AS idxrev WHERE idxrev.ID IN (1,2,3))
AND idx.P1 = 'galleries';
My question is, why does this happen and what could i do to make the main query use the index as before. I tried adding USE INDEX (id_path) but that just made it even worse, doing a whole table scan.

SELECT *
FROM mytable AS idx
WHERE idx.ID IN
(
SELECT idxrev.ID
FROM mytable AS idxrev
WHERE idxrev.ID IN (1,2,3)
)
AND idx.P1 = 'galleries'
MySQL's only way to make semi-joins is nested loops.
It needs to take every row of idx and check it against idxrev (using the indexes for that).
Of course a better method in this case would be a HASH SEMI JOIN or just reducing your query to the original one, but MySQL is just not capable of it.
To make the query use the index, just revert to your original query :)

That's one of the great mysteries of MySQL; it doesn't cope well with subqueries. You could try to change the IN to an EXISTS which is sometimes faster. It looks a bit silly in this example because you still use the hardcoded list, but I think thats just for testing, right?
SELECT * FROM
mytable AS idx
WHERE
idx.ID EXISTS
(SELECT idxrev.ID
FROM mytable AS idxrev
WHERE
idxrev.ID = idx.ID AND
idxrev.ID IN (1,2,3))
AND idx.P1 = 'galleries';
If this doesn't help, maybe you could run two queries. First you get all the ids an put them in a comma separated list (using GROUP_CONCAT if you like). Then you build the second query by using that value.

Related

How to ignore 00 (two leading zeros) in Select query?

I am not sure whether it is possible or not, I have one DB table which is having fields refNumber, Some of this fields values contains two leading zeros, following is example.
id.
refNumber
10001
123
10002
00456
Now I am trying to write a query which can select from this table with our without leading zeros (Only two not less or greater than two).Here is an example, for select refNumber=123 OR refNumber=00123 should return result 10001 and for refNumber=00456 OR refNumber=456 should return result of 10002. I can not use like operator because in that case other records might also be return. Is it possible through the query? if not what would be the right way to select such records? I am avoiding looping the all rows in my application.

You need to apply TRIM function on both - column and the value you want to filter by:
SELECT * FROM MyTable
WHERE TRIM(LEADING '0' FROM refNumber) = TRIM(LEADING '0' FROM '00123') -- here you put your desired ref number

Use trim()
Select * from table where trim(refnumber) IN ('123','456')
Or replace() whichever supported
Select * from table where
replace(refnumber, '0','') IN
('123','456')

While the currently accepted answer would work, be aware that at best it would cause Db2 to do a full index scan and at worst could result in a full table scan.
Not a particularly efficient way to return 1 or 2 records out of perhaps millions. This happens anytime you use an expression over a table column in the WHERE clause.
If you know there's only ever going to be 5 digits or less , a better solution would be something that does the following:
SELECT * FROM MyTable
WHERE refNumber in ('00123','123')
That assumes you can build the two possibilities outside the query.
If you really want to have the query deal with the two possibilities..
SELECT * FROM MyTable
WHERE refNumber in (LPAD(:value,5,'0'),LTRIM(:value, '0'))
If '00123' or '123' is pass in as value, the above query would find records with '00123' or '123' in refNumber.
And assuming you have an index on refNumber, do so quickly and efficiently.
If there could be an unknown number of lead zeros, then you are stuck with
SELECT * FROM MyTable
WHERE LTRIM(refNumber,'0') = LTRIM(:value, '0')
However, if you platform/version of Db2 supports indexes over an expression you'd want to create one for efficiency's sake
create index myidx
on MyTable (LTRIM('0' from refNumber))

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.

If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

How to get all results, except one row based on a timestamp?

I have an simple question (?) about SQL. I have come across this problem a few times before and I have always solved it, but I'm looking for a more elegant solution and perhaps a faster solution.
The problem is that I would like to select all rows in a table except the one with the max value in a timestampvalue (in this case this is a summary row but it's not marked as this is any way, and it's not releveant to my result).
I could do something like this:
select * from [table] t
where loggedat < (select max(loggedat) from [table] and somecolumn='somevalue')
and somecolumn='somevalue'
But when working with large tables this seems kind of slow. Any suggestions?

If you don't want to change your DB structure, then your query (or one with a slight variation using <> instead of <) is the way to go.
You could add a column IsSummary bit to the table, and always mark the most recent row as true (and all others false). Then your query would change to:
Select * from [table] where IsSummary = 0 and somecolumn = 'somevalue'
This would sacrifice slower speed on inserts (since an insert would also trigger an update of the IsSummary value) in exchange for faster speed on the select query.

If only you don't mind one tiny (4 byte) extra column, then you might possibly go like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY loggedat DESC) AS rownum
FROM [table] t
WHERE somecolumn = 'somevalue'
/* and all the other filters you want */
) s
WHERE rownum > 1
In case you do mind the extra column, you'll just have to list the necessary columns explicitly in the outer SELECT.

It may not be the elegant SQL query you're looking for, but it would be trivial to do it in Java, PHP, etc, after fetching the results. To make it as simple as possible, use ORDER BY timestamp DESC and discard the first row.

What do you put in a subquery's Select part when it's preceded by Exists?

What do you put in a subquery's Select part when it's preceded by Exists?
Select *
From some_table
Where Exists (Select 1
From some_other_table
Where some_condition )
I usually use 1, I used to put * but realized it could add some useless overhead.
What do you put? is there a more efficient way than putting 1 or any other dummy value?

I think the efficiency depends on your platform.
In Oracle, SELECT * and SELECT 1 within an EXISTS clause generate identical explain plans, with identical memory costs. There is no difference. However, other platforms may vary.
As a matter of personal preference, I use
SELECT *
Because SELECTing a specific field could mislead a reader into thinking that I care about that specific field, and it also lets me copy / paste that subquery out and run it unmodified, to look at the output.
However, an EXISTS clause in a SQL statement is a bit of a code smell, IMO. There are times when they are the best and clearest way to get what you want, but they can almost always be expressed as a join, which will be a lot easier for the database engine to optimize.
SELECT *
FROM SOME_TABLE ST
WHERE EXISTS(
SELECT 1
FROM SOME_OTHER_TABLE SOT
WHERE SOT.KEY_VALUE1 = ST.KEY_VALUE1
AND SOT.KEY_VALUE2 = ST.KEY_VALUE2
)
Is logically identical to:
SELECT *
FROM
SOME_TABLE ST
INNER JOIN
SOME_OTHER_TABLE SOT
ON ST.KEY_VALUE1 = SOT.KEY_VALUE1
AND ST.KEY_VALUE2 = SOT.KEY_VALUE2

I also use 1. I've seen some devs who use null. I think 1 is efficient compared to selecting from any field as the query won't have to get the actual value from the physical loc when it executes the select clause of the subquery.

Use:
WHERE EXISTS (SELECT NULL
FROM some_other_table
WHERE ... )
EXISTS returns true if one or more of the specified criteria match - it doesn't matter if columns are actually returned in the SELECT clause. NULL just makes it explicit that there isn't a comparison while 1/etc could be a valid value previously used in an IN clause.

SQL - Use results of a query as basis for two other queries in one statement

I'm doing a probability calculation. I have a query to calculate the total number of times an event occurs. From these events, I want to get the number of times a sub-event occurs. The query to get the total events is 25 lines long and I don't want to just copy + paste it twice.
I want to do two things to this query: calculate the number of rows in it, and calculate the number of rows in the result of a query on this query. Right now, the only way I can think of doing that is this (replace #total# with the complicated query to get all rows, and #conditions# with the less-complicated conditions that rows, from #total#, must have to match the sub-event):
SELECT (SELECT COUNT(*) FROM (#total#) AS t1 WHERE #conditions#) AS suboccurs,
COUNT(*) AS totaloccurs FROM (#total#) as t2
As you notice, #total# is repeated twice. Is there any way around this? Is there a better way to do what I'm trying to do?
To re-emphasize: #conditions# does depend on what #total# returns (it does stuff like t1.foo = bar).
Some final notes: #total# by itself takes ~250ms. This more complicated query takes ~300ms, so postgres is likely doing some optimization, itself. Still, the query looks terribly ugly with #total# literally pasted in twice.

If your sql supports subquery factoring, then rewriting it using the WITH statement is an option. It allows subqueries to be used more than once. With will create them as either an inline-view or a temporary table in Oracle.
Here is a contrived example.
WITH
x AS
(
SELECT this
FROM THERE
WHERE something is true
),
y AS
(
SELECT this-other-thing
FROM somewhereelse
WHERE something else is true
),
z AS
(
select count(*) k
FROM X
)
SELECT z.k, y.*, x.*
FROM x,y, z
WHERE X.abc = Y.abc

SELECT COUNT(*) as totaloccurs, COUNT(#conditions#) as suboccurs
FROM (#total# as t1)

Put the reused sub-query into a temp table, then select what you need from the temp table.

#EvilTeach:
I've not seen the "with" (probably not implemented in Sybase :-(). I like it: does what you need in one chunk then goes away, with even less cruft than temp tables. Cool.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Please help me understand why a sub-query affects the main query's use of index - sql

Related

How to ignore 00 (two leading zeros) in Select query?

Query using Rownum and order by clause does not use the index

How to get all results, except one row based on a timestamp?

What do you put in a subquery's Select part when it's preceded by Exists?

SQL - Use results of a query as basis for two other queries in one statement

Categories

Resources