How to use indexes when trying to make null values come last? - sql

I am using sqlite3 and I am trying to retrieve all rows ordered by some col1 with null values coming last. As of now I am using this kind o query:
select * from table order by row1 is null, row1 asc
As there are many rows in my table, the query worked quite slowly, so I decided to create an index on table(row1).
After creating the index it extremely improved the speed of queries like:
select * from table order by row1 asc
However sqlite doesn't seem to use that index with "order by col1 is null" type of queries.
Why sqlite, based on that index, can't just move rows with null values to the end?
Is there any way I can make null values come last without the need to evaluate every row every time again?

SQLite 3.8.12 will support expressions in indexes:
> CREATE TABLE t(x);
> CREATE INDEX tnx ON T(x IS NULL, x);
> EXPLAIN QUERY PLAN SELECT * FROM t ORDER BY x IS NULL, x;
0|0|0|SCAN TABLE t USING COVERING INDEX tnx
In earlier versions, you can split the query into two subqueries, each of which can use an index:
SELECT *
FROM (SELECT *
FROM MyTable
WHERE row1 IS NOT NULL
ORDER BY row1)
UNION ALL
SELECT *
FROM MyTable
WHERE row1 IS NULL;

You can try a conditional in order by.
select * from table
order by case when row1 is null then 1 else 0 end, row1
The default ordering is ascending. Hence asc has been omitted from the query above.

Related

Query with rownum got slow

I have got a table table with 21 millions records of which 20 millions meet the criterion col1= 'text'. I then started iteratively to set the value of col2 to a value unequal to NULL. After I have mutated 10 million records, the following query got slow, that was fast in the beginning:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC)
WHERE ROWNUM < 100;
I noticed that as soon as I remove the DESC, the whole order by clause ORDER BY T_PK DESC or the whole outer query with the condition WHERE ROWNUM < 100 it is fast again (fast means a couple of seconds, < 10s).
The execution plan looks as follows:
where the index full scan descending index is performed on the PK of the table. Besides the index on the PK, I have an index defined on col2.
What could be the reason that the query was fast and then got slow? How can I make the query fast regardless of how many records are already set to non-null value?
For this query:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC
) t
WHERE ROWNUM < 100;
The optimal index is table(col1, col2, t_pk).
I think the problem is that the optimizer has a choice of two indexes -- either for the where clause (col1 and -- probably -- col2) or one on t_pk. If you have a single index that handles both clauses, then performance should improve.
One reason that the DESC might make a difference is where the matching rows lie. If all the matching rows are in the first 100,000 rows of the table, then when you order descending, the query might have to throw out 20.9 million rows before finding a match.
I think Burleson explained this quite nicely:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Beware!
This use of rownum< can cause performance problems. Using rownum may change the all_rows optimizer mode for a query to first_rows, causing unexpected sub-optimal execution plans. One solution is to always include an all_rows hint when using rownum to perform a top-n query.

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.
If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

Conditional ORDER BY depending on column values

I need to write a query that does this:
SELECT TOP 1
FROM a list of tables (Joins, etc)
ORDER BY Column X, Column Y, Column Z
If ColumnX is NOT NULL, then at the moment, I reselect, using a slightly different ORDER BY.
So, I do the same query, twice. If the first one has a NULL in a certain column, I return that row from my procedure. However, if the value isn't NULL - I have to do another identical select, except, order by a different column or two.
What I do now is select it into a temp table the first time. Then check the value of the column. If it's OK, return the temp table, else, redo the select and return that result set.
More details:
In english, the question I am asking the database:
Return my all the results for certain court appearance (By indexed foreign key). I expect around 1000 rows. Order it by the date of the appearance (column, not indexed, nullable), last appearance first. Check an 'importId'. If the import ID is not NULL for that top 1 row, then we need to run the same query - but this time, order by the Import ID (Last one first), and return that row. Or else, just return the top 1 row from the original query.
I'd say the BEST way to do this is in a single query is a CASE statement...
SELECT TOP 1 FROM ... ORDER BY
(CASE WHEN column1 IS NULL THEN column2 ELSE column1 END)
You could use a COALESCE function to turn nullable columns into orderby friendly values.
SELECT CAST(COALESCE(MyColumn, 0) AS money) AS Column1
FROM MyTable
ORDER BY Column1;
I used in Firebird (columns are numeric):
ORDER BY CASE <condition> WHEN <value> THEN <column1>*1000 + <column2> ELSE <column3>*1000 + <column4> END

Optimizing a simple SQLite query, if possible !

I would like to optimize this query using SQLite 3.
SELECT id FROM Table WHERE value = (SELECT max(value) FROM Table WHERE value < myvalue )
UNION
SELECT id FROM Table WHERE value = (SELECT min(value) FROM Table WHERE value > myvalue );
I want the 2 closest id from a given value. Example: id 20, value 50. The closest id could be 3 with the value 48 (max value inferior) and above id 4 with value 55 (min value superior).
SQLite 3 has not all the features of a real database, if you have something better I can use, well thanks !
SELECT
(SELECT id FROM test WHERE value < myvalue ORDER BY value DESC LIMIT 1) as below,
(SELECT id FROM test WHERE value > myvalue ORDER BY value ASC LIMIT 1) as above;
Theorically speaking this should be faster becase it use two table scans intead of four.
Anyway i would create a table with a few millon records and test different queries with
the timer on. (.timer ON in sqlite console).
Also make sure to test with and without an index on value. Sometimes, specially
when the index size if bigger than your memory, indexes are useless.
If speed is the real issue consider an alternative light storage, like Kyoto
Cabinet.
Here's another way to do it. I don't know if it's faster in sqlite though. You can always try.
select id
from table
where value - myvalue > 0
order by abs(value - myvalue) asc
limit 1
union all
select id
from table
where value - myvalue < 0
order by abs(value - myvalue) desc
limit 1
SELECT id FROM Table WHERE value > myvalue ORDER BY value LIMIT 1
SELECT id FROM Table WHERE value < myvalue ORDER BY value DESC LIMIT 1
this solution has no sub-selects, table scans and no extraneous group or math functions.
but needs two queries
you should index Table.value

SQL expression to remove duplicates from a calculation

I am trying to run a query that will give time averages but when I do... some duplicate records are in the calculation. how can I remove duplicates?
ex.
Column 1 / 07-5794 / 07-5794 / 07-5766 / 07-8423 / 07-4259
Column 2 / 00:59:59 / 00:48:22 / 00:42:48/ 00:51:47 / 00:52:12
I can get the average of the column 2 but I don't want identical values in column 1 to be calculated twice (07-5794) ???
To get the average of the minimum values for each incnum, you could write this SQL
select avg(min_time) as avg_time from
(select incnum, min(col2) as min_time from inc group by incnum)
using the correct average function for your brand of SQL.
If you're doing this in Access, you'll want to paste this into the SQL view; when you use a subquery, you can't do that directly in design view.
SELECT DISTINCT()
or
GROUP BY ()
or
SELECT UNIQUE()
... but usually averages have duplicates included. Just tell me this isn't for financial software, I wont stand for another abuse of statistics!
I believe you're looking for the DISTINCT.
http://www.sql-tutorial.com/sql-distinct-sql-tutorial && http://www.w3schools.com/SQL/sql_distinct.asp
Do you want to eliminate all entries that are duplicates? That is, should neither of the rows with "07-5794" be included in the calculation. If that is the case, I think this would work (in Oracle at least):
SELECT AVG(col2)
FROM (
SELECT col1, MAX(col2) AS col2
FROM table
GROUP BY col1
HAVING COUNT(*) = 1
)
However, if you want to retain one of the duplicates, you need to specify how to pick which one to keep.
Anastasia,
Assuming you have the calculation for the average and a unique key in the table, you could do something like this to get just the latest occurrence of the timing for each unique 07-xxx result:
select Column1, Avg(Convert(decimal,Column2))
from Table1
where TableId in
(
select Max(TableId)
from Table1
group by Column1
)
group by column1
This was assuming the following table structure in MS SQL:
CREATE TABLE [dbo].[Table1] (
[TableId] [int] IDENTITY (1, 1) NOT NULL ,
[Column1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Column2] [int] NULL ) ON [PRIMARY]
Good Luck!