Informix - Efficiently find 10 most recent calls - sql

I have a table that has data like calling number and timestamp. I'd like find the ten most recent unique calls. This SQL query works:
SELECT first 10 t.originatordn
FROM
(SELECT DISTINCT a.originatordn,a.startdatetime AS time
FROM contactcalldetail a
WHERE originatordn <> '') t
ORDER BY t.time DESC
The problem is this table has over 4 million records so it is very slow. Is there a better way to do this query?

Without an index I don't see how to make it fast. Therefore, I suggest creating an index if you don't have it already and run a simpler query that the SQL optimizer can execute by using an Index Only Scan.
create index ix1 on contactcalldetail (startdatetime, originatordn)
select
distinct originatordn as calling_number
from contactcalldetail
where originatordn <> ''
order by a.startdatetime desc
limit 10

Related

Remove case insensitive duplicates in sql (postgres)

I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.
The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question
SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
WHERE out.text != in.text
AND LOWER(in.text) = LOWER(out.text)
AND in.created_at > out.created_at) > 1
Thanks!
Can you try
SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
FROM some_table
You can then use the rn column as a filter
To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.
CREATE INDEX text_lower ON some_table(LOWER(text), created_at);
It's hard to test this without your data, though.

SQL tuning, long running query + rownum

I have million record in database table having account no, address and many more columns. I want 100 rows in sorting with desc order, I used rownum for this, but the query is taking a long time to execute, since it scans the full table first make it in sorted order then apply the rownum.
What is the solution to minimize the query execution time?
For example:
select *
from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;
From past experience I found that the TOP works best for this scenario.
Also you should always select the columns you need only and avoid using the all card (*)
SELECT TOP 100 [acc_no], [address] FROM [customer] ORDER BY [acc_no] DESC
Useful resources about TOP, LIMIT and even ROWNUM.
https://www.w3schools.com/sql/sql_top.asp
Make sure you use index on acc_no column.
If you have an index already present on acc_no, check if that's being used during query execution or not by verifying the query execution plan.
To create a new index if not present, use below query :
Create index idx1 on customer(acc_no); -- If acc_no is not unique
Create unique index idx1 on customer(acc_no); -- If acc_no is unique. Note: Unique index is faster.
If in explain plan output you see "Full table scan", then it is a case that optimizer is not using the index.
Try with a hint first :
select /*+ index(idx1) */ * from
(select
acc_no, address
from
customer
order by
acc_no desc)
where
ROWNUM <= 100;
If the query with hint above returned results quickly, then you need to check why optimizer is ignoring your index deliberately. One probable reason for this is outdated statistics. Refresh the statistics.
Hope this helps.
Consider getting your top account numbers in an inner query / in-line view such that you only perform the joins on those 100 customer records. Otherwise, you could be performing all the joins on the million+ rows, then sorting the million+ results to get the top 100. Something like this may work.
select .....
from customer
where customer.acc_no in (select acc_no from
(select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
)
where rownum <= 100)
and ...
Or, if you are using 12C you can use FETCH FIRST 100 ROWS ONLY
select .....
from customer
where customer.acc_no in (select inner_cust.acc_no
from customer inner_cust
order by inner_cust.acc_no desc
fetch first 100 rows only
)
and ...
This will give the result within 100ms, but MAKE SURE that there is index on column ACC_NO. There also can be combined index on ACC_NO+other colums, but ACC_NO MUST be on the first position in the index. You have to see "range scan" in execution plan. Not "full table scan", not "skip scan". You can probably see nested loops in execution plan (that will fetch ADDRESSes from table). You can improve speed even more by creating combined index for ACC_NO, ADDRESS (in this order). In such case Oracle engine does not have to read the table at all, because all the information is contained in the index. You can compare it in execution plan.
select top 100 acc_no, address
from customer
order by acc_no desc

Query with rownum got slow

I have got a table table with 21 millions records of which 20 millions meet the criterion col1= 'text'. I then started iteratively to set the value of col2 to a value unequal to NULL. After I have mutated 10 million records, the following query got slow, that was fast in the beginning:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC)
WHERE ROWNUM < 100;
I noticed that as soon as I remove the DESC, the whole order by clause ORDER BY T_PK DESC or the whole outer query with the condition WHERE ROWNUM < 100 it is fast again (fast means a couple of seconds, < 10s).
The execution plan looks as follows:
where the index full scan descending index is performed on the PK of the table. Besides the index on the PK, I have an index defined on col2.
What could be the reason that the query was fast and then got slow? How can I make the query fast regardless of how many records are already set to non-null value?
For this query:
SELECT T_PK
FROM (SELECT T_PK
FROM table
WHERE col1= 'text' AND col2 IS NULL
ORDER BY T_PK DESC
) t
WHERE ROWNUM < 100;
The optimal index is table(col1, col2, t_pk).
I think the problem is that the optimizer has a choice of two indexes -- either for the where clause (col1 and -- probably -- col2) or one on t_pk. If you have a single index that handles both clauses, then performance should improve.
One reason that the DESC might make a difference is where the matching rows lie. If all the matching rows are in the first 100,000 rows of the table, then when you order descending, the query might have to throw out 20.9 million rows before finding a match.
I think Burleson explained this quite nicely:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Beware!
This use of rownum< can cause performance problems. Using rownum may change the all_rows optimizer mode for a query to first_rows, causing unexpected sub-optimal execution plans. One solution is to always include an all_rows hint when using rownum to perform a top-n query.

Query using Rownum and order by clause does not use the index

I am using Oracle (Enterprise Edition 10g) and I have a query like this:
SELECT * FROM (
SELECT * FROM MyTable
ORDER BY MyColumn
) WHERE rownum <= 10;
MyColumn is indexed, however, Oracle is for some reason doing a full table scan before it cuts the first 10 rows. So for a table with 4 million records the above takes around 15 seconds.
Now consider this equivalent query:
SELECT MyTable.*
FROM
(SELECT rid
FROM
(SELECT rowid as rid
FROM MyTable
ORDER BY MyColumn
)
WHERE rownum <= 10
)
INNER JOIN MyTable
ON MyTable.rowid = rid
ORDER BY MyColumn;
Here Oracle scans the index and finds the top 10 rowids, and then uses nested loops to find the 10 records by rowid. This takes less than a second for a 4 million table.
My first question is why is the optimizer taking such an apparently bad decision for the first query above?
An my second and most important question is: is it possible to make the first query perform better. I have a specific need to use the first query as unmodified as possible. I am looking for something simpler than my second query above. Thank you!
Please note that for particular reasons I am unable to use the /*+ FIRST_ROWS(n) */ hint, or the ROW_NUMBER() OVER (ORDER BY column) construct.
If this is acceptable in your case, adding a WHERE ... IS NOT NULL clause will help the optimizer to use the index instead of doing a full table scan when using an ORDER BY clause:
SELECT * FROM (
SELECT * FROM MyTable
WHERE MyColumn IS NOT NULL
-- ^^^^^^^^^^^^^^^^^^^^
ORDER BY MyColumn
) WHERE rownum <= 10;
The rational is Oracle does not store NULL values in the index. As your query was originally written, the optimizer took the decision of doing a full table scan, as if there was less than 10 non-NULL values, it should retrieve some "NULL rows" to "fill in" the remaining rows. Apparently it is not smart enough to check first if the index contains enough rows...
With the added WHERE MyColumn IS NOT NULL, you inform the optimizer that you don't want in any circumstances any row having NULL in MyColumn. So it can blindly use the index without worrying about hypothetical rows having NULL in MyColumn.
For the same reason, declaring the ORDER BY column as NOT NULL should prevent the optimizer to do a full table scan. So, if you can change the schema, a cleaner option would be:
ALTER TABLE MyTable MODIFY (MyColumn NOT NULL);
See http://sqlfiddle.com/#!4/e3616/1 for various comparisons (click on view execution plan)

mysql count performance

select count(*) from mytable;
select count(table_id) from mytable; //table_id is the primary_key
both query were running slow on a table with 10 million rows.
I am wondering why since wouldn't it easy for mysql to keep a counter that gets updated on all insert,update and delete?
and is there a way to improve this query? I used explain but didn't help much.
take a look at the following blog posts:
1) COUNT(***) vs COUNT(col)
2) Easy MySQL Performance Tips
3) Fast count(*) for InnoDB
btw, which engine do you use?
EDITED: About technique to speed up count when you need just to know if there are some amount of rows. Sorry, just was wrong with my query. So, when you need just to know, if there is e.g. 300 rows by specific condition you can try subquery:
select count(*) FROM
( select 1 FROM _table_ WHERE _conditions_ LIMIT 300 ) AS result
at first you minify result set, and then count the result; it will still scan result set, but you can limit it (once more, it works when the question to DB is "is here more or less than 300 rows), and if DB contains more than 300 rows which satisfy condition that query is faster
Testing results (my table has 6.7mln rows):
1) SELECT count(*) FROM _table_ WHERE START_DATE > '2011-02-01'
returns 4.2mln for 65.4 seconds
2) SELECT count(*) FROM ( select 1 FROM _table_ WHERE START_DATE > '2011-02-01' LIMIT 100 ) AS result
returns 100 for 0.03 seconds
Below is result of the explain query to see what is going on there:
EXPLAIN SELECT count(*) FROM ( select 1 FROM _table_ WHERE START_DATE > '2011-02-01' LIMIT 100 ) AS result
As cherouvim pointed out in the comments, it depends on the storage engine.
MyISAM does keep a count of the table rows, and can keep it accurate since the only locks MyISAM supports is a table lock.
InnoDB however supports transactions, and needs to do a table scan to count the rows.
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/