Why does my query take 2 minutes to run? - sql

Note - There are about 2-3 million records in the db.
SELECT
route_date, stop_exception_code, unique_id_no,
customer_reference, stop_name, stop_comment,
branch_id, customer_no, stop_expected_pieces,
datetime_updated, updated_by, route_code
FROM
cops_reporting.distribution_stop_information distribution_stop_information
WHERE
(stop_exception_code <> 'null') AND
(datetime_updated >= { ts '2011-01-25 00:00:01' })
ORDER BY datetime_updated DESC

If you posted the indexes you already have on this table, or maybe a query execution plan, it would be easier to know. As it is, I'm going to guess that you could improve performance if you create a combined index that contains stop_exception_code and datetime_updated. And I can't promise this will actually work, but it might be worth a shot. I can't say much more than that without any other information...

Some rules of thumb:
Index on columns that JOIN.
Index on columns used in WHERE clauses.
'Not equals' is always slower than an 'Equals' condition. Consider splitting the table into those that are null and those that are not, or hiving it off into a joined table as a index.
Using proper JOIN syntax i.e. being explicit about where joins are by writing INNER JOIN speeds things up on some databases (I've seen a 10min+ query get down to 30 secs on mysql just by this change alone)
use aliases for each table and prefix to each column
store as a function/procedure and it will precompile and get much quicker

stop_exception_code <> 'null'
Please tell me that 'null' isn't a string in your database. Standard SQL would be
stop_exception_code IS NOT NULL
or
stop_exception_code is NULL
I'm not sure what a NULL stop_exception_code might mean to you. But if it means something like "I don't know", then using a specific value for "I don't know" might let your server use an index on that column, and index that it might not be able to use for NULL. (Or maybe you've already done that by using the string 'null'.)
Without seeing your DDL, actual query, and execution plan, that's about all I can tell you.

Related

Is there performance impact when Non-Aggregate SQL functions are used in a SELECTed Column?

We have a report that uses a long and complex query that has the SELECT statement like below:
SELECT
NVL(nazwawystawcy,'BRAK') supplier_name,
NVL(AdresDostawcy,'BRAK') supplier_address,
NVL(NrDostawcy,'BRAK') supplier_registration,
DowodZakupu document_number,
DataZakupu document_issue_date,
DataWplywu document_recording_date,
trx_id,
KodKrajuNadaniaTIN country_code,
DokumentZakupu document_type_code,
payment_split MPP,
box_number box_number,
box_amount box_amount,
box_type box_type,
display_order display_order
...
FROM table1 t1
,table2 t2
....
We recently made modifications to this Query and just modified the 3rd SELECTed column to add a REGEXP_LIKE
SELECT
NVL(nazwawystawcy,'BRAK') supplier_name,
NVL(AdresDostawcy,'BRAK') supplier_address,
--NVL(NrDostawcy,'BRAK') supplier_registration,
Case When (NrDostawcy is not null and regexp_like(substr(NrDostawcy,1,2),'^[a-zA-Z]*$')) Then substr(NrDostawcy,3) else NVL(NrDostawcy,'BRAK') End supplier_registration,
DowodZakupu document_number,
DataZakupu document_issue_date,
DataWplywu document_recording_date,
trx_id,
KodKrajuNadaniaTIN country_code,
DokumentZakupu document_type_code,
payment_split MPP,
box_number box_number,
box_amount box_amount,
box_type box_type,
display_order display_order
...
FROM table1 t1
,table2 t2
....
I checked the Explain Plans of both queries and they turned out to have the same Plan hash value.
Does this mean there's no impact on performance if i use Seeded, non-aggregate, SQL Functions in SELECTed columns?
I believe there is an impact in performance if they're used in the WHERE clause, but i wasn't sure if the same applies to the SELECTed columns.
Apologies in advance as i can't provide the exact query since it's propietary and is very long and complex.
I also don't think I can create a good enough sample that would match the Explain plan of actual query as it joins over 10 tables, with thousand rows of data.
Thank you!
Since you are running this query on Oracle here's my advice. Run the query with Oracle hint /*+ gather_plan_statistics */. Run it with the first query without regex and with the regex. Then find this query in sharedpool (v$sql). The hint will give you the exact buffer gets, physical reads an also time spent in each step of the plan. With that data you can analyze in details how much more time query with regex needed to execute. I advice you, that you do this on data that returns you more than lets say 10k rows. In this way the difference should be seen (if you run this with 100 rows no difference will be seen).
The execution plan is the same as it needs to query exactly the same data from the same tables. You should also see the amount of data (logical IO) unchanged.
What will not be the same however is the execution time, as the regexp_like will consume more CPU, even if you see the logical IO unchanged.
Note that if you changed the selected columns, the execution plan could change as if all selected columns were part of an index, the optimizer might skip the table access and read the data from an index only.
it depends upon the query and the IO's being done to get the data. Sometimes you can try creating a Oracle Function based index, you may see some improvements.
Check this link, it could help you.
https://jeffkemponoracle.com/2007/11/will-oracle-use-my-regexp-function-based-index/
thanks

SQL query in dire need of optimization

I have this query, which works fine, except it takes a couple of minutes to load. I need help optimizing it so it runs faster and I don't know where to start:
SELECT
job_header.job,
job_header.suffix,
job_header.customer,
job_header.description,
job_header.comments_1,
job_header.date_due,
job_header.part,
job_header.customer_po,
job_header.date_closed,
job_header.flag_hold,
job_header.code_sort,
wo_user_flds.user_7,
wo_user_flds.user_3,
wo_user_flds.user_6,
wo_user_flds.user_5,
wo_user_flds.user_2,
quote_lines.user_2 as serialNo,
quote_lines.user_3 as unit,
quote_lines.user_4 as package
FROM job_header
LEFT JOIN wo_user_flds ON
(job_header.job = wo_user_flds.job) AND
(job_header.suffix = wo_user_flds.suffix)
LEFT JOIN quote_lines ON
(job_header.part = quote_lines.part)
WHERE job_header.date_closed = '000000'
AND LENGTH(job_header.job) > 5;
More information that might be of use:
Only the columns found in the select are the columns I need.
My query returns roughly 400 records.
Job_Header table has 97 columns and 6,300 records.
Wo_User_Flds table has 12 columns and 1,100 records.
Quote_Lines table has 198 columns and 46,000 records.
I could speculate on what I think I need to do, but I'm really just guessing at this point. I looked at similar questions and lot of talk of 'indexes', so I checked and these tables do have some indexes...if that helps? Thanks in advance.
[EDIT]
Thanks for the quick responses guys, really appreciate it. I'm going to look into everything everyone said, but here is the ddl for these tables: http://paste.ubuntu.com/13247664/
[EDIT 2]
My query takes 1 minute to load. My expectations may not be realistic in how much faster it can be. I might have to resort to breaking up the query into more than one and then just assemble the data on the client.
Without any other info you'd need an index on job_header on either (job, date_closed) or (date_closed, job). But post the indexes on the table e.g. sp_helpindex or better still the create index script (right click on the index in SSMS and script the index)
First be sure you have indexes on columns where you JOIN tables and your "WHERE clause column". In this case, you should have indexes on these columns:
--Table job_header indexes, beside unique index
job_header.job
job_header.suffix
job_header.part = quote_lines.part
job_header.date_closed
--Table wo_users_flds indexes, beside unique index
wo_user_flds.job
wo_user_flds.suffix
Then, avoid using UDFs (functions, like LENGHT, CAST, concatenation etc.). But in this case, you can leave LENGTH there. So your query would be same, only your indexes would improve query execution plan drastically.
Also, use execution plan to see where you have INDEX_SCAN and INDEX_SEEK. If you have INDEX_SCAN somewhere, it should be sign that you need index on that column.
This would be for start.

How can I make large IN clauses more efficient in SQL Server?

My current query runs very slow when accessing a DB with pretty large tables
SELECT *
FROM table1
WHERE timestamp BETWEEN 635433140000000000 AND 635433150000000000
AND ID IN ('element1', 'element2', 'element3', ... , 'element 3002');
As you can see the IN clause has several thousand values. This query is executed roughly every second.
Is there another way to write it to improve performance?
Add the elements of the IN to an indexed temporary (if the elements change) or permanent table (if the elements are static) and inner join on them.
This is your query:
SELECT *
FROM table1
WHERE timestamp BETWEEN 635433140000000000 AND 635433150000000000 AND
ID IN ('element1', 'element2', 'element3', ... , 'element 3002');
The query is fine. Add an index on table1(id, timestamp).
The best answer depends on how those element ID listings are selected, but it all comes down to one thing: getting them into a table somewhere that you can join against. That will help performance tremendously. But again, the real question here is how best to get those items into a table, and that will depend on information not yet included in the question.
You should check your execution plan, I guess you could have a parameter sniffing problem caused by your between. Check if the actual rows are way off you expected values. And you can rewrite your IN to a EXISTS, which works inside like a INNER JOIN.

SQLite - select expression is very slow

I'm experiencing some heavy performance-issues with a query in SQLite. Currently there are around 20000 entries in the table activity_tbl and about 40 in the table activity_data_tbl. I have an index for both of the columns used in the query below, but it doesn't seem to have any effect on the performance at all.
SELECT a._id, a.start_time + b.length AS time
FROM activity_tbl a INNER JOIN activity_data_tbl b
ON a.activity_data_id = b._data_id
WHERE time > ?
ORDER BY 2
LIMIT 1
As you can see, I select one column and a value created from adding two columns together. I guess this is what's causing the low performance, since the query is very fast if I just select a.start_time or b.length.
Do you guys have any suggestion for how I could optimize this?
Try putting an index on the time column. This should speed up the query
This query is not optimizable using indexes for the filter part since you are filtering and ordering on a calculated value. To optimize the query you will either need to filter on one of the actual table columns (starttime or length) or pre-compute the time values before querying.
The only place an index will help, and I assume you have one, is on b.data_id.
A compound index may help. According to its docs, SQLite tries to avoid to access the table, if the index has enough information. So if the engine did its homework it will recognize that the index is enough to compute the where clause value and spare some time. If it does not work, only the pre-computation will do.
If you are more often confronted with similar tasks, please read this: http://www.sqlite.org/rtree.html

Optimize right joins

The following query is working as expected. But I guess there is enough room for optimization. Any help?
SELECT a.cond_providentid,
b.flag1
FROM c_master a
WHERE a.cond_status = 'OnService'
ORDER BY a.cond_providentid,
a.rto_number;
May I suggest placing the query within your left join in a database view - in that way, the code can be much more cleaner and easier to maintain.
Also, check the columns that you often use the most.. it could be a candidate for indexing so that when you run your query, it can be faster.
You also might check your column data types... I see that you have this type of code:
(CASE
WHEN b.tray_type IS NULL
THEN 1
ELSE 0
END) flag2
If you have a chance to change the design for your tables, (i.e. b.Tray_Type to bit, or use a computed column to determine the flag) it would run faster because you don't have to use Case statements to determine the flag. You can just add it as another column for your query.
Hope this helps! :)
Ann