sql: how to improve this statment - sql

I have the following sql statement that I need to make quicker.
There are 500k rows, and I an index for 'HARDWARE_ID', but this still take up to a second to perform.
Does anyone have any ideas?
select
*
from
DEVICE_MONITOR DM
where
DM.DM_ID = (
select
max(DM_ID)
from
DEVICE_MONITOR
where
HARDWARE_ID=#value#
)
I've found the following index is also a great help...
CREATE INDEX DM_IX4 ON DEVICE_MONITOR (DM_ID, HARDWARE_ID);
In my test it drops the runtime from 26seconds to 20 seconds.
Thanks for all your help.

The index for the DM_ID should be created as asc
The problem might be in this that You found very fast match form HARDWARE_ID, but then those records have to sorted to fetch max from them and this operation is time consuming.
Try to compare this statements:
1 #result = select max(DM_ID) from DEVICE_MONITOR where HARDWARE_ID=#value#
2 select * from DEVICE_MONITOR DM where DM.DM_ID = #result
The query 1 is the problem, as you shall see that the 2 is working faster
if the index is created, and the query still works slowly than, you may update the statistics. But other queries will probably work only slower.
If is possible instead of * use only column that You really need

Consider changing '*' to only list of attributes you need. Very often this can give you substantial increase in speed.

If you have a clustered index on DM_ID, then that looks like the fastest query.
Edit: Ack. Daniel has the correct answer. Missed that.

Related

SQL query in dire need of optimization

I have this query, which works fine, except it takes a couple of minutes to load. I need help optimizing it so it runs faster and I don't know where to start:
SELECT
job_header.job,
job_header.suffix,
job_header.customer,
job_header.description,
job_header.comments_1,
job_header.date_due,
job_header.part,
job_header.customer_po,
job_header.date_closed,
job_header.flag_hold,
job_header.code_sort,
wo_user_flds.user_7,
wo_user_flds.user_3,
wo_user_flds.user_6,
wo_user_flds.user_5,
wo_user_flds.user_2,
quote_lines.user_2 as serialNo,
quote_lines.user_3 as unit,
quote_lines.user_4 as package
FROM job_header
LEFT JOIN wo_user_flds ON
(job_header.job = wo_user_flds.job) AND
(job_header.suffix = wo_user_flds.suffix)
LEFT JOIN quote_lines ON
(job_header.part = quote_lines.part)
WHERE job_header.date_closed = '000000'
AND LENGTH(job_header.job) > 5;
More information that might be of use:
Only the columns found in the select are the columns I need.
My query returns roughly 400 records.
Job_Header table has 97 columns and 6,300 records.
Wo_User_Flds table has 12 columns and 1,100 records.
Quote_Lines table has 198 columns and 46,000 records.
I could speculate on what I think I need to do, but I'm really just guessing at this point. I looked at similar questions and lot of talk of 'indexes', so I checked and these tables do have some indexes...if that helps? Thanks in advance.
[EDIT]
Thanks for the quick responses guys, really appreciate it. I'm going to look into everything everyone said, but here is the ddl for these tables: http://paste.ubuntu.com/13247664/
[EDIT 2]
My query takes 1 minute to load. My expectations may not be realistic in how much faster it can be. I might have to resort to breaking up the query into more than one and then just assemble the data on the client.
Without any other info you'd need an index on job_header on either (job, date_closed) or (date_closed, job). But post the indexes on the table e.g. sp_helpindex or better still the create index script (right click on the index in SSMS and script the index)
First be sure you have indexes on columns where you JOIN tables and your "WHERE clause column". In this case, you should have indexes on these columns:
--Table job_header indexes, beside unique index
job_header.job
job_header.suffix
job_header.part = quote_lines.part
job_header.date_closed
--Table wo_users_flds indexes, beside unique index
wo_user_flds.job
wo_user_flds.suffix
Then, avoid using UDFs (functions, like LENGHT, CAST, concatenation etc.). But in this case, you can leave LENGTH there. So your query would be same, only your indexes would improve query execution plan drastically.
Also, use execution plan to see where you have INDEX_SCAN and INDEX_SEEK. If you have INDEX_SCAN somewhere, it should be sign that you need index on that column.
This would be for start.

Query execution time reduction

How to reduce the execution time of this simple SQL query in SQL Server?
select *
from companybackup
where tiRecordStatus = 1
and (iAccessCode < 3 or chUpdateBy = SUSER_SNAME())
It has nearly 38,681 rows and taking nearly 10 minutes 23 seconds. The table is having 50 columns and even I created indexes on all columns to reduce the time but it didn't succeed, even I checked with nolock option and all the available solutions but couldn't reduce the execution time.
What might be the issue?
If this is a critical query, you can create an index designed just for it:
CREATE INDEX ix_MyIndexNameHere ON CompanyBackup
(tiRecordStatus, iAccesscode, chUpdateBy)
This will still require a key lookup since you are returning all fields with *, but it's a lot better than a bunch of single-field indexes.
I am wondering what does the SUSER_SNAME() function do, now it gets executed on every row. Can you try getting that function return value to parameter and then execute the query?
If the function is costly this will reduce execution time by a huge margin.
edit. Can anyone tell me why I can not post some SQL clauses and some I can? Cant post DECLARE syntax.
You haven't given much info, but splitting OR into a union of two queries can often help (optimisers have trouble with OR):
select *
from companybackup
where tiRecordStatus = 1
and iAccessCode < 3
union
select *
from companybackup
where tiRecordStatus = 1
and chUpdateBy = SUSER_SNAME())
If this doesn't help, try creating these indexes:
create index idx1 on companybackup(tiRecordStatus, iAccessCode);
create index idx2 on companybackup(chUpdateBy, tiRecordStatus);
Then force a recalculation if the distribution of index values:
update statistics companybackup;

SQL Select is very slow with CHARINDEX

I am using sql server and have a table with 2 columns
myId varchar(80)
cchunk varchar(max)
Basicly it stores large chunk of text so thats why I need it varchar(max).
My problem is when I do a query like this:
select *
from tbchunks
where
(CHARINDEX('mystring1',tbchunks.cchunk)< CHARINDEX('mystring2',tbchunks.cchunk))
AND
CHARINDEX('mystring2',tbchunks.cchunk) - CHARINDEX('mystring1',tbchunks.cchunk) <=10
It takes about 3 seconds to complete, and the table chunks is about only 500,000 records and data returned from the above query is anywhere between 0 to 800 max
I have unclustered index on myid column, it helped with making fast select count(*) but didnt help with the above query.
I tried using Fulltext but was slow. i tried spliting the text in cchunk into smaller parts and adding an id column that will connect all those splited chunks, but ended up with a table with 2 million records of splited chunks of text (i did that so i can add index) but still the query was even slower.
EDIT:
modified the table to include primary key (int)
created fultext catalog with "Accent Senstive=true"
created fulltext index on my tabe on column "cchunk"
ran the same above query and it ended up taking 22 seconds with is much slower
UPDATE
Thanks everyone for suggesting the FullText (#Aaron Bertrand thanks!), i converted my query to this
SELECT * FROM tbchunksAS FT_TBL INNER JOIN
CONTAINSTABLE(tbchunks, cchunk, '(mystring1 NEAR mystring2)') AS KEY_TBL
ON FT_TBL.cID = KEY_TBL.[KEY]
by the way the cID is the primary key i added later.
anyway i am getting borad results and i notice that the higher the RANK column that was returned the better the results. my question is when RANK starts to get accurate?
An index isn't going to help with CHARINDEX at all. An index on a particular column is only going to be able to quickly find rows where the value in the indexed field is exactly an indexed value. I'm actually quite surprised that query only takes 3 seconds given that it has to read every single row four times (or at the very least, twice).
Well as good the ideas that were presented here, no body manage to really solve my problem but rather provided helpful tips that lead me to the solution which i would love to share.
Using Full text really was the answer like many mentioned but i managed to use the Contains in combination with Near so it can totally replace my current sql query and provide an awesome speed.
CONTAINS(tbchunks, 'NEAR ((mystring1, mystring2), 3, TRUE)')

SQLite - select expression is very slow

I'm experiencing some heavy performance-issues with a query in SQLite. Currently there are around 20000 entries in the table activity_tbl and about 40 in the table activity_data_tbl. I have an index for both of the columns used in the query below, but it doesn't seem to have any effect on the performance at all.
SELECT a._id, a.start_time + b.length AS time
FROM activity_tbl a INNER JOIN activity_data_tbl b
ON a.activity_data_id = b._data_id
WHERE time > ?
ORDER BY 2
LIMIT 1
As you can see, I select one column and a value created from adding two columns together. I guess this is what's causing the low performance, since the query is very fast if I just select a.start_time or b.length.
Do you guys have any suggestion for how I could optimize this?
Try putting an index on the time column. This should speed up the query
This query is not optimizable using indexes for the filter part since you are filtering and ordering on a calculated value. To optimize the query you will either need to filter on one of the actual table columns (starttime or length) or pre-compute the time values before querying.
The only place an index will help, and I assume you have one, is on b.data_id.
A compound index may help. According to its docs, SQLite tries to avoid to access the table, if the index has enough information. So if the engine did its homework it will recognize that the index is enough to compute the where clause value and spare some time. If it does not work, only the pre-computation will do.
If you are more often confronted with similar tasks, please read this: http://www.sqlite.org/rtree.html

SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.
SELECT col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
is taking upwards of 5 or 6 mins to complete.
However
SELECT TOP 6000 col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
completes in about 4 or 5 seconds.
This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.
So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)?
Has anyone else witnessed something like this?
Thanks
Table valued functions can have a non-linear execution time.
Let's consider function equivalent for this query:
SELECT (
SELECT SUM(mi.value)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
mo.value
This query (that calculates the running SUM) is fast at the beginning and slow at the end, since on each row from mo it should sum all the preceding values which requires rewinding the rowsource.
Time taken to calculate SUM for each row increases as the row numbers increase.
If you make mytable large enough (say, 100,000 rows, as in your example) and run this query you will see that it takes considerable time.
However, if you apply TOP 5000 to this query you will see that it completes much faster than 1/20 of the time required for the full table.
Most probably, something similar happens in your case too.
To say something more definitely, I need to see the function definition.
Update:
SQL Server can push predicates into the function.
For instance, I just created this TVF:
CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN (
SELECT *
FROM master
);
These queries:
SELECT *
FROM fn_test()
WHERE name = #name
SELECT TOP 1000 *
FROM fn_test()
WHERE name = #name
yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP)
I had the same problem, a simple query joining five tables returning 1000 rows took two minutes to complete. When I added "TOP 10000" to it it completed in less than one second. It turned out that the clustered index on one of the tables was heavily fragmented.
After rebuilding the index the query now completes in less than a second.
Your TOP has no ORDER BY, so it's simply the same as SET ROWCOUNT 6000 first. An ORDER BY would require all rows to be evaluated first, and it's would take a lot longer.
If dbo.some_table_function is a inline table valued udf, then it's simply a macro that's expanded so it returns the first 6000 rows as mentioned in no particular order.
If the udf is multi valued, then it's a black box and will always pull in the full dataset before filtering. I don't think this is happening.
Not directly related, but another SO question on TVFs
You may be running into something as simple as caching here - perhaps (for whatever reason) the "TOP" query is cached? Using an index that the other isn't?
In any case the best way to quench your curiosity is to examine the full execution plan for both queries. You can do this right in SQL Management Console and it'll tell you EXACTLY what operations are being completed and how long each is predicted to take.
All SQL implementations are quirky in their own way - SQL Server's no exception. These kind of "whaaaaaa?!" moments are pretty common. ;^)
It's not necessarily true that the whole table is processed if col1 has an index.
The SQL optimization will choose whether or not to use an index. Perhaps your "TOP" is forcing it to use the index.
If you are using the MSSQL Query Analyzer (The name escapes me) hit Ctrl-K. This will show the execution plan for the query instead of executing it. Mousing over the icons will show the IO/CPU usage, I believe.
I bet one is using an index seek, while the other isn't.
If you have a generic client:
SET SHOWPLAN_ALL ON;
GO
select ...;
go
see http://msdn.microsoft.com/en-us/library/ms187735.aspx for details.
I think Quassnois' suggestion seems very plausible. By adding TOP 6000 you are implicitly giving the optimizer a hint that a fairly small subset of the 200,000 rows are going to be returned. The optimizer then uses an index seek instead of an clustered index scan or table scan.
Another possible explanation could caching, as Jim davis suggests. This is fairly easy to rule out by running the queries again. Try running the one with TOP 6000 first.