I have problem with performance when retrieving data from SQL Server.
My sql query looks something like this:
SELECT
table_1.id,
table_1.value,
table_2.id,
table_2.value,...,
table_20.id,
table_20.value
From table_1
INNER JOIN table_2
ON table_1.id = table_2.table_1_id
INNER JOIN table_3
ON table_2.id = table_3.table_2_id...
WHERE table_1.row_number BETWEEN 1 AND 20
So, I am fetching 20 results.
This query takes about 5 seconds to execute.
When I select only table_1.id, it returns results instantly.
Because of that, I guess that problem is not in JOINs, it is in retrieving data from multiple tables.
Any suggestions how I would speed up this query?
Assuming your tables are designed properly (have a useful primary key etc.), then the first thing I would check is this:
are there indices on each of the foreign key columns in the child tables?
SQL Server does not automatically create indices on the foreign key columns - yet those are indeed very helpful for speeding up your JOINs.
Other than that: just look at the query plans! They should tell you everything about this query - what indices are being used (or not), what operations are being executed to get the results....
Without knowing a lot more about your tables, their structure and the data they contain (how much? What kind of values? etc.), there's really not much we can do to help here....
Between can really slow a query, what do you want to achieve with it
also
Do you have an on the columns you are joining on
You could use with(nolock) on the table
Check to execution plan to see whats taking so long
How about this one:
SELECT
table_1.id,
table_1.value,
table_2.id,
table_2.value,...,
table_20.id,
table_20.value
FROM
table_1
INNER JOIN table_2 ON table_1.id = table_2.id AND table_1.row_Number between 1 and 20
INNER JOIN table_3 ON table_2.id = table_3.id
I mean before joining to another table, you choose range of data.
Related
I have got an sql query that has lots of INNER JOINS between other tables.
I know that we can use Joins to optimize but as we can see, this query already involved Joins in it. I was thinking to add GROUP BY to this statement which gets more realistic and could run better. Do selecting more columns from one to other tables cause the query slow? If so, how could it work if we need to optimize? Below is my code:
SELECT /*PARALLEL(4)*/
s.task_seq_num,
s.group_seq_num AS grp_seq_num,
g.source_type_cd,
d.doc_id,
d.doc_ref_id AS doc_ref_id,
dm.doc_priority_num AS doc_priority,
s.doc_seq_num AS doc_seq_num,
s.case_num AS case_num,
dm.doc_title_name AS doc_title,
s.task_status_cd AS task_status_cd,
d.received_dt AS received_dt,
nvl(b.first_name,d.first_name) AS first_name,
nvl(b.mid_name,d.mid_name) AS mid_name,
nvl(b.last_name,d.last_name) AS last_name,
tg.content_tag_cd AS content_tag_cd,
d.app_num AS app_num,
e.head_of_household_sw AS head_of_household_sw,
f.user_id AS user_id
FROM
dm_task_status s
INNER JOIN dm_task_tag tg ON s.task_seq_num = tg.task_seq_num
INNER JOIN dm_doc_group g ON g.group_seq_num = s.group_seq_num
INNER JOIN dm_doc d ON d.doc_seq_num = s.doc_seq_num
INNER JOIN dm_doc_master dm ON dm.doc_ref_id = d.doc_ref_id
LEFT JOIN mo_employees f ON f.emp_id = s.emp_id
LEFT JOIN ( dc_case_individual e
INNER JOIN dc_indv b ON b.indv_id = e.indv_id
AND e.head_of_household_sw = 'Y' ) ON e.case_num = s.case_num
WHERE
s.office_num =38
AND s.eff_end_tms IS NULL
AND d.delete_sw IS NULL
ORDER BY s.group_seq_num ASC;
Any ideas are appreciated
First,
AND d.delete_sw IS NULL
should be up in the join and not in the WHERE clause. I don't THINK that matters for performance, but don't trust me on that one. But I do know that it's just good policy to only have the final WHERE clause constructed of the primary table itself.
Second, Since we don't know the data itself or the structure of the data, I would say on EACH join, ensure that you're using a table index wherever possible to prevent full table scans. It seems like something that might be insulting to suggest, but I've made the mistake in the past many times without realizing that there was in fact a place where I was NOT using an indexed field to limit the data scanned.
For example, are all of the fields I list below a part of the primary key of their respective tables? If so, are they the FULL primary key? If they are not the full primary key, is there any way that you can get the rest of the primary key values from the table(s) that you're "coming from"? SQL will try to do the best job it can to make the most efficient plan, but it can only go so far, we always have to try to ensure we're using indexed fields and preferably primary keys to ensure that the database doesn't have to work harder than it has to.
Fields in question:
dm_task_tag.task_seq_num
dm_doc_group.group_seq_num
dm_doc.doc_seq_num
dm_doc_master.dm.doc_ref_id
mo_employees.emp_id
dc_case_individual.indv_id
dc_case_individual.case_num
I'm VERY suspicious of that second LEFT join, but I can't state much about that without knowing the tables. I'm also especially curious if the doc_ref_id is in fact the primary key of the dm_doc_master, or if you have a seq_num that you're missing...
i have 3 tables and i dont want define any foreign key in my tables.
my tables structure are like below:
tables diagram
i have written this query :
delete relativedata, crawls, stored
from relativedata inner join
crawls
on relativedata.crawl_id = crawls.id and
relativedata.id = ? inner join
stored
on stored.crawl_id = crawls.id
this query works for me unless one of tables has no records.
now how can i do this delete in 3 tables in 1 query?
If it works if all tables have records, try using LEFT JOIN instread of INNER JOIN. Also, You had some mess with Your joins ON conditions. Try it like this:
delete
relativedata, crawls, stored
from
relativedata
LEFT join crawls on relativedata.crawl_id = crawls.id
LEFT join stored on relativedata.crawl_id = stored.crawl_id
WHERE
relativedata.id = ?
Also, foregin keys are good thing, and not using them is generally bad idea. Yes, they seems to be annoying at first, but try to focus on WHEN they annoy You. Most of the times they do it when You are meddling with data in a way You should not, and without them You wloud cause data incostincency in Your DB.
But, it is just my opinion.
I have a SQL statement:
select
t3.item1,
t3.item2,
sum(t1.moneys)
from
table t1
inner join table t2 on t1.key = t2.key
inner join table t3 on t1.key2 = t3.key2
where
t2.type = 'thistype'
and t3.type2 = 'thistype'
group by
t3.item1, t3.item2
If I remove the group by, sum, or where clause it runs fine - but if I add back any of those it hangs forever... any ideas... this is on SQL Server Management Studio 2008 R2
Thanks
Further Testing
so I created a view:
select
t3.item1,
t3.item2,
t1.moneys,
t2.type,
t3.type2
from
table t1
inner join table t2 on t1.key = t2.key
inner join table t3 on t1.key2 = t3.key2
and I can select top 1000 from the view fine and see the type I want to specifically look at in the data, but when I add the 'where type2 = 'thistype'' it hangs again...
Your joining three tables together with millions of records, this is normal for it take a bit to run.
To answer your question about statistics, they are what the indices utilize to retrieve records faster from your tables. Without accurate or up to date statistics, indices can actually slow your queries down.
http://blogs.technet.com/b/rob/archive/2008/05/16/sql-server-statistics.aspx
I think we'd need to see some table structure and know some more things about your DB before we can give a solid answer. First thing, though, is to run a trace on it and see what that tells you.
At first blush, I have found that issues with aggregate functions (sum, group by, etc) tend to stem from a) overly large data sets (that is: you're just trying to pull back too much data) or b) from overly-complicated structure or relationships on the joined tables.
However, those are just my general rules-of-thumb, and may not apply in a specific situation: run a trace and any other form of profiling you can and see what that tells you.
Has you looked at the execution plan you're getting? That will tell you where the problem is. Do you have covering indices on the columns on which you're joining and grouping?
Is it possible that the execution plan is corrupted?
http://msdn.microsoft.com/en-us/library/aa175244(v=sql.80).aspx
Try recompiling the plan using sp_recompile
My question is similar to this SQL order of operations but with a little twist, so I think it's fair to ask.
I'm using Teradata. And I have 2 tables: table1, table2.
table1 has only an id column.
table2 has the following columns: id, val
I might be wrong but I think these two statements give the same results.
Statement 1.
SELECT table1.id, table2.val
FROM table1
INNER JOIN table2
ON table1.id = table2.id
WHERE table2.val<100
Statement 2.
SELECT table1.id, table3.val
FROM table1
INNER JOIN (
SELECT *
FROM table2
WHERE val<100
) table3
ON table1.id=table3.id
My questions is, will the query optimizer be smart enough to
- execute the WHERE clause first then JOIN later in Statement 1
- know that table 3 isn't actually needed in Statement 2
I'm pretty new to SQL, so please educate me if I'm misunderstanding anything.
this would depend on many many things (table size, index, key distribution, etc), you should just check the execution plan:
you don't say which database, but here are some ways:
MySql EXPLAIN
SQL Server SET SHOWPLAN_ALL (Transact-SQL)
Oracle EXPLAIN PLAN
what is explain in teradata?
Teradata Capture and compare plans faster with Visual Explain and XML plan logging
Depending on the availability of statistics and indexes for the tables in question the query rewrite mechanism in the optimizer will may or may not opt to scan Table2 for records where val < 100 before scanning Table1.
In certain situations, based on data demographics, joins, indexing and statistics you may find that the optimizer is not eliminating records in the query plan when you feel that it should. Even if you have a derived table such as the one in your example. You can force the optimizer to process a derived table by simply placing a GROUP BY in your derived table. The optimizer is then obligated to resolve the GROUP BY aggregate before it can consider resolving the join between the two tables in your example.
SELECT table1.id, table3.val
FROM table1
INNER JOIN (
SELECT table2.id, tabl2.val
FROM table2
WHERE val<100
GROUP BY 1,2
) table3
ON table1.id=table3.id
This is not to say that your standard approach should be to run with this through out your code. This is typically one of my last resorts when I have a query plan that simply doesn't eliminate extraneous records earlier enough in the plan and results in too much data being scanned and carried around through the various SPOOL files. This is simply a technique you can put in your toolkit to when you encounter such a situation.
The query rewrite mechanism is continually being updated from one release to the next and the details about how it works can be found in the SQL Transaction Processing Manual for Teradata 13.0.
Unless I'm missing something, Why do you even need Table1??
Just query Table2
Select id, val
From table2
WHERE val<100
or are you using the rows in table1 as a filter? i.e., Does table1 only copntain a subset of the Ids in Table2??
If so, then this will work as well ...
Select id, val
From table2
Where val<100
And id In (Select id
From table1)
But to answer your question, Yes the query optimizer should be intelligent enough to figure out the best order in which to execute the steps necessary to translate your logical instructions into a physical result. It uses the strored statistics that the database maintains on each table to determine what to do (what type of join logic to use for example), as wekll as what order to perform the operations in in order to minimize Disk IOs and processing costs.
Q1. execute the WHERE clause first then JOIN later in Statement 1
The thing is, if you switch the order of inner join, i.e. table2 INNER JOIN table1, then I guess WHERE clause can be processed before JOIN operation, during the preparation phase. However, I guess even if you don't change the original query, the optimizer should be able to switch their order, if it thinks the join operation will be too expensive with fetching the whole row, so it will apply WHERE first. Just my guess.
Q2. know that table 3 isn't actually needed in Statement 2
Teradata will interpret your second query in such way that the derived table is necessary, so it will keep processing table 3 involved operation.
I've added a field to a MySQL table. I need to populate the new column with the value from another table. Here is the query that I'd like to run:
UPDATE table1 t1
SET t1.user_id =
(
SELECT t2.user_id
FROM table2 t2
WHERE t2.usr_id = t1.usr_id
)
I ran that query locally on 239K rows and it took about 10 minutes. Before I do that on the live environment I wanted to ask if what I am doing looks ok i.e. does 10 minutes sound reasonable. Or should I do it another way, a php loop? a better query?
Use an UPDATE JOIN! This will provide you a native inner join to update from, rather than run the subquery for every bloody row. It tends to be much faster.
update table1 t1
inner join table2 t2 on
t1.usr_id = t2.usr_id
set t1.user_id = t2.user_id
Ensure that you have an index on each of the usr_id columns, too. That will speed things up quite a bit.
If you have some rows that don't match up, and you want to set t1.user_id = null, you will need to do a left join in lieu of an inner join. If the column is null already, and you're just looking to update it to the values in t2, use an inner join, since it's faster.
I should make mention, for posterity, that this is MySQL syntax only. The other RDBMS's have different ways of doing an update join.
There are two rather important pieces of information missing:
What type of tables are they?
What indexes exist on them?
If table2 has an index that contains user_id and usr_id as the first two columns and table1 is indexed on user_id, it shouldn't be that bad.
You don't have an index on t2.usr_id.
Create this index and run your query again, or a multiple-table UPDATE proposed by #Eric (with LEFT JOIN, of course).
Note that MySQL lacks other JOIN methods than NESTED LOOPS, so it's index that matters, not the UPDATE syntax.
However, the multiple table UPDATE is more readable.