Simple SQL query on small tables takes too long to execute - sql

I have a query that takes much too long time to execute. It is simple and tables are small. The simplified query (but still slow) is:
SELECT D.ID, C.Name, T.Name AS TownName
FROM Documents D, Companies C, Towns T
WHERE C.ID = D.Company AND T.ID = C.Town
ORDER BY C.Name
Primary keys and foreign keys between tables are properly set. Also, column Companies.Name is indexed.
I tried using JOINs, restarting SQL Server, rebuilding indices etc. but it still needs about 40 seconds to execute on my computer with SSD. Number of records in tables Documents and Companies is only 18K (currently, they are 1:1) and only about 20 records in table Towns.
On the other side, the following query returns completely the same records, but it takes practically no time to execute:
SELECT D.ID, C.Name, (SELECT Name FROM Towns WHERE ID = C.Town) AS TownName
FROM Documents D, Companies C
WHERE C.ID = D.Company
ORDER BY C.Name
In my opinion, the first query should be even faster, but I am obviously wrong. Does anybody have a clue what's happening here? It seems that indices are ignored when sorting by column in a table which is a master of one and detail of another one.

I can't explain why your subquery query is running faster but I would try something else to see if I could eliminate the subquery.
I usually go from least to greatest when i'm not using where conditions.. So my query would look like
Select t.Name TownName,
c.Name,
d.Id
From Towns t
Join Companies c ON t.Id = c.Town
Join Documents d ON c.Id = d.Company
Order By c.Name
Then I'd make sure that Companies has an Index on Town, and that Documents has and index on Company.. 18k records might take a little while to display in the output window but the query should be pretty quick

what happens when you use join statements?
SELECT D.ID, C.Name, T.Name AS TownName
FROM Documents D
inner join Companies C on C.
inner join Towns T on T.ID = C.Town
ORDER BY C.Name
also, try with and without the order

Related

How to join three tables having relation parent-child-child's child. And I want to access all records related to parent

I have three tables:
articles(id,title,message)
comments(id,article_id,commentedUser_id,comment)
comment_likes(id, likedUser_id, comment_id, action_like, action_dislike)
I want to show comments.id, comments.commentedUser_id, comments.comment, ( Select count(action_like) where action_like="like") as likes and comment_id=comments.id where comments.article_id=article.id
Actually I want to count all action_likes that related to any comment. And show all all comments of articles.
action_likes having only two values null or like
SELECT c.id , c.CommentedUser_id , c.comment , (cl.COUNT(action_like) WHERE action_like='like' AND comment_id='c.id') as likes
FROM comment_likes as cl
LEFT JOIN comments as c ON c.id=cl.comment_id
WHERE c.article_id=article.id
It shows nothing, I know I'm doing wrong way, that was just that I want say
I guess you are looking for something like below. This will return Article/Comment wise LIKE count.
SELECT
a.id article_id,
c.id comment_id,
c.CommentedUser_id ,
c.comment ,
COUNT (CASE WHEN action_like='like' THEN 1 ELSE NULL END) as likes
FROM article a
INNER JOIN comments C ON a.id = c.article_id
LEFT JOIN comment_likes as cl ON c.id=cl.comment_id
GROUP BY a.id,c.id , c.CommentedUser_id , c.comment
IF you need results for specific Article, you can add WHERE clause before the GROUP BY section like - WHERE a.id = N
I would recommend a correlated subquery for this:
SELECT a.id as article_id, c.id as comment_id,
c.CommentedUser_id, c.comment,
(SELECT COUNT(*)
FROM comment_likes cl
WHERE cl.comment_id = c.id AND
cl.action_like = 'like'
) as num_likes
FROM article a INNER JOIN
comments c
ON a.id = c.article_id;
This is a case where a correlated subquery often has noticeably better performance than an outer aggregation, particularly with the right index. The index you want is on comment_likes(comment_id, action_like).
Why is the performance better? Most databases will implement the group by by sorting the data. Sorting is an expensive operation that grows super-linearly -- that is, twice as much data takes more than twice as long to sort.
The correlated subquery breaks the problem down into smaller pieces. In fact, no sorting should be necessary -- just scanning the index and counting the matching rows.

SQL Server query perfomance tuning with group by and join clause

We have been experiencing performance concerns over job and I could fortunately find the query causing the slowness..
select name from Student a, Student_Temp b
where a.id = b.id and
a.name in (select name from Student
group by name having count(*) = #sno)
group by a.name having count(*) = #sno
OPTION (MERGE JOIN, LOOP JOIN)
This particular query is iteratively called many times slowing down the performance..
Student table has 8 Million records and Student_temp receives 5-20 records in the iteration process each time.
Student table has composite primary key on ( id and name)
and sno = No of records in Student_Temp.
My questions are below,
1) why does this query show performance issues.
2) could you guys give a more efficient way of writing this piece ?
Thanks in Advance !
It's repeating the same logic unnecessarily. You really just want:
Of the Student(s) who also exist in Student_temp
what names exist #sno times?
Try this:
SELECT
name
FROM
Student a JOIN
Student_Temp b ON a.id = b.id
GROUP BY
name
HAVING
count(*) = #sno
Your query returns the following result: Give me all names that are #sno times in the table Student and exactly once in Student_temp.
You can rewrite the query like this:
SELECT a.name
FROM Student a
INNER JOIN Student_temp b
ON a.id = b.id
GROUP BY a.name
HAVING COUNT(*) = #sno
You should omit the query hint unless you are absolutely sure that the query optimizer screws up.
EDIT: There is of course a difference between the queries: if for instance #sno=2 then a name that shows up once in Student but twice in Student_temp would be included in my query but not in the original. I depends on what you really want to achieve whether that needs to be adressed or not.
Here you go
select name
from Student a
inner join Student_Temp b
on a.id = b.id
group by a.name
HAVING COUNT(*) = #sno

comparison query taking ages

My query is quite simple:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a, COMPANIES b
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
Database: sql server 2008 r2
What I'm trying to do:
The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)
Right now it's taking a lot of time to complete. I was wondering if if could be done faster
Try this version. It should be only a little faster. The COUNT is quite slow. I've added a.ID <> b.ID to avoid few cases earlier.
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
The FROM ... INNER JOIN ... ON ... is a preferred SQL construct to join tables. It may be faster too.
One approach would be to pre-calculate the COMPANYID count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:
insert into #CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID
Then your main query:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
inner join #CompanyCount aCount on aCount.ID = a.ID
inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
inner join #CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount
If you want all instances of a even though there is no corresponding b then you'd need to have left outer joins to b and bCount.
However you need to look at the query plan - which indexes are you using - you probably want to have them on the IDs and the Postcode and Adres fields as a minimum since you're joining on them.
Build an index on postcode and adres
The database probably executes the subselects for every row. (Just guessing here, veryfy it in the explain plan. If this is the case you can rewrite the query to join with the inline views (note this is how it would look in oracle hop it works in sql server as well):
select distinct a.ID, a.adres, a.place, a.postalcode
from
COMPANIES a,
COMPANIES b,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntA,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntb
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and a.ID<>b.ID
and cnta.cnt>cntb.cnt

What is difference in performance of following 2 queries

What is the difference in performance of following 2 queries in SQL Server 2008
Query 1:
SELECT A.Id,A.Name,B.Class,B.Std,C.Result,D.Grade
FROM Student A
INNER JOIN Classes B ON B.ID = A.ID
INNER JOIN Results C ON C.ID = A.ID
INNER JOIN Grades D ON D.Name = A.Name
WHERE A.Name='Test' AND A.ID=3
Query 2:
SELECT A.Id,A.Name,B.Class,B.Std,C.Result,D.Grade
FROM Student A
INNER JOIN Classes B ON B.ID = A.ID AND A.Name='Test' AND A.ID=3
INNER JOIN Results C ON C.ID = A.ID
INNER JOIN Grades D ON D.Name = A.Name
Is there any best way to achieve the best perofermance in the above 2 queries
You can have a 100% guarantee that they execute the same, with the same plan.
The only time it does matter when splitting AND/WHERE clauses in an INNER JOIN is when options like FORCE ORDER is used.
For great performance at the expense of writes, create these indexes:
A (id, name)
B (id) includes (class, std)
C (id) includes (result)
D (name) includes (grade)
However, it still depends on your distribution of data and selectivity of indexes as to whether they will actually be used. e.g. if your Grade table contains only 5 entries A,B,C,D,E, then no index will be used, it will simply scan and buffer the table in memory.
If you look at the execution plans of each, you'll find they're in all likelihood going to have identical execution plans.
The way to best improve performance for these kinds of queries is to make sure that you've got the proper indexes set up. All of your ID columns should have indexes on them, and if Grades is a largeish table, consider putting an index on Name for both it and Student

SQL: Speed Improvement - Left Join on cond1 or cond2

SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
)
Two tables that are basically the same
I don't have access to the table structure or data input (thus no cleaning up primary keys)
Sometimes the user_id is populated in one and not the other
Sometimes names are equal, sometimes they are not
I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).
This query runs in 33000ms, whereas individualized they are each about 200ms.
I've been up late and can't think straight right now
I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
Here is some free points to anyone that wants to help
Please don't ask for the execution plan.
Looks like you can easily avoid the string concatenation:
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
Change it to:
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
Rather than concatenating first and last name and comparing them, try comparing them individually instead. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes on the first name and last name columns.
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR (a.f_name = b.f_name and a.l_name = b.l_name)
)
If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.
One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:
SELECT a_field1, a_field2, ...
MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
FROM (
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
UNION ALL
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.f_name = b.f_name AND a.l_name = b.l_name
)
GROUP BY a_field1, a_field2, ...
And now the database can do each of the two joins using the most efficient plan.
(Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)
Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.
I don't really understand why you're concatenating those strings. Seems like that's where your slowdown would be. Does this work instead?
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
)
Here is Yet Another Ugly Way To Do It.
SELECT a.*
, CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1
, CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2
...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
LEFT JOIN import_tbl c
ON a.f_name = c.f_name AND a.l_name = c.l_name;
This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.
Try using JOIN hints:
http://msdn.microsoft.com/en-us/library/ms173815.aspx
We were encountering the same type of behavior with one of our queries. As a last resort we added the LOOP hint, and the query ran much much faster.
It's important to note that Microsoft says this about JOIN hints:
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including , be used only as a last resort by experienced developers and database administrators.
my boss at my last job.. I swear.. he thought that using UNIONS was ALWAYS FASTER THAN OR.
For example.. instead of writing
Select * from employees Where Employee_id = 12 or employee_id = 47
he would write (and have me write)
Select * from employees Where employee_id = 12
UNION
Select * from employees Where employee_id = 47
SQL Sever optimizer said that this was the right thing to do in SOME situations.. I have a friend who works on the SQL Server team at Microsoft, I emailed him about this and he told me that my stats were out of date or something along those lines.
I never really got a good answer on WHY the unions are faster, it seems REALLY counter-intuitive.
I'm not recommending you DO this, but in some situations it can help.
Also two more things-- GET RID OF THE DISTINCT CLAUSE unless you absolutely need it.. n
and more importantly, you can easily get rid of the concatenation in your join, like this for example (pardon my lack of mySQL knowledge)
SELECT DISTINCT a., b.
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name and a.l_name = b.l_name)
)
I've had some tests at work in a similiar situation that show 10x performance improvement by getting rid of the simple concatenation in your join