What is the better way than joining 3 tables? - sql

I have to join 3 tables somewhere on my project.
Here is example tables and columns:
Table-1 : posts
Columns1: id,owner,title,post,keywords
Table-2 : sites
Columns2: id,name,url
Table-3 : views
Columns3: id,post,view
When I join all these tables it happens such a little huge query:
SELECT title,post,keywords,name,url,view
FROM posts
LEFT JOIN sites ON sites.id=posts.owner
LEFT JOIN views ON views.post = post.id
WHERE posts.date BETWEEN '2010-10-10 00:00:00' AND '2010-11-11 00:00:00'
ORDER BY views.view DESC
LIMIT 0,10
Is it the only way or could I do something else to get better performance?
This is my current query's EXPLAIN. Above one is just an example.

That's not a huge query by any stretch of the imagination.
How slow is it really?
Maybe if views doesn't contain any more information than what you've shown, you should just have a view count be a field of posts. No need for it to be its own separate table unless you're actually storing some information about the views themselves, like user-agent strings or time.

That's not a particularly "Huge" query. Have you ran query analyzer and checked for where the slow point is and then checked your indexes?
Re: Analyzer - Microsoft keeps moving it, but in 2008 Management Studio there are several options for showing the execution plan. Once you see the execution plan you can see where the problems are. Look for a single action taking 80+% of your time and focus on that. Things like Table Scans are an indication that you could speed it up by tweaking indexes. (There are downsides to indexes as well, but worry about that later).

If your relations are guaranteed, in other words non-nullable foreign keys, then the query will perform better if it uses inner joins instead of left joins. Although this query does not appear to be large or complex enough to seriously be suffering from performance issues.

If your POSTS table is particularly large (>100K rows?) then one thing you could do is to load the time-filtered posts into a temporary table and join on that temp table.

Related

Performance for big query in SQL Server view

I have a big query for a view that takes a couple of hours to run and I feel like it may be possible to work on its performance "a bit"..
The problem is that I am not sure of what I should do. The query SELECT 39 values, LEFT OUTER JOIN 25 tables and each table could have up to a couple of million rows.
Any tip is good. Is there any good way to attack this problem? I tried to look at the actual execution plan on a test with less data (took about 10 min to run) but it's crazy big. Is there any general things I could do to make this faster? Do I have to tackle one small part at the time..?
Maybe there is just one join that slows down everything? How do I detect it? So what I mean for short, how do I work on a query like this?
As a said, all feedback is good. Is there some more information I need to show, tell me!
The query looks something like this:
SELECT DISTINCT
A.something,
A.somethingElse,
B.something,
C.somethingElse,
ISNULL(C.somethingElseElse, '')
C.somethingElseElseElse,
CASE *** THEN D.something ELSE 0,
E.something,
...
U.something
FROM
TableA A
JOIN
TableB B on ...
JOIN
TableC C on ...
JOIN
TableD D on ...
JOIN
TableE E on ...
JOIN
TableF F on ...
JOIN
TableG G on ...
...
JOIN
Table U on ...
Break your problem into manageable pieces. If the execution plan is too large for you to analyze, start with a smaller part of the query, check its execution plan and optimize it.
There is no general answer on how to optimize a query, since there is a whole bunch of possible reasons why a query can be slow. You have to check the execution plan.
Generally the most promising ways to improve performance are:
Indexing:
When you see a a Clustered Index Scan or - even worse (because then you don't have a clustered index) - a Table Scan in your query plan for a table that you join, you need an index for your JOIN predicate. This is especially true if you have tables with millions of entries, and you select only a small subset of those entries. Check also the index suggestions in the execution plan.
You see that the index works when your Clustered Index Scan turns into an Index Seek.
Index includes:
You probably are displaying columns from your joined tables that are different from the fields you use to join (otherwise, why would you need to join then?). SQL Server needs to get the fields that you need from the table, which you see in the execution plan as Key Lookup.
Since you are taking 39 values from 25 tables, there will be very few fields per table that you will need to get (mostly one or two). SQL Server needs to load entire pages of the respecitive table and get the values from them.
In this case, you should INCLUDE the column(s) you want to display in your index to avoid the key lookups. This comes at an increased index size, but considering you only include a few columns, that cost should be neglectable compared to the size of your tables.
Checking views that you join:
When you join VIEWs you should be aware that it basically means an extension to your query (which means also of the execution plan). Do the same performance optimizations for the view as you do for your main query. Also, check if you join tables in the view that you already join in the main query. These joins might be unnecessary.
Indexed views (maybe):
In general, you can add indexes to views you are joining to your query or create one or more indexed views for parts of your query. There are some caveats though:
Indexed views take storage space in your DB, because you store parts of the data multiple times.
There are a lot of restrictions to indexed views, most notably in your case that OUTER JOINs are forbidden. If you can transform at least some of your OUTER JOINs to INNER JOINs this might be an option.
When you join indexed views, don't forget to use WITH(NOEXPAND) in your join, otherwise they might be ignored.
Partitioned tables (maybe):
If you are running on the Enterprise Edition of SQL Server, you can partition your tables. That can be useful if the rows you join are always selected from a small subset of the available rows. You can make a partition for this subset and increase performance.
Summary:
Divide and conquer. Analyze your query bit by bit to optimize it. The most promising options are indexes and index includes. If you still have trouble, go from there.

Slow SQL Queries, Order Table by Date?

I have a Sql-Server-2008 database that I am querying from on the regular that was over 30 million entries (joy!). Unfortunately this database cannot be drastically changed because it is still in use for R/D.
When I query from this database, it takes FOREVER. By that I mean I haven't been patient enough to wait for results (after 2 mins I have to cancel to avoid locking the R/D department out). Even if I use a short date range (more than a few months), it is basically impossible to get any results from it. I am querying with requirements from 4 of the columns and unfortunately have to use an inner-join for another table (which I've been told is very costly in terms of query efficiency -- but it unavoidable). This inner joined table has less than 100k entries.
What I was wondering, is it is possible to organize the table to have it defaultly be ordered by date to reduce the number of results it has to search through?
If this is not possible, is there anything I can do to reduce query times? Is there any other useful information that could assist me in coming up with a solution?
I have included a sample of the query that I use:
SELECT DISTINCT N.TestName
FROM [DalsaTE].[dbo].[ResultsUut] U
INNER JOIN [DalsaTE].[dbo].[ResultsNumeric] N
ON N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
AND N.ResultsUutId = U.ResultsUutId
WHERE U.DeviceName = 'BO-32-3HK60-00-R'
AND U.StartDateTime > '2011-11-25 01:10:10.001'
ORDER BY N.TestName
Any help or suggestions are appreciated!
It sounds like datetime may be a text based field and subsequently an index isn't being used?
Could you try the following to see if you have any speed improvement:
select distinct N.TestName
from [DalsaTE].[dbo].[ResultsUut] U
inner join [DalsaTE].[dbo].[ResultsNumeric] N
on N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
and N.ResultsUutId = U.ResultsUutId
where U.DeviceName = 'BO-32-3HK60-00-R'
and U.StartDateTime > cast('2011-11-25 01:10:10.001' as datetime)
order by N.TestName
It would also be worth trying changing your inner join to a left outer join as those occasionally perform faster for no conceivable reason (at least one that I'm not aware of).
you can add an index based on your date column, which should improve your query time. You can either use an alter table command, or use the table designer.
Is the sole purpose of the join to provide sorting? If so, a quick thing to try would be to remove this, and see how much of a difference it makes - at least then you'll know where to focus your attention.
Finally, SQL server management studio has some useful tools such as execution plans that can help diagnose performance issues. Good luck!
There are a number of problems which may be causing delays in the execution of your query.
Indexes (except the primary key) do not reorder the data, they merely create an index (think phonebook) which orders a number of values and points back to the primary key.
Without seeing the type of data or the existing indexes, it's difficult, but at the very least, the following ASCENDING indexes might help:
[DalsaTE].[dbo].[ResultsNumeric] ModeDescription and ResultsUutId and TestName
[DalsaTE].[dbo].[ResultsUut] StartDateTime and DeviceName and ResultsUutId
Without the indexes above, the sample query you gave can be completed without performing a single lookup on the actual table data.

Is creating a view of multiple joining table faster then gettin that data directly using the join in a query?

Following is the query which is hit every table has over 100,000 records.
SELECT b.login as userEmail, imgateway_instance_id as img, u.id as userId
FROM buddy b
INNER JOIN `user` u ON b.username = u.login
INNER JOIN bot_to_buddy btb ON b.id = btb.buddy_id
INNER JOIN bot ON btb.bot_id = bot.id
WHERE u.id IN 14242
Using joins with tables that have as large an amount of records as yours are often very slow. This is so because joins will go over every record in a table which makes the query take a lot of time.
As a personally experienced solution I would suggest that you try and cut down the results of your query by using WHERE as much as you can to filter down the results and then use joins.
No, you cannot gain performance from using a view. Behind the scene, your original query is run when you query the view.
Sometimes using views can gain a small bit of performance, like it says in High Performance MySQL
On the other hand the author of the book has written this blog:
Views as performance trouble maker
Generally speaking, this depends on how you submit your query.
The view MAY be faster:
For example, in PHP it's common practice to submit the query "dynamically" (i.e. NOT as an prepared statement).
That means MySQL has to compile the query every time you call it. When using a view, this in done once when the view is created.
Regarding MySQL as an DBMS, I heard about performance issues with Views in earlier versions. (Don't know what the current situation is, though).
As a general rule in such questions, just benchmark your query to get real life results. Looks like you have already populated your database with a lot of data, so this should yield meaningful results. (Don't forget to disable caching in MySQL).
There's little reason having a view run your query instead of running the query yourself be any faster with MySQL.
Views in MySQL is generally so poorly implemented, we had to back out
of using them for many of our projects.
Check with EXPLAIN what your query does when you place it in a view, looking at that query, it can probably still use the proper indexes even it's part of a view, so it'll atleast not be slower.

SQL Joins Vs SQL Subqueries (Performance)?

I wish to know if I have a join query something like this -
Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id
and a subquery something like this -
Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept)
When I consider performance which of the two queries would be faster and why ?
Also is there a time when I should prefer one over the other?
Sorry if this is too trivial and asked before but I am confused about it. Also, it would be great if you guys can suggest me tools i should use to measure performance of two queries. Thanks a lot!
Well, I believe it's an "Old but Gold" question. The answer is: "It depends!".
The performances are such a delicate subject that it would be too much silly to say: "Never use subqueries, always join".
In the following links, you'll find some basic best practices that I have found to be very helpful:
Optimizing Subqueries
Optimizing Subqueries with Semijoin Transformations
Rewriting Subqueries as Joins
I have a table with 50000 elements, the result i was looking for was 739 elements.
My query at first was this:
SELECT p.id,
p.fixedId,
p.azienda_id,
p.categoria_id,
p.linea,
p.tipo,
p.nome
FROM prodotto p
WHERE p.azienda_id = 2699 AND p.anno = (
SELECT MAX(p2.anno)
FROM prodotto p2
WHERE p2.fixedId = p.fixedId
)
and it took 7.9s to execute.
My query at last is this:
SELECT p.id,
p.fixedId,
p.azienda_id,
p.categoria_id,
p.linea,
p.tipo,
p.nome
FROM prodotto p
WHERE p.azienda_id = 2699 AND (p.fixedId, p.anno) IN
(
SELECT p2.fixedId, MAX(p2.anno)
FROM prodotto p2
WHERE p.azienda_id = p2.azienda_id
GROUP BY p2.fixedId
)
and it took 0.0256s
Good SQL, good.
I would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...).
As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.
The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!
Performance is based on the amount of data you are executing on...
If it is less data around 20k. JOIN works better.
If the data is more like 100k+ then IN works better.
If you do not need the data from the other table, IN is good, But it is alwys better to go for EXISTS.
All these criterias I tested and the tables have proper indexes.
Start to look at the execution plans to see the differences in how the SQl Server will interpret them. You can also use Profiler to actually run the queries multiple times and get the differnce.
I would not expect these to be so horribly different, where you can get get real, large performance gains in using joins instead of subqueries is when you use correlated subqueries.
EXISTS is often better than either of these two and when you are talking left joins where you want to all records not in the left join table, then NOT EXISTS is often a much better choice.
The performance should be the same; it's much more important to have the correct indexes and clustering applied on your tables (there exist some good resources on that topic).
(Edited to reflect the updated question)
I know this is an old post, but I think this is a very important topic, especially nowadays where we have 10M+ records and talk about terabytes of data.
I will also weight in with the following observations. I have about 45M records in my table ([data]), and about 300 records in my [cats] table. I have extensive indexing for all of the queries I am about to talk about.
Consider Example 1:
UPDATE d set category = c.categoryname
FROM [data] d
JOIN [cats] c on c.id = d.catid
versus Example 2:
UPDATE d set category = (SELECT TOP(1) c.categoryname FROM [cats] c where c.id = d.catid)
FROM [data] d
Example 1 took about 23 mins to run. Example 2 took around 5 mins.
So I would conclude that sub-query in this case is much faster. Of course keep in mind that I am using M.2 SSD drives capable of i/o # 1GB/sec (thats bytes not bits), so my indexes are really fast too. So this may affect the speeds too in your circumstance
If its a one-off data cleansing, probably best to just leave it run and finish. I use TOP(10000) and see how long it takes and multiply by number of records before I hit the big query.
If you are optimizing production databases, I would strongly suggest pre-processing data, i.e. use triggers or job-broker to async update records, so that real-time access retrieves static data.
The two queries may not be semantically equivalent. If a employee works for more than one department (possible in the enterprise I work for; admittedly, this would imply your table is not fully normalized) then the first query would return duplicate rows whereas the second query would not. To make the queries equivalent in this case, the DISTINCT keyword would have to be added to the SELECT clause, which may have an impact on performance.
Note there is a design rule of thumb that states a table should model an entity/class or a relationship between entities/classes but not both. Therefore, I suggest you create a third table, say OrgChart, to model the relationship between employees and departments.
You can use an Explain Plan to get an objective answer.
For your problem, an Exists filter would probably perform the fastest.

TSQL Join efficiency

I'm developing an ASP.NET/C#/SQL application. I've created a query for a specific grid-view that involves a lot of joins to get the data needed. On the hosted server, the query has randomly started taking up to 20 seconds to process. I'm sure it's partly an overloaded host-server (because sometimes the query takes <1s), but I don't think the query (which is actually a view reference via a stored procedure) is at all optimal regardless.
I'm unsure how to improve the efficiency of the below query:
(There are about 1500 matching records to those joins, currently)
SELECT dbo.ca_Connections.ID,
dbo.ca_Connections.Date,
dbo.ca_Connections.ElectricityID,
dbo.ca_Connections.NaturalGasID,
dbo.ca_Connections.LPGID,
dbo.ca_Connections.EndUserID,
dbo.ca_Addrs.LotNumber,
dbo.ca_Addrs.UnitNumber,
dbo.ca_Addrs.StreetNumber,
dbo.ca_Addrs.Street1,
dbo.ca_Addrs.Street2,
dbo.ca_Addrs.Suburb,
dbo.ca_Addrs.Postcode,
dbo.ca_Addrs.LevelNumber,
dbo.ca_CompanyConnectors.ConnectorID,
dbo.ca_CompanyConnectors.CompanyID,
dbo.ca_Connections.HandOverDate,
dbo.ca_Companies.Name,
dbo.ca_States.State,
CONVERT(nchar, dbo.ca_Connections.Date, 103) AS DateView,
CONVERT(nchar, dbo.ca_Connections.HandOverDate, 103) AS HandOverDateView
FROM dbo.ca_CompanyConnections
INNER JOIN dbo.ca_CompanyConnectors ON dbo.ca_CompanyConnections.CompanyID = dbo.ca_CompanyConnectors.CompanyID
INNER JOIN dbo.ca_Connections ON dbo.ca_CompanyConnections.ConnectionID = dbo.ca_Connections.ID
INNER JOIN dbo.ca_Addrs ON dbo.ca_Connections.AddressID = dbo.ca_Addrs.ID
INNER JOIN dbo.ca_Companies ON dbo.ca_CompanyConnectors.CompanyID = dbo.ca_Companies.ID
INNER JOIN dbo.ca_States ON dbo.ca_Addrs.StateID = dbo.ca_States.ID
It may have nothing to do with your query and everything to do with the data transfer.
How fast does the query run in query analyzer?
How does this compare to the web page?
If you are bringing back the entire data set you may want to introduce paging, say 100 records per page.
The first thing I normally suggest is to profile to look for potential indexes to help out. But the when the problem is sporadic like this and the normal case is for the query to run in <1sec, it's more likely due to lock contention rather than a missing index. That means the cause is something else in the system causing this query to take longer. Perhaps an insert or update. Perhaps another select query — one that you would normally expect to take a little longer so the extra time on it's end isn't noted.
I would start with indexing, but I have a database that is a third-party application. Creating my own indexes is not an option. I read an article (sorry, can't find the reference) recommending breaking up the query into table variables or temp tables (depending on number of records) when you have multiple tables in your query (not sure what the magic number is).
Start with dbo.ca_CompanyConnections, dbo.ca_CompanyConnectors, dbo.ca_Connections. Include the fields you need. And then subsitute these three joined tables with just the temp table.
Not sure what the issue is (would like to here recommendations) but seems like when you get over 5 tables performance seems to drop.