SQL COUNT(col) vs extra logging column... efficiency? - sql

I can't seem to find much information about this.
I have a table to log users comments. I have another table to log likes / dislikes from other users for each comment.
Therefore, when selecting this data to be displayed on a web page, there is a complex query requiring joins and subqueries to count all likes / dislikes.
My example is a query someone kindly helped me with on here to achieve the required results:
SELECT comments.comment_id, comments.descr, comments.created, usrs.usr_name,
(SELECT COUNT(*) FROM comment_likers WHERE comment_id=comments.comment_id AND liker=1)likes,
(SELECT COUNT(*) FROM comment_likers WHERE comment_id=comments.comment_id AND liker=0)dislikes,
comment_likers.liker
FROM comments
INNER JOIN usrs ON ( comments.usr_id = usrs.usr_id )
LEFT JOIN comment_likers ON ( comments.comment_id = comment_likers.comment_id
AND comment_likers.usr_id = $usrID )
WHERE comments.topic_id=$tpcID
ORDER BY comments.created DESC;
However, if I added a likes and dislikes column to the COMMENTS table and created a trigger to automatically increment / decrement these columns as likes get inserted / deleted / updated to the LIKER table then the SELECT statement would be more simple and more efficient than it is now. I am asking, is it more efficient to have this complex query with the COUNTS or to have the extra columns and triggers?
And to generalise, is it more efficient to COUNT or to have an extra column for counting when being queried on a regular basis?

Your query is very inefficient. You can easily eliminate those sub queries, which will dramatically increase performance:
Your two sub queries can be replaced by simply:
sum(liker) likes,
sum(abs(liker - 1)) dislikes,
Making the whole query this:
SELECT comments.comment_id, comments.descr, comments.created, usrs.usr_name,
sum(liker) likes,
sum(abs(liker - 1)) dislikes,
comment_likers.liker
FROM comments
INNER JOIN usrs ON comments.usr_id = usrs.usr_id
LEFT JOIN comment_likers ON comments.comment_id = comment_likers.comment_id
AND comment_likers.usr_id = $usrID
WHERE comments.topic_id=$tpcID
ORDER BY comments.created DESC;

Related

Adding ORDER BY on SQLite takes huge amount of time

I've written the following query:
WITH m2 AS (
SELECT m.id, m.original_title, m.votes, l.name as lang
FROM movies m
JOIN movie_languages ml ON m.id = ml.movie_id
JOIN languages l ON l.id = ml.language_id
)
SELECT m.original_title
FROM movies m
WHERE NOT EXISTS (
SELECT 1
FROM m2
WHERE m.id = m2.id AND m2.lang <> 'English'
)
The results appear after 1.5 seconds.
After adding the following line at the end of the query, it takes at least 5 minutes to run it:
ORDER BY votes DESC;
It's not the size of the data, as ORDER BY on the entire table return results in notime.
What am I doing wrong?
Why is the ORDER BY adds so much time? (The query SELECT * FROM movies ORDER BY votes DESC returns immediately).
The order by in the CTE is irrelevant. But I would suggest aggregation for this purpose:
SELECT m.original_title
FROM movies m JOIN
movie_languages ml
ON m.id = ml.movie_id JOIN
languages l
ON l.id = ml.language_id
GROUP BY m.original_title, m.id
HAVING SUM(lang = 'English') = 0;
In order to examine your queries you may turn on the timer by entering .time on at the SQLite prompt. More importantly utilize the EXPLAIN function to see details on your query.
The query initially written does seem to be rather more complex than necessary as already pointed out above. It does not seem apparent what the necessity is for 'movie_languages' and 'languages' tables in general, but especially in this particular query. That would require more explanation on your part but I believe at least one could be removed thus speeding up your query.
The ORDER BY clause in SQLite is handled as described below.
SQLite attempts to use an index to satisfy the ORDER BY clause of a query when possible. When faced with the choice of using an index to satisfy WHERE clause constraints or satisfying an ORDER BY clause, SQLite does the same cost analysis described above and chooses the index that it believes will result in the fastest answer.
SQLite will also attempt to use indices to help satisfy GROUP BY clauses and the DISTINCT keyword. If the nested loops of the join can be arranged such that rows that are equivalent for the GROUP BY or for the DISTINCT are consecutive, then the GROUP BY or DISTINCT logic can determine if the current row is part of the same group or if the current row is distinct simply by comparing the current row to the previous row. This can be much faster than the alternative of comparing each row to all prior rows.
Since there is no index or type on votes stated and the above logic may be followed thus choosing 'the index that it believes will result in the fastest answer'. With the over-complicated query and no index on votes which is being used as ORDER BY then there is much more for it to figure out than necessary. Since the simple query with ORDER BY executes then the complexity of the query causing SQLite much more to compute than necessary.
Additionally the type of the column, most likely INTEGER, is important when sorting (and joining). Attempting to sort on a character type will not only get you wrong results in this case if votes end up above single digits it would be the wrong type to use (I'm not assuming you are just mentioning it).
So simplify the query, ensure your PRIMARY KEYS are properly set, and test it. If it is still not returning in time try an index on votes. This will give you much better insight into what is going on and how different changes affect your queries.
SQLite Documentation - check all and note 6. Sorting, Grouping and Compound SELECTs
SQLite Documentation - check 10. ORDER BY optimizations
You can do it with NOT EXISTS, without joins and aggregation (assuming that there is always at least 1 row for each movie in the table movie_languages):
SELECT m.*
FROM movies m
WHERE NOT EXISTS (
SELECT 1 FROM movie_languages ml
WHERE m.id = ml.movie_id
AND ml.language_id <> (SELECT l.id FROM languages l WHERE l.lang = 'English')
)
ORDER BY m.votes DESC
or with a LEFT join to languages to get the unmatched rows:
SELECT m.*
FROM movies m
INNER JOIN movie_languages ml ON m.id = ml.movie_id
LEFT JOIN languages l ON l.id = ml.language_id AND l.lang <> 'English'
WHERE l.id IS NULL
ORDER BY m.votes DESC
Refer to this link for more information:
here
In a nutshell, When you include an order by clause, the database builds a list of the rows in the correct order and then returns the data in that order.
The creation of the list mentioned above takes a lot of extra processing, translating into a longer execution time.

Issue with joins in a SQL query

SELECT
c.ConfigurationID AS RealflowID, c.companyname,
c.companyphone, c.ContactEmail, COUNT(k.caseid)
FROM
dbo.Configuration c
INNER JOIN
dbo.cases k ON k.SiteID = c.ConfigurationId
WHERE
EXISTS (SELECT * FROM dbo.RepairEstimates
WHERE caseid = k.caseid)
AND c.AccountStatus = 'Active'
AND c.domainid = 46
GROUP BY
c.configurationid,c.companyname, c.companyphone, c.ContactEmail
I have this query - I am using the configuration table to get the siteid of the cases in the cases table. And if the case exists in the repair estimates table pull the company details listed and get a count of how many cases are in the repair estimator table for that siteid.
I hope that is clear enough of a description.
But the issue here is the count is not correct with the data that is being pulled. Is there something I could do differently? Different join? Remove the exists add another join? I am not sure I have tried many different things.
Realized I was using the wrong table. The query was correct.

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10

Improving performance on SQL query

I'm currently having performance problems with an expensive SQL query, and I'd like to improve it.
This is what the query looks like:
SELECT TOP 50 MovieID
FROM (SELECT [MovieID], COUNT(*) AS c
FROM [tblMovieTags]
WHERE [TagID] IN (SELECT TOP 7 [TagID]
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY Relevance ASC)
GROUP BY [MovieID]
HAVING COUNT(*) > 1) a
INNER JOIN [tblMovies] m ON m.MovieID=a.MovieID
WHERE (Hidden=0) AND m.Active=1 AND m.Processed=1
ORDER BY c DESC, m.IMDB DESC
What I'm trying to find movies that have at least 2 matching tags for MovieID 12345.
Database basic scheme looks like:
Each movie has 4 to 5 tags. I want a list of movies similar to any movie based on the tags. A minimum of 2 tags must match.
This query is causing my server problems as I have hundreds of concurrent users at any given time.
I have already created indexes based on execution plan suggestions, and that has made it quicker, but it's still not enough.
Is there anything I could do to make this faster?
I Like to use temp tables, because they can speed up your queries (if used correctly) and make it easier to read. Try using the query below and see if it speeds it up any. There were a few fields (hidden,imdb) that weren't in your schema, so I left them out.
This query may, or may not, be exactly what you are looking for. The point of it is to show you how to use temp tables to increase the performance and improve readability. Some minor tweaks may be necessary.
SELECT TOP 7 [TagID],[MovieTagID],[MovieID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
SELECT mt.MovieID, COUNT(mt.MovieTagID)
INTO #Movies
FROM #MovieTags mt
INNER JOIN tblMovies m ON m.MovieID=mt.MovieID AND m.Active=1 AND m.Process=1
GROUP BY [MovieID]
HAVING COUNT(mt.MovieTagID) > 1
SELECT TOP 50 * FROM #Movies
DROP TABLE #MovieTags
DROP TABLE #Movies
Edit
Parameterized Queries
You will also want to use parameterized queries, rather than concatenating your values in your SQL string. Check out this short, to the point, blog that explains why you should use parameterized queries. This, combined with the temp table method, should improve your performance significantly.
I want to see if there is some unnecessary processing happening from that query you wrote. Try the following query and let us know if it's faster slower etc And if it's even getting the same data.
I just threw this together so no guarantees on perfect syntax
SELECT TOP 7 [TagID]
INTO #MovieTags
FROM [tblMovieTags]
WHERE [MovieID]=12345
ORDER BY TagID
;cte_movies AS
(
SELECT
mt.MovieID
,mt.TagID
FROM
tblMovieTags mt
INNER JOIN #MovieTags t ON mt.TagId = t.TagId
INNER JOIN tblMovies m ON mt.MovieID = m.MovieID
WHERE
(Hidden=0) AND m.Active=1 AND m.Processed=1
),
cte_movietags AS
(
SELECT
MovieId
,COUNT(MovieId) AS TagCount
FROM
cte_movies
GROUP BY MovieId
)
SELECT
MovieId
FROM
cte_movietags
WHERE
TagCount > 1
ORDER BY
MovieId
GO
DROP TABLE #MovieTags

Select Users with more Items

Each user HAS MANY photos and HAS MANY comments. I would like to order users by SUM(number_of_photos, number_of_comments)
Can you suggest me the SQL query?
GROUP BY with JOINs works more efficiently than dependent subqueries (in all relational DBs I know):
Select * From Users
Left Join Photos On (Photos.user_id = Users.id)
Left Join Comments On (Comments.user_id = Users.id)
Group By UserId
Order By (Count(Photos.id) + Count(Comments.id))
with some assumptions on the tables (e.g. an id primary key in each of them).
Select * From Users U
Order By (Select Count(*) From Photos
Where userId = U.UserId) +
(Select Count(*) From Comments
Where userId = U.UserId)
EDIT: although every query using subqueries can also be done using Joins, which will be faster ,
is not a simple question,
and is irrelevant unless the system is
experiencing performance problems.
1) Both constructions must be translated by the query optimizer into a query plan which includes some type of correlated join, be it a nested loop join, hash-join, merge join, or whatever. And it's entirely possible, (even likely), that they will both result in the same query plan.
NOTE: This is because the entire SQL Statement is translated into a single query plan. The subqueries do NOT get their own, individual query plans as though they were being executed in isolation.
What query plan and what type of joins are used will depend on the data structure and the data in each specific situation. The only way to tell which is faster is to try both, in controlled environments, and measure the performance... but,
2) Unless the system is experiencing an issue with performance, (unacceptable poor performance). clarity is more important. And for problems like the one described above, (where none of the data attributes in the "other" tables are required in the output of the SQL Statement, a Subquery is much clearer in describing the function and purpose of the SQL that a join with Group Bys would be.
I think that the accepted solutions would be problematic from a performance standpoint, assuming you have many users, photos, and comments. Your query runs two separate select statements for every row in the user table.
What you want to do is synthesize a query using ActiveRecord that looks like this:
SELECT user.*, COUNT(c.id) + COUNT(p.id) AS total_count
FROM users u LEFT JOIN photos p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY user.id
ORDER BY total_count DESC
The join will be much, much more efficient. Using left joins insures that even if a user has no comments or photos they will still be included in the results.
If I were to assume that you had a count of comments and a count of photos (user.number_of_photos, user.number_of_comments; as seen above), it would be simple (not stupid):
Select user_id from user order by number_of_photos DESC, number_of_comments DESC
In Ruby On Rails:
User.find(:all, :order => '((SELECT COUNT(*) FROM photos WHERE user_id=users.id) + (SELECT COUNT(*) FROM classifications WHERE user_id=users.id)) DESC')