SQL picking same entry multiple times - sql

SELECT p.*
FROM StatusUpdates p
JOIN FriendRequests fr
ON (fr.From = p.AuthorId OR fr.To = p.AuthorId)
WHERE fr.To = ".$Id." OR fr.From = ".$Id."
AND fr.Accepted = 1
ORDER BY p.DatePosted DESC
I'm using this SQL code at the moment which somone wrote for me on a different question. I'm using PHP, but that shouldn't make much difference, since the only thing I'm doing with it is concatenating a variable into it.
What it's meant to do is go through all your friends and get all their status posts, and order them. It works fine, but it picks out "$Id"'s posts either not at all, or the amount of friends you have
Eg, if you had 5 friends, it would pick our your posts 5 times. I only want it to do this once. How could I do this?

You need to use LEFT JOIN instead of join if you want to display post id regardless of number of friends. And GROUP BY p.id in order not to display the same posts more than 1 time:
SELECT p.*
FROM StatusUpdates p
LEFT JOIN FriendRequests fr
ON ((fr.From = p.AuthorId OR fr.To = p.AuthorId) AND fr.Accepted = 1)
WHERE p.AuthorId = ".$Id."
GROUP BY p.id
ORDER BY p.DatePosted DESC

Either GROUP BY p.id or SELECT DISTINCT p.id, p.*

Related

sql limit left side of left join

I've got a database on which I'm trying to execute the following query:
select blog.name, blog.createDate,
post.author, post.content,
comments.author, comments.createDate
from blogs
left join post on post.blogId = blog.id
left join comments on comments.postId = post.id
limit 0,10;
The problem is that I want to query all blogs with posts with comments for the first 10 blogs. Now I'm querying the first 10 comments instead of the first 10 blogs. I want the first 10 blogs with all posts containing all comments. How do I achieve this?
You need to limit the number of blogs you are querying
Select b.Name, b.CreateDate, post.author, post.content,
comments.author, comments.createDate
From Comments c
Join Post p on c.PostId = p.Id
Join (Select Id, Name, CreateDate From Blogs limit 0, 10) b on p.BlogId = b.Id
We join the blogs as a subquery that limits to the first 10 (you would add an Order By section there if you want to define how you want to retrieve that first 10), and then join to the other tables.

Order by join column but use distinct on another

I'm building a system in which there are the following tables:
Song
Broadcast
Station
Follow
User
A user follows stations, which have songs on them through broadcasts.
I'm building a "feed" of songs for a user based on the stations they follow.
Here's the query:
SELECT DISTINCT ON ("broadcasts"."created_at", "songs"."id") songs.*
FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
ORDER BY broadcasts.created_at desc
LIMIT 18
Note: shared_id is the same as id.
As you can see I'm getting duplicate results, which I don't want. I found out from a previous question that this was due to selecting distinct on broadcasts.created_at.
My question is: How do I modify this query so it will return only unique songs based on their id but still order by broadcasts.created_at?
Try this solution:
SELECT a.maxcreated, b.*
FROM
(
SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
FROM follows aa
INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
WHERE aa.user_id = 2
GROUP BY bb.song_id
) a
INNER JOIN songs b ON a.song_id = b.id
ORDER BY a.maxcreated DESC
LIMIT 18
The FROM subselect retrieves distinct song_ids that are broadcasted by all stations the user follows; it also gets the latest broadcast date associated with each song. We have to encase this in a subquery because we have to GROUP BY on the columns we're selecting from, and we only want the unique song_id and the maxdate regardless of the station.
We then join that result in the outer query to the songs table to get the song information associated with each unique song_id
You can use Common Table Expressions (CTE) if you want a cleaner query (nested queries make things harder to read)
I would look like this:
WITH a as (
SELECT bb.song_id, MAX(bb.created_at) AS maxcreated
FROM follows aa
INNER JOIN broadcasts bb ON aa.station_id = bb.station_id
INNER JOIN songs cc ON bb.song_id = cc.shared_id
WHERE aa.user_id = 2
GROUP BY bb.song_id
)
SELECT
a.maxcreated,
b.*
FROM a INNER JOIN
songs b ON a.song_id = b.id
ORDER BY
a.maxcreated DESC
LIMIT 18
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.
Try by adding GROUP BY Songs.id
I had a very similar query I was doing between listens, tracks and albums and it took me a long while to figure it out (hours).
If you use a GROUP_BY songs.id, you can get it to work by ordering by MAX(broadcasts.created_at) DESC.
Here's what the full SQL looks like:
SELECT songs.* FROM "songs"
INNER JOIN "broadcasts" ON "songs"."shared_id" = "broadcasts"."song_id"
INNER JOIN "stations" ON "broadcasts"."station_id" = "stations"."id"
INNER JOIN "follows" ON "stations"."id" = "follows"."station_id"
WHERE "follows"."user_id" = 2
GROUP BY songs.id
ORDER BY MAX(broadcasts.created_at) desc
LIMIT 18;

Help with SQL Join on two tables

I have two tables, one is a table of forum threads. It has a last post date column.
Another table has PostID, UserId, and DateViewed.
I want to join these tables so I can compare DateViewed and LastPostDate for the current user. However, if they have never viewed the thread, there will not be a row in the 2nd table.
This seems easy but I cant wrap my head around it. Advice please.
Thanks in advance.
What is it that you're trying to do specifically - determine if there are unread posts?
You just need to use an outer join:
SELECT p.PostID, p.LastPostDate, ...,
CASE
WHEN v.DateViewed IS NULL OR v.DateViewed < p.LastPostDate THEN 1
ELSE 0
END AS Unread
FROM Posts p
LEFT JOIN PostViews v
ON v.PostID = p.PostID
AND v.UserID = #UserID
Note that I've placed the UserID test in the JOIN condition; if you put it in the WHERE predicate then you'll get no results because there will be no matching rows in the PostViews table.
So you're thinking something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
LEFT JOIN dbo.Views v ON v.PostID = t.PostID
AND v.UserID = t.UserID
WHERE t.UserID = #user;
v.DateViewed will be NULL if there's no corresponding row in Views.
If you have lots of rows in Views, you may prefer to do something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
CROSS APPLY (SELECT MAX(vw.DateViewed) as DateViewed
FROM dbo.Views vw
WHERE vw.PostID = t.PostID
AND vw.UserID = t.UserID
) v
WHERE t.UserID = #user;
The key is to use a LEFT JOIN, which will cause non-existent rows on the right side to come up as all NULL:
SELECT threads.lastpostdate, posts.dateviewed
FROM threads
LEFT JOIN posts
ON threads.id=posts.postid

Join two tables where all child records of first table match all child records of second table

I have four tables: Customer, CustomerCategory, Limit, and LimitCategory. A customer can be in multiple categories and a limit can also have multiple categories. I need to write a query that will return the customer name and limit amount where ALL the customers categories match ALL the limit categories.
I'm guessing it would be similar to the answer here, but I can't seem to get it right. Thanks!
Edit - Here's what the tables look like:
tblCustomer
customerId
name
tblCustomerCategory
customerId
categoryId
tblLimit
limitId
limit
tblLimitCategory
limitId
categoryId
I THINK you're looking for:
SELECT *
FROM CustomerCategory
LEFT OUTER JOIN Customer
ON CustomerCategory.CustomerId = Customer.Id
INNER JOIN LimitCategory
ON CustomerCategory.CategoryId = LimitCategory.CategoryId
LEFT OUTER JOIN Limit
ON Limit.Id = LimitCategory.LimitId
Updated!
Thanks to Felix for pointing out a flaw in my existing solution (3 years after I originally posted it, hehe). After looking at it again, I think this might be correct. Here I'm getting (1) the customers and limits with matching categories, plus the number of matching categories, (2) the number of categories per customer, (3) the number of categories per limit, (4) I then ensure the number of categories for customer and limits is the same as the number of the matches between the customers and limits:
UNTESTED!
select
matches.name,
matches.limit
from (
select
c.name,
c.customerId,
l.limit,
l.limitId,
count(*) over(partition by cc.customerId, lc.limitId) as matchCount
from tblCustomer c
join tblCustomerCategory cc on c.customerId = cc.customerId
join tblLimitCategory lc on cc.categoryId = lc.categoryId
join tblLimit l on lc.limitId = l.limitId
) as matches
join (
select
cc.customerId,
count(*) as categoryCount
from tblCustomerCategory cc
group by cc.customerId
) as customerCategories
on matches.customerId = customerCategories.customerId
join (
select
lc.limitId,
count(*) as categoryCount
from tblLimitCategory lc
group by lc.limitId
) as limitCategories
on matches.limitId = limitCategories.limitId
where matches.matchCount = customerCategories.categoryCount
and matches.matchCount = limitCategories.categoryCount
I don't know if this will work or not, just a thought i had and i can't test it, I'm sures theres a nicer way! don't be too harsh :)
SELECT
c.customerId
, l.limitId
FROM
tblCustomer c
CROSS JOIN
tblLimit l
WHERE NOT EXISTS
(
SELECT
lc.limitId
FROM
tblLimitCategory lc
WHERE
lc.limitId = l.id
EXCEPT
SELECT
cc.categoryId
FROM
tblCustomerCategory cc
WHERE
cc.customerId = l.id
)

SQL query performance question (multiple sub-queries)

I have this query:
SELECT p.id, r.status, r.title
FROM page AS p
INNER JOIN page_revision as r ON r.pageId = p.id AND (
r.id = (SELECT MAX(r2.id) from page_revision as r2 WHERE r2.pageId = r.pageId AND r2.status = 'active')
OR r.id = (SELECT MAX(r2.id) from page_revision as r2 WHERE r2.pageId = r.pageId)
)
Which returns each page and the latest active revision for each, unless no active revision is available, in which case it simply returns the latest revision.
Is there any way this can be optimised to improve performance or just general readability? I'm not having any issues right now, but my worry is that when this gets into a production environment (where there could be a lot of pages) it's going to perform badly.
Also, are there any obvious problems I should be aware of? The use of sub-queries always bugs me, but to the best of my knowledge this cant be done without them.
Note:
The reason the conditions are in the JOIN rather than a WHERE clause is that in other queries (where this same logic is used) I'm LEFT JOINing from the "site" table to the "page" table, and If no pages exist I still want the site returned.
Jack
Edit: I'm using MySQL
If "active" is the first in alphabetical order you migt be able to reduce subqueries to:
SELECT p.id, r.status, r.title
FROM page AS p
INNER JOIN page_revision as r ON r.pageId = p.id AND
r.id = (SELECT r2.id
FROM page_revision as r2
WHERE r2.pageId = r.pageId
ORDER BY r2.status, r2.id DESC
LIMIT 1)
Otherwise you can replace ORDER BY line with
ORDER BY CASE r2.status WHEN 'active' THEN 0 ELSE 1 END, r2.id DESC
These all come from my assumptions on SQL Server, your mileage with MySQL may vary.
Maybe a little re-factoring is in order?
If you added a latest_revision_id column onto pages your problem would disappear, hopefully with only a couple of lines added to your page editor.
I know it's not normalized but it would simplify (and greatly speed up) the query, and sometimes you do have to denormalize for performance.
Your problem is a particular case of what is described in this question.
The best you can get using standard ANSI SQL seems to be:
SELECT p.id, r.status, r.title
FROM page AS p
INNER JOIN page_revision as r ON r.pageId = p.id
AND r.id = (SELECT MAX(r2.id) from page_revision as r2 WHERE r2.pageId = r.pageId)
Other approaches are available but dependent on what database you're using. I'm not really sure it can be improved much for MySQL.
In MS SQL 2005+ and Oracle:
SELECT p.id, r.status, r.title
FROM (
SELECT p.*, r,*,
ROW_NUMBER() OVER (PARTITION BY p.pageId ORDER BY CASE WHEN p.status = 'active' THEN 0 ELSE 1 END, r.id DESC) AS rn
FROM page AS p, page_revision r
WHERE r.id = p.pageId
) o
WHERE rn = 1
In MySQL that can become a problem, as subqueries cannot use the INDEX RANGE SCAN as the expression from the outer query is not considered constant.
You'll need to create two indexes and a function that returns the last page revision to use those indexes:
CREATE INDEX ix_revision_page_status_id ON page_revision (page_id, id, status);
CREATE INDEX ix_revision_page_id (page_id, id);
CREATE FUNCTION `fn_get_last_revision`(input_id INT) RETURNS int(11)
BEGIN
DECLARE id INT;
SELECT r_id
INTO id
FROM (
SELECT r.id
FROM page_revisions
FORCE INDEX (ix_revision_page_status_id)
WHERE page_id = input_id
AND status = 'active'
ORDER BY id DESC
LIMIT 1
UNION ALL
SELECT r.id
FROM page_revisions
FORCE INDEX (ix_revision_page_id)
WHERE page_id = input_id
ORDER BY id DESC
LIMIT 1
) o
LIMIT 1;
RETURN id;
END;
SELECT po.id, r.status, r.title
FROM (
SELECT p.*, fn_get_last_revision(p.page_id) AS rev_id
FROM page p
) po, page_revision r
WHERE r.id = po.rev_id;
This will efficiently use index to get the last revision of the page.
P. S. If you will use codes for statuses and use 0 for active, you can get rid of the second index and simplify the query.