MySQL Query, Join, and Myself, or how I always take the hard way through - sql

I'm creating a small forum.
Attempting to run SElECT... JOIN... query too pick up information on the individual posts, plus the last reply (if any). As part of my desire to do everything the hard way, this covers five tables (only columns revelant to this issue are being stated)
commentInfo referenceID | referenceType | authorID | create
postit id | title
postitInfo referencePostitID | create | authorID
user id | username | permission
userInfo referenceUserID | title
So, I run this query SELECT... JOIN... query to get the most recent topics and their last replies.
SELECT DISTINCT
t1.id, t1.title, t2.create, t2.lastEdit, t2.authorID, t3.username,
t4.title AS userTitle, t3.permission, t5.create AS commentCreate,
t5.authorID AS commentAuthor, t6.username AS commentUsername,
t6.permission AS commentPermission
FROM rantPostit AS t1
LEFT JOIN (rantPostitInfo AS t2)
ON ( t1.id = t2.referencePostitID)
LEFT OUTER JOIN (rantUser as t3, rantUserInfo as t4)
ON (t2.authorId = t3.id AND t4.referenceUserId = t2.authorId)
LEFT OUTER JOIN (rantCommentInfo as t5, rantUser as t6)
ON (t5.referenceType = 8 AND t5.referenceID = t1.id AND t6.id = t5.authorID)
ORDER BY t2.create DESC, t5.create DESC
Now, this returns the topic posts. Say I have two of them, it returns both of them fine. Say I have eight replies to the first, it will return 9 entries (one each for the topic + reply, and the individual one with no replies). So, I guess my issue is this: I don't know what to do to limit the number of returns in the final LEFT OUTER JOIN clause to just the most recent, or simply strike the least recent ones out of the window.
(Yes, I realize the ORDER BY... clause is messed up, as it'll first order it by the post create date, then by the comment create date. Yes, I realize I could simplify all my problems by adding two fields into postitInfo, lastCommentCreate and lastCommentCreateID, and have it update each time a reply is made, but... I like the hard way.)
So what am I doing wrong?
Or is this such an inane problem that I should be taken 'round the woodshed and beat with a hammer?

The splits between post and postInfo, and the user and userInfo tables, appear to be doing nothing much here except obfuscate things. To better see solutions, let's boil things down to their essence: a table Posts (with a primary key id, a creation date date, and other fields) and a table Comments (with a primary key id, a foreign key refId referencing Posts, a unique creation date date, and other fields); we want to see all posts, each with its most recent comment if any (the primary keys id of the table rows retrieved, and the other fields, can of course be contextually used in the SELECT to fetch and show more info yet, but that doesn't change the core structure, and simplifying things down to the core structure should help illustrate the solutions). I'm assuming the creation date of a comment is unique, otherwise "latest comment" can be ambiguous (of course, that ambiguity could be arbitrarily truncated in other ways, picking one item of the set of "latest comments" to a given post).
So, here's one approach:
SELECT Posts.id, Comments.id FROM Posts
LEFT OUTER JOIN Comments on (Posts.id = Comments.refId)
WHERE Comments.create IS NULL OR (
Comments.create = (SELECT create FROM Comments
WHERE refID = Posts.id
ORDER BY create DESC
LIMIT 1)
) /* add ORDER BY &c to taste;-) */
the idea: for each post, we want "a null comment" (when there have been no comment to it) or else the comment whose create date is the highest among those referencing the post; here, the inner SELECT takes care of finding that "highest" create date. So, in the same spirit, the inner select might be SELECT MAX(create) FROM Comments WHERE refID = Posts.id which is probably preferable (as shorter and more direct, & maybe faster).

It looks like the last LEFT JOIN is the only one that can return multiple rows. If that's true, you can just use LIMIT 5 to get the last five comments:
ORDER BY t5.create DESC
LIMIT 5
If not, a very simple solution would be to retrieve the comments with a separate query:
SELECT *
FROM rantCommentInfo t5
ON t5.referenceType = 8
AND t5.referenceid = t1.id
LEFT OUTER JOIN rantUser t6
ON t6.id = t5.authorID
ORDER BY CommentCreate
WHERE t5.referenceid = YourT1Id
LIMIT 5
Can't think of a way to do it in one query, without ROW_NUMBER, which MySQL does not support.

Related

SQL Server : join on array of ID's from previous join

I have 2 tables. One has been pruned to show only ID's which meet certain criteria. The second needs to be pruned to show only data that matches the previous "array" of id's. there can be multiple results.
Consider the following:
Query_1_final: Returns the ID's of users whom meet certain criteria:
select
t1.[user_id]
from
[SQLDB].[db].[meeting_parties] as t1
inner join
(select distinct
[user_id]
from
[SQLDB].[db].[meeting_parties]
group by
[user_id]
having
count([user_id]) = 1) as t2 on t1.user_id = t2.user_id
where
[type] = 'organiser'
This works great and returns:
user_id
--------------------
22
1255
9821
and so on...
It produces a single column with the ID's of everyone who is a "Meeting Organizer" and also in the active_meetings table. (note, there are multiple types/roles, this was the best way to grab them all)
Now, I need this data to filter another table, another join. Here is the start of my query
Query_2_PREP: returns 5 columns where the meeting has "started" already.
SELECT
[meeting_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
This works as well
meeting_id | meeting_style | meeting_day ...
---------------------------------------------
23 open M,F,SA
23 discussion TU,TH
23 lead W,F
and so on...
and returns ALL 10,982 meetings that started, but I need it to return only the meetings that are from the distinct 'organiser's ID's from Query_1_final (which should be more like 1200 records or so)
Ideally, I need something "like" this below (but of course it does not work)
Query 2: needs to return all meetings that are from organiser ID's only.
SELECT
[meeting_party_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
AND [meeting_party_id] = "ANY Query_1_final results, especially multiple"
I have tried nesting JOIN and INNER JOIN's but I think there is something fundamental I am missing here about SQL. In PHP I would use an array compare or just run another query... any help would be much appreciated.
Just use IN. Here is the structure of the logic:
with q1 as (
<first query here>
)
SELECT m.*
FROM [SQLDB].[db].[all_meetings] m
WHERE meeting_started = 'TRUE' AND
meeting_party_id IN (SELECT user_id FROM q1);

SQL JOIN returning multiple rows when I only want one row

I am having a slow brain day...
The tables I am joining:
Policy_Office:
PolicyNumber OfficeCode
1 A
2 B
3 C
4 D
5 A
Office_Info:
OfficeCode AgentCode OfficeName
A 123 Acme
A 456 Acme
A 789 Acme
B 111 Ace
B 222 Ace
B 333 Ace
... ... ....
I want to perform a search to return all policies that are affiliated with an office name. For example, if I search for "Acme", I should get two policies: 1 & 5.
My current query looks like this:
SELECT
*
FROM
Policy_Office P
INNER JOIN Office_Info O ON P.OfficeCode = O.OfficeCode
WHERE
O.OfficeName = 'Acme'
But this query returns multiple rows, which I know is because there are multiple matches from the second table.
How do I write the query to only return two rows?
SELECT DISTINCT a.PolicyNumber
FROM Policy_Office a
INNER JOIN Office_Info b
ON a.OfficeCode = b.OfficeCode
WHERE b.officeName = 'Acme'
SQLFiddle Demo
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Simple join returns the Cartesian multiplication of the two sets and you have 2 A in the first table and 3 A in the second table and you probably get 6 results. If you want only the policy number then you should do a distinct on it.
(using MS-Sqlserver)
I know this thread is 10 years old, but I don't like distinct (in my head it means that the engine gathers all possible data, computes every selected row in each record into a hash and adds it to a tree ordered by that hash; I may be wrong, but it seems inefficient).
Instead, I use CTE and the function row_number(). The solution may very well be a much slower approach, but it's pretty, easy to maintain and I like it:
Given is a person and a telephone table tied together with a foreign key (in the telephone table). This construct means that a person can have more numbers, but I only want the first, so that each person only appears one time in the result set (I ought to be able concatenate multiple telephone numbers into one string (pivot, I think), but that's another issue).
; -- don't forget this one!
with telephonenumbers
as
(
select [id]
, [person_id]
, [number]
, row_number() over (partition by [person_id] order by [activestart] desc) as rowno
from [dbo].[telephone]
where ([activeuntil] is null or [activeuntil] > getdate()
)
select p.[id]
,p.[name]
,t.[number]
from [dbo].[person] p
left join telephonenumbers t on t.person_id = p.id
and t.rowno = 1
This does the trick (in fact the last line does), and the syntax is readable and easy to expand. The example is simple but when creating large scripts that joins tables left and right (literally), it is difficult to avoid that the result contains unwanted duplets - and difficult to identify which tables creates them. CTE works great for me.

Performance in SQL sentence containing ORDER BY, LIMIT and COUNT

I've searched the way of improving this dangerous combination of functions in one SQL sentence...
To put you in a context, i have a table with several information about articles (article_id, author, ...) and another one containing the article_id with one tag_id. As an article is able to have several tags, that second table could have 2 rows with the same article_id and different tag_id.
In order to get a list of the 8 articles that have more tags in common with the one that i want (in this case the 1354) I have written the following query:
SELECT articles.article_id, articles.author, count(articles_tags.article_id) as times
FROM articles
INNER JOIN articles_tags ON (articles.article_id=articles_tags.article_id)
WHERE id_tag IN
(SELECT article_id FROM articles_tags WHERE article_id=1354)
AND article_id <> 1354
GROUP BY article_id
ORDER BY times DESC
LIMIT 8
It is EXTREMELY slow... like 90 seconds for half million articles.
By deleting the "order by times" sentence, it works almost instantly, but if i do so, i won't get the most similar articles.
What can i do?
Thanks!!
a query on a sub-select is ALWAYS a time-killer... Also, as the query didn't really appear to be accurate, or missing, I am making an assumption that your articles_tags table has two columns... one for the actual article ID, and another for the tag_ID associated with it.
That said, I would pre-query just the TAG IDs for article 1354 (the on you are interested in). Use that as a Cartesian join to the article tags again on the tag IDs being the same. From that, you are grabbing the SECOND version of article tags alias and getting ITs article ID, and then the count that MATCH (via Join and not a left-join). Apply the group by on the article ID as you had, And for grins, join to the articles table to get the author.
Now, note. Some SQL engines require you to group by all non-aggregate fields, so you MAY have to either add the author to the group by (which will always be the same per article ID anyway), or change it to MAX( A.author ) as Author which would give the same results.
I would have an index on the (tag_id, article_id) so the tags are found from the "common" tags you are looking to find in common. You could have one article with 10 tags, and another article with 10 completely different tags resulting in 0 in common. This will prevent the other article from even appearing in the result set.
You STILL have the time associated with blowing through half-million articles as you described, which could be millions of actual tag entries.
select
AT2.article_id,
A.Author,
count(*) as Times
from
( select ATG.id_tag
from articles_tags ATG
where ATG.Article_ID = 1354
order by ATG.id_tag ) CommonTags
JOIN articles_tags AT2
on CommonTags.ID_Tag = AT2.ID_Tag
AND AT2.Article_ID <> 1354
JOIN articles A
on AT2.Article_ID = A.Article_ID
group by
AT2.article_id
order by
Times DESC
limit 8
It seems that it should be possible to do this without any subqueries, and then a quicker query may result.
Here the article of interest is joined to its tags, and then further to other articles having these tags. Then the number of tags for each article is counted and ordered:
SELECT a2.article_id, a2.author, COUNT(t2.tag_id) AS times
FROM articles a1
INNER JOIN articles_tags t1
ON t1.article_id = a1.article_id -- find tags for staring article
INNER JOIN tags t2
ON t2.tag_id = t1.tag_id -- find other instances of those tags
AND t2.articles_id <> t1.articles_id
INNER JOIN articles a2
ON a2.articles_id = t2.articles_id -- and the articles where they are used
WHERE a1.article_id = 1354
GROUP BY a2.article_id, a2.author -- count common tags by articles
ORDER BY times DESC
LIMIT 8
If you know a lower bound on the number of tags in common (e.g. 3), inserting HAVING times > 2 before ORDER BY times DESC could give a further speed improvement.

Modelling database for a small soccer league

The database is quite simple. Below there is a part of a schema relevant to this question
ROUND (round_id, round_number)
TEAM (team_id, team_name)
MATCH (match_id, match_date, round_id)
OUTCOME (team_id, match_id, score)
I have a problem with query to retrieve data for all matches played. The simple query below gives of course two rows for every match played.
select *
from round r
inner join match m on m.round_id = r.round_id
inner join outcome o on o.match_id = m.match_id
inner join team t on t.team_id = o.team_id
How should I write a query to have the match data in one row?
Or maybe should I redesign the database - drop the OUTCOME table and modify the MATCH table to look like this:
MATCH (match_id, match_date, team_away, team_home, score_away, score_home)?
You can almost generate the suggested change from the original tables using a self join on outcome table:
select o1.team_id team_id_1,
o2.team_id team_id_2,
o1.score score_1,
o2.score score_2,
o1.match_id match_id
from outcome o1
inner join outcome o2 on o1.match_id = o2.match_id and o1.team_id < o2.team_id
Of course, the information for home and away are not possible to generate, so your suggested alternative approach might be better after all. Also, take note of the condition o1.team_id < o2.team_id, which gets rid of the redundant symmetric match data (actually it gets rid of the same outcome row being joined with itself as well, which can be seen as the more important aspect).
In any case, using this select as part of your join, you can generate one row per match.
you fetch 2 rows for every matches played but team_id and team_name are differents :
- one for team home
- one for team away
so your query is good
Using the match table as you describe captures the logic of a game simply and naturally and additionally shows home and away teams which your initial model does not.
You might want to add the round id as a foreign key to round table and perhaps a flag to indicate a match abandoned situation.
drop outcome. it shouldn't be a separate table, because you have exactly one outcome per match.
you may consider how to handle matches that are cancelled - perhaps scores are null?

MySQL Find Related Articles

I'm trying to select a maximum of 10 related articles, where a related article is an article that has 3 or more of the same keywords as the other article.
My table structure is as follows:
articles[id, title, content, time]
tags[id, tag]
articles_tags[article_id, tag_id]
Can I select the related articles id and title all in one query?
Any help is greatly appreciated.
Assuming that title is also unique
SELECT fA.ID, fA.Title
from
Articles bA,
articles_tags bAT,
articles_tags fAT,
Articles fA
where
bA.title = 'some name' AND
bA.id = bAT.Article_Id AND
bAT.Tag_ID = fAT.Tag_ID AND
fAT.Article_ID = fA.ID AND
fA.title != 'some name'
GROUP BY
fA.ID, fA.Title
HAVING
count(*) >= 3
Where to exclude the 'seed' article
Because I don't care exactly WHICH tags I match on, just THAT I match on 3 tags, I only need tag_id and avoid the join to the tags table completely. So now I join the many-to-many table to itself to find the articles which have an overlap.
The problem is that the article will match itself 100% so we need to eliminate that from the results.
You can exclude that record in 3 ways. You can filter it from the table to before joining, you can have it fall out of the join, or you can filter it when you're finished.
If you eliminate it before you begin the join, you're not gaining much of an advantage. You've got thousands or millions of articles and you're only eliminating 1. I also believe this will not be useful based on the best index for the article_tag mapping table.
If you do it as part of the join, the inequality will prevent that clause from being part of the index scan and be applied as a filter after the index scan.
Consider the index on article_tags as (Tag_ID, Article_ID). If I join the index to itself on tag_id = tag_id then I'll immediately define the slice of the index to process by walking the index to each tag_id my 'seed' article has. If I add the clause article_id != article_id, that can't use the index to define the slice to be processed. That means it will be applied as a filter. e.g. Say my first tag is "BLUE". I walk the index to get all the articles which have "BLUE". (by ID of course). Say there are 50 rows. We know that 1 is my seed article and 49 are matches. If I don't include the inequality, I include all 50 records and move on. If I do include the inequality, I then have to check each of the 50 records to see which is my seed and which isn't. The next tag is "Jupiter" and it matches 20,000 articles. Again I have to check each row in that slice of the index to exclude my seed article. After I go through this 2,5,20 times (depends on tags for that seed article), I now have a completely clean set of articles to do the COUNT(*) and HAVING on. If I don't include the inequality as part of my join but instead just filter the SEED ID out after the group by and having then I only do that filer once on a very short list.
#updated to exclude the searched article itself!
Something along these lines
select *
from articles
inner join (
select at2.article_id, COUNT(*) cnt
from articles a
inner join articles_tags at on at.article_id = a.id
# find all matching tags to get the article ids
inner join articles_tags at2 on at2.tag_id = at.tag_id
and at2.article_id != at.article_id
where a.id = 1234 # the base article to find matches for
group by at2.article_id
having count(*) >= 3 # at least 3 matching keywords
) matches on matches.article_id = articles.id
order by matches.cnt desc
limit 10; # up to 10 matches required
If you can write a query to get ids of records that have matches, then you can certainly have that same query return you the titles. If your real question is 'how do I write the query to return the matches?', then please say so and I'll edit this answer with more details along those lines.