SQL: Gather right hand values from a join - sql

Let's say a question has many tags, via a join table called taggings. I do a join thus:
SELECT DISTINCT `questions`.id
FROM `questions`
LEFT OUTER JOIN `taggings`
ON `taggings`.taggable_id = `questions`.id
LEFT OUTER JOIN `tags`
ON `tags`.id = `taggings`.tag_id
I want to order the results according to a particular tag name, eg 'piano', so that piano is at the top, then by all the other tags in alphabetical order. Currently i'm using this order clause:
ORDER BY (tags.name = 'piano') desc, tags.name
Which is going completely wrong - the first results i get back aren't even tagged with 'piano' at all. I think my problem is that i need to group the tag names somehow and do my ordering test against that: i think that doing it against the straight tags.name isn't working due to the structure of the resultant join table (it does work if i just do a simple select on the tags table) but i can't get my head around how to fix it.
grateful for any advice, max
EDIT - reply to Marcelo re COALESCE
Thanks a lot Marcelo - i hadn't seen this before. Must read api's more properly.
This does actually help, but only if i select the coalese clause as well. Ie, this:
SELECT DISTINCT `questions`.id
FROM `questions`
LEFT OUTER JOIN `taggings`
ON `taggings`.taggable_id = `questions`.id
LEFT OUTER JOIN `tags`
ON `tags`.id = `taggings`.tag_id
ORDER BY (COALESCE(tags.name,'') = 'piano') desc, tags.name
still gives spurious results. However, this:
SELECT DISTINCT `questions`.id, COALESCE(tags.name,'')
FROM `questions`
LEFT OUTER JOIN `taggings`
ON `taggings`.taggable_id = `questions`.id
LEFT OUTER JOIN `tags`
ON `tags`.id = `taggings`.tag_id
ORDER BY (COALESCE(tags.name,'') = 'piano') desc, tags.name
returns the correct results. I'd like to still just select the question ids though. Definitely getting closer anyway...

Maybe because tags.name = 'piano' evaluates to NULL when tags.name is NULL. Try COALESCE(tags.name, '') = 'piano'.

Related

SQLite GROUP_CONCAT from another table, multiple joins

Having trouble with my sql query. Not an SQL expert by any means.
SELECT
transactions.*,
categories.*,
GROUP_CONCAT(tags.tagName) as concatTags
FROM transactions
INNER JOIN categories
ON transactions.category = categories.categoryId
LEFT JOIN TransactionTagRelation AS ttr
ON transactions.transactionId = ttr.transactionId
LEFT JOIN tags
ON tags.tagId = ttr.tagId;
(There's also a where and group by, but didn't think it was relevant to the question).
I'm trying to get:
transactionId1, ...otherStuff..., "tagId1,tagId2,tagId3"
transactionId2, ...otherStuff..., "tagId1,tagId3"
What I have now seems to merge the tags into one transaction or something. I tried adding a GROUP BY transactionID at the end, but it gives a syntax error for some reason. I have a feeling my joins are incorrect, but I wasn't able to get anything better.
Do something like this:
SELECT t.*, c.*,
(SELECT GROUP_CONCAT(tg.tagName)
FROM TransactionTagRelation ttr JOIN
Tags tg
ON tg.tagId = ttr.tagId
WHERE t.transactionId = ttr.transactionId
) as concatTags
FROM transactions t JOIN
categories c
ON t.category = c.categoryId;
This eliminates the GROUP BY in the outer query and allows you to use t.* and c.* in the SELECT.

PostgreSQL or SQL in general: LEFT OUTER JOIN and WHERE condition behave unexpectedly (for me?)

Let's say I have two tables, which may or may not have a relation. For example, books and tags. So, let's say I want to select books, that don't have tag "Sci-Fi", I would write something like this:
SELECT
*
FROM
books
LEFT OUTER JOIN
tags ON books.id = tags.taggable_id
WHERE
tags.name NOT IN ('Sci-Fi')
I wasn't expecting that this will also exclude books with no tags at all.
I tried this:
WHERE
tags.name IN (NULL, 'Novel'...)
And ended up with this, which I'm pretty sure not the best way to do this:
WHERE
tags.name NOT IN ('Sci-Fi') OR tags.id IS NULL
The question is why and is there another way?
P.S. don't ask about why tags are created this way, it's just for the sake of example and that's the best analogy I managed to squeeze out of myself :)
All conditions in the where clause filter the resulting data. So put the Sci-Fi condition on the join clause
SELECT *
FROM books
LEFT OUTER JOIN tags ON books.id = tags.taggable_id
AND tags.name NOT IN ('Sci-Fi')
WHERE tags.id IS NULL
To get books that can have more than one tag and should not have the Sci-Fi tag at all then you can do
SELECT books.id, books.name
FROM books
LEFT OUTER JOIN tags ON books.id = tags.taggable_id
GROUP BY books.id, books.name
HAVING sum(case when tags.name = 'Sci-Fi' then 1 else 0 end) = 0
If you include a condition on the "outer" table, this effectively turns the outer join into an inner join.
This is because for any row that is not matched in the outer table, all columns from that table are returned as null. So for any non-matched row, tags.name will be null.
Any comparison with null yields null which in this case means "not true" and thus those rows are removed by the where condition - which they would have been as well with an inner join.
You need to put that condition into the join clause, not the where clause.
ON books.id = tags.taggable_id and tags.name NOT IN ('Sci-Fi')
And as you are not interested in books that do not have that tag, you can change the outer join into an inner join:
select *
from books
join tags on books.id = tags.taggable_id
and tags.name NOT IN ('Sci-Fi')

T-SQL Left-Join with 1 row (limi, subselect)

I already read a lot on that topic but I´m unable to get it to work for my case.
I have the following situation:
A list of orderitems (the main datasets I want to get)
Articles which have a 1:1 relation to an order item
A n:m Jointable "Articlesupplier" which creates a relation between an article and a
partner
A Partner table with detailed information about partners.
Target:
One dataset per OrderItem and from the suppliers I only want to get the first one found in the join. No priorization required.
Tables:
Table IDX_ORDERITEM
id,article_id
Table IDX_ARTICLE
id,name
Table IDX_ARTICLESUPPLIER
article_id,partner_id
Table IDX_PARTNER
id,abbr
My actual statement (short version):
SELECT IDX_ORDERITEM.id
FROM
dbo.IDX_ORDERITEM AS IDX_ORDERITEM
-- ARTICLE --
INNER JOIN dbo.IDX_ARTICLE AS IDX_ARTICLE
ON IDX_ORDERITEM.article_id=IDX_ARTICLE.id
-- SUPPLIER VIA ARTICLE --
LEFT JOIN
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
ON IDX_PARTNER_SUPPLIER.id=IDX_ARTICLE.supplier_partner_id
WHERE 1>0
ORDER BY orderitem.id DESC
But it seems I can´t access IDX_ARTICLE.id in the subquery. I get the following error message:
The multi-part identifier "IDX_ARTICLE.id" could not be bound.
Is the problem that the Article alias has the same name as the table name?
Thanks a lot in advance for possible ideas,
Mike
Well, I changed your aliases, and the subquery to which you were joining (I also modified that subquery so it doesn't use implicit joins anymore), though this changes where mostly cosmetics. The actual important change was the use of OUTER APPLY instead of LEFT JOIN:
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
OUTER APPLY
(SELECT TOP(1) P.id, P.abbr
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
WHERE SUP.article_id = A.id
AND P.id = A.supplier_partner_id) AS PS
ORDER BY OI.id DESC
The error is thrown because the below piece of query
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
cannot be considered as a correlated sub-query and IDX_ARTICLE.id is referenced in it in the same manner we reference a field of outer query in a correlated sub-query.
I see two problems.
According to your DDLs there is no IDX_ARTICLE.supplier_partner_id which you refer to in the left join on clause.
Second, I'm quite sure you cannot use IDX_ARTICLE.id in your derived table. Simply add IDX_ARTICLESUPPLIER.article_id to your derived table selected fields and use it in your left join on clause against IDX_ARTICLE.id.
I prefer to avoid nested queries. If I can, I will always rewrite it using CTE.
WITH Part_Sup
AS (
SELECT TOP ( 1 ) P.id
,P.abbr
,SUP.article_id
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
ORDER BY OI.id DESC;
Next I rewritten the first query to use ROW_NUMBER() function instead of using TOP (1) using ROW_NUMBER you can control which results you want and what you don't want.
WITH Part_Sup
AS (
SELECT P.id
,P.abbr
,SUP.article_id
,ROW_NUMBER() OVER ( PARTITION BY P.id, P.abbr ) AS RowNum
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
AND RowNum = 1
ORDER BY OI.id DESC;
Thanks Lamak - you solved it :)
I used your input to extract the basic solution to make it a bit easier to read for others which have the same problem:
Using OUTER APPLY (without ORDER_ITEM Table here):
SELECT IDX_ARTICLE.id AS AR_ID, IDX_PARTNER_SUPPLIER.id, IDX_PARTNER_SUPPLIER.abbr
FROM
dbo.IDX_ARTICLE AS IDX_ARTICLE
OUTER APPLY
(SELECT TOP(1) _PARTNER.id, _PARTNER.abbr
FROM IDX_PARTNER AS _PARTNER
INNER JOIN IDX_ARTICLESUPPLIER AS _ARTICLESUPPLIER
ON _PARTNER.id = _ARTICLESUPPLIER.partner_id
WHERE _ARTICLESUPPLIER.article_id=IDX_ARTICLE.id
AND _ARTICLESUPPLIER.deleted IS NULL) AS IDX_PARTNER_SUPPLIER
WHERE IDX_ARTICLE.id=67

Mysql statement (syntax error on FULL JOIN)

What is wrong with my sql statement, it says that the problem is near the FULL JOIN, but I'm stumped:
SELECT `o`.`name` AS `offername`, `m`.`name` AS `merchantName`
FROM `offer` AS `o`
FULL JOIN `offerorder` AS `of` ON of.offerId = o.id
INNER JOIN `merchant` AS `m` ON o.merchantId = m.id
GROUP BY `of`.`merchantId`
Please be gentle, as I am not a sql fundi
MySQL doesn't offer full join, you can either use
a pair of LEFT+RIGHT and UNION; or
use a triplet of LEFT, RIGHT and INNER and UNION ALL
The query is also very wrong, because you have a GROUP BY but your SELECT columns are not aggregates.
After you convert this properly to LEFT + RIGHT + UNION, you still have the issue of getting an offername and merchantname from any random record per each distinct of.merchantid, and not even necessarily from the same record.
Because you have an INNER JOIN condition against o.merchant, the FULL JOIN is not necessary since "offerorder" records with no match in "offer" will fail the INNER JOIN. That turns it into a LEFT JOIN (optional). Because you are grouping on of.merchantid, any missing offerorder records will be grouped together under "NULL" as merchantid.
This is a query that will work, for each merchantid, it will show just one offer that the merchant made (the one with the first name when sorted in lexicographical order).
SELECT MIN(o.name) AS offername, m.name AS merchantName
FROM offer AS o
LEFT JOIN offerorder AS `of` ON `of`.offerId = o.id
INNER JOIN merchant AS m ON o.merchantId = m.id
GROUP BY `of`.merchantId, m.name
Note: The join o.merchantid = m.id is highly suspect. Did you mean of.merchantid = m.id? If that is the case, change the LEFT to RIGHT join.

Help with SQL Join on two tables

I have two tables, one is a table of forum threads. It has a last post date column.
Another table has PostID, UserId, and DateViewed.
I want to join these tables so I can compare DateViewed and LastPostDate for the current user. However, if they have never viewed the thread, there will not be a row in the 2nd table.
This seems easy but I cant wrap my head around it. Advice please.
Thanks in advance.
What is it that you're trying to do specifically - determine if there are unread posts?
You just need to use an outer join:
SELECT p.PostID, p.LastPostDate, ...,
CASE
WHEN v.DateViewed IS NULL OR v.DateViewed < p.LastPostDate THEN 1
ELSE 0
END AS Unread
FROM Posts p
LEFT JOIN PostViews v
ON v.PostID = p.PostID
AND v.UserID = #UserID
Note that I've placed the UserID test in the JOIN condition; if you put it in the WHERE predicate then you'll get no results because there will be no matching rows in the PostViews table.
So you're thinking something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
LEFT JOIN dbo.Views v ON v.PostID = t.PostID
AND v.UserID = t.UserID
WHERE t.UserID = #user;
v.DateViewed will be NULL if there's no corresponding row in Views.
If you have lots of rows in Views, you may prefer to do something like:
SELECT t.UserID, t.PostID, t.LastPostDate, v.DateViewed
FROM dbo.Threads t
CROSS APPLY (SELECT MAX(vw.DateViewed) as DateViewed
FROM dbo.Views vw
WHERE vw.PostID = t.PostID
AND vw.UserID = t.UserID
) v
WHERE t.UserID = #user;
The key is to use a LEFT JOIN, which will cause non-existent rows on the right side to come up as all NULL:
SELECT threads.lastpostdate, posts.dateviewed
FROM threads
LEFT JOIN posts
ON threads.id=posts.postid