Complex join with nested group-by/having clause? - sql

I ultimately need a list of "import" records that include "album"
records which only have one "song" each.
This is what I'm using now:
select i.id, i.created_at
from imports i
where i.id in (
select a.import_id
from albums a inner join songs s on a.id = s.album_id
group by a.id having 1 = count(s.id)
);
The nested select (with the join) is blazing fast, but the external
"in" clause is excruciatingly slow.
I tried to make the entire query a single (no nesting) join but ran
into problems with the group/having clauses. The best I could do was
a list of "import" records with dupes, which is not acceptable.
Is there a more elegant way to compose this query?

How's this?
SELECT i.id,
i.created_at
FROM imports i
INNER JOIN (SELECT a.import_id
FROM albums a
INNER JOIN songs s
ON a.id = s.album_id
GROUP BY a.id
HAVING Count(* ) = 1) AS TEMP
ON i.id = TEMP.import_id;
In most database systems, the JOIN works a lost faster than doing a WHERE ... IN.

SELECT i.id, i.created_at, COUNT(s.album_id)
FROM imports AS i
INNER JOIN albums AS a
ON i.id = a.import_id
INNER JOIN songs AS s
ON a.id = s.album_id
GROUP BY i.id, i.created_at
HAVING COUNT(s.album_id) = 1
(You might not need to include the COUNT in the SELECT list itself. SQL Server doesn't require it, but it's possible that a different RDBMS might.)

Untested:
select
i.id, i.created_at
from
imports i
where
exists (select *
from
albums a
join
songs s on a.id = s.album_id
where
a.import_id = i.id
group by
a.id
having
count(*) = 1)
OR
select
i.id, i.created_at
from
imports i
where
exists (select *
from
albums a
join
songs s on a.id = s.album_id
group by
a.import_id, a.id
having
count(*) = 1 AND a.import_id = i.id)

All three sugested techniques should be faster than your WHERE IN:
Exists with a related subquery (gbn)
Subquery that is inner joined (achinda99)
Inner Joining all three tables (luke)
(All should work, too ..., so +1 for all of them. Please let us know if one of them does not work!)
Which one actually turns out to be the fastest, depends on your data and the execution plan. But an interesting example of different ways for expressing the same thing in SQL.

I tried to make the entire query a
single (no nesting) join but ran into
problems with the group/having
clauses.
You can join subquery using CTE (Common Table Expression) if you are using SQL Server version 2005/2008
As far as I know, CTE is simply an expression that works like a virtual view that works only one a single select statement - So you will be able to do the following.
I usually find using CTE to improve query performance as well.
with AlbumSongs as (
select a.import_id
from albums a inner join songs s on a.id = s.album_id
group by a.id
having 1 = count(s.id)
)
select i.id, i.created_at
from imports i
inner join AlbumSongs A on A.import_id = i.import_id

Related

Is there any alternative way for following query in oracle

select
a.id,
a.name,
b.group,
a.accountno,
(
select ci.cardno
from taccount ac, tcardinfo ci
where ac.accountno = ci.accountno
) as card_no
from tstudent a, tgroup b
where a.id = b.id
And how to select more than one field from (select ci.cardno from taccount ac,tcardinfo ci where ac.accountno = ci.accountno) or any others way
Please note that the is not a relation in two queries (main and subquery). Sub-query value depends on the data of the main query. Main query is set of data by joining multiple table and sub-query is also a set of data by joining multiple table
In essence, you are describing a lateral join. This is available in oracle since version 12.
Your query is rather unclear about from which table each column comes from (I made assumptions, that you might need to review), and you seem to be missing a join condition in the subquery (I added question marks in that spot)... But the idea is:
select
s.id,
s.name,
g.group,
s.accountno,
x.*
from tstudent s
inner join tgroup g on g.id = s.id
outer apply (
select ci.cardno
from taccount ac
inner join tcardinfo ci on ????
where ac.accountno = s.accountno
) x
You can then return more columns to the subquery, and then will show up in the resultset.

SQLite GROUP_CONCAT from another table, multiple joins

Having trouble with my sql query. Not an SQL expert by any means.
SELECT
transactions.*,
categories.*,
GROUP_CONCAT(tags.tagName) as concatTags
FROM transactions
INNER JOIN categories
ON transactions.category = categories.categoryId
LEFT JOIN TransactionTagRelation AS ttr
ON transactions.transactionId = ttr.transactionId
LEFT JOIN tags
ON tags.tagId = ttr.tagId;
(There's also a where and group by, but didn't think it was relevant to the question).
I'm trying to get:
transactionId1, ...otherStuff..., "tagId1,tagId2,tagId3"
transactionId2, ...otherStuff..., "tagId1,tagId3"
What I have now seems to merge the tags into one transaction or something. I tried adding a GROUP BY transactionID at the end, but it gives a syntax error for some reason. I have a feeling my joins are incorrect, but I wasn't able to get anything better.
Do something like this:
SELECT t.*, c.*,
(SELECT GROUP_CONCAT(tg.tagName)
FROM TransactionTagRelation ttr JOIN
Tags tg
ON tg.tagId = ttr.tagId
WHERE t.transactionId = ttr.transactionId
) as concatTags
FROM transactions t JOIN
categories c
ON t.category = c.categoryId;
This eliminates the GROUP BY in the outer query and allows you to use t.* and c.* in the SELECT.

Returning count using multiple subqueries

Forgive my ignorance, but sql-server is not my strong suit. I'm trying to retrieve the count of rows from two mapping tables using subqueries (correlated subqueries?). If I remove one subquery, things work fine, but if I include both subqueries, I'm not getting the expected number of rows back.
SELECT
a.id,
a.name,
al_1.locationid as LocationCount,
af_1.FuncAreaId as FuncAreaCount
FROM
application as a inner join
applicationlocation as al_1 on a.id = al_1.appid inner join
applicationfuncarea as af_1 on a.id = af_1.AppId
WHERE
al_1.locationid in
(
SELECT count(locationid)
FROM applicationlocation as al_2
WHERE al_2.appid = al_1.appid
)
AND
af_1.funcareaid in
(
SELECT count(funcareaid)
FROM applicationfuncarea as af_2
WHERE af_2.appid = af_1.appid
)
If I'm not mistaken about what you want to achieve here, then this can be achieved by using grouping with COUNT and DISTINCT operators:
SELECT
a.id,
a.name,
COUNT(DISTINCT al_1.locationid) as LocationCount,
COUNT(DISTINCT af_1.FuncAreaId) as FuncAreaCount
FROM
application as a inner join
applicationlocation as al_1 on a.id = al_1.appid inner join
applicationfuncarea as af_1 on a.id = af_1.AppId
GROUP BY a.id, a.name
It is hard to tell if result is correct without seeing your data and expected result. If the above does not work, try removing DISTINCT operators and see if you get the results that you need.

T-SQL Left-Join with 1 row (limi, subselect)

I already read a lot on that topic but I´m unable to get it to work for my case.
I have the following situation:
A list of orderitems (the main datasets I want to get)
Articles which have a 1:1 relation to an order item
A n:m Jointable "Articlesupplier" which creates a relation between an article and a
partner
A Partner table with detailed information about partners.
Target:
One dataset per OrderItem and from the suppliers I only want to get the first one found in the join. No priorization required.
Tables:
Table IDX_ORDERITEM
id,article_id
Table IDX_ARTICLE
id,name
Table IDX_ARTICLESUPPLIER
article_id,partner_id
Table IDX_PARTNER
id,abbr
My actual statement (short version):
SELECT IDX_ORDERITEM.id
FROM
dbo.IDX_ORDERITEM AS IDX_ORDERITEM
-- ARTICLE --
INNER JOIN dbo.IDX_ARTICLE AS IDX_ARTICLE
ON IDX_ORDERITEM.article_id=IDX_ARTICLE.id
-- SUPPLIER VIA ARTICLE --
LEFT JOIN
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
ON IDX_PARTNER_SUPPLIER.id=IDX_ARTICLE.supplier_partner_id
WHERE 1>0
ORDER BY orderitem.id DESC
But it seems I can´t access IDX_ARTICLE.id in the subquery. I get the following error message:
The multi-part identifier "IDX_ARTICLE.id" could not be bound.
Is the problem that the Article alias has the same name as the table name?
Thanks a lot in advance for possible ideas,
Mike
Well, I changed your aliases, and the subquery to which you were joining (I also modified that subquery so it doesn't use implicit joins anymore), though this changes where mostly cosmetics. The actual important change was the use of OUTER APPLY instead of LEFT JOIN:
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
OUTER APPLY
(SELECT TOP(1) P.id, P.abbr
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
WHERE SUP.article_id = A.id
AND P.id = A.supplier_partner_id) AS PS
ORDER BY OI.id DESC
The error is thrown because the below piece of query
(SELECT TOP(1) IDX_PARTNER.id, IDX_PARTNER.abbr
FROM IDX_PARTNER, IDX_ARTICLESUPPLIER
WHERE IDX_PARTNER.id = IDX_ARTICLESUPPLIER.partner_id
AND IDX_ARTICLESUPPLIER.article_id=IDX_ARTICLE.id) AS IDX_PARTNER_SUPPLIER
cannot be considered as a correlated sub-query and IDX_ARTICLE.id is referenced in it in the same manner we reference a field of outer query in a correlated sub-query.
I see two problems.
According to your DDLs there is no IDX_ARTICLE.supplier_partner_id which you refer to in the left join on clause.
Second, I'm quite sure you cannot use IDX_ARTICLE.id in your derived table. Simply add IDX_ARTICLESUPPLIER.article_id to your derived table selected fields and use it in your left join on clause against IDX_ARTICLE.id.
I prefer to avoid nested queries. If I can, I will always rewrite it using CTE.
WITH Part_Sup
AS (
SELECT TOP ( 1 ) P.id
,P.abbr
,SUP.article_id
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
ORDER BY OI.id DESC;
Next I rewritten the first query to use ROW_NUMBER() function instead of using TOP (1) using ROW_NUMBER you can control which results you want and what you don't want.
WITH Part_Sup
AS (
SELECT P.id
,P.abbr
,SUP.article_id
,ROW_NUMBER() OVER ( PARTITION BY P.id, P.abbr ) AS RowNum
FROM IDX_PARTNER AS P
INNER JOIN IDX_ARTICLESUPPLIER AS SUP
ON P.id = SUP.partner_id
)
SELECT OI.id
FROM dbo.IDX_ORDERITEM AS OI
INNER JOIN dbo.IDX_ARTICLE AS A
ON OI.article_id = A.id
LEFT OUTER JOIN Part_Sup AS PS
ON PS.article_id = A.Id
AND PS.id = A.supplier_partner_id
AND RowNum = 1
ORDER BY OI.id DESC;
Thanks Lamak - you solved it :)
I used your input to extract the basic solution to make it a bit easier to read for others which have the same problem:
Using OUTER APPLY (without ORDER_ITEM Table here):
SELECT IDX_ARTICLE.id AS AR_ID, IDX_PARTNER_SUPPLIER.id, IDX_PARTNER_SUPPLIER.abbr
FROM
dbo.IDX_ARTICLE AS IDX_ARTICLE
OUTER APPLY
(SELECT TOP(1) _PARTNER.id, _PARTNER.abbr
FROM IDX_PARTNER AS _PARTNER
INNER JOIN IDX_ARTICLESUPPLIER AS _ARTICLESUPPLIER
ON _PARTNER.id = _ARTICLESUPPLIER.partner_id
WHERE _ARTICLESUPPLIER.article_id=IDX_ARTICLE.id
AND _ARTICLESUPPLIER.deleted IS NULL) AS IDX_PARTNER_SUPPLIER
WHERE IDX_ARTICLE.id=67

Unable to Group on MSAccess SQL multiple search query

please can you help me before I go out of my mind. I've spent a while on this now and resorted to asking you helpful wonderful people. I have a search query:
SELECT Groups.GroupID,
Groups.GroupName,
( SELECT Sum(SiteRates.SiteMonthlySalesValue)
FROM SiteRates
WHERE InvoiceSites.SiteID = SiteRates.SiteID
) AS SumOfSiteRates,
( SELECT Count(InvoiceSites.SiteID)
FROM InvoiceSites
WHERE SiteRates.SiteID = InvoiceSites.SiteID
) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID
With the following table relationship
http://m-ls.co.uk/ExtFiles/SQL-Relationship.jpg
Without the GROUP BY entry I can get a list of the entries I want but it drills the results down by SiteID where instead I want to GROUP BY the GroupID. I know this is possible but lack the expertise to complete this.
Any help would be massively appreciated.
I think all you need to do is add groups.Name to the GROUP BY clause, however I would adopt for a slightly different approach and try to avoid the subqueries if possible. Since you have already joined to all the required tables you can just use normal aggregate functions:
SELECT Groups.GroupID,
Groups.GroupName,
SUM(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates,
COUNT(InvoiceSites.SiteID) AS CountOfSites
FROM (InvoiceSites
INNER JOIN (Groups
INNER JOIN SitesAndGroups
ON Groups.GroupID = SitesAndGroups.GroupID
) ON InvoiceSites.SiteID = SitesAndGroups.SiteID)
INNER JOIN SiteRates
ON InvoiceSites.SiteID = SiteRates.SiteID
GROUP BY Groups.GroupID, Groups.GroupName;
I think what you are looking for is something like the following:
SELECT Groups.GroupID, Groups.GroupName, SumResults.SiteID, SumResults.SumOfSiteRates, SumResults.CountOfSites
FROM Groups INNER JOIN
(
SELECT SitesAndGroups.SiteID, Sum(SiteRates.SiteMonthlySalesValue) AS SumOfSiteRates, Count(InvoiceSites.SiteID) AS CountOfSites
FROM SitesAndGroups INNER JOIN (InvoiceSites INNER JOIN SiteRates ON InvoiceSites.SiteID = SiteRates.SiteID) ON SitesAndGroups.SiteID = InvoiceSites.SiteID
GROUP BY SitesAndGroups.SiteID
) AS SumResults ON Groups.SiteID = SumResults.SiteID
This query will group your information based on the SiteID like you want. That query is referenced in the from statement linking to the Groups table to pull the group information that you want.