how to select top row from group by with join

how to select top row from group by with join - sql

I have two SQL tables, news and newsSections. I want to to display the top rows from a group by when selecting 4 different types of the news sections. For example:
SELECT TOP (4) a.newsID, a.title, a.clicked, a.path, a.newsDate, c.sectionName, a.sectionID
FROM dbo.News a INNER JOIN
dbo.newsSection c
ON a.sectionID = c.SectionID
WHERE (c.SectionID = 21) OR (c.SectionID = 23) OR (c.SectionID = 36) OR (c.SectionID = 37)
GROUP BY c.sectionName, a.newsID, a.title, a.clicked, a.path, a.newsDate, a.sectionID
ORDER BY a.newsDate DESC

You can use APPLY:
SELECT n.*, ns.sectionName
FROM dbo.newsSection ns CROSS APPLY
(SELECT TOP 1 n.*
FROM dbo.News n
WHERE n.sectionID = ns.sectionID
ORDER BY n.newsDate DESC
) n
WHERE ns.SectionID IN (21, 23, 36, 37);

From your query, you're going to the the 4 most recent articles from ALL of the the news sections pooled together -- meaning, you may get multiple articles from a single section and no articles from another section if there has been more recent activity in some sections vs. others.
I'm guessing what you actually want is the most recent article from EACH of the sections. If so, then the reply by Gordon Linoff would do the trick -- except that he left in the 'ON' clause in the query. (Gordon himself pointed this out.) Should look more like this:
SELECT n.*, ns.sectionName
FROM dbo.newsSection ns CROSS APPLY
(SELECT TOP 1 n.*
FROM dbo.News n
WHERE n.sectionID = ns.sectionID
ORDER BY n.newsDate DESC
) n
WHERE ns.SectionID IN (21, 23, 36, 37);

Related

Only one expression can be specified in the SELECT list where the subquery is not introduced with EXISTS

I have the following code - it's odd because I seem to only have one expression in my SELECT list at the top, but I am still getting an error stating that only one condition can be specified in the SELECT list where the query is not introduced with 'EXISTS'.
I am trying to get the recent games won from the last three games.
Thanks
DECLARE #RecentGamesWon INT
SET #RecentGamesWon = (SELECT COUNT(*)
FROM game g
JOIN inserted ON inserted.HomeTeamID = g.HomeTeamID
WHERE g.HomeTeamID IN (SELECT TOP 3 *
FROM game g
WHERE (g.hometeamid = inserted.HomeTeamID
AND g.HomeScore > g.AwayScore)
OR (g.awayteamid = inserted.HomeTeamID
AND g.AwayScore > g.HomeScore)
ORDER BY g.GameDate));

I am unable to reproduce your problem statement, because i would require the data with proper table structure. Please provide the variable tables with the data what you are using, with you problem statements.
BUT, for a while you can try by correcting your existing query i.e. use TOP 3 HomeTeamID rather than TOP 3 *.
And if problem still persists, then use CTE to have your inner query results and then set the count from CTE to your desired variable.

You can't return two (or multiple) columns in your subquery to do the comparison in the WHERE A_ID IN (subquery) clause - which column is it supposed to compare A_ID to? Your subquery must only return the one column needed for the comparison to the column on the other side of the IN. So the query needs to be of the form:
SELECT * From ThisTable WHERE ThisColumn IN (SELECT ThatColumn FROM ThatTable)
In your query, this is the portion failing -
... WHERE g.HomeTeamID IN (SELECT TOP 3 * FROM game g ...
To fix - Your query should be as follows -
SET #RecentGamesWon = (SELECT COUNT(*)
FROM game g
JOIN inserted ON inserted.HomeTeamID = g.HomeTeamID
WHERE g.HomeTeamID IN (SELECT TOP 3 g.HomeTeamID
FROM game g
WHERE (g.hometeamid = inserted.HomeTeamID
AND g.HomeScore > g.AwayScore)
OR (g.awayteamid = inserted.HomeTeamID
AND g.AwayScore > g.HomeScore)
ORDER BY g.GameDate));

I believe you want:
SELECT #RecentGamesWon COUNT(*)
FROM inserted i CROSS APPLY
(SELECT TOP 3 g.*
FROM game g
WHERE i.HomeTeamID IN (g.hometeamid, g.awayteamid)
ORDER BY g.GameDate DESC
) g
WHERE (g.hometeamid = i.HomeTeamID AND g.HomeScore > g.AwayScore) OR
(g.awayteamid = i.HomeTeamID AND g.AwayScore > g.HomeScore);
This selects the 3 most recent games played by i.HomeTeamID and t hen counts the number of wins.

Compare fields from different rows

First off I am using SQL Server.
I am joining a table on itself like in the example below:
SELECT t.theDate,
s.theDate,
t.bitField,
s.bitField,
t.NAME,
s.NAME
FROM table1 t
INNER JOIN table1 s ON t.NAME = s.NAME
If I take a random row (i.e. X) from the dataset produced.
Can I compare values in any field on row X to values in any field on row X-1 OR row X+1?
Example: I want to compare t.theDate on row 5 to s.theDate on row 4 or s.theDate on row 3.
Sample data looks like:
Desired results:
I want to pull all pairs of rows where the t.bitfield and s.bitfield are opposite and t.theDate and s.theDate are opposite.
From the image the would be row (3 & 4), (5 & 6), (7 & 8) ... etc.
I really appreciate any help!
Can it be done?

Varinant 1: It looks like you would like to use ranking function.
if objcet_id('tempdb..#TmpOrderedTable') is not null drop table #TmpOrderedTable
select *, row_number(order by columnlist, (select 0)) rn
into #TmpOrderedTable
from table1 t
select *
from #TmpOrderedTable t0
inner join #TmpOrderedTable tplus on t0.rn = tplus.rn + 1 -- next one
inner join #TmpOrderedTable tminus on t0.rn = tminus.rn - 1 -- previous one
Varinant 2:
To get scalar values you can use ranking function lag and lead. Or subquery.
Varinant 3:
You can use selfjoin, but you have to specify unique nonarbitary key if you don't want duplicates.
Varinant 4:
You can use apply.
Your question isn't too clear, so i hope it was your goal.

How about this?
WITH ts as (
SELECT t.theDate as theDate1, s.theDate as theDate2,
t.bitField as bitField1, s.bitField as bitField2,
t.NAME -- there is only one name
FROM table1 t INNER JOIN
table1 s
ON t.NAME = s.NAME
)
SELECT ts.*
FROM ts
WHERE EXISTS (SELECT 1
FROM ts ts2
WHERE ts2.name = ts.name AND
ts2.theDate1 = ts.theDate2 AND
ts2.theDate2 = ts.theDate1 AND
ts2.bitField1 = ts.bitField2 AND
ts2.bitField2 = ts.bitField1
);

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.

SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.

Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by

Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE

Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

JOIN on varchar column with subquery

i know that this is not the recommended way to join tables. But it's only relevant for one rarely used report for one person and i don't want to change my datamodel for it.
I have two tables Model and SparePart that are not directly linked with each other via foreign keys.
Model SparePart
idModel idSparePart
ModelName SparePartDescription
Price
In special cases a model is also a sparepart(exchange unit). Then i need the price for this model from the SparePart table via its SparePartDescription column.
For example:
ModelName = C510
SparePartDescription = C510/Exchange Unit/Exch unit/Red
So i try to join both tables to get the price with following SQL:
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
FROM modModel AS m INNER JOIN
tabSparePart AS sp ON m.ModelName =
(SELECT TOP 1 LEFT(sp.SparePartDescription, CHARINDEX('/', sp.SparePartDescription) - 1)order by price desc)
WHERE (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
ORDER BY m.ModelName, sp.SparePartDescription
But i get multiple records for one model:
idModel ModelName Price SparePartDescription
569 C510 70,75 C510/Exchange Unit/Exch unit/Red
569 C510 70,75 C510/Exchange Unit/Latin/Generic/Black
569 C510 70,75 C510/Exchange Unit/Latin/Generic/Silver
433 C702 80,72 C702/Exchange Unit/Latin/Generic/Black
433 C702 NULL C702/Exchange Unit/Latin/Generic/Cyan
433 C702 80,72 C702/Exchange Unit/Orange Global/Black
I only want to select one record if there are multiple spareparts with matching SparePartDescription.

Sql Server 2005 and better introduced the 'APPLY' operator which allows you to join against a subquery... Try this.
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
FROM modModel AS m
CROSS APPLY
(
SELECT TOP 1 * FROM tabSparePart
WHERE m.ModelName =
LEFT(SparePartDescription, LEN(ModelName))
ORDER BY Price DESC
) sp
WHERE (sp.fiSparePartCategory = 6)
ORDER BY m.ModelName, sp.SparePartDescription
It inner joins the 'modModel' table with the subquery 'only the top one matching tabSparePart'.
You can also use OUTER APPLY which will emulate a LEFT JOIN on the subquery. Documentation is here.

Try the ROW_NUMBER function. It guarantees that you'll only get one of each item as defined in the PARTITION BY clause.
SELECT a.idModel, a.ModelName, Price, SparePartDescription
FROM modModel a
LEFT JOIN
(
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
, ROW_NUMBER() OVER (PARTITION BY m.idModel, m.ModelName ORDER BY sp.price DESC) AS r
FROM modModel AS m
INNER JOIN tabSparePart AS sp
ON m.ModelName = LEFT(sp.SparePartDescription, CHARINDEX('/', sp.SparePartDescription) - 1)
WHERE (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
) b
ON a.idModel = b.idModel
AND b.r = 1
ORDER BY ModelName, SparePartDescription

First, your join condition can be simplified some, and then you can use ROW_NUMBER() to specify some kind of order to your results, allowing the first result (per model) to be selected. I also changed it to a LEFT JOIN in case there was no match. If that is not required, it's simple to change back to an INNER JOIN :)
WITH
ranked_results AS
(
SELECT
m.idModel, m.ModelName, sp.Price, sp.SparePartDescription,
ROW_NUMBER() OVER (PARTITION BY m.idModel ORDER BY sp.Price DESC) AS rank
FROM
modModel AS m
LEFT JOIN
tabSparePart AS sp
ON LEFT(sp.SparePartDescription, LEN(m.ModelName)) = m.ModelName
AND (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
)
SELECT
*
FROM
ranked_results
WHERE
rank = 1
ORDER BY
ModelName,
SparePartDescription
#MattMurrell's answer just appeared while I was typing this. One difference here is that the selection criteria is being applied to the whole set, rather than separately in the CROSS APPLY. This may have a performance benefit, you'd have to try and see. CROSS APPLY with inline functions is normally more performant that correlated sub-queries, so I can't predict which is faster.

MS-Access -> SELECT AS + ORDER BY = error

I'm trying to make a query to retrieve the region which got the most sales for sweet products. 'grupo_produto' is the product type, and 'regiao' is the region. So I got this query:
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao ORDER BY total DESC
Then when i run the query, MS-Access asks for the "total" parameter. Why it doesn't consider the newly created 'column' I made in the select clause?
Thanks in advance!

Old Question I know, but it may help someone knowing than while you cant order by aliases, you can order by column index. For example, this will work without error :
SELECT
firstColumn,
IIF(secondColumn = '', thirdColumn, secondColumn) As yourAlias
FROM
yourTable
ORDER BY
2 ASC
The results would then be ordered by the values found in the second column wich is the Alias "yourAlias".

Aliases are only usable in the query output. You can't use them in other parts of the query. Unfortunately, you'll have to copy and paste the entire subquery to make it work.

You can do it like this
select * from(
select a + b as c, * from table)
order by c
Access has some differences compared to Sql Server.

Why it doesn't consider the newly
created 'column' I made in the select
clause?
Because Access (ACE/Jet) is not compliant with the SQL-92 Standard.
Consider this example, which is valid SQL-92:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY x, y;
In fact, x and y the only valid elements in the ORDER BY clause because all others are out of scope (ordinal numbers of columns in the SELECT clause are valid though their use id deprecated).
However, Access chokes on the above syntax. The equivalent Access syntax is this:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY a, c - b;
However, I understand from #Remou's comments that a subquery in the ORDER BY clause is invalid in Access.

Try using a subquery and order the results in an outer query.
SELECT TOP 1 * FROM
(
SELECT
r.nm_regiao,
(SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) T1
ORDER BY total DESC
(Not tested.)

How about:
SELECT TOP 1 r.nm_regiao
FROM (SELECT Dw_Empresa.cod_regiao,
Count(Dw_Empresa.cod_regiao) AS CountOfcod_regiao
FROM Dw_Empresa
WHERE Dw_Empresa.[grupo_produto]='1'
GROUP BY Dw_Empresa.cod_regiao
ORDER BY Count(Dw_Empresa.cod_regiao) DESC) d
INNER JOIN tb_regiao AS r
ON d.cod_regiao = r.cod_regiao

I suggest using an intermediate query.
SELECT r.nm_regiao, d.grupo_produto, COUNT(*) AS total
FROM Dw_Empresa d INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
GROUP BY r.nm_regiao, d.grupo_produto;
If you call that GroupTotalsByRegion, you can then do:
SELECT TOP 1 nm_regiao, total FROM GroupTotalsByRegion
WHERE grupo_produto = '1' ORDER BY total DESC
You may think it's extra work to create the intermediate query (and, in a sense, it is), but you will also find that many of your other queries will be based off of GroupTotalsByRegion. You want to avoid repeating that logic in many other queries. By keeping it in one view, you provide a simplified route to answering many other questions.

How about use:
WITH xx AS
(
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) SELECT * FROM xx ORDER BY total

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to select top row from group by with join - sql

You can use APPLY: SELECT n., ns.sectionName FROM dbo.newsSection ns CROSS APPLY (SELECT TOP 1 n. FROM dbo.News n WHERE n.sectionID = ns.sectionID ORDER BY n.newsDate DESC ) n WHERE ns.SectionID IN (21, 23, 36, 37);

Related

Only one expression can be specified in the SELECT list where the subquery is not introduced with EXISTS

Compare fields from different rows

Limit join to one row

JOIN on varchar column with subquery

MS-Access -> SELECT AS + ORDER BY = error

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to select top row from group by with join - sql

You can use APPLY: SELECT n.*, ns.sectionName FROM dbo.newsSection ns CROSS APPLY (SELECT TOP 1 n.* FROM dbo.News n WHERE n.sectionID = ns.sectionID ORDER BY n.newsDate DESC ) n WHERE ns.SectionID IN (21, 23, 36, 37);

Related

Only one expression can be specified in the SELECT list where the subquery is not introduced with EXISTS

Compare fields from different rows

Limit join to one row

JOIN on varchar column with subquery

MS-Access -> SELECT AS + ORDER BY = error

Categories

Resources

You can use APPLY: SELECT n., ns.sectionName FROM dbo.newsSection ns CROSS APPLY (SELECT TOP 1 n. FROM dbo.News n WHERE n.sectionID = ns.sectionID ORDER BY n.newsDate DESC ) n WHERE ns.SectionID IN (21, 23, 36, 37);