Join to only the "latest" record with t-sql - sql

I've got two tables. Table "B" has a one to many relationship with Table "A", which means that there will be many records in table "B" for one record in table "A".
The records in table "B" are mainly differentiated by a date, I need to produce a resultset that includes the record in table "A" joined with only the latest record in table "B". For illustration purpose, here's a sample schema:
Table A
-------
ID
Table B
-------
ID
TableAID
RowDate
I'm having trouble formulating the query to give me the resultset I'm looking for any help would be greatly appreciated.

SELECT *
FROM tableA A
OUTER APPLY (SELECT TOP 1 *
FROM tableB B
WHERE A.ID = B.TableAID
ORDER BY B.RowDate DESC) as B

select a.*, bm.MaxRowDate
from (
select TableAID, max(RowDate) as MaxRowDate
from TableB
group by TableAID
) bm
inner join TableA a on bm.TableAID = a.ID
If you need more columns from TableB, do this:
select a.*, b.* --use explicit columns rather than * here
from (
select TableAID, max(RowDate) as MaxRowDate
from TableB
group by TableAID
) bm
inner join TableB b on bm.TableAID = b.TableAID
and bm.MaxRowDate = b.RowDate
inner join TableA a on bm.TableAID = a.ID

table B join is optional: it depends if there are other columns you want
SELECT
*
FROM
tableA A
JOIN
tableB B ON A.ID = B.TableAID
JOIN
(
SELECT Max(RowDate) AS MaxRowDate, TableAID
FROM tableB
GROUP BY TableAID
) foo ON B.TableAID = foo.TableAID AND B.RowDate= foo.MaxRowDate

With ABDateMap AS (
SELECT Max(RowDate) AS LastDate, TableAID FROM TableB GROUP BY TableAID
),
LatestBRow As (
SELECT MAX(ID) AS ID, TableAID FROM ABDateMap INNER JOIN TableB ON b.TableAID=a.ID AND b.RowDate = LastDate GROUP BY TableAID
)
SELECT columns
FROM TableA a
INNER JOIN LatestBRow m ON m.TableAID=a.ID
INNER JOIN TableB b on b.ID = m.ID

Just for the clarity's sake and to benefit those who will stumble upon this ancient question. The accepted answer would return duplicate rows if there are duplicate RowDate in Table B. A safer and more efficient way would be to utilize ROW_NUMBER():
Select a.*, b.* -- Use explicit column list rather than * here
From [Table A] a
Inner Join ( -- Use Left Join if the records missing from Table B are still required
Select *,
ROW_NUMBER() OVER (PARTITION BY TableAID ORDER BY RowDate DESC) As _RowNum
From [Table B]
) b
On b.TableAID = a.ID
Where b._RowNum = 1

Try using this:
BEGIN
DECLARE #TB1 AS TABLE (ID INT, NAME VARCHAR(30) )
DECLARE #TB2 AS TABLE (ID INT, ID_TB1 INT, PRICE DECIMAL(18,2))
INSERT INTO #TB1 (ID, NAME) VALUES (1, 'PRODUCT X')
INSERT INTO #TB1 (ID, NAME) VALUES (2, 'PRODUCT Y')
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (1, 1, 3.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (2, 1, 4.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (3, 1, 5.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (1, 2, 0.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (2, 2, 1.99)
INSERT INTO #TB2 (ID, ID_TB1, PRICE) VALUES (3, 2, 2.99)
SELECT A.ID, A.NAME, B.PRICE
FROM #TB1 A
INNER JOIN #TB2 B ON A.ID = B.ID_TB1 AND B.ID = (SELECT MAX(ID) FROM #TB2 WHERE ID_TB1 = A.ID)
END

This will fetch the latest record with JOIN. I think this will help someone
SELECT cmp.*, lr_entry.lr_no FROM
(SELECT * FROM lr_entry ORDER BY id DESC LIMIT 1)
lr_entry JOIN companies as cmp ON cmp.id = lr_entry.company_id

Related

Strategies to reduce redundant data on join

My production query has 10 joins with a select which has large text columns.
The problem is that some searches returns more that 8mb of data.. 7mb from duplicate data because of join. something like the image above but much bigger.
My example of tables structure:
create table A (
Id int primary key identity(1,1),
Value varchar(100)
)
create table B(
Id int primary key identity(1,1),
AId int,
Value varchar(100)
)
create table C(
Id int primary key identity(1,1),
BId int,
Value varchar(100)
)
Inserts:
insert into A values ('value A1')
insert into A values ('value A2')
insert into A values ('value A3')
insert into B values(1, 'value B1')
insert into B values(1, 'value B2')
insert into B values(1, 'value B3')
insert into C values(1, 'value C1')
insert into C values(1, 'value C2')
insert into C values(1, 'value C3')
How can I return only necessary data in a smart way with the same performance of join or better?
Command used to execute the query was:
select * from A
left join B on A.Id = B.AId
left join C on C.BId = B.Id
where A.Id = 1
You can use row_number() to return only the first appearance of the values:
select A.Id,
(case when row_number() over (partition by a.id order by b.id, c.id) = 1
then a.value
end) as a_value,
B.id,
(case when row_number() over (partition by a.id, b.id order by c.id) = 1
then b.value
end) as b_value,
C.id,
C.value
from A left join
B
on A.Id = B.AId left join
C
on C.BId = B.Id
where A.Id = 1
order by a.id, b.id, c.id;

How to use a bunch of clause in a having statement. Oracle

assume i have a query like this:
SELECT table1.id
FROM (
SELECT id, sum(column) as A
FROM table1
GROUP BY id
) a1
Left join (
SELECT id,
sum(column) as B
FROM table 2
GROUP BY Id
) a2
on table1.id=table2.id
.
.
.
.
Left join (
SELECT id, sum(column) as G
FROM table 7
GROUP BY id
) g1
on table1.id=table7.id
Having or where A+B - (C+D+E+F+G) >0
I tried both, none works.
Having return error on there's no group by in the first select and where doesn't return any rows.
First your question have some issues.
I'm going to guess you mean put alias a, b, c, d ....
instead of a1, a2, g1.
Also your left join should be something like a.id = b.id at the moment you create a subquery you have to use the alias instead of tablename.
If you fix that you should add a WHERE, I also guess you mean use the SUM() result
WHERE a.A + b.B - (c.C+ d.D+ e.E+ f.F+ g.G) > 0
.
SELECT a.id
FROM (
SELECT id, sum(column) as sumA
FROM table1
GROUP BY id
) a
Left join
(
SELECT id, sum(column) as sumB
FROM table 2
GROUP BY Id
) b
on a.id = b.id
.
.
.
.
Left join
(
SELECT id, sum(column) as sumG
FROM table 2
GROUP BY id
) g
on f.id = g.id
WHERE a.sumA + b.sumB - (c.sumC + d.sumD + e.sumE + f.sumF + g.sumG) >0
Juan has the right answer. I am just adding a SQLFiddle to help strengthen his answer. Please look at a smaller instance of the same solution here: http://sqlfiddle.com/#!4/81c275/1
Tables
create table table1(id int, col int);
insert into table1 values (1, 10);
insert into table1 values (2, 20);
insert into table1 values (2, 30);
create table table2(id int, col int);
insert into table2 values (1, 5);
insert into table2 values (2, 3);
insert into table2 values (2, 2);
create table table3(id int, col int);
insert into table3 values (1, 100);
insert into table3 values (2, 20);
insert into table3 values (2, 3);
SQL
select a1.id
from (select id, sum(col) as A from table1 group by id) a1
left join (select id, sum(col) as B from table2 group by id) a2
on a1.id = a2.id
left join (select id, sum(col) as C from table3 group by id) a3
on a1.id = a3.id
where A + B - (C) > 0
You can add more tables in the SQLFiddle with whatever values you please, and change the SQL accordingly by appending D, E, F, G etc after C in (C).
The above example will result in output of 2 since ID 2's A+B = 55 and C = 23. A+B-C > 0 for this record and therefore the output will be 2.
I believe that you need to take out the 'where' and move it up if you still need it.
So that it would look something like this,
select table1.id from(
...
...
...)
Having ((A+B)-(C+D+R+F+G)>0)
According to this site:
http://www.w3schools.com/sql/sql_having.asp

Parent Child Query without using SubQuery

Let say I have two tables,
Table A
ID Name
-- ----
1 A
2 B
Table B
AID Date
-- ----
1 1/1/2000
1 1/2/2000
2 1/1/2005
2 1/2/2005
Now I need this result without using sub query,
ID Name Date
-- ---- ----
1 A 1/2/2000
2 B 1/2/2005
I know how to do this using sub query but I want to avoid using sub query for some reason?
If I got your meaning right and you need the latest date from TableB, then the query below should do it:
select a.id,a.name,max(b.date)
from TableA a
join TableB b on b.aid = a.id
group by a.id,a.name
SELECT a.ID, a.Name, MAX(B.Date)
FROM TableA A
INNER JOIN TableB B
ON B.ID = A.ID
GROUP BY A.id, A.name
It's a simple aggregation. Looks like you want the highest date per id/name combo.
create table #t1 (id int, Name varchar(10))
create table #t2 (Aid int, Dt date)
insert #t1 values (1, 'A'), (2, 'B')
insert #t2 values (1, '1/1/2000'), (1, '1/2/2000'), (2, '1/1/2005'), (2, '1/2/2005')
;WITH cte (AId, MDt)
as
(
select Aid, MAX(Dt) from #t2 group by AiD
)
select #t1.Id, #t1.Name, cte.MDt
from #t1
join cte
on cte.AId = #t1.Id

SQL to resequence items by groups

Lets say I have a database that looks like this:
tblA:
ID, Name, Sequence, tblBID
1 a 5 14
2 b 3 15
3 c 3 16
4 d 3 17
tblB:
ID, Group
14 1
15 1
16 2
17 3
I would like to sequence A so that the sequences go 1...n for each group of B.
So in this case, the sequences going down should be 1,2,1,1.
The ordering needs to be consistent with the current ordering, but there are no guarantees as to the current ordering.
I am not exactly a sql master and I am sure there is a fairly easy way to do this, but I really don't know the right route to take. Any hints?
If you are using SQL Server 2005+ or higher, you can use a ranking function:
Select tblA.Id, tblA.Name
, Row_Number() Over ( Partition By tblB.[Group] Order By tblA.Id ) As Sequence
, tblA.tblBID
From tblA
Join tblB
On tblB.tblBID = tblB.ID
Row_Number ranking function.
Here's another solution that would work in SQL Server 2000 and prior.
Select A.Id, A.Name
, (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1 As Sequence
, A.tblBID
From tblA As A
Join tblB As B
On B.Id = A.tblBID
EDIT
Also want to make it clear that I want to actually update tblA to reflect the proper sequences.
In SQL Server, you can use their proprietary From clause in an Update statement like so:
Update tblA
Set Sequence = (
Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID
) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
The Hoyle ANSI solution might be something like:
Update tblA
Set Sequence = (
Select (Select Count(*)
From tblB As B1
Where B1.[Group] = B.[Group]
And B1.Id < B.ID) + 1
From tblA As A
Join tblB As B
On B.Id = A.tblBID
Where A.Id = tblA.Id
)
EDIT
Can we do that [the inner group] comparison based on A.Sequence instead of B.ID?
Select A1.*
, (Select Count(*)
From tblB As B2
Join tblA As A2
On A2.tblBID = B2.Id
Where B2.[Group] = B1.[Group]
And A2.Sequence < A1.Sequence) + 1
From tblA As A1
Join tblB As B1
On B1.Id = A1.tblBID
Because it's SQL 2000, we can't use a windowing function. That's okay.
Thomas's queries are good and will work. However, they will get worse and worse as the number of rows increases—with different characteristics depending on how wide (the number of groups) and how deep (the number of items per group). This is because those queries use a partial cross-join, perhaps we could call it a "pyramidal cross-join" where the crossing part is limited to right side values less than left side values rather than left crossing to all right values.
What to do?
I think you will be surprised to find that the following long and painful-looking script will outperform the pyramidal join at a certain size of data (which may not be all that big) and eventually, with really large data sets must be considered a screaming performer:
CREATE TABLE #tblA (
ID int identity(1,1) NOT NULL,
Name varchar(1) NOT NULL,
Sequence int NOT NULL,
tblBID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #tblA VALUES ('a', 5, 14)
INSERT #tblA VALUES ('b', 3, 15)
INSERT #tblA VALUES ('c', 3, 16)
INSERT #tblA VALUES ('d', 3, 17)
CREATE TABLE #tblB (
ID int NOT NULL PRIMARY KEY CLUSTERED,
GroupID int NOT NULL
)
INSERT #tblB VALUES (14, 1)
INSERT #tblB VALUES (15, 1)
INSERT #tblB VALUES (16, 2)
INSERT #tblB VALUES (17, 3)
CREATE TABLE #seq (
seq int identity(1,1) NOT NULL,
ID int NOT NULL,
GroupID int NOT NULL,
PRIMARY KEY CLUSTERED (ID)
)
INSERT #seq
SELECT
A.ID,
B.GroupID
FROM
#tblA A
INNER JOIN #tblB B ON A.tblBID = b.ID
ORDER BY B.GroupID, A.Sequence
UPDATE A
SET A.Sequence = S.seq - X.MinSeq + 1
FROM
#tblA A
INNER JOIN #seq S ON A.ID = S.ID
INNER JOIN (
SELECT GroupID, MinSeq = Min(seq)
FROM #seq
GROUP BY GroupID
) X ON S.GroupID = X.GroupID
SELECT * FROM #tblA
DROP TABLE #seq
DROP TABLE #tblB
DROP TABLE #tblA
If I understood you correctly, then ORDER BY B.GroupID, A.Sequence is correct. If not, you can switch A.Sequence to B.ID.
Also, my index on the temp table should be experimented with. For a certain quantity of rows, and also the width and depth characteristics of those rows, clustering on one of the other two columns in the #seq table could be helpful.
Last, there is a possible different data organization possible: leaving GroupID out of the #seq table and joining again. I suspect it would be worse, but am not 100% sure.
Something like:
SELECT a.id, a.name, row_number() over (partition by b.group order by a.id)
FROM tblA a
JOIN tblB on a.tblBID = b.ID;

sql - getting the id from a row based on a group by

Table A
tableAID
tableBID
grade
Table B
tableBID
name
description
Table A links to Table b from the tableBID found in both tables.
If I want to find the row in Table A, which has the highest grade, for each row in Table B, I would write my query like this:
select max(grade) from TableA group by tableBID
However, I don't just want the grade, I want the grade plus id of that row.
You could try something like
SELECT a.*
FROM TableA a INNER JOIN
(
SELECT tableBID,
MAX(grade) MaxGrade
FROM TableA
GROUP BY tableBID
) B ON a.tableBID = B.tableBID AND a.grade = B.MaxGrade
Using the Sql Server 2005 ROW_NUMBER function you could also try
SELECT tableAID,
tableBID,
grade
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY tableBID ORDER BY grade DESC) RowNum
FROM #TableA
) A
WHERE a.RowNum = 1
Assuming you will only have one Max() (HUGE ASSUMPTION) you can do min(tableaid) where grade = (select max(grade) etc....)
or if you want all matching ids
SELECT TableAID, Grade
FROM TableA INNER JOIN (SELECT tableBID,
MAX(grade) Grade
FROM TableA
GROUP BY tableBID) MaxGrade
ON TableA.Grade = MaxGrade.Grade
Just use a little self-referencing NOT EXISTS clause:
DECLARE #tableA TABLE (tableAID int IDENTITY(1,1), tableBID int, grade int)
INSERT INTO #tableA(tableBID, grade) VALUES(10, 1)
INSERT INTO #tableA(tableBID, grade) VALUES(10, 3)
INSERT INTO #tableA(tableBID, grade) VALUES(20, 4)
INSERT INTO #tableA(tableBID, grade) VALUES(20, 8)
INSERT INTO #tableA(tableBID, grade) VALUES(30, 10)
INSERT INTO #tableA(tableBID, grade) VALUES(30, 6)
SELECT tableAID, grade
FROM #tableA ta
WHERE NOT EXISTS (SELECT 1 FROM #tableA WHERE tableBID = ta.tableBID AND grade > ta.grade)