Last Record of a Join Table (how to optimize)

Last Record of a Join Table (how to optimize) - sql

I have the same "problem" as described in (Last record of Join table): I need to join a "Master Table" with a "History Table" whereas I only want to join the latest (by date) Record of the the history table. So whenever I query a record for the mastertable I also geht the "latest" data of the History Table.
Master Table
ID
FIRSTNAME
LASTNAME
...
History Table
ID
LASTACTION
DATE
This is possible by joining both tables and using a subselect to retrieve the latest history table record as described in the answer given in the link above.
My Quesions are:
How can I solve the problem, that there might be in theory two History Records with the same date?
Is this kind of joining with the subselect really the best solution in terms of performance (and in general)? What do you think (I am NO expert in all this stuff) if I integrate a further attribute in the History table that is named "ISLATESTRECORD" as a boolean Flag that I manage manually (and that has a unique constrained). This attribute will then explicitly mark the latest record and I do not need any subselects as I can directly use this attribute in the where clause of the join.
On the other hand, this makes inserting a new record of course a little bit more complicated: I first have to remove the "ISLATESTRECORD" flag from the latest record, I have to insert the new History Record with the "ISLATESTRECORD" set and commit the transaction.
What do you think is the recommended solution? I do not have any clue about the performance impact of the subselects: I might have millions of "Mastertable" Records" that I have to search for a specific record also using in the search attributes of the joined History table like: "Give me the Mastertable Record with FIRSTNAME XYZ and the LASTACTION (of the History Table) was "changed_name". So this subselect might be called millions of times.
Or is it better work with a subselect to find the latest record, as subselects are very efficient and its better to keep everything normalized?
Thank you very much

I solve your problem with a query on your existing tables, and on your tables with an auto-incrementing identity column added to the history table. By adding an auto-incrementing identity column on your history table, you can get around the unique problem of the dates, and make the query easier.
To solve the problem with your tables (with SQL Server example code):
DECLARE #MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE #HistoryTable table (MasterID int,LastAction char(1),HistoryDate datetime)
INSERT INTO #MasterTable VALUES (1,'AAA','aaa')
INSERT INTO #MasterTable VALUES (2,'BBB','bbb')
INSERT INTO #MasterTable VALUES (3,'CCC','ccc')
INSERT INTO #HistoryTable VALUES (1,'I','1/1/2009')
INSERT INTO #HistoryTable VALUES (1,'U','2/2/2009')
INSERT INTO #HistoryTable VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTable VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTable VALUES (2,'I','5/5/2009')
INSERT INTO #HistoryTable VALUES (3,'I','7/7/2009')
INSERT INTO #HistoryTable VALUES (3,'U','8/8/2009')
SELECT
MasterID,FirstName,LastName,LastAction,HistoryDate
FROM (SELECT
m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,ROW_NUMBER() OVER(PARTITION BY m.MasterID ORDER BY m.MasterID) AS RankValue
FROM #MasterTable m
INNER JOIN (SELECT
MasterID,MAX(HistoryDate) AS MaxDate
FROM #HistoryTable
GROUP BY MasterID
) dt ON m.MasterID=dt.MasterID
INNER JOIN #HistoryTable h ON dt.MasterID=h.MasterID AND dt.MaxDate=h.HistoryDate
) AllRows
WHERE RankValue=1
OUTPUT:
MasterID FirstName LastName LastAction HistoryDate
----------- --------- -------- ---------- -----------
1 AAA aaa U 2009-03-03
2 BBB bbb I 2009-05-05
3 CCC ccc U 2009-08-08
(3 row(s) affected)
To solve the problem with a better, HistoryTable (with SQL Server example code):
it is better because it has an auto-incrementing history id identity column
DECLARE #MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE #HistoryTableNEW table (HistoryID int identity(1,1), MasterID int,LastAction char(1),HistoryDate datetime)
INSERT INTO #MasterTable VALUES (1,'AAA','aaa')
INSERT INTO #MasterTable VALUES (2,'BBB','bbb')
INSERT INTO #MasterTable VALUES (3,'CCC','ccc')
INSERT INTO #HistoryTableNEW VALUES (1,'I','1/1/2009')
INSERT INTO #HistoryTableNEW VALUES (1,'U','2/2/2009')
INSERT INTO #HistoryTableNEW VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTableNEW VALUES (1,'U','3/3/2009') --<<dups
INSERT INTO #HistoryTableNEW VALUES (2,'I','5/5/2009')
INSERT INTO #HistoryTableNEW VALUES (3,'I','7/7/2009')
INSERT INTO #HistoryTableNEW VALUES (3,'U','8/8/2009')
SELECT
m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,h.HistoryID
FROM #MasterTable m
INNER JOIN (SELECT
MasterID,MAX(HistoryID) AS MaxHistoryID
FROM #HistoryTableNEW
GROUP BY MasterID
) dt ON m.MasterID=dt.MasterID
INNER JOIN #HistoryTableNEW h ON dt.MasterID=h.MasterID AND dt.MaxHistoryID=h.HistoryID
OUTPUT:
MasterID FirstName LastName LastAction HistoryDate HistoryID
----------- --------- -------- ---------- ----------------------- ---------
1 AAA aaa U 2009-03-03 00:00:00.000 4
2 BBB bbb I 2009-05-05 00:00:00.000 5
3 CCC ccc U 2009-08-08 00:00:00.000 7
(3 row(s) affected)

If the history table has a Primary Key (and all tables should), you can modify the subselect to extract the record with either the larger (or the smaller) PK value of the multiples that match the date criteria...
Select M.*, H.*
From Master M
Join History H
On H.PK = (Select Max(PK) From History
Where FK = M.PK
And Date = (Select Max(Date) From History
Where FK = M.PK))
As to performance, that can be addressed by adding the appropriate indices to these tables (History.Date, History.FK) but in general, depending on the specific table data distribution patterns, sub queries can adversely affect performance.

Related

Insert into table with Inner Join

I've been trying to execute insert into a table with an inner join with another table. I tried to use inner join as below but it didn't works. I'm not very sure which is more suitable whether to use INNER JOIN or LEFT JOIN
INSERT INTO ticketChangeSet (Comments, createdBy, createdDateTime)
VALUES ('Test', 'system', CURRENT_TIMESTAMP)
INNER JOIN tickets ON ticketChangeSet.ticket_id = tickets.id
WHERE tickets.id BETWEEN '3' AND '5'
Sample data:
tickets table
id comment createdDateTime closeDateTime createdBy
2 NULL 2022-07-05 15:36:20 2022-07-05 16:21:03 system
3 NULL 2022-07-05 15:36:20 2022-07-05 16:21:03 system
4 NULL 2022-07-05 15:36:20 2022-07-05 16:21:03 system
5 NULL 2022-07-05 15:36:20 2022-07-05 16:21:03 system
ticketChangeSet table
id comments createdBy createdDateTime ticket_id
1 Ticket not resolved system 2022-07-05 15:59:01 2
Basically, I want to insert this value ('Ticket not resolved', 'system', '2022-07-05 15:59:01') into the ticketChangeSet table for ticket_id 3 to 5 from ticket table

Just select the rows directly from differIssue (or maybe from Tickets - not certain) and supply your constants as the column values.
insert dbo.differIssue (Comments, createdby, dateTime) -- why the strange casing?
select 'Test', 'system', CURRENT_TIMESTAMP
from dbo.differIssue where Tickets_id between 89 and 100 -- why underscores
;
Notice the statement terminator and the use of schema name - both best practices. I also assumed that the ID column is numeric and removed the string delimiters around those filter values. I left out the join because it did not seem required. Presumably the relationship between differIssue and Tickets is 1:1 so an inner join does nothing useful. But perhaps you need to include rows from Tickets for that range of ID values but which might not exist in differIssue? So try
insert dbo.differIssue (Comments, createdby, dateTime)
select 'Test', 'system', CURRENT_TIMESTAMP
from dbo.Tickets where id between 89 and 100
;
But this all seems highly suspicious. I think there is at least one key column missing from the logic - and perhaps more than one.
Update. Now you've changed the table names, added more columns, and changed the filter. You still use string constants for a numeric column - a bad habit.
insert dbo.ticketChangeSet (...)
select ...
from dbo.Tickets as TKT
where not exists (select * from dbo.ticketChangeSet as CHG where CHG.ticket_id = TKT.id)
;
I leave it to you to fill in the missing bits.

One SQL statement for counting the records in the master table based on matching records in the detail table?

I have the following master table called Master and sample data
ID---------------Date
1 2014-09-07
2 2014-09-07
3 2014-09-08
The following details table called Details
masterId-------------Name
1 John Walsh
1 John Jones
2 John Carney
1 Peter Lewis
3 John Wilson
Now I want to find out the count of Master records (grouped on the Date column) whose corresponding details record with Name having the value "John".
I cannot figure how to write a single SQL statement for this job.
**Please note that join is needed in order to find master records for count. However, such join creates duplicate master records for count. I need to remove such duplicate records from being counted when grouping on the Date column in the Master table.
The correct results should be:
count: grouped on Date column
2 2014-09-07
1 2014-09-08
**
Thanks and regards!

This answer assumes the following
The Name field is always FirstName LastName
You are looking once and only once for the John firstname. The search criteria would be different, pending what you need
SELECT Date, Count(*)
FROM tblmaster
INNER JOIN tbldetails ON tblmaster.ID=tbldetails.masterId
WHERE NAME LIKE 'John%'
GROUP BY Date, tbldetails.masterId
What we're doing here is using a wilcard character in our string search to say "Look for John where any characters of any length follows".
Also, here is a way to create table variables based on what we're working with
DECLARE #tblmaster as table(
ID int,
[date] datetime
)
DECLARE #tbldetails as table(
masterID int,
name varchar(50)
)
INSERT INTO #tblmaster (ID,[date])
VALUES
(1,'2014-09-07'),(2,'2014-09-07'),(3,'2014-09-08')
INSERT INTO #tbldetails(masterID, name) VALUES
(1,'John Walsh'),
(1,'John Jones'),
(2,'John Carney'),
(1,'Peter Lewis'),
(3,'John Wilson')
Based on all comments below, this SQL statement in it's clunky glory should do the trick.
SELECT date,count(t1.ID) FROM #tblmaster mainTable INNER JOIN
(
SELECT ID, COUNT(*) as countOfAll
FROM #tblmaster t1
INNER JOIN #tbldetails t2 ON t1.ID=t2.masterId
WHERE NAME LIKE 'John%'
GROUP BY id)
as t1 on t1.ID = mainTable.id
GROUP BY mainTable.date

Is this what you want?
select date, count(distinct m.id)
from master m join
details d
on d.masterid = m.id
where name like '%John%'
group by date;

Help with SQL Grouping

A partial fragment of my output looks as follows:
CNEP P000000025 1
CNEP P000000029 1
NONMAT P000000029 1
CNEP P000000030 1
CWHCNP P000000030 1
MSN P000000030 1
Each row represents a term that a student is in a particular curriculum. Right now I am grouping the information to make sure that each UserID correlates to a partcular curriculum only once.
Notice how "P000000029" and "P000000030" have multiple entries.
I would like to be able to show only those students who have multiple curriculum types within the system.

Assuming the columnbs are named curriculum and userid (no idea what the third column IS;-), you can get the userids of interest via, e.g.:
select userid
from thetable
group by userid
having count(distinct curriculum) > 1
and other info about the userids so selected via in, joins, and similar operations as usual.

I don't think you are showing any student info in your sample data. But you can still use this to find groups with multiples (SQL Server example code, but query will wrok just about anywhere):
DECLARE #YourTable table (col1 varchar(10), col2 char(10), col3 int)
INSERT INTO #YourTable VALUES ('NEP','P000000025',1)
INSERT INTO #YourTable VALUES ('CNEP','P000000029',1)
INSERT INTO #YourTable VALUES ('NONMAT','P000000029',1)
INSERT INTO #YourTable VALUES ('CNEP','P000000030',1)
INSERT INTO #YourTable VALUES ('CWHCNP','P000000030',1)
INSERT INTO #YourTable VALUES ('MSN','P000000030',1)
SELECT
col1,COUNT(*) AS CountOf
FROM #YourTable
GROUP BY col1
HAVING COUNT(col2)>1
OUTPUT
col1 CountOf
---------- -----------
CNEP 2
(1 row(s) affected)

Help With SQL - Combining Two Rows Into One Row

I have an interesting SQL problem that I need help with.
Here is the sample dataset:
Warehouse DateStamp TimeStamp ItemNumber ID
A 8/1/2009 10001 abc 1
B 8/1/2009 10002 abc 1
A 8/3/2009 12144 qrs 5
C 8/3/2009 12143 qrs 5
D 8/5/2009 6754 xyz 6
B 8/5/2009 6755 xyz 6
This dataset represents inventory transfers between two warehouses. There are two records that represent each transfer, and these two transfer records always have the same ItemNumber, DateStamp, and ID. The TimeStamp values for the two transfer records always have a difference of 1, where the smaller TimeStamp represents the source warehouse record and the larger TimeStamp represents the destination warehouse record.
Using the sample dataset above, here is the query result set that I need:
Warehouse_Source Warehouse_Destination ItemNumber DateStamp
A B abc 8/1/2009
C A qrs 8/3/2009
D B xyz 8/5/2009
I can write code to produce the desired result set, but I was wondering if this record combination was possible through SQL. I am using SQL Server 2005 as my underlying database. I also need to add a WHERE clause to the SQL, so that for example, I could search on Warehouse_Source = A. And no, I can't change the data model ;).
Any advice is greatly appreciated!
Regards,
Mark

SELECT source.Warehouse as Warehouse_Source
, dest.Warehouse as Warehouse_Destination
, source.ItemNumber
, source.DateStamp
FROM table source
JOIN table dest ON source.ID = dest.ID
AND source.ItemNumber = dest.ItemNumber
AND source.DateStamp = dest.DateStamp
AND source.TimeStamp = dest.TimeStamp + 1

Mark,
Here is how you can do this with row_number and PIVOT. With a clustered index or primary key on the columns as I suggest, it will use a straight-line query plan with no Sort operation, thus be particularly efficient.
create table T(
Warehouse char,
DateStamp datetime,
TimeStamp int,
ItemNumber varchar(10),
ID int,
primary key(ItemNumber,DateStamp,ID,TimeStamp)
);
insert into T values ('A','20090801','10001','abc','1');
insert into T values ('B','20090801','10002','abc','1');
insert into T values ('A','20090803','12144','qrs','5');
insert into T values ('C','20090803','12143','qrs','5');
insert into T values ('D','20090805','6754','xyz','6');
insert into T values ('B','20090805','6755','xyz','6');
with Tpaired(Warehouse,DateStamp,TimeStamp,ItemNumber,ID,rk) as (
select
Warehouse,DateStamp,TimeStamp,ItemNumber,ID,
row_number() over (
partition by ItemNumber,DateStamp,ID
order by TimeStamp
)
from T
)
select
max([1]) as Warehouse_Source,
max([2]) as Warehouse_Destination,
ItemNumber,
DateStamp
from Tpaired
pivot (
max(Warehouse) for rk in ([1],[2])
) as P
group by ItemNumber, DateStamp, ID;
go
drop table T;

Updating Uncommitted data to a cell with in an UPDATE statement

I want to convert a table storing in Name-Value pair data to relational form in SQL Server 2008.
Source table
Strings
ID Type String
100 1 John
100 2 Milton
101 1 Johny
101 2 Gaddar
Target required
Customers
ID FirstName LastName
100 John Milton
101 Johny Gaddar
I am following the strategy given below,
Populate the Customer table with ID values in Strings Table
INSERT INTO CUSTOMERS SELECT DISTINCT ID FROM Strings
You get the following
Customers
ID FirstName LastName
100 NULL NULL
101 NULL NULL
Update Customers with the rest of the attributes by joining it to Strings using ID column. This way each record in Customers will have corresponding 2 matching records.
UPDATE Customers
SET FirstName = (CASE WHEN S.Type=1 THEN S.String ELSE FirstName)
LastName = (CASE WHEN S.Type=2 THEN S.String ELSE LastName)
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID
An intermediate state will be llike,
ID FirstName LastName ID Type String
100 John NULL 100 1 John
100 NULL Milton 100 2 Milton
101 Johny NULL 101 1 Johny
101 NULL Gaddar 101 2 Gaddar
But this is not working as expected. Because when assigning the values in the SET clause it is setting only the committed values instead of the uncommitted. Is there anyway to set uncommitted values (with in the processing time of query) in UPDATE statement?
PS: I am not looking for alternate solutions but make my approach work by telling SQL Server to use uncommitted data for UPDATE.

The easiest way to do it would be to split the update into two:
UPDATE Customers
SET FirstName = Strings.String
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID AND Strings.Type = 1
And then:
UPDATE Customers
SET LastName = Strings.String
FROM Customers
INNER JOIN Strings ON Customers.ID=Strings.ID AND Strings.Type = 2
There are probably ways to do it in one query such as a derived table, but unless that's a specific requirement I'd just use this approach.

Have a look at this, it should avoid all the steps you had
DECLARE #Table TABLE(
ID INT,
Type INT,
String VARCHAR(50)
)
INSERT INTO #Table (ID,[Type],String) SELECT 100 ,1 ,'John'
INSERT INTO #Table (ID,[Type],String) SELECT 100 ,2 ,'Milton'
INSERT INTO #Table (ID,[Type],String) SELECT 101 ,1 ,'Johny'
INSERT INTO #Table (ID,[Type],String) SELECT 101 ,2 ,'Gaddar'
SELECT IDs.ID,
tName.String NAME,
tSur.String Surname
FROM (
SELECT DISTINCT ID
FROM #Table
) IDs LEFT JOIN
#Table tName ON IDs.ID = tName.ID AND tName.[Type] = 1 LEFT JOIN
#Table tSur ON IDs.ID = tSur.ID AND tSur.[Type] = 2
OK, i do not think that you will find a solution to what you are looking for. From UPDATE (Transact-SQL) it states
Using UPDATE with the FROM Clause
The results of an UPDATE statement are
undefined if the statement includes a
FROM clause that is not specified in
such a way that only one value is
available for each column occurrence
that is updated, that is if the UPDATE
statement is not deterministic. For
example, in the UPDATE statement in
the following script, both rows in
Table1 meet the qualifications of the
FROM clause in the UPDATE statement;
but it is undefined which row from
Table1 is used to update the row in
Table2.
USE AdventureWorks;
GO
IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL
DROP TABLE dbo.Table1;
GO
IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL
DROP TABLE dbo.Table2;
GO
CREATE TABLE dbo.Table1
(ColA int NOT NULL, ColB decimal(10,3) NOT NULL);
GO
CREATE TABLE dbo.Table2
(ColA int PRIMARY KEY NOT NULL, ColB decimal(10,3) NOT NULL);
GO
INSERT INTO dbo.Table1 VALUES(1, 10.0), (1, 20.0), (1, 0.0);
GO
UPDATE dbo.Table2
SET dbo.Table2.ColB = dbo.Table2.ColB + dbo.Table1.ColB
FROM dbo.Table2
INNER JOIN dbo.Table1
ON (dbo.Table2.ColA = dbo.Table1.ColA);
GO
SELECT ColA, ColB
FROM dbo.Table2;

Astander is correct (I am accepting his answer). The update is not happening because of a read UNCOMMITTED issue but because of the multiple rows returned by the JOIN. I have verified this. UPDATE picks only the first row generated from the multiple records to update the original table. This is the behavior for MSSQL, Sybase and such RDMBMSs but Oracle does not allow this kind of an update an d it throws an error. I have verified this thing for MSSQL.
And again MSSQL does not support updating a cell with UNCOMMITTED data. Don't know the status with other RDBMSs. And I have no idea if anyRDBMS provides with in the query ISOLATION level management.
An alternate solution will be to do it in two steps, Aggregate to unpivot and then insert. This has lesser scans compared to methods given in above answers.
INSERT INTO Customers
SELECT
ID
,MAX(CASE WHEN Type = 1 THEN String ELSE NULL END) AS FirstName
,MAX(CASE WHEN Type = 2 THEN String ELSE NULL END) AS LastName
FROM Strings
GROUP BY ID
Thanks to my friend Roji Thomas for helping me with this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas