Left Join with duplicate keys in the right table - sql

I'm trying merging 2 tables as follow
SELECT * FROM T1
LEFT JOIN T2 ON T1.EMPnum = T2.EMPnum
The above works well but I need the joining to use only the first appearance of the common key EMPnum record in table T2 so that the query returns exactly the same number of rows as T1
Thanks Avi

SQL tables are inherently unordered, so there is no such thing as a "first" key. In most databases, you can do something like this:
with t2 as (
select t2.*, row_number() over (partition by EMPnum order by id) as seqnum
from t2
)
select *
from t1 left join
t2
on t1.EMPnum = t2.EMPnum and t2.seqnum = 1;
Here id is just any column that specifies the ordering. If none exist, you can use EMPnum to get an arbitrary row.

Related

SQL Server query showing most recent distinct data

I am trying to build a SQL query to recover only the most young record of a table (it has a Timestamp column already) where the item by which I want to filter appears several times, as shown in my table example:
.
Basically, I have a table1 with Id, Millis, fkName and Price, and a table2 with Id and Name.
In table1, items can appear several times with the same fkName.
What I need to achieve is building up a single query where I can list the last record for every fkName, so that I can get the most actual price for every item.
What I have tried so far is a query with
SELECT DISTINCT [table1].[Millis], [table2].[Name], [table1].[Price]
FROM [table1]
JOIN [table2] ON [table2].[Id] = [table1].[fkName]
ORDER BY [table2].[Name]
But I don't get the correct listing.
Any advice on this? Thanks in advance,
A simple and portable approach to this greatest-n-per-group problem is to filter with a subquery:
select t1.millis, t2.name, t1.price
from table1 t1
inner join table2 t2 on t2.id = t1.fkName
where t1.millis = (select max(t11.millis) from table1 t11 where t11.fkName = t1.fkName)
order by t1.millis desc
using Common Table Expression:
;with [LastPrice] as (
select [Millis], [Price], ROW_NUMBER() over (Partition by [fkName] order by [Millis] desc) rn
from [table1]
)
SELECT DISTINCT [LastPrice].[Millis],[table2].[Name],[LastPrice].[Price]
FROM [LastPrice]
JOIN [table2] ON [table2].[Id] = [LastPrice].[fkName]
WHERE [LastPrice].rn = 1
ORDER BY [table2].[Name]

SQL - Update last occurrence with select in clause

In Oracle, I would like do an update on a table for the last occurrence based on select in list, something like :
UPDATE table t1
set t1.fieldA = 0
where t1.id in (
select t2.id, max(t2.TIMESTAMP)
from table t2
where t2.id in (1111,2222,33333)
group by t2.id
);
This query does not works, I received an error "too many values".
Any ideas?
Thanks
Your subselect has 2(too many) fields so it cant't be compared with the id.
compare the timestamp with the max timestamp.
UPDATE table t1 set t1.fieldA = 0
where t1.TIMESTAMP = (select max(t2.TIMESTAMP) from table t2 where t2.id = t1.id )
and id in (1111,2222,33333)
I believe oracle supports multiple columns IN, so you could try:
UPDATE table t1
set t1.fieldA = 0
where
(t1.id,t1.timestamp) in ( select t2.id, max(t2.TIMESTAMP) from table t2 where t2.id in (1111,2222,33333) group by t2.id );
Essentially if you're providing a subquery to the right side of the IN that has a result set with two columns, then you have to provide the names of both the columns in brackets on the left side of the IN
Incidentally, I've never liked updating based on a max value if the requirement is that only one row be updated, as two rows with the same max value will both be updated. This kind of query is better:
Update table
Set fieldA=0
Where rowid in(
select rowid from (
select rowid, row_number() over(partition by id order by timestamp desc
) as rown from table
) where rown=1)
If, however, your primary key on the table includes the time stamp as part of the compound, then there is no concern of here being two rows with the same ID/timestamp.. it's just rarely the case!

SQL Tables need to be appended, but not like JOINS

I've a scenario as below.
I've two tables, and a Common column/Key between the two.
I need Table2 data to be just appended to Table1 without repetition like JOINs.
If there are more rows in one table, other table rows can be NULL.
As shown in figure, Result Table has NULLs when there is no corresponding row count from Table2.
I tried using Joins, but I'm getting a result of 45 rows. But I should get 9 rows.
Thanks in advance.
Edit: Added my queries
SELECT DISTINCT
APPT.PRSN_ID
,APPT.SCHEDULEDAPPOINTMENTS
,APPT.OVERDUEAPPOINTMENTS
,Visit.ENCOUNTER_CATEGORY
,Visit.ENCOUNTER_TYPE
,Visit.ENCOUNTER_DATE
,Visit.ENCOUNTER_FOLLOWUP_DATE
FROM
APPOINTMENTS APPT
OUTER APPLY dbo.fn_GetVisitsOfAPerson(PROV.PRSN_ID) AS Visit
/***********************************************************/
--IN THE ABOVE QUERY, THE FUNCTION IS DEFINED AS BELOW
CREATE FUNCTION dbo.fn_GetVisitsOfAPerson(#PrsnID AS bigint)
RETURNS TABLE
AS
RETURN
(
SELECT
VISITS.PVISITS_PRSN_KEY
,DATA.HE_Category_Description AS 'ENCOUNTER_CATEGORY'
,DATA.HE_Type_Description AS 'ENCOUNTER_TYPE'
,VISITS.PVISITS_DATE AS 'ENCOUNTER_DATE'
,VISITS.PVISITS_FOLLOWUP_DATE AS 'ENCOUNTER_FOLLOWUP_DATE'
FROM
[HS_PRSN_HEALTH_VISITS] VISITS INNER JOIN
[HS_HealthEncounter_Table] DATA ON
ENCOUNTER.PVISITS_CATEGORY = DATA.HE_Category_Code AND
ENCOUNTER.PVISITS_TYPE = DATA.HE_Type_Code
WHERE
PVISITS_PRSN_KEY = #PrsnID
AND PVISITS_VOID = 0
)
GO
I cannot read your data tables, but I think you just need to introduce a row number.
It would be something like this:
select . . .
from (select t1.*, row_number() over (partition by key order by ??) as seqnum
from table1 t1
) full join
(select t2.*, row_number() over (partition by key order by ??) as seqnum
from table1 t2
) t2
on t1.key = t2.key and t1.seqnum = t2.seqnum;
Based on you wanting the 9 rows present in the Visit table it looks like you need an OUTER JOIN, something like:
SELECT DISTINCT
APPT.PRSN_ID
,APPT.SCHEDULEDAPPOINTMENTS
,APPT.OVERDUEAPPOINTMENTS
,Visit.ENCOUNTER_CATEGORY
,Visit.ENCOUNTER_TYPE
,Visit.ENCOUNTER_DATE
,Visit.ENCOUNTER_FOLLOWUP_DATE
FROM Visit
LEFT OUTER JOIN
APPOINTMENTS APPT
ON Visit.key = APP.key

SQL - remove duplicates from left join

I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
This will give the results you requested and should have the best performance.
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
Your original query is producing arbitrary values for the two columns (the use of top with no order by). You can get the same effect with this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct with a group by.
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
If you want described result, you need to use INNER JOIN and following query will satisfy your need:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN and same query will gave you desired result.
EDITED
This construction has good performance, and you dont need to use subqueries.

SQL - using a value in a nested select

Hope the title makes some kind of sense - I'd basically like to do a nested select, based on a value in the original select, like so:
SELECT MAX(iteration) AS maxiteration,
(SELECT column
FROM table
WHERE id = 223652
AND iteration = maxiteration)
FROM table
WHERE id = 223652;
I get an ORA-00904 invalid identifier error.
Would really appreciate any advice on how to return this value, thanks!
It looks like this should be rewritten with a where clause:
select iteration,
col
from tbl
where id = 223652
and iteration = (select max(iteration) from tbl where id = 223652);
You can circumvent the problem alltogether by placing the subselect in an INNER JOIN of its own.
SELECT t.iteration
, t.column
FROM table t
INNER JOIN (
SELECT id, MAX(iteration) AS iteration
FROM table
WHERE id = 223652
) tm ON tm.id = t.id AND tm.iteration = t.iteration
Since you're using Oracle, I'd suggest using analytic functions for this:
SELECT * FROM (
SELECT col,
iteration,
row_number() over (partition by id order by iteration desc) rn
FROM tab
WHERE id = 223652
) WHERE rn = 1
do it like this:
with maxiteration as
(
SELECT MAX(iteration) AS maxiteration
FROM table
WHERE id = 223652
)
select
column,
iteration
from
table
where
id = 223652
AND iteration = maxiteration
;
Not 100% sure on Oracle syntax, but isn't it something like:
select iteration, column from table where id = 223652 order by iteration desc limit 1
I would approach this problem in a slightly different way. You're basically looking for the row that has no other iterations greater than it. There are at least 3 ways I can think of to do this:
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
WHERE
T1.id = 223652 AND
NOT EXISTS
(
SELECT *
FROM Table T2
WHERE
T2.id = 223652 AND
T2.iteration > T1.iteration
)
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
LEFT OUTER JOIN Table T2 ON
T2.id = T1.id AND
T2.iteration > T1.iteration
WHERE
T1.id = 223652 AND
T2.id IS NULL
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
INNER JOIN (SELECT id, MAX(iteration) AS maxiteration FROM Table T2 GROUP BY id) SQ ON
SQ.id = T1.id AND
SQ.maxiteration = T1.iteration
WHERE
T1.id = 223652
EDIT: I didn't see the ORA error the first time reading the question and it wasn't tagged as Oracle specific. I think that there may be some differences in the syntax and use of aliases in Oracle, so you may need to tweak some of the above queries.
The Oracle error is telling you that it doesn't know what maxiteration is, because the column alias isn't available yet inside the subquery. You need to refer to it by the table alias and column name instead of the column alias I believe.
You do something like
select maxiteration,column from table a join (select max(iteration) as maxiteration from table where id=1) b using (id) where b.maxiteration=a.iteration;
This could of course return multiple rows for one maxiteration unless your table has a constraint against it.