querying 2 tables with the same spec for the differences - sql

I recently had to solve this problem and find I've needed this info many times in the past so I thought I would post it. Assuming the following table def, how would you write a query to find all differences between the two?
table def:
CREATE TABLE feed_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT feed_tbl_PK PRIMARY KEY (code)
CREATE TABLE data_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT data_tbl_PK PRIMARY KEY (code)
Here is my solution, as a view using three queries joined by unions. The diff_type specified is how the record needs updated: deleted from _data(2), updated in _data(1), or added to _data(0)
CREATE VIEW delta_vw AS (
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 0 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON feed_tbl.code = data_tbl.code
WHERE (data_tbl.code IS NULL)
UNION
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 1 as diff_type
FROM data_tbl RIGHT OUTER JOIN
feed_tbl ON data_tbl.code = feed_tbl.code
where (feed_tbl.name <> data_tbl.name) OR
(data_tbl.status <> feed_tbl.status) OR
(data_tbl.update <> feed_tbl.update)
UNION
SELECT data_tbl.code, data_tbl.name, data_tbl.status, data_tbl.update, 2 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON data_tbl.code = feed_tbl.code
WHERE (feed_tbl.code IS NULL)
)

UNION will remove duplicates, so just UNION the two together, then search for anything with more than one entry. Given "code" as a primary key, you can say:
edit 0: modified to include differences in the PK field itself
edit 1: if you use this in real life, be sure to list the actual column names. Dont use dot-star, since the UNION operation requires result sets to have exactly matching columns. This example would break if you added / removed a column from one of the tables.
select dt.*
from
data_tbl dt
,(
select code
from
(
select * from feed_tbl
union
select * from data_tbl
)
group by code
having count(*) > 1
) diffs --"diffs" will return all differences *except* those in the primary key itself
where diffs.code = dt.code
union --plus the ones that are only in feed, but not in data
select * from feed_tbl ft where not exists(select code from data_tbl dt where dt.code = ft.code)
union --plus the ones that are only in data, but not in feed
select * from data_tbl dt where not exists(select code from feed_tbl ft where ft.code = dt.code)

I would use a minor variation in the second union:
where (ISNULL(feed_tbl.name, 'NONAME') <> ISNULL(data_tbl.name, 'NONAME')) OR
(ISNULL(data_tbl.status, 'NOSTATUS') <> ISNULL(feed_tbl.status, 'NOSTATUS')) OR
(ISNULL(data_tbl.update, '12/31/2039') <> ISNULL(feed_tbl.update, '12/31/2039'))
For reasons I have never understood, NULL does not equal NULL (at least in SQL Server).

You could also use a FULL OUTER JOIN and a CASE ... END statement on the diff_type column along with the aforementioned where clause in querying 2 tables with the same spec for the differences
That would probably achieve the same results, but in one query.

Related

How to get different data from two different tables in SQL query?

I have two table named Soft and Web, table containing multiple data in that which data is different that data I want. For Ex :
In soft table containing 5 data i.e.
Also in Web table containing 5 data i.e.
Now I want output i.e.
I have done query but unfortunately didnt succed, lets see my query i.e.
SELECT DISTINCT soft.GSTNo AS SoftGST
,web.GSTNo AS WebGST
,soft.InvoiceNumber AS SoftInvoice
,web.InvoiceNumber AS WebInvoice
,soft.Rate AS SoftRate
,web.Rate AS WebRate
FROM soft
LEFT OUTER JOIN web ON web.GstNo = soft.GSTNo
AND web.InvoiceNumber = soft.invoicenumber
AND web.rate = soft.rate
Also I apply inner join bt same thing didnt work.
You can achieve this by
;WITH cte_soft AS
(SELECT * FROM soft
EXCEPT
SELECT * FROM web)
,cte_web AS
(SELECT * FROM web
EXCEPT
SELECT * FROM soft)
SELECT *
FROM
(SELECT gst softgst, NULL webgst, invoice softinvoice, NULL webinvoice, rate softrate, NULL webrate
FROM cte_soft
UNION ALL
SELECT NULL, gst, NULL, invoice, NULL , rate
FROM cte_web) tbl
ORDER BY coalesce(softgst, webgst),coalesce(softinvoice,webinvoice)
Fiddle
You can use full join:
SELECT s.gst as softgst, w.gst as webgst,
s.invoice as softinvoice, w.invoice as webinvoice,
s.rate as softrate, w.rate as webrate
FROM soft s FULL JOIN
web w
ON s.gst = w.gst AND s.invoice = w.invoice AND s.rate = w.rate
WHERE s.gst IS NULL OR w.gst IS NULL
ORDER BY COALESCE(s.gst, w.gst), COALESCE(s.invoice, w.invoice);
No subqueries are CTEs are needed. This is really just a slight variant of your query.

Change existing sql to left join only on first match

Adding back some original info for historical purposes as I thought simplifying would help but it didn't. We have this stored procedure, in this part it is selecting records from table A (calldetail_reporting_agents) and doing a left join on table B (Intx_Participant). Apparently there are duplicate rows in table B being pulled that we DON'T want. Is there any easy way to change this up to only pick the first match on table B? Or will I need to rewrite the whole thing?
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
LEFT JOIN Intx_Participant ON calldetail_reporting_agents.CallID = Intx_Participant.CallIDKey
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
Simpler version: how to change below to select only first matching row from table B:
SELECT columnA, columnB FROM TableA LEFT JOIN TableB ON someColumn
I changed to this per the first answer and all data seems to look exactly as expected now. Thank you to everyone for the quick and attentive help.
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM Intx_Participant ip
WHERE calldetail_reporting_agents.CallID = ip.CallIDKey
AND calldetail_reporting_agents.RemoteNumber = ip.ConnValue
AND ip.HowEnded = '9'
AND ip.Recorded = '0'
AND ip.Duration > 0
AND ip.Role = '1') Intx_Participant
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
You can try to OUTER APPLY a subquery getting only one matching row.
...
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM intx_Participant ip
WHERE ip.callidkey = calldetail_reporting_agents.callid) intx_participant
WHERE ...
You should add an ORDER BY in the subquery. Otherwise it isn't deterministic which row is taken as the first. Or maybe that's not an issue.

Check if a combination of fields already exists in the table

My weakest area of SQL are self JOINS, currently struggling with an issue.
I need to find the latest entry in a table, I'm using a WHERE DATEFIELD IN (SELECT MAX(DATEFIELD) FROM TABLE) to do this. I then need to establish if 3 columns from that already exist in the same TABLE.
My latest attempt looks like this -
SELECT * FROM PART_TABLE
WHERE NOT EXISTS
(
SELECT
t1.DATEFIELD
t1.CODE1
t1.CODE2
t1.CODE3
FROM PART_TABLE t1
INNER JOIN PART_TABLE t2 ON t1.UNIQUE = t2.UNQIUE
)
WHERE t1.DATEFIELD IN
(
SELECT MAX(DATEFIELD)
FROM PARTTABLE
)
)
I think part of the issue is that I can't exclude the unique row from t1 when checking in t2 using this method.
Using MSSQL 2014.
The following query will return the latest record from your table and a bit flag whether a duplicate tuple {Code1, Code2, Code3} exists in it under a different identifier:
select top (1) p.*,
case when exists (
select 0 from dbo.Part_Table t where t.Unique != p.Unique
and t.Code1 = p.Code1 and t.Code2 = p.Code2 and t.Code3 = p.Code3
) then 1
else 0 end as [IsDuplicateExists]
from dbo.Part_Table p
order by p.DateField desc;
You can use this example as a template to address your specific needs, which unfortunately aren't immediately apparent from your explanation.

sql inner table substitution

Suppose I have an sql query like the following (I realize this query could be written better, just bear with me):
SELECT aT.NAME
FROM anothertable aT,
( SELECT ts.slot_id,
tgm.trans_id,
tagm.agent_id
FROM slots ts,
transactions tgm,
agents tagm
WHERE ts.slot_id = (12345, 678910)
and ts.slot_id = tagm.slot_id
AND ts.slot_id = tgm.slot_id) INNER
WHERE INNER.trans_id = aT.trans_id
AND INNER.agent_id = aT.trans_id
Now suppose that I need to break up this query into two parts...in the first I'll execute the inner query, do some processing on the results in code, and then pass back a reduced set to the outer part of the query. The question is, is there an easy way to emulate an inner table in sql?
For instance, if the results of the inner query returned 5 rows but my program deems to only need two of those rows, how can I write sql that will do what I am trying to do below? Is there a way, in sql, to declare a table for in memory in query use?
SELECT
at.Name
FROM
anotherTable aT,
(SLOT_ID, TRANS_ID, AGENT_ID
-------------------------
230743, 3270893, 2307203
078490, 230897, 237021) inner
WHERE
inner.trans_id = at.trans_id
AND INNER.agent_id = aT.trans_id
Just use a subquery:
SELECT at.Name
FROM anotherTable aT JOIN
(select 230743 as SLOT_ID, 3270893 as TRANS_ID, 2307203 as AGENT_ID from dual
select 078490, 230897, 237021 from dual
) i
on i.trans_id = at.trans_id AND i.agent_id = aT.trans_id;
Most systems will let you define a TEMP TABLE or TABLE VARIABLE: https://www.simple-talk.com/sql/t-sql-programming/temporary-tables-in-sql-server/
CREATE TABLE #temp (
SLOT_ID INT,
TRANS_ID INT,
AGENT_ID INT
);
INSERT INTO #temp(SLOT_ID, TRANS_ID, AGENT_ID)
(--inner query goes here)
--do your main query, then:
DROP TABLE #temp
IN MS SQL Server (not sure about other systems), you could possibly use a Common Table Expression (CTE): https://technet.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx
WITH inner AS (
--inner query goes here
)
--main select goes here
Personally, since I generally work with MSSQL Server, I use CTE's quite a bit, as they can be created "on the fly", and can be a big help in organizing more complex queries.
The subquery method worked. Since this is Oracle, the syntax turned out to be:
SELECT aT.Name
FROM anotherTable aT,
(select 1907945 as SLOT_ID, 2732985 as TRANS_ID, 40157 as AGENT_ID FROM DUAL
union
select 1907945, 2732985, 40187 FROM DUAL
) inner
WHERE
inner.trans_id = aT.trans_id AND INNER.agent_id = aT.trans_id;

SQL Join With Fallback

Given
CREATE TABLE Addresses
Id INT NOT NULL
Zip NVARCHAR(5) NULL
ZipPlus4 NVARCHAR(9) NULL
CREATE TABLE ZipLookup
Zip NVARCHAR(5) NULL
Code NVARCHAR(10) NULL
CREATE TABLE ZipPlus4Lookup
ZipPlus4 NVARCHAR(9) NULL
Code NVARCHAR(10) NULL
And data like
Addresses
1 | 92123 | 921234444
ZipLookup
92123 | Type A
ZipPlus4Lookup
921234444 | Type B
Is it possible to construct a query such that:
A given row in Addresses is outer joined to ZipPlus4Lookup if there is a match
Addresses.ZipPlus4 = ZipPlus4Lookup.ZipPlus4
Otherwise, the given row in Addresses is outer joined to ZipLookup if there is a match
Addresses.Zip = ZipLookup.Zip
Otherwise neither table is outer joined
In plain English, the Addresses table has a Zip and a ZipPlus4 column and I need to look up a code using the most precise match. If there's a match on Zip+4, use the code from that match. Otherwise, use the code from a Zip match.
I wish I had an attempted query to share, but with this one I don't know where to start.
This basic query will work:
SELECT
A.*,
Code = IsNull(Z4.Code, Z.Code)
FROM
dbo.Addresses A
LEFT JOIN dbo.ZipPlus4Lookup Z4
ON A.ZipPlus4 = Z4.ZipPlus4
LEFT JOIN dbo.ZipLookup Z
ON A.Zip = Z.Zip
AND Z4.ZipPlus4 IS NULL;
Or you could try something like this:
SELECT
A.*,
Z.Code
FROM
dbo.Addresses A
OUTER APPLY (
SELECT TOP 1 Code
FROM (
SELECT 0, Code FROM dbo.ZipPlus4Lookup Z4
WHERE A.ZipPlus4 = Z4.ZipPlus4
UNION ALL
SELECT 1, Code FROM dbo.ZipLookup Z
WHERE A.Zip = Z.Zip
) X (Seq, Code)
ORDER BY X.Seq
) Z;
They may have different performance characteristics. It's worth testing. My guess is the second query is unnecessary but it's still conceptually possible to be better.
See these in action in a SQL Fiddle.