Insert rows from TableA that are not in TableB using concatenated keys - sql

I have 2 tables. TableA gets populated by a csv import and typically contains between 10k and 15k rows. TableB has the same structure, and has now grown to about 95k rows. In order to determine rows in TableA that are not in TableB, I need to compare a concatenation of 4 fields in TableA with the same concatenation in TableB.
The code below has been working as TableB has been growing, but is just taking so long that it needs to be cancelled and does not finish.
I strongly believe that the use of concatenated fields as a comparison is causing execution times to grow beyond usability.
Is there a better approach to the problem?
DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
FORMAT='csv',
FIRSTROW=2,
FIELDTERMINATOR=',',
ROWTERMINATOR='\r\n'
)
INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
(
SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName)
FROM billing..whse
)

Simply use NOT EXISTS:
INSERT INTO billing..whse
SELECT * FROM billing..whse_temp S
WHERE NOT EXISTS
(
SELECT NULL
FROM billing..whse w
WHERE w.runtimestamp = s.runtimestamp
AND w.cs_datacenter = s.cs_datacenter
AND w.customer = s.customer
AND w.servername = s.servername
);
The appropriate index for this:
CREATE INDEX idx ON billing..whse (runtimestamp, cs_datacenter, customer, servername);

I'm sure there are ways to do it with a MERGE command, but I've never really used those. I'm sure there's ways with EXISTS across multiple columns, but I personally find it clearer to have the full join condition and then just test for where the join failed. (ie: no row on the right side):
INSERT INTO billing..whse
SELECT S.*
FROM billing..whse_Temp S
LEFT OUTER JOIN billing..whse W
ON S.RunTimeStamp = W.RunTimeStamp
AND S.CS_Datacenter = W.CS_Datacenter
AND S.Customer = W.Customer
AND S.ServerName = W.ServerName
WHERE W.RunTimeStamp IS NULL

Related

Rewrite query without using temp table

I have a query that is using a temp table to insert some data then another select from to extract distinct results. That query by it self was fine but now with entity-framework it is causing all kinds of unexpected errors at the wrong time.
Is there any way I can rewrite the query not to use a temp table? When this is converted into a stored procedure and in entity framework the result set is of type int which throws an error:
Could not find an implementation of the query pattern Select not found.
Here is the query
Drop Table IF EXISTS #Temp
SELECT
a.ReceiverID,
a.AntennaID,
a.AntennaName into #Temp
FROM RFIDReceiverAntenna a
full join Station b ON (a.ReceiverID = b.ReceiverID) and (a.AntennaID = b.AntennaID)
where (a.ReceiverID is NULL or b.ReceiverID is NULL)
and (a.AntennaID IS NULL or b.antennaID is NULL)
select distinct r.ReceiverID, r.ReceiverName, r.receiverdescription
from RFIDReceiver r
inner join #Temp t on r.ReceiverID = t.ReceiverID;
No need for anything fancy, you can just replace the reference to #temp with an inner sub-query containing the query that generates #temp e.g.
select distinct r.ReceiverID, r.ReceiverName, r.receiverdescription
from RFIDReceiver r
inner join (
select
a.ReceiverID,
a.AntennaID,
a.AntennaName
from RFIDReceiverAntenna a
full join Station b ON (a.ReceiverID = b.ReceiverID) and (a.AntennaID = b.AntennaID)
where (a.ReceiverID is NULL or b.ReceiverID is NULL)
and (a.AntennaID IS NULL or b.antennaID is NULL)
) t on r.ReceiverID = t.ReceiverID;
PS: I haven't made any effort to improve the query overall like Gordon has but do consider his suggestions.
First, a full join makes no sense in the first query. You are selecting only columns from the first table, so you need that.
Second, you can use a CTE.
Third, you should be able to get rid of the SELECT DISTINCT by using an EXISTS condition.
I would suggest:
WITH ra AS (
SELECT ra.*
FROM RFIDReceiverAntenna ra
Station s
ON s.ReceiverID = ra.ReceiverID AND
s.AntennaID = ra.AntennaID)
WHERE s.ReceiverID is NULL
)
SELECT r.ReceiverID, r.ReceiverName, r.receiverdescription
FROM RFIDReceiver r
WHERE EXISTS (SELECT 1
FROM ra
WHERE r.ReceiverID = ra.ReceiverID
);
You can use CTE instead of the temp table:
WITH
CTE
AS
(
SELECT
a.ReceiverID,
a.AntennaID,
a.AntennaName
FROM
RFIDReceiverAntenna a
full join Station b
ON (a.ReceiverID = b.ReceiverID)
and (a.AntennaID = b.AntennaID)
where
(a.ReceiverID is NULL or b.ReceiverID is NULL)
and (a.AntennaID IS NULL or b.antennaID is NULL)
)
select distinct
r.ReceiverID, r.ReceiverName, r.receiverdescription
from
RFIDReceiver r
inner join CTE t on r.ReceiverID = t.ReceiverID
;
This query will return the same results as your original query with the temp table, but its performance may be quite different; not necessarily slower, it can be faster. Just something that you should be aware about.

Problems shortening a SQL query

I am trying to make a query that works with a temp table, work without that temp table
I tried doing a join in the subquery without the temp table but I don't get the same results as the query with the temp table.
This is the query with the temp table that works as I want:
create table #results(
RowId id_t,
LastUpdatedAt date_T
)
insert into #results
select H.RowId, H.LastUpdatedAt from MemberCarrierMap M Join MemberCarrierMapHistory H on M.RowId = H.RowId
update MemberCarrierMap
set CreatedAt = (select MIN(LastUpdatedAt) from #results r where r.rowId = MemberCarrierMap.rowId)
Where CreatedAt is null;
and here is the query I tried without the temp table that doesn't work like the above:
update MemberCarrierMap
set CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH join MemberCarrierMap M on MH.RowId = M.RowId where MH.RowId = M.RowId )
Where CreatedAt is null;
I was expecting the 2nd query to work as the first but It is not. Any suggestions on how to achieve what the first query does without the temp table?
This should work:
update M
set M.CreatedAt = (select MIN(MH.LastUpdatedAt) from MemberCarrierMapHistory MH WHERE MH.RowId = M.RowId)
FROM MemberCarrierMap M
Where M.CreatedAt is null;
Your question is more or less a duplicate of this answer. There, you will find multiple solutions. But the ones that implement correlated subqueires are less performant than the one that simply uses an uncorrelated aggregation subquery inside a join.
Applying it to your situation, you will have this:
update m
set m.createdDate = hAgg.maxVal
from memberCarrierMap m
join (
select rowId, max(lastUpdatedAt) as maxVal
from memberCarrierMapHistory
group by rowId
) as hAgg
on m.rowId = hAgg.rowId
where m.createdAt is null;
Basically, it's more performant because it is more expensive to run aggregations and filterings on a row-by-row basis (which is what happens in a correlated subquery) than to just get the aggregations out of the way all at once (joins tend to happen early in processing) and perform the match afterwards.

Change existing sql to left join only on first match

Adding back some original info for historical purposes as I thought simplifying would help but it didn't. We have this stored procedure, in this part it is selecting records from table A (calldetail_reporting_agents) and doing a left join on table B (Intx_Participant). Apparently there are duplicate rows in table B being pulled that we DON'T want. Is there any easy way to change this up to only pick the first match on table B? Or will I need to rewrite the whole thing?
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
LEFT JOIN Intx_Participant ON calldetail_reporting_agents.CallID = Intx_Participant.CallIDKey
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
Simpler version: how to change below to select only first matching row from table B:
SELECT columnA, columnB FROM TableA LEFT JOIN TableB ON someColumn
I changed to this per the first answer and all data seems to look exactly as expected now. Thank you to everyone for the quick and attentive help.
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM Intx_Participant ip
WHERE calldetail_reporting_agents.CallID = ip.CallIDKey
AND calldetail_reporting_agents.RemoteNumber = ip.ConnValue
AND ip.HowEnded = '9'
AND ip.Recorded = '0'
AND ip.Duration > 0
AND ip.Role = '1') Intx_Participant
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
You can try to OUTER APPLY a subquery getting only one matching row.
...
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM intx_Participant ip
WHERE ip.callidkey = calldetail_reporting_agents.callid) intx_participant
WHERE ...
You should add an ORDER BY in the subquery. Otherwise it isn't deterministic which row is taken as the first. Or maybe that's not an issue.

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

Select IN from multiple columns

So I have a table, lets call it MAIN table, it has the following example columns
Name,
Code_1,
Code_2,
Code_3,
Code_4,
Code_5
(in my real example there's 25 Code columns)
I have a set of 300 codes that I inserted into a temporary table, what would be the best way to get the rows from the MAIN table where it matches a code from the temporary table?
Here's what I have so far that works, but it seems extremely inefficient
SELECT * FROM MAIN WHERE (CODE_1 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_2 IN(SELECT CODE FROM TMP_TABLE)
OR CODE_3 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_4 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_5 IN (SELECT CODE FROM TMP_TABLE))
One approach would be to use a correlated subquery:
SELECT *
FROM MAIN m
WHERE EXISTS (
SELECT *
FROM TMP_TABLE t
WHERE t.CODE = m.CODE_1 OR t.CODE = m.CODE_2 OR ...
)
A join would be faster
SELECT * FROM MAIN
inner join TMP_TABLE
on main.code_1 = tmp_table.code
or main.code_2 = tmp_table.code
or main.code_3 = tmp_table.code
or main.code_4 = tmp_table.code
or main.code_5 = tmp_table.code
But as mentioned in the comment, the join could potentially increase the number of rows if in the main table multiple code_## match the join criteria in the tmp_table