Select IN from multiple columns - sql

So I have a table, lets call it MAIN table, it has the following example columns
Name,
Code_1,
Code_2,
Code_3,
Code_4,
Code_5
(in my real example there's 25 Code columns)
I have a set of 300 codes that I inserted into a temporary table, what would be the best way to get the rows from the MAIN table where it matches a code from the temporary table?
Here's what I have so far that works, but it seems extremely inefficient
SELECT * FROM MAIN WHERE (CODE_1 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_2 IN(SELECT CODE FROM TMP_TABLE)
OR CODE_3 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_4 IN (SELECT CODE FROM TMP_TABLE)
OR CODE_5 IN (SELECT CODE FROM TMP_TABLE))

One approach would be to use a correlated subquery:
SELECT *
FROM MAIN m
WHERE EXISTS (
SELECT *
FROM TMP_TABLE t
WHERE t.CODE = m.CODE_1 OR t.CODE = m.CODE_2 OR ...
)

A join would be faster
SELECT * FROM MAIN
inner join TMP_TABLE
on main.code_1 = tmp_table.code
or main.code_2 = tmp_table.code
or main.code_3 = tmp_table.code
or main.code_4 = tmp_table.code
or main.code_5 = tmp_table.code
But as mentioned in the comment, the join could potentially increase the number of rows if in the main table multiple code_## match the join criteria in the tmp_table

Related

Insert rows from TableA that are not in TableB using concatenated keys

I have 2 tables. TableA gets populated by a csv import and typically contains between 10k and 15k rows. TableB has the same structure, and has now grown to about 95k rows. In order to determine rows in TableA that are not in TableB, I need to compare a concatenation of 4 fields in TableA with the same concatenation in TableB.
The code below has been working as TableB has been growing, but is just taking so long that it needs to be cancelled and does not finish.
I strongly believe that the use of concatenated fields as a comparison is causing execution times to grow beyond usability.
Is there a better approach to the problem?
DELETE FROM billing..whse_Temp
BULK INSERT billing..whse_Temp
FROM '/mnt/ABC/ABC.csv'
WITH
(
FORMAT='csv',
FIRSTROW=2,
FIELDTERMINATOR=',',
ROWTERMINATOR='\r\n'
)
INSERT INTO billing..whse
SELECT * FROM billing..whse_Temp S
WHERE CONCAT(S.RunTimeStamp, S.CS_Datacenter,S.Customer, S.ServerName) NOT IN
(
SELECT CONCAT(RunTimeStamp, CS_Datacenter, Customer, ServerName)
FROM billing..whse
)
Simply use NOT EXISTS:
INSERT INTO billing..whse
SELECT * FROM billing..whse_temp S
WHERE NOT EXISTS
(
SELECT NULL
FROM billing..whse w
WHERE w.runtimestamp = s.runtimestamp
AND w.cs_datacenter = s.cs_datacenter
AND w.customer = s.customer
AND w.servername = s.servername
);
The appropriate index for this:
CREATE INDEX idx ON billing..whse (runtimestamp, cs_datacenter, customer, servername);
I'm sure there are ways to do it with a MERGE command, but I've never really used those. I'm sure there's ways with EXISTS across multiple columns, but I personally find it clearer to have the full join condition and then just test for where the join failed. (ie: no row on the right side):
INSERT INTO billing..whse
SELECT S.*
FROM billing..whse_Temp S
LEFT OUTER JOIN billing..whse W
ON S.RunTimeStamp = W.RunTimeStamp
AND S.CS_Datacenter = W.CS_Datacenter
AND S.Customer = W.Customer
AND S.ServerName = W.ServerName
WHERE W.RunTimeStamp IS NULL

Change existing sql to left join only on first match

Adding back some original info for historical purposes as I thought simplifying would help but it didn't. We have this stored procedure, in this part it is selecting records from table A (calldetail_reporting_agents) and doing a left join on table B (Intx_Participant). Apparently there are duplicate rows in table B being pulled that we DON'T want. Is there any easy way to change this up to only pick the first match on table B? Or will I need to rewrite the whole thing?
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
LEFT JOIN Intx_Participant ON calldetail_reporting_agents.CallID = Intx_Participant.CallIDKey
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
Simpler version: how to change below to select only first matching row from table B:
SELECT columnA, columnB FROM TableA LEFT JOIN TableB ON someColumn
I changed to this per the first answer and all data seems to look exactly as expected now. Thank you to everyone for the quick and attentive help.
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM Intx_Participant ip
WHERE calldetail_reporting_agents.CallID = ip.CallIDKey
AND calldetail_reporting_agents.RemoteNumber = ip.ConnValue
AND ip.HowEnded = '9'
AND ip.Recorded = '0'
AND ip.Duration > 0
AND ip.Role = '1') Intx_Participant
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
You can try to OUTER APPLY a subquery getting only one matching row.
...
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM intx_Participant ip
WHERE ip.callidkey = calldetail_reporting_agents.callid) intx_participant
WHERE ...
You should add an ORDER BY in the subquery. Otherwise it isn't deterministic which row is taken as the first. Or maybe that's not an issue.

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

How to join 100 random rows from table 1 multiple other tables in oracle

I have scrapped my previous question as I did not do a good job explaining. Maybe this will be simpler.
I have the following query.
Select * from comp_eval_hdr, comp_eval_pi_xref, core_pi, comp_eval_dtl
where comp_eval_hdr.START_DATE between TO_DATE('01-JAN-16' , 'DD-MON-YY')
and TO_DATE('12-DEC-17' , 'DD-MON-YY')
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_dtl.COMP_EVAL_ID
and comp_eval_hdr.COMP_EVAL_ID = comp_eval_pi_xref.COMP_EVAL_ID
and core_pi.PI_ID = comp_eval_pi_xref.PI_ID
and core_pi.PROGRAM_CODE = 'PS'
Now if I only want a random 100 rows from the comp_eval_hdr table to join with the other tables how would I go about it? If it makes it easier you can disregard the comp_eval_dtl table.
I think you are pretty much there. You just need subqueries, table aliases, and JOIN conditions:
SELECT . . .
FROM (SELECT a.*
FROM (SELECT a.*
FROM a
WHERE a.START_DATE BEWTWEEN DATE '2016-01-01' AND DATE '2017-12-12'
ORDER BY DBMS_RANDOM.VALUE
) a
WHERE ROWNUM <= 100
) a JOIN
mapping m
ON a.? = m.? JOIN
b
ON m.? = b.?;
The ? is just a placeholder for the join columns.
It's a bit of a stretch to know what you want with the question as written but here's my attempt.
WITH rand_list AS
(SELECT * FROM comp_eval_hdr
WHERE comp_eval_hdr.START_DATE BEWTWEEN TO_DATE('01-JAN-16' , 'DD-MON-YY') AND TO_DATE('12-DEC-17' , 'DD-MON-YY')
ORDER BY DBMS_RANDOM.VALUE)
first_100 AS
(SELECT *
FROM rand_list
WHERE ROWNUM <=100)
SELECT md.col_1, t3.col_a
FROM first_100 md
INNER JOIN
table2 t2 ON md.id_column = t2.fk_comp_eval_hdr_id
INNER JOIN
table3 t3 ON t3.id_column = t2.fk_table3_id
You haven't given any indication how they join or the table names and obviously I haven't run this against any mock tables.
You've got a list of randomised records with RAND_LIST which you could, if you wanted, combine with the FIRST_100 query (your choice).
The main query then just joins that through your mapping table (T2) to your 'multiples' table (T3).
how does table 2 look like?...Let me put one example as person table and order table?
select * from (
select * from person ps , order order where ps.city = 'mumbai' and ps.id = order.purchasedby ) porder where porder.rownum <= 100
I did not tested it but it will look something like this.

Ms Access query to SQL Server - DistinctRow

What would the syntax be to convert this MS Access query to run in SQL Server as it doesn't have a DistinctRow keyword
UPDATE DISTINCTROW [MyTable]
INNER JOIN [AnotherTable] ON ([MyTable].J5BINB = [AnotherTable].GKBINB)
AND ([MyTable].J5BHNB = [AnotherTable].GKBHNB)
AND ([MyTable].J5BDCD = [AnotherTable].GKBDCD)
SET [AnotherTable].TessereCorso = [MyTable].[J5F7NR];
DISTINCTROW [MyTable] removes duplicate MyTable entries from the results. Example:
select distinctrow items
items.item_number, items.name
from items
join orders on orders.item_id = items.id;
In spite of the join getting you the same item_number and name multiple times when there is more than one order for it, DISTINCTROW reduces this to one row per item. So the whole join is merely for assuring that you only select items for which exist at least one order. You don't find DISTINCTROW in any other DBMS as far as I know. Probably because it is not needed. When checking for existence, we use EXISTS of course (or IN for that matter).
You are joining MyTable and AnotherTable and expect for some reason to get the same MyTable record multifold for one AnotherTable record, so you use DISTINCTROW to only get it once. Your query would (hopefully) fail if you got two different MyTable records for one AnotherTable record.
What the update does is:
update anothertable
set tesserecorso = (select top 1 j5f7nr from mytable where mytable.j5binb = anothertable.gkbinb and ...)
where exists (select * from mytable where mytable.j5binb = anothertable.gkbinb and ...)
But this uses about the same subquery twice. So we'd want to update from a query instead.
The easiest way to get one result record per <some columns> in a standard SQL query is to aggregate data:
select *
from anothertable a
join
(
select j5binb, j5bhnb, j5bdcd, max(j5f7nr) as j5f7nr
from mytable
group by j5binb, j5bhnb, j5bdcd
) m on m.j5binb = a.gkbinb and m.j5bhnb = a.gkbhnb and m.j5bdcd = a.gkbdcd;
How to write an updateble query is different from one DBMS to another. Here is the final update statement for SQL-Server:
update a
set a.tesserecorso = m.j5f7nr
from anothertable a
join
(
select j5binb, j5bhnb, j5bdcd, max(j5f7nr) as j5f7nr
from mytable
group by j5binb, j5bhnb, j5bdcd
) m on m.j5binb = a.gkbinb and m.j5bhnb = a.gkbhnb and m.j5bdcd = a.gkbdcd;
The DISTINCTROW predicate in MS Access SQL removes duplicates across all fields of a table in join statements and not just the selected fields of query (which DISTINCT in practically all SQL dialects do). So consider selecting all fields in a derived table with DISTINCT predicate:
UPDATE [AnotherTable]
SET [AnotherTable].TessereCorso = main.[J5F7NR]
FROM
(SELECT DISTINCT m.* FROM [MyTable] m) As main
INNER JOIN [AnotherTable]
ON (main.J5BINB = [AnotherTable].GKBINB)
AND (main.J5BHNB = [AnotherTable].GKBHNB)
AND (main.J5BDCD = [AnotherTable].GKBDCD)
Another variant of the query.. (Too lazy to get the original tables).
But like the query above updates 35 rows =, so does this one
UPDATE [Albi-Anagrafe-Associati]
SET
[Albi-Anagrafe-Associati].CRegDitte = [055- Registri ditte].[CRegDitte],
[Albi-Anagrafe-Associati].NIscrTribunale = [055- Registri ditte].[NIscrTribunale],
[Albi-Anagrafe-Associati].NRegImprese = [055- Registri ditte].[NRegImprese]
FROM [055- Registri ditte]
WHERE EXISTS(
SELECT *
FROM [055- Registri ditte]-- [Albi-Anagrafe-Associati]
WHERE ([055- Registri ditte].GIBINB = [Albi-Anagrafe-Associati].GKBINB)
AND ([055- Registri ditte].GIBHNB = [Albi-Anagrafe-Associati].GKBHNB)
AND ([055- Registri ditte].GIBDCD = [Albi-Anagrafe-Associati].GKBDCD))
Update [AnotherTable]
Set [AnotherTable].TessereCorso = MyTable.[J5F7NR]
From [AnotherTable]
Inner Join
(
Select Distinct [J5BINB],[5BHNB],[J5BDCD]
,(Select Top 1 [J5F7NR] From MyTable) as [J5F7NR]
,[J5BHNB]
From MyTable
)as MyTable
On (MyTable.J5BINB = [AnotherTable].GKBINB)
AND (MyTable.J5BHNB = [AnotherTable].GKBHNB)
AND (MyTable.J5BDCD = [AnotherTable].GKBDCD)