SQL INNER JOIN vs. WHERE ID IN(...) not the same results - sql

I was surprised by the outcome of these two queries. I was expecting same from both. I have two tables that share a common field but there is not a relationship set up. The table (A) has a field EventID varchar(10) and table (B) has a field XXNumber varchar(15).
Values from table B column XXNumber are referenced in table A column EventID. Even though XXNumber can hold 15 chars, none of the 179K rows of data is longer than 10 chars.
So the requirement was:
"To avoid Duplicate table B and table A entries, if the XXNumber is contained in a table A >“Event ID” number, then it should not be counted."
To see how many common records I have I ran this query first - call it query alpha"
SELECT dbo.TableB.XXNumber FROM dbo.TableB WHERE dbo.TableB.XXNumber in
( select distinct dbo.TableA.EventId FROM dbo.TableA )
The result was 5322 rows.
The following query - call it query delta which looks like this:
SELECT DISTINCT dbo.TableB.XXNumber, dbo.TableB.EventId
FROM dbo.TableB INNER JOIN dbo.TableA ON dbo.TableB.XXNumber= dbo.TableB.EventId
haas returned 4308 rows.
Shouldn't the resulting number of rows be the same?

The WHERE ID IN () version will select all rows that match each distinct value in the list (regardless of whether you code DISTINCT indide the inner select or not - that's irrelevant). If a given value appears in the parent table more than once, you'll get multipke rows selected from the parent table for that single value found in the child table.
The INNER JOIN version will select each row from the parent table once for every successful join, so if there are 3 rows in the child table with the value, and 2 in the parent, then there will be 6 rows rows in the result for that value.
To make them "the same", add 'DISTINCT' to your main select.
To explain what you're seeing, we'd need to know more about your actual data.

Related

How do I update every row of a particular column with values from another column?

I am trying to update a column called software_id with a possible three values from a lookup table. So update user softwares software id column with the values from the softwares table of the id column. But there are only three rows in the softwares table. In the user softwares table there are over 1000. But when I run the query below only three rows get updated and the rest are left as null.
So the softwares id column is number and goes 1,2,3 and the user_sofwares software_id column is all null.
When I run the following query.
UPDATE user_software c
SET sotware_id = (
SELECT DISTINCT id
FROM softwares
WHERE id = c.id
);
Only the first three rows of the destination table get updated. I want every row on the user_softwares column to be updated with either 1,2 or 3. So it should be 1,2,3,1,2,3 for example.
I keep getting this error whenever i tried to update all rows
Error report - ORA-01427: single-row subquery returns more than one row
You can do this with a MERGE statement:
MERGE INTO (SELECT software_id, 1, ora_hash(ROWID, 2) + 1 AS fake_id
FROM user_software) u_soft
USING (SELECT DISTINCT id
FROM softwares) sftw
ON (sftw.id = u_soft.fake_id)
WHEN MATCHED THEN UPDATE SET u_soft.software_id = sftw.id;
(considering your matches are unique)

Get the "most" optimal row in a JOIN

Problem
I have a situation in which I have two tables in which I would like the entries from table 2 (lets call it table_2) to be matched up with the entries in table 1 (table_1) such that there are no duplicates rows of table_2 used in the match up.
Discussion
Specifically, in this case there are datetime stamps in each table (field is utcdatetime). For each row in table_1, I want to find the row in table_2 in which has the closed utcdatetime to the table 1 utcdatetime such that the table2.utcdatetime is older than the table_1 utcdatetime and within 30 minutes of the table 1 utcdatetime. Here is the catch, I do not want any repeats. If a row in table 2 gets gobbled up in a match on an earlier row in table 1, then I do not want it considered for a match later.
This has currently been implemented in a Python routine, but it is slow to iterate over all of the rows in table 1 as it is large. I thought I was there with a single SQL statement, but I found that my current SQL results in duplicate table 2 rows in the output data.
I would recommend using a nested select to get whatever results you're looking for.
For instance:
select *
from person p
where p.name_first = 'SCCJS'
and not exists (select 'x' from person p2 where p2.person_id != p.person_id
and p.name_first = 'SCCJS' and p.name_last = 'SC')

Suggestion to make a massive insert

I have the tables below:
tb_profile tb_mbx tb_profile_mbx tb_profile_cd
id id id_profile id_perfil
cod_mat mbx id_mbx id_cd (matches id_mbx)
concil bp
masc
I need to create a query that when validating that the id_cd 1,2,4,5
and 6 exists in tb_profile_cd, perform an insert in the
tb_profile_mbx table with the cod_matrix parameters of the tb_profile
table.
Remembering that each concil has its ID in the tb_mbx table and a
concil has many cod_mat.
Another point is that the concil id_mbx represents the id_cd of the
tb_profile_cd table.
One more point is that as I said above, that a concil has many
cod_mat. I have around 20 thousand records for each concil.
For my need, try to consult the query below, but Oracle returned an error:
insert into tb_profile_mbx values (seq_profile_mbx.nextval,
(select id from tb_profile where concil like '%NEXXERA%')
,(select id from tb_mbx where mbx like '%NEXXERA%')
,null
,null);
ORA-01427: single line subquery returns more than one line
Would there be another way to do this query?
Thanks in advance!
You can insert all matching combinations of matches using:
insert into tb_profile_mbx (id_profile, id_mbx)
select p.id, m.id
from tb_profile p join
tb_mbx m
on p.concil like '%NEXXERA%' and m.mbx like '%NEXXERA%';
I would recommend running the select to see if it returns the values you want.
You only show two columns for tb_profile_mbx, so I've only included two columns in the insert.

SQL Selecting and Comparing Data from 2 tables

I have 2 tables with 1 column in common. I want to select all data from table1 and want to restrict it with a condition. The thing I could not do is writing the where condition with the non-common column.
Here is the code:
$sql="SELECT kategoriler.adi as katadi, urunler.* FROM `urunler`
LEFT JOIN `kategoriler`
ON urunler.kategori_id=kategoriler.Id
WHERE urunler.kategori_id=$id
OR
kategoriler.ust_id = $id";
ust_id is a parent column of kategori_id and kategoriler.Id. I want to select the child values of it.
(Btw as you can see the common columns are urunler.kategori_id and kategoriler.Id)

SQL Query - Ensure a row exists for each value in ()

Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.