Should I JOIN or should I UNION - sql

I have four different tables I am trying to query on, the first table is where I will be doing most of the querying, but if there is no match in car I am to look in other fields in the other tables to see if there is a match from a VIN parameter.
Example:
Select
c.id,
c.VIN,
c.uniqueNumber,
c.anotheruniqueNumber
FROM Cars c, Boat b
WHERE
c.VIN = #VIN(parameter),
b.SerialNumber = #VIN
Now say that I have no match in Cars, but there is a match in Boat, how would I be able to pull the matching Boat record vs the car record? I have tried to JOIN the tables, but the tables have no unique identifier to reference the other table.
I am trying to figure out what is the best way to search all the tables off of a parameter but with the least amount of code. I thought about doing UNION ALL, but not sure if that what I really want for this situation, seeing as the number of records could get extremely large.
I am currently using SQL Server 2012. Thanks in advance!
UPDATED:
CAR table
ID VIN UniqueIdentifier AnotherUniqueIdentifier
1 2002034434 HH54545445 2016-A23
2 2002035555 TT4242424242 2016-A24
3 1999034534 AGH0000034 2016-A25
BOAT table
ID SerialNumber Miscellaneous
1 32424234243 545454545445
2 65656565656 FF24242424242
3 20023232323 AGH333333333
Expected Result if #VIN parameter matches a Boat identifier:
BOAT
ID SerialNumber Miscellaneous
2 65656565656 FF24242424242

Some sort of union all might be the best approach -- at least the fastest with the right indexes:
Select c.id, c.VIN, c.uniqueNumber, c.anotheruniqueNumber
from Cars c
where c.VIN = #VIN
union all
select b.id, b.VIN, b.uniqueNumber, b.anotheruniqueNumber
from Boats b
where b.VIN = #VIN and
not exists (select 1 from Cars C where c.VIN = #VIN);
This assumes that you have the corresponding columns in each of the tables (which your question implies is true).
The chain of not exists can get longer as you add more entity types. A simple way around is to do sorting instead -- assuming you want only one row:
select top 1 x.*
from (Select c.id, c.VIN, c.uniqueNumber, c.anotheruniqueNumber, 1 as priority
from Cars c
where c.VIN = #VIN
union all
select b.id, b.VIN, b.uniqueNumber, b.anotheruniqueNumber, 2 as priority
from Boats b
where b.VIN = #VIN
) x
order by priority;
There is a slight overhead for the order by. But frankly speaking, ordering 1-4 rows is trivial from a performance perspective.

Related

How do I find the name of someone who has multiple entries in a different table

I'm trying to figure out how to get through this type of scenario:
I have a Table A with names of people along with their IDs. I have a Table B with the people's IDs and the type of car that they bought from this dealership. Multiple people can have multiple entries if they buy multiple cars.
Let's say that David with ID 789 has bought multiple cars from this dealership, so he has multiple entries in Table B:
(ID | Car)
(----|----)
(789 | Toyota)
(789 | Ford)
I want to query these tables so that my results show all of the people who have bought a Toyota, but not a Ford.
SELECT name, id
FROM TableA a
JOIN TableB b ON a.id = b.id
WHERE a.id IN (SELECT b.id where Car = 'Toyota') AND
a.id NOT IN (SELECT b.id where Car = 'Ford')
I want to understand why this code does not bring back the ids who have bought a toyota but not a ford, even if they bought multiple cars? What about the logic am I missing?
Thanks in advance.
To start with the why it isn't working, SQL is evaluating the WHERE clause sections in pieces. When it evaluates WHERE a.id IN (SELECT b.id where Car = 'Toyota'), all that remains in the table is records where that ID had Toyota since you are using relative subqueries. So, when it passes to the AND a.id NOT IN (SELECT b.id where Car = 'Ford'), there are no remaining records with Ford in it.
This fiddle should help to illustrate what's happening - https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=085cdad84898f4e498aa9a2e76947d93
The quick fix to this would be to use absolute references to TableB instead of a relative reference in the subquery as shown in this fiddle - https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=f95e3ef0ddab5290ee7b9fa13e7451ab

SQL queries: how to get the value which appears the most in the total of two different tables?

Context: I want to know which vehicle brand appears the most in different accidents.
I have the table vehicle (v_number, brand).
Problem is, I have two different accident tables:
One refers to driven cars involved in an accident, let's call it acc_drive (v_number, acc_number, driver) [v_number FK vehicle]
The other refers to parked cars which are involved in an accident, let's call it acc_park (v_number, acc_number) [v_number FK vehicle, acc_number FK acc_drive]
Now, I'm trying to get the vehicle brand which appears the most in the total of the two tables. For example, if Audi cars appeared 2 times in acc_drive and 3 times in acc_park, the total number of appearences would be 5.
I'm having a really hard time trying to figure this out, so a helping hand would be much appreciated!
UNION ALL can be used to bring the tables together for the JOIN:
select v.brand, count(a.v_number)
from vehicle v left join
((select v_number
from acc_drive
) union all
(select v_number
from acc_park
)
) a
on v.v_number = a.v_number
group by v.brand
order by count(v_number) desc; -- put the biggest numbers first
Note that this uses a left join. So brands with no accidents will be included in the results.
Try this-
SELECT TOP 1 brand,COUNT(*)
FROM vehicle A
INNER JOIN acc_drive B ON A.v_number = B.v_number
INNER JOIN acc_park C ON A.v_number = C.v_number
GROUP BY brand
ORDER BY COUNT(*) DESC

SQL Query to fetch information based on one or more condition. Getting combinations instead of exact number

I have two tables. Table 1 has about 750,000 rows and table 2 has 4 million rows. Table two has an extra ID field in which I am interested, so I want to write a query that will check if the 750,000 table 1 records exist in table 2. For all those rows in table 1 that exist in table 2, I want the respective ID based on same SSN. I tried the following query:
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A, [Proteus_8_2].dbo.Table2 B
where a.ssn = b.ssn
Instead of getting 750,000 rows in the output, I am getting 5.4 million records. Where am i going wrong?
Please help?
You're requesting all the rows in your select if b.UID is a unique field in column two.
Also if SSN is not unique in table one you can get the higher row count than the total row count for table 2.
You need to consider what you want from table 2 again.
EDIT
You can try this to return distinct combinations of ssn and uid when ssn is found in table 2 provided that ssn and uid have a cardinality of 1:1, i.e., every unique ssn has a single unique uid.
select distinct
a.ssn,b.[UID]
from [Analysis].[dbo].[Table1] a
cross apply
( select top 1 [uid] from [Proteus_8_2].[dbo].[Table2] where ssn = a.ssn ) b
where b.[UID] is not null
Try with LEFT JOIN
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A LEFT JOIN [Proteus_8_2].dbo.Table2 B
on a.ssn = b.ssn
Since the order detail table is in a one-many relationship to the order table, that is the expected result of any join. If you want something different, you need to define for us the business rule that will tell us how to select only one record from the Order detail table. You cannot effectively write SQL code without understanding the business rules that of what you are trying to achieve. You should never just willy nilly select one record out of the many, you need to understand which one you want.

SQL JOIN returning multiple rows when I only want one row

I am having a slow brain day...
The tables I am joining:
Policy_Office:
PolicyNumber OfficeCode
1 A
2 B
3 C
4 D
5 A
Office_Info:
OfficeCode AgentCode OfficeName
A 123 Acme
A 456 Acme
A 789 Acme
B 111 Ace
B 222 Ace
B 333 Ace
... ... ....
I want to perform a search to return all policies that are affiliated with an office name. For example, if I search for "Acme", I should get two policies: 1 & 5.
My current query looks like this:
SELECT
*
FROM
Policy_Office P
INNER JOIN Office_Info O ON P.OfficeCode = O.OfficeCode
WHERE
O.OfficeName = 'Acme'
But this query returns multiple rows, which I know is because there are multiple matches from the second table.
How do I write the query to only return two rows?
SELECT DISTINCT a.PolicyNumber
FROM Policy_Office a
INNER JOIN Office_Info b
ON a.OfficeCode = b.OfficeCode
WHERE b.officeName = 'Acme'
SQLFiddle Demo
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Simple join returns the Cartesian multiplication of the two sets and you have 2 A in the first table and 3 A in the second table and you probably get 6 results. If you want only the policy number then you should do a distinct on it.
(using MS-Sqlserver)
I know this thread is 10 years old, but I don't like distinct (in my head it means that the engine gathers all possible data, computes every selected row in each record into a hash and adds it to a tree ordered by that hash; I may be wrong, but it seems inefficient).
Instead, I use CTE and the function row_number(). The solution may very well be a much slower approach, but it's pretty, easy to maintain and I like it:
Given is a person and a telephone table tied together with a foreign key (in the telephone table). This construct means that a person can have more numbers, but I only want the first, so that each person only appears one time in the result set (I ought to be able concatenate multiple telephone numbers into one string (pivot, I think), but that's another issue).
; -- don't forget this one!
with telephonenumbers
as
(
select [id]
, [person_id]
, [number]
, row_number() over (partition by [person_id] order by [activestart] desc) as rowno
from [dbo].[telephone]
where ([activeuntil] is null or [activeuntil] > getdate()
)
select p.[id]
,p.[name]
,t.[number]
from [dbo].[person] p
left join telephonenumbers t on t.person_id = p.id
and t.rowno = 1
This does the trick (in fact the last line does), and the syntax is readable and easy to expand. The example is simple but when creating large scripts that joins tables left and right (literally), it is difficult to avoid that the result contains unwanted duplets - and difficult to identify which tables creates them. CTE works great for me.

Best way to filter union of data from 2 tables by value in shared 3rd table

For sake of example, let's assume 3 tables:
PHYSICAL_ITEM
ID
SELLER_ID
NAME
COST
DIMENSIONS
WEIGHT
DIGITAL_ITEM
ID
SELLER_ID
NAME
COST
DOWNLOAD_PATH
SELLER
ID
NAME
Item IDs are guaranteed unique across both item tables. I want to select, in order, with a type label, all item IDs for a given seller. I've come up with:
Query A
SELECT PI.ID AS ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM PI
JOIN SELLER S ON PI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
UNION
SELECT DI.ID AS ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM DI
JOIN SELLER S ON DI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
ORDER BY ID
Query B
SELECT ITEM.ID, ITEM.TYPE
FROM (SELECT ID, SELLER_ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM
UNION
SELECT ID, SELLER_ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM) AS ITEM
JOIN SELLER ON ITEM.SELLER_ID = SELLER.ID
WHERE SELLER.NAME = 'name'
ORDER BY ITEM.ID
Query A seems like it would be the most efficient, but it also looks unnecessarily duplicative (2 table joins to the same table, 2 where clauses on the same table column). Query B looks cleaner in a way to me (no duplication), but it also looks much less efficient, since it has a subquery. Is there a way to get the best of both worlds, so to speak?
In both cases, replace the union with union all. Union unnecessarily removes duplicates.
I would expect Query A to be more efficient, because the optimizer has more information when doing the join (although I think Oracle is pretty good with using indexes even after a union). In addition, the first query reduces the amount of data before the union.
This is, however, only an opinion. The real test is to time the two queries -- multiple times to avoid cache fill delays -- to see which is better.