Sub-query works but would a join or other alternative be better? - sql

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?

If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)

It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

Related

How to insert one column from a table into another based on a join/where clause

I have two tables, temp_am and amphibian. The relationship between the two tables comes from the lake_id and the survey_date column in both tables. Both tables have 24,109 entries.
temp_am
id
lake_id
survey_date
1
10,001
7/25/2001
5
10,005
7/27/2001
6
10,006
7/29/2001
etc...
amphibain
id
lake_id
survey_date
amhibian_survey_id
1
10,002
7/25/2001
2
10,005
7/27/2001
etc...
I want to input the temp_am.id into the amphibian.amphibian_survey_id when both lake_ids and survey dates equal each other.
I have tried this sql query but it never worked. I canceled the query after 600 seconds as I figured a 29,000 observation table should not take that long. Please let me know if you see any issues in my query statement.
update amphibian
set amphibian_survey_id = tm.id
from amphibian a
inner join temp_am tm
on a.lake_id = tm.lake_id
and a.survey_date = tm.survey_date
This query worked in microsoft access but not on DBeaver
UPDATE amphibian
inner JOIN amphibian_survey_meta_data md ON
(amphibian.survey_date = md.survey_date) AND (amphibian.lake_id = md.lake_id) SET amphibian.amphibian_survey_id = [md.id];
Postgres does not require repeating the table name for an update join. In this case even the join is not necessary just set <column> = ( select ... ) is sufficient. See demo here.
update amphibain a
set amhibian_survey_id =
( select tm.id
from temp_am tm
where (tm.lake_id, tm.survey_date) =
(a.lake_id, a.survey_date)
) ;

Determining what index to create given a query?

Given a SQL query:
SELECT *
FROM Database..Pizza pizza
JOIN Database..Toppings toppings ON pizza.ToppingId = toppings.Id
WHERE toppings.Name LIKE '%Mushroom%' AND
toppings.GlutenFree = 0 AND
toppings.ExtraFee = 1.25 AND
pizza.Location = 'Minneapolis, MN'
How do you determine what index to write to improve the performance of the query? (Assuming every value to the right of the equal is calculated at runtime)
Is there a built in command SQL command to suggest the proper index?
To me, it gets confusing when there's multiple JOINS that use fields from both tables.
For this query:
SELECT *
FROM Database..Pizza p JOIN
Database..Toppings t
ON p.ToppingId = t.Id
WHERE t.Name LIKE '%Mushroom%' AND
t.GlutenFree = 0 AND
t.ExtraFee = 1.25 AND
p.Location = 'Minneapolis, MN';
You basically have two options for indexes:
Pizza(location, ToppingId) and Toppings(id)
or:
Toppings(GlutenFree, ExtraFee, Name, id) and Pizza(ToppingId, location)
Which works better depends on how selective the different conditions are in the WHERE clause.

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

SQL query Optimisation JOIN multiple column

I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows).
T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns.
I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it.
Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS.
My query is :
SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR
INTO FINALTABLE
FROM T_DATAS A LEFT JOIN T_REAF B ON
A.REGION LIKE B.REGION AND
A.PAYS LIKE B.PAYS AND
A.MARQUE LIKE B.MARQUE AND
A.MODELE LIKE B.MODELE AND
A.CARROS LIKE B.CARROS AND
A.SEGT LIKE B.SEGT
I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes).
I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query.
Could you please tell me what is wrong with this query?
Thanks a lot for your help
Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index.
So:
SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR
INTO FINALTABLE
FROM T_DATAS D LEFT JOIN
T_REAF R
ON D.REGION = R.REGION AND
D.PAYS = R.PAYS AND
D.MARQUE = R.MARQUE AND
D.MODELE = R.MODELE AND
D.CARROS = R.CARROS AND
D.SEGT = R.SEGT;
Second, you want an index on T_REAF:
CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT);
MS Access can then use the index for the JOIN, speeding the query.
Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.
I assume that those 6 columns are same may have same datatype also.
Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time.
SELECT A.*
,B.CARROS_NEW
,B.SEGT_NEW
,B.ATTR
INTO FINALTABLE
FROM T_DATAS A
LEFT JOIN T_REAF B
ON A.REGION = B.REGION
AND A.PAYS = B.PAYS
AND A.MARQUE = B.MARQUE
AND A.MODELE = B.MODELE
AND A.CARROS = B.CARROS
AND A.SEGT = B.SEGT

SQL outer join in combination with MAX function in right table

I have an SQL question based on below table structure.
Database is currently in MS Access, with plans to migrate to SQL Server. Query should work in both DBMS'es.
I want to get devName and the latest dswSW_Version, based on dswTimestamp, for the device in question. If no SW history exists, I want to just return the devName.
The closest I could get was:
SELECT dev.devname, dsw1.dswsw_version
FROM device_sw_history AS dsw1 RIGHT JOIN device AS dev
ON dsw1.dswdevid = dev.devid
WHERE dsw1.dswtimestamp = (SELECT MAX(dswtimestamp) FROM device_sw_history AS dsw2 WHERE dsw1.dswdevid = dsw2.dswdevid)
AND devid = #devid
But nothing is returned for devid = 2, due to MAX returning null. I want to return Apple, null.
Is there a way to construct this statement without using a UNION and still return devname even if no SW history exists ?
Device:
devid devname
1 Samsung
2 Apple
Device_SW_History:
dswid dswdevid dswtimestamp dswsw_version
1 1 5/dec/13 One
2 1 6/dec/13 Two
Thank you !
Just put your condition in the on clause:
SELECT dev.devname, dsw1.dswsw_version
FROM device_sw_history AS dsw1 RIGHT JOIN device AS dev
ON dsw1.dswdevid = dev.devid
AND dsw1.dswtimestamp = (SELECT MAX(dswtimestamp) FROM device_sw_history AS dsw2 WHERE dsw1.dswdevid = dsw2.dswdevid)
WHERE devid = #devid
For inner joins the on and where clauses are identical, and putting a condition in one or the other is merely a question of style and readability. Outer joins introduce a difference between on and where, the on clause only applies to one table, while the where clause applies to their combination.
On SQL Server, a simple subquery should do the trick:
SELECT
devname,
(SELECT TOP 1 dswsw_version FROM device_sw_history WHERE dswdevid = devid
ORDER BY dswtimestamp DESC)
FROM device
This will return all the device names from device, even those that does not have an entry in device_sw_history.