Faster way compare changes between SQL sub-queries?

Faster way compare changes between SQL sub-queries? - sql

I need to check for the change in the color of cats between two dates.
I have the query below:
WITH
cats_prior AS (SELECT IDTAG,Color FROM CATS WHERE APPOINTMENT = '06/30/2019'),
cats_now AS (SELECT IDTAG,Color FROM CATS WHERE APPOINTMENT = '08/31/2019')
SELECT cats_prior.IDTAG, cats_prior.Color,cats_now.Color
FROM cats_prior
JOIN cats_now on cats_prior.IDTAG = cats_now.IDTAG
WHERE cats_prior.Color != cats_now.Color
It works but it takes 11 minutes and there are around 15 million cats in that table.
Is there another way to do this? or a way to make this faster?
This is SQL Server.

I would try aggregation with a HAVING clause:
SELECT IDTAG
FROM CATS
WHERE APPOINTMENT IN ('2019-08-31', '2019-06-30')
GROUP BY IDTAG
HAVING MIN(COLOR) <> MAX(COLOR);
An index on CATS(APPOINTMENT, IDTAG, COLOR) would help. This index might also speed up your version of the query.

Just another way to try the same thing:
select distinct cp.IDTAG
from CATS cp
inner join CATS cn
on cp.IDTAG = cn.IDTAG
and cp.color <> cn.color
where cp.APPOINTMENT = '06/30/2019'
and cn.APPOINTMENT = '08/31/2019'
You can check performance against your data.

Related

Sub-query works but would a join or other alternative be better?

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?

If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)

It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

Using NOT EXISTS with a subquery

I'm trying to create a query to retrieve all customers who purchased one SKU ('Red399') but did not also purchase a second SKU ('Red323'). It seems like the best way to do this is by using NOT EXISTS with a subquery to filter out people who bought 'Red323'.
My query isn't returning any errors, but it also isn't returning any results, and I think it might be because I have too many conditions in the initial WHERE clause but I'm not sure:
SELECT DISTINCT o."FirstName", o."LastName", o."Email", ol."SKU"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red399'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
AND NOT EXISTS
(SELECT DISTINCT o."Email"
FROM flight_export_order o
JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" = 'Red323'
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020')

You don't need the subquery.
You can group by customer and set the conditions in the having clause:
SELECT o."FirstName", o."LastName", o."Email"
FROM flight_export_order o INNER JOIN flight_export_orderline ol
ON o."OrderDisplayID" = ol."OrderDisplayID"
WHERE ol."SKU" IN ('Red399', 'Red323')
AND o."OrderDate" BETWEEN '07/22/2020' AND '08/03/2020'
GROUP BY o."FirstName", o."LastName", o."Email"
HAVING SUM((ol."SKU" = 'Red399')::int) > 0
AND SUM((ol."SKU" = 'Red323')::int) = 0

You were on the right track to begin with. EXISTS / NOT EXISTS are the right tools:
SELECT o.* -- or just the columns you need
FROM flight_export_order o
WHERE o."OrderDate" BETWEEN '2020-07-22' AND '2020-08-03'
AND EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red399'
)
AND NOT EXISTS (
SELECT FROM flight_export_orderline
WHERE "OrderDisplayID" = o."OrderDisplayID"
AND "SKU" = 'Red323'
);
With a multicolumn index on flight_export_orderline ("OrderDisplayID", "SKU"), this is as fast as it gets. An index on just ("OrderDisplayID") (like you probably have) goes a long way, too.
Plus an index on flight_export_order("OrderDate"), obviously.
I see no need for any expensive aggregating or DISTINCT. See:
Select rows which are not present in other table
Aside 1: try to avoid quoted CaMeL-case identifiers in Postgres if you can. See:
Are PostgreSQL column names case-sensitive?
Aside 2: It's reommended to use ISO 8601 date format (YYYY-MM-DD), which is always unambiguous and independent of locale and session settings.

Determining what index to create given a query?

Given a SQL query:
SELECT *
FROM Database..Pizza pizza
JOIN Database..Toppings toppings ON pizza.ToppingId = toppings.Id
WHERE toppings.Name LIKE '%Mushroom%' AND
toppings.GlutenFree = 0 AND
toppings.ExtraFee = 1.25 AND
pizza.Location = 'Minneapolis, MN'
How do you determine what index to write to improve the performance of the query? (Assuming every value to the right of the equal is calculated at runtime)
Is there a built in command SQL command to suggest the proper index?
To me, it gets confusing when there's multiple JOINS that use fields from both tables.

For this query:
SELECT *
FROM Database..Pizza p JOIN
Database..Toppings t
ON p.ToppingId = t.Id
WHERE t.Name LIKE '%Mushroom%' AND
t.GlutenFree = 0 AND
t.ExtraFee = 1.25 AND
p.Location = 'Minneapolis, MN';
You basically have two options for indexes:
Pizza(location, ToppingId) and Toppings(id)
or:
Toppings(GlutenFree, ExtraFee, Name, id) and Pizza(ToppingId, location)
Which works better depends on how selective the different conditions are in the WHERE clause.

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?

Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

Database (Oracle 11g) query optimization for joins

So I am trying to optimize a bunch of queries which are taking a lot of time. What I am trying to figure out is how to create an index on columns from different tables.
Here is a simple version of my problem.
What I did
After Googling I looked into bitmap index but I am not sure if this is the right way to solve the issue
Issue
There is a many to many relationship b/w Student(sid,...) and Report(rid, year, isdeleted)
StudentReport(id, sid, rid) is the join table
Query
Select *
from Report
inner join StudentReport on Report.rid = StudentReport.rid
where Report.isdeleted = 0 and StudentReport.sid = x and Report.year = y
What is the best way to create an index?

Please try this:
with TMP_REP AS (
Select * from Report where Report.isdeleted = 0 AND Report.year = y
)
,TMP_ST_REP AS(
Select *
from StudentReport where StudentReport.sid = x
)
SELECT * FROM TMP_REP R, TMP_ST_REP S WHERE S.rid = R.rid

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Faster way compare changes between SQL sub-queries? - sql

I would try aggregation with a HAVING clause: SELECT IDTAG FROM CATS WHERE APPOINTMENT IN ('2019-08-31', '2019-06-30') GROUP BY IDTAG HAVING MIN(COLOR) <> MAX(COLOR); An index on CATS(APPOINTMENT, IDTAG, COLOR) would help. This index might also speed up your version of the query.

Just another way to try the same thing: select distinct cp.IDTAG from CATS cp inner join CATS cn on cp.IDTAG = cn.IDTAG and cp.color <> cn.color where cp.APPOINTMENT = '06/30/2019' and cn.APPOINTMENT = '08/31/2019' You can check performance against your data.

Related

Sub-query works but would a join or other alternative be better?

Using NOT EXISTS with a subquery

Determining what index to create given a query?

SQL Math Operation In Correlated Subquery

Database (Oracle 11g) query optimization for joins

Categories

Resources