SQL: How to speed up a query using indexing

SQL: How to speed up a query using indexing - sql

I am trying to speed up a query to find all CUSTOMERs who have bought a MOTORCYCLE manufactured before 1970 AND bought another MOTORCYCLE manufactured after 2010. Since my query is running very slowly, I think that I need help with finding the better indexes. My attempts are documented below:
Tables
CREATE TABLE CUSTOMER (
id int PRIMARY KEY,
fname varchar(30),
lname varchar(30)
);
CREATE TABLE MOTORCYCLE (
id int PRIMARY KEY,
name varchar(30),
year int -- Manufactured year
);
CREATE TABLE SALES (
cid int,
mid int,
FOREIGN KEY(cid) REFERENCES CUSTOMER(id),
FOREIGN KEY(mid) REFERENCES MOTOCYCLE(id),
PRIMARY KEY(pid, mid, role)
);
Indexes
Here are my indexes (I am somewhat guessing with these, but this was my attempt):
CREATE UNIQUE INDEX customerID on CUSTOMER(id);
CREATE INDEX customerName on CUSTOMER(fname, lname);
CREATE UNIQUE INDEX motorcycleID on MOTORCYCLE(id);
CREATE INDEX motorcycleName on MOTORCYCLE(name);
CREATE INDEX motorcycleYear on MOTORCYCLE(year);
CREATE INDEX salesCustomerMotorcycleID on SALES(cid, mid);
CREATE INDEX salesCustomerID on SALES(cid);
CREATE INDEX castsMotorcycleID on SALES(mid);
Queries
My query to find the customers purchasing bikes manufactured before 1970 and after 2010 is here:
SELECT fname, lname
FROM (SALES INNER JOIN CUSTOMER ON SALES.cid=CUSTOMER.id) INNER JOIN MOTORCYCLE ON MOTORCYCLE.id=SALES.mid
GROUP BY CUSTOMER.id
HAVING MIN(MOTORCYCLE.year) < 1970 AND MAX(MOTORCYCLE.year) > 2010;
And here is another working query which avoids the GROUP BY and HAVING clauses:
SELECT DISTINCT C.id, fname, lname
FROM (CUSTOMER as C inner join (SALES as S1 INNER JOIN MOTORCYCLE as M1 ON M1.id=S1.mid) on C.id=S1.cid) inner join (SALES as S2 inner join MOTORCYCLE as M2 on S2.mid=M2.id) on C.id=S2.cid
WHERE (M1.year < 1970 AND M2.year > 2010);
Any suggestions on the kinds of indexes I can use to speed up my query? Or should I change my query?
UPDATE
I found another query that also works, but it is also too slow. It has been added above. Still, it might be helpful when finding an index to speed it up.

When you check out your queries with EXPLAIN QUERY PLAN, you see that in both cases, the database looks up many related records before it filters out unneeded records (with unwanted years).
The following queries look up the motorcycle IDs before matching; which one is faster depends on the details of your data and must be measured by you:
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970))
AND EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales AS s1
JOIN Sales AS s2 ON s1.cid = s2.cid
WHERE s1.cid = Customer.id
AND s1.mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970)
AND s2.mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SQL Fiddle

Why using group by when there's no using of aggregation function in the query?
Use distinct instead if you don't want to see any duplication

Related

Is there a way to select this name from more than one table?

I need to select the item name and the vendor name for each item that belongs to the vendor with a rating bigger than 4. And I can't find a way, I know it's something with joins but the 2 of them have the same column name.
CREATE TABLE venedors(
id int PRIMARY KEY,
name varchar2(20),
rating int)
CREATE TABLE items(
id int PRIMARY KEY,
name varchar2(20),
venedorId int references venedors(id))

If i understanded your problem.
Select items.name as itemName, venedors.name as vendorName
from items
inner join venedors
on items.venedorId = venedors.id
where venedors.rating > 4

If you want get all the vendors irrespective whether there are items associated with vendors or not, then try with left join as shown below:
Select v.name as vendorName, i.name as itemName
from venedors v
left join items i
on i.venedorId = v.id
where v.rating > 4

Select MAX price of a book with JOIN (2 tables) in SQL Server?

I have 2 tables
CREATE TABLE BOOKS
(
numbk INT PRIMARY KEY IDENTITY,
nombk NVARCHAR(60),
_numrub INT FOREIGN KEY REFERENCES CLASSIFICATION(numrub)
)
CREATE TABLE TARIFER
(
_numbk INT FOREIGN KEY REFERENCES BOOKS(numbk),
_nomed NVARCHAR(60) FOREIGN KEY REFERENCES EDITEURS(nomed),
_date DATE,
price DECIMAL(20,2),
PRIMARY KEY (_numouv, _nomed)
)
The question is: how do I list all titles of books (nombk) that have the max price?
PS: TRAFIER has the price columns, and a foreign key from BOOKS which is _numbk
I tried this:
select
o.nombk, max(prix)
from
TARIFER tr, books o
where
o.numbk = tr._numbk
group by
o.nombk
This lists all, but when I execute this:
select max(prix)
from TARIFER tr, books o
where o.numbk = tr._numbk
It returns only the max price. I don't know why. Could someone please explain?

In SQL Server, you can use TOP (1) WITH TIES:
select top (1) with ties b.nombk, t.prix
from books b join
TARIFER t
on b.numbk = t._numbk
order by t.prix desc;

Why not use just a subquery the get the max(prix) and then use that one to list all records with that prix:
select o.nombk ,prix
from TARIFER tr , books o
where o.numbk = tr._numbk
and tr.prix in (select max(prix) from TARIFER tr)

Both queries to aggregation, but not at the same level:
The first query has group by o.nombk, so it generate one record per book, and gives you the maximum price of this book accross all tarifers.
The second query has no group by clause, hence it gives you the maximum price of all books over all tarifers.
If you want the book with the higher price, there is no need to aggregate: you can join and sort the results by price:
select top (1) with ties b.*, t.*
from books b
inner join join tarifer t on b.numbk = t._numbk
order by t.prix desc;
top (1) with ties gives you the first record; if there are several records with the same, top price, the query returns them all.

SQL Query to check if a record does not exist in another table

I have a table which holds details of all Students currently enrolled in classes which looks like this:
CREATE TABLE studentInClass(
studentID int,
classID int,
FOREIGN KEY(studentID) references students(studentID),
foreign key(classID) references class(classID)
);
And another table which contains details of students who have paid for classes:
CREATE TABLE fees(
feesID INTEGER PRIMARY KEY AUTOINCREMENT,
StudentID INTEGER,
AmountPaid INT,
Date DATE,
FOREIGN KEY(StudentID) REFERENCES students(StudentID));
What I want to do is check whether a student who is in a class has not paid for that class. I am struggling to write a SQL query which does so. I have tried multiple queries such as:
Select studentInClass.StudentID
from fees, studentInClass
where fees.StudentID = studentInClass.StudentID;
But this returns no data. I'm not sure how to proceed from here. Any help will be appreciated.

You want outer join :
select s.StudentID, (case when f.AmountPaid is not null
then 'Yes'
else 'No'
end) as Is_fees_paid
from studentInClass s left join
fees f
on f.StudentID = s.StudentID;

With NOT EXISTS:
select s.*
from studentInClass s
where not exists (
select 1 from fees
where studentid = s.studentid
)
with this you get all the rows from the table studentInClass for which there is not the studentid in the table fees.
It's not clear if you also need to check the date.

check it please:
select studentInClass.StudentID
from studentInClass inner join fees ON fees.StudentID = studentInClass.StudentID

Oracle APEX Join and Count

I have two tables created with SQL code:
CREATE TABLE
TicketSales(
purchase# Number(10),
client# Integer CONSTRAINT fk1 REFERENCES Customers,
PRIMARY KEY(purchase#));
CREATE TABLE Customers(
client# Integer,
name Char(30),
Primary Key(client#);
Basically table TicketSales holds ticket sales data and client# is foreign key referenced in customers table. I would like to count names that are in TicketSales table. i tried below code with no success:
select Count(name)
From Customers
Where Customers.Client#=TicketSales.Client#
Group by Name;
Any help appreciated.
Thanks,

If you want a count by each name, then include name in the select and group by clauses
select c.Name, Count(*)
From Customers c
INNER JOIN TicketSales t ON c.Client# =t.Client#
Group by c.Name;
If you want just the count of names, not tickets, then use
select Count(*)
From Customers c
;
Or, for a count of individuals who have tickets recrded against them:
select Count(DISTINCT t.Client#)
From TicketSales t
;

SQL multiple natural inner joins

Why does this correctly return the Order ID of an order, the Customer ID of the person who made the order, and the Last Name of the employee in charge of the transaction
SELECT "OrderID", "CustomerID", "LastName"
FROM orders O
NATURAL INNER JOIN customers JOIN employees ON O."EmployeeID" = employees."EmployeeID";
while
SELECT "OrderID", "CustomerID", "LastName"
FROM orders O
NATURAL INNER JOIN customers NATURAL INNER JOIN employees;
returns 0 rows?
I am sure that they have common columns.
Table orders
OrderId
EmployeeID
CustomerID
...
Table employees
EmployeeID
...
Table customers
CustomerID
...

Without seeing your full, unedited schema it's hard to be sure, but I'd say there are more common columns than you intended.
E.g. as #ClockworkMuse sugested:
CREATE TABLE orders (
OrderId integer primary key,
EmployeeID integer not null,
CustomerID integer not null,
created_at timestamp not null default current_timestamp,
...
);
CREATE TABLE employees (
EmployeeID integer primary key,
created_at timestamp not null default current_timestamp,
...
);
then orders NATURAL JOIN employees will be equivalent to orders INNER JOIN employees USING (EmployeeID, created_at). Which surely isn't what you intended.
You should use INNER JOIN ... USING (colname) or INNER JOIN ... ON (condition).
NATURAL JOIN is a poorly thought out feature that should really be avoided except on quick and dirty ad-hoc queries, if even then. Even if it works now, if you later add an unrelated column to a table it might change the meaning of existing queries. That's ... well, avoid natural joins.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: How to speed up a query using indexing - sql

Why using group by when there's no using of aggregation function in the query? Use distinct instead if you don't want to see any duplication

Related

Is there a way to select this name from more than one table?

Select MAX price of a book with JOIN (2 tables) in SQL Server?

SQL Query to check if a record does not exist in another table

Oracle APEX Join and Count

SQL multiple natural inner joins

Categories

Resources