Optimizing SQL queries

Optimizing SQL queries - sql

I have created some MSSQL queries, all of them work well, but I think it could be done in a faster way. Can you help me to optimize them?
That's the database:
Create table Teachers
(TNO char(3) Primary key,
TNAME char(20),
TITLE char(6) check (TITLE in('Prof','PhD','MSc')),
CITY char(12),
SUPNO char(3) REFERENCES Teachers);
Create table Students
(SNO char(3) Primary key,
SNAME char(20),
SYEAR int,
CITY char(20));
Create table Courses
(CNO char(3) Primary key,
CNAME char(20),
STUDYEAR int);
Create table TSC
(TNO char(3) REFERENCES Teachers,
SNO char(3) REFERENCES Students,
CNO char(3) REFERENCES Courses,
HOURS int,
GRADE float,
PRIMARY KEY(TNO,SNO,CNO));
1:
On which study year there are most courses?
Problem: it looks like the result is being sorted while I only need the max element.
select
top 1 STUDYEAR
from
Courses
group by
STUDYEAR
order by COUNT(*) DESC
2:
Show the TNOs of those teachers who do NOT have courses with the 1st studyear
Problem: I'm using a subquery only to negate a select query
select
TNO
from
Teachers
where
TNO not in (
select distinct
tno
from
Courses, TSC
where tsc.CNO=Courses.CNO and STUDYEAR = 1)

Some ordering needs to be done to find the max or min value; maybe using ranking functions instead of a group by would be better but I frankly expect the query analyzer to be smart enough to find a good query plan for this specific query.
The subquery is performing well as long as it isn't using columns from the outer query (which may cause it to be performed for every row in many cases). However, I'd leave away the distinct, as it has no benefit. Also, I'd always use the explicit join syntax, but that's mostly a matter of personal preference (for inner joins - outer joins should always be done with the explicit syntax).
So all in all I think that these queries are simple and clear enough to be handled well in the query analyzer, thereby yielding good performance. Do you have a specific performance issue for asking this question? If yes, give us more info (query plan etc.), if no, just leave them - don't to premature optimization.

Related

Benefits of using an autogenerated primary key instead of a constant unique name

I've heard that having an autogenerated primary key is a convention. However, I'm trying to understand its benefits in the following scenario:
CREATE TABLE countries
(
countryID int(11) PRIMARY KEY AUTO_INCREMENT,
countryName varchar(128) NOT NULL
);
CREATE TABLE students
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin int(11) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries (countryID)
);
INSERT INTO countries (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
If I want to insert something into the students table, I need to lookup the countryIDs in the countries table:
INSERT INTO students (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', (SELECT countryID FROM countries WHERE countryName = 'Germany')),
('Erik Jakobsson', (SELECT countryID FROM countries WHERE countryName = 'Sweden')),
('Manuel Verdi', (SELECT countryID FROM countries WHERE countryName = 'Italy')),
('Min Lin', (SELECT countryID FROM countries WHERE countryName = 'China'));
In a different scenario, as I know that the countryNames in the countries table are unique and not null, I could to the following:
CREATE TABLE countries2
(
countryName varchar(128) PRIMARY KEY
);
CREATE TABLE students2
(
studentID int(11) PRIMARY KEY AUTO_INCREMENT,
studentName varchar(128) NOT NULL,
countryOfOrigin varchar(128) NOT NULL,
FOREIGN KEY (countryOfOrigin) REFERENCES countries2 (countryName)
);
INSERT INTO countries2 (countryName)
VALUES ('Germany'), ('Sweden'), ('Italy'), ('China');
Now, inserting data into the students2 table is simpler:
INSERT INTO students2 (studentName, countryOfOrigin)
VALUES ('Benjamin Schmidt', 'Germany'),
('Erik Jakobsson', 'Sweden'),
('Manuel Verdi', 'Italy'),
('Min Lin', 'China');
So why should the first option be the better one, given that countryNames are unique and are never going to change?

There are two apects involved here:
natural keys vs. surrogate keys
autoincremented values
You are wondering why to have to deal with some arbitrary number for a country, when a country can be uniquely identified by its name. Well, imagine you use the country names in several tables to relate rows to each other. Then at some point you are told that you misspelled a country. You want to correct this, but have to do this in every table the country occurs in. In big databases you usually don't have cascading updates in order to avoid updates that unexpectedly take hours instead of mere minutes or seconds. So you must do this manually, but the foreign key constraints get in your way. You cannot change the parent table's key, because there are child tables using this, and you cannot change the key in the child tables first, because that key has to exist in the parent table. You'll have to work with a new row in the parent table and start from there. Quite some task. And even if you have no spelling issue, at some point someone might say "we need the official country names; you have China, but it must be the People's Republic of China instead" and again you must look up and change that contry in all those tables. And what about partial backups? A table gets totally messed up due to some programming error and must be replaced by last week's backup, because this is the best you have. But suddenly some keys don't match any more. You never want a table's key to change.
You say "country names are unique and are never going to change". Think again :-)
It is easier to have your database use a technical arbitrary ID instead. Then the country name only exists in the country table. And if that name must get changed, you change it just in that one place, and all relations stay intact. This, however, doesn't mean that natural keys are worse than technical IDs. They are not. But it's more difficult with them to set up a database correctly. In case of countries a good natural key would be a country ISO code, because this uniquely identifies a country and doesn't change. This would be my choice here.
With students it's the same. Students usually have a student number or student ID in real world, so you can simply use this number to uniquely identifiy a student in the database. But then, how do we get these unique student IDs? At a large university, two secretaries may want to enter new students at the same time. They ask the system what the last student's ID was. It was #11223, so they both want to issue #11224, which causes a conflict of course, because only one student can be given that number. In order to avoid this, DBMS offer sequences of which numbers are taken. Thus one of the secretaries pulls #11224 and the other #11225. Auto-incremented IDs work this way. Both secretaries enter their new student, the rows get inserted into the student table and result in the two different IDs that get reported back to the secretaries. This makes sequences and auto incrementing IDs a great and safe tool to work with.

Convention can be a useful guide. It isn't necessarily the best option in all situations.
There are usually tradeoffs involved, like space, convenience, etc.
While you showed one method of resolving / inserting the proper country key value, there's a slightly less wordy option supported by standard SQL (and many databases).
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
SELECT *
FROM (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
) AS x
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
and a little less wordy again:
INSERT INTO students (studentName, countryOfOrigin)
WITH list (name, country) AS (
VALUES ('Benjamin Schmidt', 'Germany')
, ('Erik Jakobsson', 'Sweden')
, ('Manuel Verdi', 'Italy')
, ('Min Lin', 'China')
)
SELECT name, countryID
FROM list AS l
JOIN countries AS c
ON c.countryName = l.country
;
Here's a test case with MariaDB 10.5:
Working test case (updated)

Restrict the number of entries in a relation based on conditions across several relations

I am using PostgreSQL and am trying to restrict the number of concurrent loans that a student can have. To do this, I have created a CTE that selects all unreturned loans grouped by StudentID, and counts the number of unreturned loans for each StudentID. Then, I am attempting to create a check constraint that uses that CTE to restrict the number of concurrent loans that a student can have to 7 at most.
The below code does not work because it is syntactically invalid, but hopefully it can communicate what I am trying to achieve. Does anyone know how I could implement my desired restriction on loans?
CREATE TABLE loan (
id SERIAL PRIMARY KEY,
copy_id INTEGER REFERENCES media_copies (copy_id),
account_id INT REFERENCES account (id),
loan_date DATE NOT NULL,
expiry_date DATE NOT NULL,
return_date DATE,
WITH currentStudentLoans (student_id, current_loans) AS
(
SELECT account_id, COUNT(*)
FROM loan
WHERE account_id IN (SELECT id FROM student)
AND return_date IS NULL
GROUP BY account_id
)
CONSTRAINT max_student_concurrent_loans CHECK(
(SELECT current_loans FROM currentStudentLoans) BETWEEN 0 AND 7
)
);
For additional (and optional) context, I include an ER diagram of my database schema.

You cannot do this using an in-line CTE like this. You have several choices.
The first is a UDF and check constraint. Essentially, the logic in the CTE is put in a UDF and then a check constraint validates the data.
The second is a trigger to do the check on this table. However, that is tricky because the counts are on the same table.
The third is storing the total number in another table -- probably accounts -- and keeping it up-to-date for inserts, updates, and deletes on this table. Keeping that value up-to-date requires triggers on loans. You can then put the check constraint on accounts.
I'm not sure which solution fits best in your overall schema. The first is closest to what you are doing now. The third "publishes" the count, so it is a bit clearer what is going on.

How do I make this SQL Query?

Quite simply, I need to make a query that prints out all the years (in increasing order) in which only 1 exam took place. I am thinking something along the lines of having to query for every distinct year, and then use the count function to test if there was only one exam in that year? But I can't quite seem to write it out. If it is of importance, the program is being written in Java so I can manipulate outputs.
The form of the EXAM table is:
CREATE TABLE EXAM
(Student_id char(5), NOT NULL
Module_code varchar(6), NOT NULL
Exam_year smallint, NOT NULL
Score smallint, NOT NULL
PRIMARY KEY (Student_id, Module_code) -- Creates a unique tuple
FOREIGN KEY (Student_id) REFERENCES STUDENT(Student_id) -- Enforces data integrity
FOREIGN KEY (Module_code) REFERENCES MODULE(Module_code) -- Enforces data integrity
);

You need to use HAVING, and use distinct Module_code just in case.
SELECT Exam_year
FROM EXAM
GROUP BY Exam_year
HAVING COUNT(distinct Module_code) = 1
ORDER BY Exam_year

select Exam_year
from EXAM
group by module_code
having count(module_code) = 1
order by Exam_year asc

One query should get the job done. Try this:
select Exam_year
from EXAM
group by Exam_year
having count(*) = 1
order by Exam_year
If you need more information than just Exam_year in the output, this can be modified to support additional info, but this example is the simplest way to do what is stated in your requirement.

SQL gym with less workers per district

I'm struggling to find how to make a specific query to a set of tables that I have on my local database.
CREATE TABLE Gym (
eid INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
district VARCHAR(127),
area INT);
CREATE TABLE Trainer (
id INT PRIMARY KEY,
name VARCHAR(127),
birth_year INT,
year_credentials_expiry INT
);
CREATE TABLE Works (
eid INT,
id INT,
since INT,
FOREIGN KEY (eid) REFERENCES Gym (eid),
FOREIGN KEY (id) REFERENCES Trainer (id),
PRIMARY KEY (eid,id));
The question is the following: I have several gyms and there are some cases where two or more gyms are located on the same district. How can I know, by district, which is the gym with less trainers on it?
The only thing I managed to get is the number of trainers per gym. Considering that, I can only get the gym with the minimum trainers from all districts...
NOTE: I am NOT allowed to use subqueries (SELECT inside SELECT's; SELECT inside FROM's)
Thank you in advance.

If you can have subqueries in the where or having clauseclause, then you can approach it this way.
First look at your query that counts the number of trainers by district.
Now, write a query (using that as a subquery) that calculates the minimum number of trainers in a gym in a district.
Next, take your first query, add a having clause and correlate it to the second by district and number of trainers.
This does use a subquery in a subquery, but it is in the having clause. I'm not writing the query for you, since you need to learn how to do that yourself.
By the way, if you have window/analytic functions there may be other solutions.

SQL Relation and Query

I am trying to create a database that contains two tables. I have included the create_tables.sql code if this helps. I am trying to set the relationship to make the STKEY the defining key so that a query can be used to search for thr key and show what issues this student has been having. At the moment when I search using:
SELECT *
FROM student, student_log
WHERE 'tilbun' like student.stkey
It shows all the issues in the table regardless of the STKEY. I think I may have the foreign key set incorrectly. I have included the create_tables.sql here.
CREATE TABLE `student`
(
`STKEY` VARCHAR(10),
`first_name` VARCHAR(15),
`surname` VARCHAR(15),
`year_group` VARCHAR(4),
PRIMARY KEY (STKEY)
)
;
CREATE TABLE `student_log`
(
`issue_number` int NOT NULL AUTO_INCREMENT,
`STKEY` VARCHAR(10),
`date_field` DATETIME,
`issue` VARCHAR(150),
PRIMARY KEY (issue_number),
INDEX (STKEY),
FOREIGN KEY (STKEY) REFERENCES student (STKEY)
)
;
Cheers for the help.

Though you have correctly defined the foreign key relationship in the tables, you must still specify a join condition when performing the query. Otherwise, you'll get a cartesian product of the two tables (all rows of one times all rows of the other)
SELECT
student.*,
student_log.*
FROM student INNER JOIN student_log ON student.STKEY = student_log.STKEY
WHERE student.STKEY LIKE 'tilbun'
And note that rather than using an implicit join (comma-separated list of tables), I have used an explicit INNER JOIN, which is the preferred modern syntax.
Finally, there's little use to using a LIKE clause instead of = unless you also use wildcard characters
WHERE student.STKEY LIKE '%tilbun%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas