SQL DML Query AVG and COUNT - sql

I am beginner at SQL and I am trying to create a query.
I have these tables:
CREATE TABLE Hospital (
hid INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
country VARCHAR(127),
area INT
);
CREATE TABLE Doctor (
ic INT PRIMARY KEY,
name VARCHAR(127),
date_of_birth INT,
);
CREATE TABLE Work (
hid INT,
ic INT,
since INT,
FOREIGN KEY (hid) REFERENCES Hospital (hid),
FOREIGN KEY (ic) REFERENCES Doctor (ic),
PRIMARY KEY (hid,ic)
);
The query is: What is the average in each country of the number of doctors working in hospitals of that country (1st column: each country, 2nd column: average)? Thanks.

You first need to write a query that counts the doctors per hospital
select w.hid, count(w.ic)
from work w
group by w.hid;
Based on that query, you can retrieve the average number of doctors per country:
with doctor_count as (
select w.hid, count(w.ic) as cnt
from work w
group by w.hid
)
select h.country, avg(dc.cnt)
from hospital h
join doctor_count dc on h.hid = dc.hid
group by h.country;
If you have an old DBMS that does not support common table expressions the above can be rewritten as:
select h.country, avg(dc.cnt)
from hospital h
join (
select w.hid, count(w.ic) as cnt
from work
group by w.hid
) dc on h.hid = dc.hid;
Here is an SQLFiddle demo: http://sqlfiddle.com/#!12/9ff79/1
Btw: storing date_of_birth as an integer is a bad choice. You should use a real DATE column.
And work is a reserved word in SQL. You shouldn't use that for a table name.

Related

How to join Views with aggregate functions?

My problem:
In #4, I'm having trouble joining two Views because the other has an aggregate function. Same with #5
Question:
Create a view name it as studentDetails, that would should show the student name, enrollment date, total price per unit and subject description of students who are enrolled on the subject Science or History.
Create a view, name it as BiggestPrice, that will show the subject id and highest total price per unit of all the subjects. The view should show only the highest total price per unit that are greater than 1000.
--4.) Create a view name it as studentDetails, that would should show the student name,
-- enrollment date the total price per unit and subject description of students who are
-- enrolled on the subject Science or History.
CREATE VIEW StudentDetails AS
SELECT StudName, EnrollmentDate
--5.) Create a view, name it as BiggestPrice, that will show the subject id and highest total
-- price per unit of all the subjects. The view should show only the highest total price per unit
-- that are greater than 1000.
CREATE VIEW BiggestPrice AS
SELECT SubjId, SUM(Max(Priceperunit)) FROM Student, Subject
GROUP BY Priceperunit
Here is my table:
CREATE TABLE Student(
StudentId char(5) not null,
StudName varchar2(50) not null,
Age NUMBER(3,0),
CONSTRAINT Student_StudentId PRIMARY KEY (StudentId)
);
CREATE table Enrollment(
EnrollmentId varchar2(10) not null,
EnrollmentDate date not null,
StudentId char(5) not null,
SubjId Number(5) not null,
constraint Enrollment_EnrollmentId primary key (EnrollmentId),
constraint Enrollment_StudentId_FK foreign key (StudentId) references Student(StudentId),
constraint Enrollment_SubjId_Fk foreign key (SubjId) references Subject(SubjId)
);
Create table Subject(
SubjId number(5,0) not null,
SubjDescription varchar2(200) not null,
Units number(3,0) not null,
Priceperunit number(9,0) not null,
Constraint Subject_SubjId_PK primary key (SubjId)
);
Since this appears to be a homework question.
You need to use JOINs. Your current query:
CREATE VIEW StudentDetails AS
SELECT StudName, EnrollmentDate
Does not have a FROM clause and the query you have for question 5 uses the legacy comma join syntax with no WHERE filter; this is the same as a CROSS JOIN and will connect every student to every subject and is not what you want.
Don't use the legacy comma join syntax and use ANSI joins and explicitly state the join condition.
SELECT <expression list>
FROM student s
INNER JOIN enrollment e ON ...
INNER JOIN subject j ON ...
Then you can fill in the ... based on the relationships between the tables (typically the primary key of one table = the foreign key of another table).
Then for the <expression list> you need to include the columns asked for in the question: student name and enrolment date and subject name would just be those columns from the appropriate tables; and total price-per-unit (which I assume is actually total-price-per-subject) would be a calculation.
Then for the last part of question 4.
who are enrolled on the subject Science or History.
Add a WHERE filter to only include rows for those subjects.
For question 5, you do not need any JOINS as the question only asks about details in the SUBJECT table.
You need to add a WHERE filter to show "only the highest total price per unit that are greater than 1000". This is a simple multiplication and then you can filter by comparing if it is > 1000.
Then you need to limit the query to return only the row with the "highest total price per unit of all the subjects". From Oracle 12, this would be done with an ORDER BY clause in descending order of total price and then using FETCH FIRST ROW ONLY or FETCH FIRST ROW WITH TIES.
Not sure if i get it fully, but i think its this :
Notes:
Always use Id's to filter records:
where su.SubjId in (1,2)
You can find max record using max() at subquery and join it with main query like this :
where su2.SubjId = su.SubjId
You cannot use alias as filter so you can filter it like:
( su.Units * su.Priceperunit ) > 1000
CREATE VIEW StudentDetails AS
select s.StudName,
e.EnrollmentDate,
su.SubjDescription,
su.Units * su.Priceperunit TotalPrice
from student s
inner join Enrollment e
on e.StudentId = s.StudentId
inner join Subject su
on su.SubjId = e.SubjId
where su.SubjId in (1,2)
CREATE VIEW BiggestPrice AS
select su.SubjId, ( su.Units * su.Priceperunit ) TotalPrice
from Subject su
where ( su.Units * su.Priceperunit ) =
(
select max(su2.Units * su2.Priceperunit)
from Subject su2
where su2.SubjId = su.SubjId
)
and ( su.Units * su.Priceperunit ) > 1000

For each ‘CptS’ course, find the percentage of the students who failed the course. Assume a passing grade is 2 or above

CREATE TABLE Course (
courseno VARCHAR(7),
credits INTEGER NOT NULL,
enroll_limit INTEGER,
classroom VARCHAR(10),
PRIMARY KEY(courseNo), );
CREATE TABLE Student (
sID CHAR(8),
sName VARCHAR(30),
major VARCHAR(10),
trackcode VARCHAR(10),
PRIMARY KEY(sID),
FOREIGN KEY (major,trackcode) REFERENCES Tracks(major,trackcode) );
CREATE TABLE Enroll (
courseno VARCHAR(7),
sID CHAR(8),
grade FLOAT NOT NULL,
PRIMARY KEY (courseNo, sID),
FOREIGN KEY (courseNo) REFERENCES Course(courseNo),
FOREIGN KEY (sID) REFERENCES Student(sID) );
So far I've been able to create two seperate queries, one that counts the number of people who failes. And the other counts the number of people who passed. I'm having trouble combining these to produce the number of people passed / number of people failed. For each course.
SELECT course.courseno, COUNT(*) FROM course inner join enroll on enroll.courseno = course.courseno
WHERE course.courseno LIKE 'CptS%' and enroll.grade < 2
GROUP BY course.courseno;
SELECT course.courseno, COUNT(*) FROM course inner join enroll on enroll.courseno = course.courseno
WHERE course.courseno LIKE 'CptS%' and enroll.grade > 2
GROUP BY course.courseno;
The end result should look something like
courseno passrate
CptS451 100
CptS323 100
CptS423 66
You can do a conditional average for this:
select
courseno,
avg(case when grade > 2 then 100.0 else 0 end) passrate
from enroll
where courseno like 'CptS%'

How to count missing rows in left table after right join?

There are two tables:
Table education_data (list of countries with values by year per measured indicator).
create table education_data
(country_id int,
indicator_id int,
year date,
value float
);
Table indicators (list of all indicators):
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
I want to find the indicators for which the highest number of countries lack information entirely
i.e. max (count of missing indicators by country)
I have solved the problem in excel (by counting blanks in a pivot table by country)
pivot table with count for missing indicators by country
I haven't figured our yet the SQL query to return the same results.
I am able to return the number of missing indicators for a set country , read query below, but not for all countries.
SELECT COUNT(*)
FROM education_data AS edu
RIGHT JOIN indicators AS ind ON
edu.indicator_id = ind.id and country_id = 10
WHERE value IS NULL
GROUP BY country_id
I have tried with a cross join without success so far.
You will have to join on the contries as well, otherwise you can not tell if a contry has no entry in education_data at all:
create table countries(id serial primary key, name varchar);
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
create table education_data
(country_id int references countries,
indicator_id int references indicators,
year date,
value float
);
insert into countries values (1,'USA');
insert into countries values (2,'Norway');
insert into countries values (3,'France');
insert into indicators values (1,'foo','xxx');
insert into indicators values (2,'bar', 'yyy');
insert into education_data values(1,1,'01-01-2020',1.1);
SELECT count (c.id), i.id, i.name
FROM countries c JOIN indicators i ON (true) LEFT JOIN education_data e ON(c.id = e.country_id AND i.id = e.indicator_id)
WHERE indicator_id IS NULL
GROUP BY i.id;
count | id | name
-------+----+------
3 | 2 | bar
2 | 1 | foo
(2 rows)
I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)
That's a logical contradiction. The ...
count of missing indicators by country
.. cannot be pinned on any specific indicators, since those countries don't have an indicator.
The counts per country with "missing indicator" (i.e. indicator_id IS NULL):
SELECT country_id, count(*) AS ct_indicator_null
FROM education_data
WHERE indicator_id IS NULL
GROUP BY country_id
ORDER BY count(*) DESC;
Or, more generally, without valid indicator, which also includes rows where indicator_id has no match in table indicators:
SELECT country_id, count(*) AS ct_no_valid_indicator
FROM education_data e
WHERE NOT EXISTS (
SELECT FROM indicators i
WHERE i.id = e.indicator_id
)
GROUP BY country_id
ORDER BY count(*) DESC;
NOT EXISTS is one of four basic techniques that apply here (LEFT / RIGHT JOIN, like you tried being another one). See:
Select rows which are not present in other table
You mentioned a country table. Countries without any indicator entries in education_data are not included in the result above. To find those, too:
SELECT *
FROM country c
WHERE NOT EXISTS (
SELECT
FROM education_data e
JOIN indicators i ON i.id = e.indicator_id -- INNER JOIN this time!
WHERE e.country_id = c.id
);
Reports countries without valid indicator (none, or not valid).
If every country should have a valid indicator, after cleaning up existing data, consider:
1: adding an FOREIGN KEY constraint to disallow invalid entries in education_data.indicator_id.
2: setting education_data.indicator_id NOT NULL to also disallow NULL entries.
Or add a PRIMARY KEY on (country_id, indicator_id), which makes both columns NOT NULL automatically.
.. which brings you closer to a valid many-to-many implementation. See:
How to implement a many-to-many relationship in PostgreSQL?

Find the rating that given to a certian game by each member who rated it in SQL

CREATE TABLE members
(
name varchar(40),
ID char(6) PRIMARY KEY
);
CREATE TABLE games
(
name varchar(100),
ID serial PRIMARY KEY
);
CREATE TABLE ratings
(
memberID char(6) REFERENCES members(ID),
rating SMALLINT CHECK(rating >= 1 AND rating <= 8),
gameID integer REFERENCES games(ID),
PRIMARY KEY (memberID, gameID)
);
I am trying to find all the ratings that were given to the game that has an ID of (2) following by each member who rated it.
I used:
SELECT rating, name
FROM ratings, members
WHERE gameID = 2;
Whenever i used this command, it gives me the correct rating value but it lists all the members even if the member did not rate the game. Can someone help to figure out how to solve the problem.
thanks all in advance
You're looking for JOIN:
SELECT rating, name
FROM ratings r
INNER JOIN members m
ON r.MemberID = m.ID
WHERE gameID = 2;
try:
SELECT rating, name
FROM ratings, members
WHERE gameID = 2 and ratings.memberID = members.id;

Join multiple tables, including one table twice, and sort by counting a group

I am an amateur just trying to finish his last question of his assignment (it is past due at this point, just looking for understanding) I sat and shot attempts at this for almost 5 hours now across two days, and have had no success.
I have tried looking through all the different types of joins, couldn't get grouping to work (ever) and have had little luck with the sorting as well. I can do all of these things one at a time, but the difficulty here was getting all of these things to work in union.
This is the question:
Write a SQL query to retrieve a list that has (source city, source code, destination city,
destination code, and number-of-flights) for all source-dest pairs with at least 2 flights. Order
by the number_of_flights. Note that the “dest”, and “source” attributes in the “flights” table
are both referenced to the “airportid” in the “airports” table.
Here are the tables I have to work with (also came with about 3000 lines of dummy entries)
create table airports (
airportid char(3) primary key,
city varchar(20)
);
create table airlines (
airlineid char(2) primary key,
name varchar(20),
hub char(3) references airports(airportid)
);
create table customers (
customerid char(10) primary key,
name varchar(25),
birthdate date,
frequentflieron char(2) references airlines(airlineid)
);
create table flights (
flightid char(6) primary key,
source char(3) references airports(airportid),
dest char(3) references airports(airportid),
airlineid char(2) references airlines(airlineid),
local_departing_time date,
local_arrival_time date
);
create table flown (
flightid char(6) references flights(flightid),
customerid char(10) references customers,
flightdate date
);
The first problem I ran in to was outputting airports.city twice in the same query but with different results. Not only that, but no matter what I tried when grouping I would always get the same result:
Not a GROUP BY expression
Normally I have fun trying to piece these together, but this has been frustrating. Help!
select source.airportid as source_airportid,
source.city source_city,
dest.airportid as dest_airportid,
dest.city as dest_city,
count(*) as flights
from flights
inner join airports source on source.airportid = flights.source
inner join airports dest on dest.airportid = flights.dest
group by
source.airportid,
source.city,
dest.airportid,
dest.city
having count(*) >= 2
order by 5;
Have you tried a subquery?
SELECT source_airports.city,
source_airports.airportid,
dest_airports.city,
dest_airports.airportid,
x.number_of_flights
FROM
(
SELECT source, dest, COUNT(*) as number_of_flights
FROM flights
GROUP BY source, dest
HAVING COUNT(*) > 1
) as x
INNER JOIN airports as dest_airports
ON dest_airports.airportid = x.dest
INNER JOIN airports as source_airports
ON source_airports.airportid = x.source
ORDER BY x.number_of_flights ASC