How to count missing rows in left table after right join? - sql

There are two tables:
Table education_data (list of countries with values by year per measured indicator).
create table education_data
(country_id int,
indicator_id int,
year date,
value float
);
Table indicators (list of all indicators):
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
I want to find the indicators for which the highest number of countries lack information entirely
i.e. max (count of missing indicators by country)
I have solved the problem in excel (by counting blanks in a pivot table by country)
pivot table with count for missing indicators by country
I haven't figured our yet the SQL query to return the same results.
I am able to return the number of missing indicators for a set country , read query below, but not for all countries.
SELECT COUNT(*)
FROM education_data AS edu
RIGHT JOIN indicators AS ind ON
edu.indicator_id = ind.id and country_id = 10
WHERE value IS NULL
GROUP BY country_id
I have tried with a cross join without success so far.

You will have to join on the contries as well, otherwise you can not tell if a contry has no entry in education_data at all:
create table countries(id serial primary key, name varchar);
create table indicators
(id int PRIMARY KEY,
name varchar(200),
code varchar(25)
);
create table education_data
(country_id int references countries,
indicator_id int references indicators,
year date,
value float
);
insert into countries values (1,'USA');
insert into countries values (2,'Norway');
insert into countries values (3,'France');
insert into indicators values (1,'foo','xxx');
insert into indicators values (2,'bar', 'yyy');
insert into education_data values(1,1,'01-01-2020',1.1);
SELECT count (c.id), i.id, i.name
FROM countries c JOIN indicators i ON (true) LEFT JOIN education_data e ON(c.id = e.country_id AND i.id = e.indicator_id)
WHERE indicator_id IS NULL
GROUP BY i.id;
count | id | name
-------+----+------
3 | 2 | bar
2 | 1 | foo
(2 rows)

I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)
That's a logical contradiction. The ...
count of missing indicators by country
.. cannot be pinned on any specific indicators, since those countries don't have an indicator.
The counts per country with "missing indicator" (i.e. indicator_id IS NULL):
SELECT country_id, count(*) AS ct_indicator_null
FROM education_data
WHERE indicator_id IS NULL
GROUP BY country_id
ORDER BY count(*) DESC;
Or, more generally, without valid indicator, which also includes rows where indicator_id has no match in table indicators:
SELECT country_id, count(*) AS ct_no_valid_indicator
FROM education_data e
WHERE NOT EXISTS (
SELECT FROM indicators i
WHERE i.id = e.indicator_id
)
GROUP BY country_id
ORDER BY count(*) DESC;
NOT EXISTS is one of four basic techniques that apply here (LEFT / RIGHT JOIN, like you tried being another one). See:
Select rows which are not present in other table
You mentioned a country table. Countries without any indicator entries in education_data are not included in the result above. To find those, too:
SELECT *
FROM country c
WHERE NOT EXISTS (
SELECT
FROM education_data e
JOIN indicators i ON i.id = e.indicator_id -- INNER JOIN this time!
WHERE e.country_id = c.id
);
Reports countries without valid indicator (none, or not valid).
If every country should have a valid indicator, after cleaning up existing data, consider:
1: adding an FOREIGN KEY constraint to disallow invalid entries in education_data.indicator_id.
2: setting education_data.indicator_id NOT NULL to also disallow NULL entries.
Or add a PRIMARY KEY on (country_id, indicator_id), which makes both columns NOT NULL automatically.
.. which brings you closer to a valid many-to-many implementation. See:
How to implement a many-to-many relationship in PostgreSQL?

Related

SQL query is saying too many columns in sub query

I have two tables: a candidate table with candidate ID as a main key and the second table is one of educations linking candidate ID with the school they went to.
I want to filter schools where there are 50 or more candidates from that school. I also want the candidate names too.
select candidates.first_name, candidates.last_name
from candidates
where candidates.id IN (select e.candidate_id, e.school_name, count(e.school_name)
from educations e
group by e.candidate_id, e.school_name
having count(e.school_name) >= 50)
I'm getting an error that says:
Subquery has too many columns
When you are using a subquery inside an IN condition, your subquery can only return a single column.
As Stu already said in the coment, a EXISTS would be faster than an IN clause
In ana IN your subselect only can return so many columns as a defined by the column name(s) before the IN
This example of a query is for MySQL, but it should work on any Databse system
and of course is simplified
CREATE tABLE candidates (id int, first_name varchar(10), last_name varchar(10))
INSERT INTO candidates VALUEs(1,'a','an'),(2,'b','bn')
Records: 2 Duplicates: 0 Warnings: 0
create TablE educations (id int, candidate_id int,school_name varchar(10))
INSERT INTO educations VALUES (1,1,'school A'),(2,1,'school B'),(3,1,'school C'),(4,1,'school D')
,(5,1,'school E'),(6,2,'school A'),(7,2,'school B'),(9,2,'school C')
Records: 8 Duplicates: 0 Warnings: 0
select candidates.first_name, candidates.last_name
from candidates
where EXISTS (select 1
from educations e
WHERE e.candidate_id = candidates.id
having count(e.school_name) >= 5)
first_name
last_name
a
an
fiddle

How to join Views with aggregate functions?

My problem:
In #4, I'm having trouble joining two Views because the other has an aggregate function. Same with #5
Question:
Create a view name it as studentDetails, that would should show the student name, enrollment date, total price per unit and subject description of students who are enrolled on the subject Science or History.
Create a view, name it as BiggestPrice, that will show the subject id and highest total price per unit of all the subjects. The view should show only the highest total price per unit that are greater than 1000.
--4.) Create a view name it as studentDetails, that would should show the student name,
-- enrollment date the total price per unit and subject description of students who are
-- enrolled on the subject Science or History.
CREATE VIEW StudentDetails AS
SELECT StudName, EnrollmentDate
--5.) Create a view, name it as BiggestPrice, that will show the subject id and highest total
-- price per unit of all the subjects. The view should show only the highest total price per unit
-- that are greater than 1000.
CREATE VIEW BiggestPrice AS
SELECT SubjId, SUM(Max(Priceperunit)) FROM Student, Subject
GROUP BY Priceperunit
Here is my table:
CREATE TABLE Student(
StudentId char(5) not null,
StudName varchar2(50) not null,
Age NUMBER(3,0),
CONSTRAINT Student_StudentId PRIMARY KEY (StudentId)
);
CREATE table Enrollment(
EnrollmentId varchar2(10) not null,
EnrollmentDate date not null,
StudentId char(5) not null,
SubjId Number(5) not null,
constraint Enrollment_EnrollmentId primary key (EnrollmentId),
constraint Enrollment_StudentId_FK foreign key (StudentId) references Student(StudentId),
constraint Enrollment_SubjId_Fk foreign key (SubjId) references Subject(SubjId)
);
Create table Subject(
SubjId number(5,0) not null,
SubjDescription varchar2(200) not null,
Units number(3,0) not null,
Priceperunit number(9,0) not null,
Constraint Subject_SubjId_PK primary key (SubjId)
);
Since this appears to be a homework question.
You need to use JOINs. Your current query:
CREATE VIEW StudentDetails AS
SELECT StudName, EnrollmentDate
Does not have a FROM clause and the query you have for question 5 uses the legacy comma join syntax with no WHERE filter; this is the same as a CROSS JOIN and will connect every student to every subject and is not what you want.
Don't use the legacy comma join syntax and use ANSI joins and explicitly state the join condition.
SELECT <expression list>
FROM student s
INNER JOIN enrollment e ON ...
INNER JOIN subject j ON ...
Then you can fill in the ... based on the relationships between the tables (typically the primary key of one table = the foreign key of another table).
Then for the <expression list> you need to include the columns asked for in the question: student name and enrolment date and subject name would just be those columns from the appropriate tables; and total price-per-unit (which I assume is actually total-price-per-subject) would be a calculation.
Then for the last part of question 4.
who are enrolled on the subject Science or History.
Add a WHERE filter to only include rows for those subjects.
For question 5, you do not need any JOINS as the question only asks about details in the SUBJECT table.
You need to add a WHERE filter to show "only the highest total price per unit that are greater than 1000". This is a simple multiplication and then you can filter by comparing if it is > 1000.
Then you need to limit the query to return only the row with the "highest total price per unit of all the subjects". From Oracle 12, this would be done with an ORDER BY clause in descending order of total price and then using FETCH FIRST ROW ONLY or FETCH FIRST ROW WITH TIES.
Not sure if i get it fully, but i think its this :
Notes:
Always use Id's to filter records:
where su.SubjId in (1,2)
You can find max record using max() at subquery and join it with main query like this :
where su2.SubjId = su.SubjId
You cannot use alias as filter so you can filter it like:
( su.Units * su.Priceperunit ) > 1000
CREATE VIEW StudentDetails AS
select s.StudName,
e.EnrollmentDate,
su.SubjDescription,
su.Units * su.Priceperunit TotalPrice
from student s
inner join Enrollment e
on e.StudentId = s.StudentId
inner join Subject su
on su.SubjId = e.SubjId
where su.SubjId in (1,2)
CREATE VIEW BiggestPrice AS
select su.SubjId, ( su.Units * su.Priceperunit ) TotalPrice
from Subject su
where ( su.Units * su.Priceperunit ) =
(
select max(su2.Units * su2.Priceperunit)
from Subject su2
where su2.SubjId = su.SubjId
)
and ( su.Units * su.Priceperunit ) > 1000

How can I insert values from a nested table into another table?

I want to grab values from a nested table in one table and insert said values into another table
Here's the type for the nested table:
CREATE OR REPLACE TYPE type_val AS OBJECT
(
year DATE,
amount INTEGER
);
The nested table:
CREATE OR REPLACE TYPE nt_type_val IS
TABLE OF type_val;
Here's the table that contains the nested table:
CREATE TABLE country
(
id INTEGER NOT NULL,
name VARCHAR2(100) NOT NULL,
continent VARCHAR2(30) NOT NULL,
prod_an nt_type_val
)
NESTED TABLE prod_an STORE AS nt_prod_an;
Here's the table into which I want to insert
CREATE TABLE prod_country_ai
(
year DATE NOT NULL,
amount INTEGER NOT NULL,
country_fk INTEGER NOT NULL
)
What I want to do is I want to grab the values from prod_an in the country table for each country and store them in the prod_country_ai table, respectively, year and amount from the nested table (prod_an) into year and amount on prod_country_ai and the primary key from country into country_fk on prod_country_ai.
I have the following piece for a procedure that would do that:
DECLARE
CURSOR inner_table IS
SELECT t.* FROM country p, TABLE(p.prod_an) t
WHERE p.name = 'Portugal';
BEGIN
FOR i IN inner_table LOOP
dbms_output.put_line( i.year || i.quantity);
END LOOP;
END;
This successfully outputs the year followed by the amount but it only does so upon specification of the country name, the solution I thought of is running an "outer loop" that cycles on the country table (could be by id or by country name it doesn't change much because each value will be unique either way), and I'm guessing I can use i.year and i.quantity directly on an insert statement inside the "inner loop" to insert into prod_country_ai, but I'm not sure how I can do this, also, I think variables are treated as "local" inside a loop so how could I go about inserting the country primary key as a foreign key in the prod_country_ai table?
You don't need a procedure for this. You can do this with an INSERT ... SELECT from the countries cross joining the nested tables.
INSERT INTO prod_country_ai (year, amount, country_fk)
SELECT
p.year, p.amount, c.id
FROM
country c
CROSS JOIN TABLE(c.prod_an) p;

SQL DML Query AVG and COUNT

I am beginner at SQL and I am trying to create a query.
I have these tables:
CREATE TABLE Hospital (
hid INT PRIMARY KEY,
name VARCHAR(127) UNIQUE,
country VARCHAR(127),
area INT
);
CREATE TABLE Doctor (
ic INT PRIMARY KEY,
name VARCHAR(127),
date_of_birth INT,
);
CREATE TABLE Work (
hid INT,
ic INT,
since INT,
FOREIGN KEY (hid) REFERENCES Hospital (hid),
FOREIGN KEY (ic) REFERENCES Doctor (ic),
PRIMARY KEY (hid,ic)
);
The query is: What is the average in each country of the number of doctors working in hospitals of that country (1st column: each country, 2nd column: average)? Thanks.
You first need to write a query that counts the doctors per hospital
select w.hid, count(w.ic)
from work w
group by w.hid;
Based on that query, you can retrieve the average number of doctors per country:
with doctor_count as (
select w.hid, count(w.ic) as cnt
from work w
group by w.hid
)
select h.country, avg(dc.cnt)
from hospital h
join doctor_count dc on h.hid = dc.hid
group by h.country;
If you have an old DBMS that does not support common table expressions the above can be rewritten as:
select h.country, avg(dc.cnt)
from hospital h
join (
select w.hid, count(w.ic) as cnt
from work
group by w.hid
) dc on h.hid = dc.hid;
Here is an SQLFiddle demo: http://sqlfiddle.com/#!12/9ff79/1
Btw: storing date_of_birth as an integer is a bad choice. You should use a real DATE column.
And work is a reserved word in SQL. You shouldn't use that for a table name.

Join multiple tables, including one table twice, and sort by counting a group

I am an amateur just trying to finish his last question of his assignment (it is past due at this point, just looking for understanding) I sat and shot attempts at this for almost 5 hours now across two days, and have had no success.
I have tried looking through all the different types of joins, couldn't get grouping to work (ever) and have had little luck with the sorting as well. I can do all of these things one at a time, but the difficulty here was getting all of these things to work in union.
This is the question:
Write a SQL query to retrieve a list that has (source city, source code, destination city,
destination code, and number-of-flights) for all source-dest pairs with at least 2 flights. Order
by the number_of_flights. Note that the “dest”, and “source” attributes in the “flights” table
are both referenced to the “airportid” in the “airports” table.
Here are the tables I have to work with (also came with about 3000 lines of dummy entries)
create table airports (
airportid char(3) primary key,
city varchar(20)
);
create table airlines (
airlineid char(2) primary key,
name varchar(20),
hub char(3) references airports(airportid)
);
create table customers (
customerid char(10) primary key,
name varchar(25),
birthdate date,
frequentflieron char(2) references airlines(airlineid)
);
create table flights (
flightid char(6) primary key,
source char(3) references airports(airportid),
dest char(3) references airports(airportid),
airlineid char(2) references airlines(airlineid),
local_departing_time date,
local_arrival_time date
);
create table flown (
flightid char(6) references flights(flightid),
customerid char(10) references customers,
flightdate date
);
The first problem I ran in to was outputting airports.city twice in the same query but with different results. Not only that, but no matter what I tried when grouping I would always get the same result:
Not a GROUP BY expression
Normally I have fun trying to piece these together, but this has been frustrating. Help!
select source.airportid as source_airportid,
source.city source_city,
dest.airportid as dest_airportid,
dest.city as dest_city,
count(*) as flights
from flights
inner join airports source on source.airportid = flights.source
inner join airports dest on dest.airportid = flights.dest
group by
source.airportid,
source.city,
dest.airportid,
dest.city
having count(*) >= 2
order by 5;
Have you tried a subquery?
SELECT source_airports.city,
source_airports.airportid,
dest_airports.city,
dest_airports.airportid,
x.number_of_flights
FROM
(
SELECT source, dest, COUNT(*) as number_of_flights
FROM flights
GROUP BY source, dest
HAVING COUNT(*) > 1
) as x
INNER JOIN airports as dest_airports
ON dest_airports.airportid = x.dest
INNER JOIN airports as source_airports
ON source_airports.airportid = x.source
ORDER BY x.number_of_flights ASC