Better idea to select unique data from table - sql

I am learning SQL. I want to select employee (emp_name, emp_lname, project_name) which has only one project (not more or less).
I have 3 table in database:
Tables:
create table employee(
emp_id char (5) primary key,
emp_name nvarchar(15) not null,
emp_lname nvarchar(20)
);
create table project(
pr_id char(5) primary key,
project_name nvarchar(10) not null,
project_budjet int
);
create table employee_project(
emp_id char (5) foreign key references employee(emp_id),
pr_id char(5) foreign key references project(pr_id),
constraint premppk primary key(emp_id, pr_id)
);
I am trying to select only unique emp_id from employee_project.
This code gives me unique emp_id from employee_project
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
But I need emp_id and pr_id to select emp_name, emp_lname and project_name. I try to select pr_id too using emp_id what I have already. Code:
select ep.emp_id, ep.pr_id from employee_project as ep,
(
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
) CT
where CT.emp_id = ep.emp_id
Now I have everything to select everything what I need about these employee and project. Finally code:
select employee.emp_name, employee.emp_lname, project.project_name
from employee, project,
(
select ep.emp_id, ep.pr_id from employee_project as ep,
(
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
) CT
where CT.emp_id = ep.emp_id
) CK
where CK.emp_id = employee.emp_id and CK.pr_id = project.pr_id
Is there any way to do this easily.
Thanks for help.

Since there is only one project you are looking for you can use any aggregate function in the group to get the project too. I used min(pr_id) but you could also use avg() or max() for instance.
After that you can join the tables to get all the other column values.
select e.*, p.*
from
(
select emp_id, min(pr_id) as pr_id
from employee_project
group by emp_id
having count(pr_id) = 1
) ep
join employee e on e.emp_id = ep.emp_id
join project p on p.pr_id = ep.pr_id

Related

How to return ties from aggregation functions in SQLite

I have a homework question which requires me to return the name of the oldest child of a person, and in case of ties return all the ties.
The schema for the database is:
create table persons (
fname char(12),
lname char(12),
bdate date,
bplace char(20),
address char(30),
phone char(12),
primary key (fname, lname)
);
create table births (
regno int,
fname char(12),
lname char(12),
regdate date,
regplace char(20),
gender char(1),
f_fname char(12),
f_lname char(12),
m_fname char(12),
m_lname char(12),
primary key (regno),
foreign key (fname,lname) references persons,
foreign key (f_fname,f_lname) references persons,
foreign key (m_fname,m_lname) references persons
);
My current query is
SELECT fname, lname, min(bdate)
from (SELECT *
from persons
JOIN births using (fname, lname)
WHERE f_fname='Michael' and f_lname='Fox');
Where Michael Fox is the person in question. The expected output is
Q4|MFOld
Q4|MFOld2
however I am only able to get the first oldest child. I tried using a With statement, but we are not allowed to use views or temporary tables to answer this question. I also looked into using Rank (), but to my knowledge, that was introduced insqlite v3.25, but this question will be tested using v3.11. Any insight as to how the ties can be returned?
You can use RANK():
SELECT fname, lname, bdate
FROM (
SELECT
b.fname,
b.lname,
p.bdate,
RANK() OVER(ORDER BY bdate) rnk
from persons p
JOIN births b using (fname, lname)
WHERE b.f_fname='Michael' and b.f_lname='Fox'
) x
WHERE rnk = 1;
If you ever need to remove the where clause in order to get the oldest child(ren) for each person, then you would need to add a use a PARTITION:
SELECT fname, lname, bdate
FROM (
SELECT
b.fname,
b.lname,
p.bdate,
RANK() OVER(PARTITION BY b.f_fname, b.f_lname ORDER BY bdate) rnk
from persons p
JOIN births b using (fname, lname)
) x
WHERE rnk = 1;
In SQLite < 3.25, where window functions such as RANK() are not available, you can use a common table expression (available since version 3.8) to pull out all children of Michael Fox and use NOT EXISTS to filter on the oldest one(s):
WITH cte AS (
SELECT b.fname, b.lname, p.bdate
FROM persons p
JOIN births b using (fname, lname)
WHERE b.f_fname = 'Michael' AND b.f_lname = 'Fox'
)
SELECT *
FROM cte c
WHERE NOT EXISTS (
SELECT 1 FROM cte c1 WHERE c1.bdate < c.bdate
)
Why do you need min, itll be same for all rows as theres no group by just replace it with normal date
SELECT fname, lname, b_date,
row_number() over (partition by b_date
order by fname, lname)
from (SELECT *
from persons
JOIN births using (fname, lname)
WHERE f_fname='Michael' and
f_lname='Fox');

Find the manager_id with most people under supervision

I am having trouble displaying the most people under manager_id. The answer is manager_id = 100 but I can't seem to make a sql that displays it. Below are 2 tables that were created and given to me.
CREATE TABLE departments
( department_id NUMBER(4)
, department_name VARCHAR2(30)
CONSTRAINT dept_name_nn NOT NULL
, manager_id NUMBER(6)
, location_id NUMBER(4)
) ;
CREATE TABLE employees
( employee_id NUMBER(6)
, first_name VARCHAR2(20)
, last_name VARCHAR2(25)
CONSTRAINT emp_last_name_nn NOT NULL
, email VARCHAR2(25)
CONSTRAINT emp_email_nn NOT NULL
, phone_number VARCHAR2(20)
, hire_date DATE
CONSTRAINT emp_hire_date_nn NOT NULL
, job_id VARCHAR2(10)
CONSTRAINT emp_job_nn NOT NULL
, salary NUMBER(8,2)
, commission_pct NUMBER(2,2)
, manager_id NUMBER(6)
, department_id NUMBER(4)
, CONSTRAINT emp_salary_min
CHECK (salary > 0)
, CONSTRAINT emp_email_uk
UNIQUE (email)
) ;
Below is my code where I am trying to join the two tables employees and departments together to find manager_id between them with the most occurrence.
Every time I try to run my sql block it gives me an error like "ORA-00918: column ambiguously defined" or something is wrong with Limit 1
SELECT COUNT(Manager_id) into v_manager_id,
FROM departments d
RIGHT JOIN employees e
ON d.manager_id = e.manager_id
GROUP BY Manager_id
ORDER BY COUNT(Manager_id) DESC
LIMIT 1;
You should rather select count(employee_id) ... group by manager_id so for all employees who are under some manager that count is displayed and then check if that count is max under that manager or not
else it is alias or qualifier issue you should name manager_id as some alias.
this will work:
select manager_id
from (select manager_id,count(*)
from employees
group by manager_id
order by
count(*) desc)
where rownum<=1 ;
you can also use nested subquery like this:
create table ns_231(col1 number,col2 number);
insert into ns_231 values(1,1);
insert into ns_231 values(2,3);
insert into ns_231 values(3,3);
insert into ns_231 values(1,2);
insert into ns_231 values(2,5);
insert into ns_231 values(2,1);
insert into ns_231 values(3,1);
insert into ns_231 values(1,4);
SELECT * FROM ns_231;
commit;
select col1 from (select col1,count(*) from ns_231 group by col1 order by count(*) desc) where rownum<=1 ;
select col1 from ns_231 group by col1
having count(*)=(select max(total) from (select count(*) as total from
ns_231 group by col1));
output:
1
2
for your table the query is :
select manager_id from employees group by manager_id
having count(*)=(select max(total) from (select count(*) as total from
employees group by manager_id));
I think you need to add qualifier to manager_id because it occurs in both tables
SELECT COUNT(d.Manager_id) into v_manager_id,
FROM departments d
RIGHT JOIN employees e
ON d.manager_id = e.manager_id
GROUP BY d.Manager_id
ORDER BY COUNT(d.Manager_id) DESC
LIMIT 1;

why i'm not getting error? how does database understand relevant column in nested subquery?

Here is the scenario:
I have two tables department and employee. when i'm selecting a column from a table which doesn't exist in that table, it's throws error as expected. However, when i'm using subquery and again selecting the same column from the same table it's working. I don't understand how it can ignore my error.
create table department
( DEPT_ID NUMBER(2),
DEPT_NAME VARCHAR2(6) );
insert into department values(1,'ch');
create table employee
( EMP_ID NUMBER(2),
EMP_NAME VARCHAR2(6),
EMP_DEPT_ID NUMBER(2)
);
insert into employee values(0,'ch',1);
--getting error for below (ORA-00904: "DEPT_ID": invalid identifier)
select dept_id
from employee;
-- not getting any error and can see the output for below sql statement. How it can consider invalid column for employee table dept_id in this query.
select *
from department
where dept_id in
(
-- Incorrect column name
select dept_id
from employee
);
I have tried this with 2 RDBMS oracle and MSSQL. Case is the same with both. I didn't check with others
Since you don't qualify the columns, your query
select *
from department
where dept_id in
(
-- Incorrect column name
select dept_id
from employee
);
will be "evaluated" as
select d.*
from department d
where d.dept_id in
(
select d.dept_id
from employee e
);
A sub-query can reference its outer query's columns. Always qualify all columns when there are more than one table involved!
What you probably want is
select d.*
from department d
where d.dept_id in
(
select e.EMP_DEPT_ID
from employee e
);

Postgres most common value query

I am trying to figure out how to structure some queries, and I am a bit lost.
Tables:
CREATE TABLE dv_customer(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
CREATE TABLE dv_address(
address_id INTEGER PRIMARY KEY,
address VARCHAR(50),
address2 VARCHAR(50),
district VARCHAR(50),
city_id INTEGER,
postal_code VARCHAR(50),
phone VARCHAR(50)
);
CREATE TYPE MPAA_RATING AS ENUM(
'G',
'PG',
'PG-13',
'R',
'NC-17'
);
CREATE TABLE dv_film(
film_id INTEGER PRIMARY KEY,
title VARCHAR(50),
description TEXT,
length SMALLINT,
rating MPAA_RATING,
release_year SMALLINT
);
CREATE TABLE cb_customers(
last_name VARCHAR(50),
first_name VARCHAR(50),
PRIMARY KEY (last_name, first_name)
);
CREATE TABLE cb_books(
title VARCHAR(50),
author_id INTEGER,
edition SMALLINT,
publisher VARCHAR(50),
PRIMARY KEY (title, author_id, edition)
);
CREATE TABLE cb_authors(
author_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE mg_customers(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
I need to figure out the following Queries:
What are the first and last names of all customers who live in the district having the most customers?
So far:
SELECT x.first_name, x.last_name
FROM dv_customer x, dv_address y
WHERE x.address_id = y.address_id
AND (SELECT count(district)
FROM dv_address >= SELECT count(district) FROM dv_address
);
What are the first and last names of the top 10 authors when ranked by the number of books each has written? I want author name and book count, in descending order of book count.
So far:
SELECT x.first_name, x.last_name, count(y.title)
FROM cb_authors x, cb_books y
GROUP BY first_name, last_name
ORDER BY count(*) DESC
LIMIT 10;
I know these are a bit of a mess, but they are the only queries I can't seem to figure out. Any help would be appreciated. I am a Postgres noob and just trying to figure out how it works.
What are the first and last names of the top 10 authors when ranked by the number of books each has written
This kind of query is typically done using a window function:
select first_name, last_name, num_books
from (
SELECT x.first_name, x.last_name,
dense_rank() over (order by count(y.title) desc) as rnk,
count(*) as num_books
FROM cb_authors x
join cb_books y on x.author_id = y.author_id
GROUP BY x.author_id
) t
where rnk <= 10
Your from clause FROM cb_authors x, cb_books y is missing a join condition and thus creates a cartesian join between the two tables. It is a good example on why the implicit joins in the where clause are a bad thing. If you get in the habit of using an explicit JOIN operator you will never accidentally miss a join condition.
The above also uses x.author_id which is sufficient for grouping as it is the primary key of the column and all other (non-grouped) columns in the select list are functionally dependent on it.
The query below will give you the district with the most customers
select district
from dv_address
group by district
order by count(*) desc
limit 1
Then you can select all customers living in that district with a sub query
select c.* from dv_customer c
join dv_address a on c.address_id = a.address_id
where a.district = (
select district
from dv_address
group by district
order by count(*) desc
limit 1
)
Similarly you can get the top 10 author_id's with this query
select author_id
from cb_books
group by author_id
order by count(*) desc
limit 10
Similarly, with a dervied table
select a.*, t.cnt from cb_authors a
join (
select author_id, count(*) cnt
from cb_books
group by author_id
order by count(*) desc
limit 10
) t on t.author_id = a.author_id
order by t.cnt desc

SELECT rows from a table that don't have related entries in a second table

That's my first day in SQL using PostgreSQL 9.4 and I'm lost with some things. I think that I'm close but not enough:
Table definition:
CREATE TABLE DOCTOR (
Doc_Number INTEGER,
Name VARCHAR(50) NOT NULL,
Specialty VARCHAR(50) NOT NULL,
Address VARCHAR(50) NOT NULL,
City VARCHAR(30) NOT NULL,
Phone VARCHAR(10) NOT NULL,
Salary DECIMAL(8,2) NOT NULL,
DNI VARCHAR(10) UNIQUE
CONSTRAINT pk_Doctor PRIMARY KEY (Doc_Number)
);
CREATE TABLE VISIT (
Doc_Number INTEGER,
Pat_Number INTEGER,
Visit_Date DATE,
Price DECIMAL(7,2),
CONSTRAINT Visit_pk PRIMARY KEY (Doc_Number, Pat_Number, Visit_Date),
CONSTRAINT Visit_Doctor_fk FOREIGN KEY (Doc_Number) REFERENCES DOCTOR(Doc_Number),
CONSTRAINT Visit_PATIENT_fk FOREIGN KEY (Pat_Number) REFERENCES PATIENT(Pat_Number)
);
I need how to join these two queries into one:
SELECT d.City, d.Name
FROM DOCTOR d, VISIT v
WHERE d.Specialty = 'family and comunity'
ORDER BY d.Name;
SELECT * FROM VISIT
WHERE DATE (Visit_Date)<'01/01/2012'
OR DATE(Visit_Date)>'31/12/2013';
I tried something like this but it doesn't work. I need the doctors of that specialty that didn't do any visit in 2012 and 2013.
SELECT City, Name
FROM DOCTOR d
WHERE d.Specialty = 'family and comunity'
AND NOT IN(SELECT *
FROM VISIT
WHERE Visit_Date BETWEEN '2012-01-01' and '2013-12-31')
ORDER BY d.Name;
Can anyone help?
Alternative to the LEFT JOIN ... WHERE NULL construct is the plain WHERE NOT EXISTS(...) anti-join. [It is completely equivalent to erwin's query]
SELECT d.name, d.city
FROM doctor d
WHERE d.specialty = 'family and community'
AND NOT EXISTS (
SELECT 13
FROM visit v WHERE v.doc_number = d.doc_number
AND v.visit_date BETWEEN '2012-01-01' AND '2013-12-31'
)
ORDER BY d.name;
SELECT d.name, d.city
FROM doctor d
LEFT JOIN visit v ON v.doc_number = d.doc_number
AND v.visit_date BETWEEN '2012-01-01' AND '2013-12-31'
WHERE d.specialty = 'family and community' -- or 'family and comunity'?
AND v.doc_number IS NULL
ORDER BY d.name;
As commented you need a join condition. How are visits connected to doctors? Typically, you would have a visit.doctor_id referencing doctor.doctor_id.
Using LEFT JOIN / IS NULL to rule out doctors who have visits in said period. This is one of several possible techniques:
Select rows which are not present in other table
Dates must be greater than the lower bound AND smaller than the upper bound. OR would be wrong here.
It's better to use ISO 8601 date format which is unambiguous regardless of your locale.
field value between date range
You were almost there... Instead of
SELECT City, Name
FROM DOCTOR d
WHERE d.Specialty = 'family and comunity'
AND NOT IN(SELECT *
FROM VISIT
WHERE Visit_Date BETWEEN '2012-01-01' and '2013-12-31')
ORDER BY d.Name;
try
SELECT City, Name
FROM DOCTOR d
WHERE d.Specialty = 'family and comunity'
AND doc_number NOT IN(SELECT doc_number -- or SELECT DISTINCT doc number - to get fewer rows from the subquery
FROM VISIT
WHERE Visit_Date BETWEEN '2012-01-01' and '2013-12-31')
ORDER BY d.Name;
Just in case - table/column names are case-insensitive by default in Postgres. If you want them case sensitive - you have to write them in "" .
I finally found the solution, this is quite similar to your solutions, I post here to help another people with similar problems
SELECT City, Name
FROM DOCTOR d, VISIT v
WHERE d.Specialty = 'family and comunity'
AND not exists(SELECT *
FROM visit v WHERE v.doc_number = d.doc_number
AND v.visit_date BETWEEN '2012-01-01' AND '2013-12-31')
GROUP BY name, city
ORDER BY d.Name;
Thank you all for your help!