Postgres most common value query - sql

I am trying to figure out how to structure some queries, and I am a bit lost.
Tables:
CREATE TABLE dv_customer(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
CREATE TABLE dv_address(
address_id INTEGER PRIMARY KEY,
address VARCHAR(50),
address2 VARCHAR(50),
district VARCHAR(50),
city_id INTEGER,
postal_code VARCHAR(50),
phone VARCHAR(50)
);
CREATE TYPE MPAA_RATING AS ENUM(
'G',
'PG',
'PG-13',
'R',
'NC-17'
);
CREATE TABLE dv_film(
film_id INTEGER PRIMARY KEY,
title VARCHAR(50),
description TEXT,
length SMALLINT,
rating MPAA_RATING,
release_year SMALLINT
);
CREATE TABLE cb_customers(
last_name VARCHAR(50),
first_name VARCHAR(50),
PRIMARY KEY (last_name, first_name)
);
CREATE TABLE cb_books(
title VARCHAR(50),
author_id INTEGER,
edition SMALLINT,
publisher VARCHAR(50),
PRIMARY KEY (title, author_id, edition)
);
CREATE TABLE cb_authors(
author_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE mg_customers(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
I need to figure out the following Queries:
What are the first and last names of all customers who live in the district having the most customers?
So far:
SELECT x.first_name, x.last_name
FROM dv_customer x, dv_address y
WHERE x.address_id = y.address_id
AND (SELECT count(district)
FROM dv_address >= SELECT count(district) FROM dv_address
);
What are the first and last names of the top 10 authors when ranked by the number of books each has written? I want author name and book count, in descending order of book count.
So far:
SELECT x.first_name, x.last_name, count(y.title)
FROM cb_authors x, cb_books y
GROUP BY first_name, last_name
ORDER BY count(*) DESC
LIMIT 10;
I know these are a bit of a mess, but they are the only queries I can't seem to figure out. Any help would be appreciated. I am a Postgres noob and just trying to figure out how it works.

What are the first and last names of the top 10 authors when ranked by the number of books each has written
This kind of query is typically done using a window function:
select first_name, last_name, num_books
from (
SELECT x.first_name, x.last_name,
dense_rank() over (order by count(y.title) desc) as rnk,
count(*) as num_books
FROM cb_authors x
join cb_books y on x.author_id = y.author_id
GROUP BY x.author_id
) t
where rnk <= 10
Your from clause FROM cb_authors x, cb_books y is missing a join condition and thus creates a cartesian join between the two tables. It is a good example on why the implicit joins in the where clause are a bad thing. If you get in the habit of using an explicit JOIN operator you will never accidentally miss a join condition.
The above also uses x.author_id which is sufficient for grouping as it is the primary key of the column and all other (non-grouped) columns in the select list are functionally dependent on it.

The query below will give you the district with the most customers
select district
from dv_address
group by district
order by count(*) desc
limit 1
Then you can select all customers living in that district with a sub query
select c.* from dv_customer c
join dv_address a on c.address_id = a.address_id
where a.district = (
select district
from dv_address
group by district
order by count(*) desc
limit 1
)
Similarly you can get the top 10 author_id's with this query
select author_id
from cb_books
group by author_id
order by count(*) desc
limit 10
Similarly, with a dervied table
select a.*, t.cnt from cb_authors a
join (
select author_id, count(*) cnt
from cb_books
group by author_id
order by count(*) desc
limit 10
) t on t.author_id = a.author_id
order by t.cnt desc

Related

How to return ties from aggregation functions in SQLite

I have a homework question which requires me to return the name of the oldest child of a person, and in case of ties return all the ties.
The schema for the database is:
create table persons (
fname char(12),
lname char(12),
bdate date,
bplace char(20),
address char(30),
phone char(12),
primary key (fname, lname)
);
create table births (
regno int,
fname char(12),
lname char(12),
regdate date,
regplace char(20),
gender char(1),
f_fname char(12),
f_lname char(12),
m_fname char(12),
m_lname char(12),
primary key (regno),
foreign key (fname,lname) references persons,
foreign key (f_fname,f_lname) references persons,
foreign key (m_fname,m_lname) references persons
);
My current query is
SELECT fname, lname, min(bdate)
from (SELECT *
from persons
JOIN births using (fname, lname)
WHERE f_fname='Michael' and f_lname='Fox');
Where Michael Fox is the person in question. The expected output is
Q4|MFOld
Q4|MFOld2
however I am only able to get the first oldest child. I tried using a With statement, but we are not allowed to use views or temporary tables to answer this question. I also looked into using Rank (), but to my knowledge, that was introduced insqlite v3.25, but this question will be tested using v3.11. Any insight as to how the ties can be returned?
You can use RANK():
SELECT fname, lname, bdate
FROM (
SELECT
b.fname,
b.lname,
p.bdate,
RANK() OVER(ORDER BY bdate) rnk
from persons p
JOIN births b using (fname, lname)
WHERE b.f_fname='Michael' and b.f_lname='Fox'
) x
WHERE rnk = 1;
If you ever need to remove the where clause in order to get the oldest child(ren) for each person, then you would need to add a use a PARTITION:
SELECT fname, lname, bdate
FROM (
SELECT
b.fname,
b.lname,
p.bdate,
RANK() OVER(PARTITION BY b.f_fname, b.f_lname ORDER BY bdate) rnk
from persons p
JOIN births b using (fname, lname)
) x
WHERE rnk = 1;
In SQLite < 3.25, where window functions such as RANK() are not available, you can use a common table expression (available since version 3.8) to pull out all children of Michael Fox and use NOT EXISTS to filter on the oldest one(s):
WITH cte AS (
SELECT b.fname, b.lname, p.bdate
FROM persons p
JOIN births b using (fname, lname)
WHERE b.f_fname = 'Michael' AND b.f_lname = 'Fox'
)
SELECT *
FROM cte c
WHERE NOT EXISTS (
SELECT 1 FROM cte c1 WHERE c1.bdate < c.bdate
)
Why do you need min, itll be same for all rows as theres no group by just replace it with normal date
SELECT fname, lname, b_date,
row_number() over (partition by b_date
order by fname, lname)
from (SELECT *
from persons
JOIN births using (fname, lname)
WHERE f_fname='Michael' and
f_lname='Fox');

For each activity, show the volunteer highest number of points

CREATE DATABASE E_volunteerSy;
USE E_volunteerSy;
CREATE TABLE Participate_In
( V_ID INT(4) PRIMARY KEY,
Ename CHAR(20),
POINTS INT(255),
AName CHAR(20)
);
INSERT INTO Participate_In
values
(1001,'Name1',10,'A'),
(1002,'Name2',3,'A'),
(1003,'Name3',11,'B'),
(1004,'Name4',3,'B'),
(1005,'Name5',4,'B');
how can i wright a query that return the highest point for AName A and AName B.
AName meant activity name .
You can use the following using MAX and GROUP BY:
SELECT AName, MAX(POINTS) AS Points FROM Participate_In GROUP BY AName
demo: http://www.sqlfiddle.com/#!9/7b8ea6/1/0
This query will display the highest points for all the activities
SELECT Aname, MAX(points)
FROM Participate_In
GROUP BY Aname
This will select the top record by pointsw for each AName value
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY AName ORDER BY POINTS DESC) AS RowNum
FROM Participate_In
) Sub
WHERE RowNum = 1

Better idea to select unique data from table

I am learning SQL. I want to select employee (emp_name, emp_lname, project_name) which has only one project (not more or less).
I have 3 table in database:
Tables:
create table employee(
emp_id char (5) primary key,
emp_name nvarchar(15) not null,
emp_lname nvarchar(20)
);
create table project(
pr_id char(5) primary key,
project_name nvarchar(10) not null,
project_budjet int
);
create table employee_project(
emp_id char (5) foreign key references employee(emp_id),
pr_id char(5) foreign key references project(pr_id),
constraint premppk primary key(emp_id, pr_id)
);
I am trying to select only unique emp_id from employee_project.
This code gives me unique emp_id from employee_project
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
But I need emp_id and pr_id to select emp_name, emp_lname and project_name. I try to select pr_id too using emp_id what I have already. Code:
select ep.emp_id, ep.pr_id from employee_project as ep,
(
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
) CT
where CT.emp_id = ep.emp_id
Now I have everything to select everything what I need about these employee and project. Finally code:
select employee.emp_name, employee.emp_lname, project.project_name
from employee, project,
(
select ep.emp_id, ep.pr_id from employee_project as ep,
(
select emp_id, count(pr_id) from employee_project
group by emp_id having count(pr_id) = 1
) CT
where CT.emp_id = ep.emp_id
) CK
where CK.emp_id = employee.emp_id and CK.pr_id = project.pr_id
Is there any way to do this easily.
Thanks for help.
Since there is only one project you are looking for you can use any aggregate function in the group to get the project too. I used min(pr_id) but you could also use avg() or max() for instance.
After that you can join the tables to get all the other column values.
select e.*, p.*
from
(
select emp_id, min(pr_id) as pr_id
from employee_project
group by emp_id
having count(pr_id) = 1
) ep
join employee e on e.emp_id = ep.emp_id
join project p on p.pr_id = ep.pr_id

create view selecting the max in one column related with another column

I need to create a view in PostgreSQL 9.4 about this table:
CREATE TABLE DOCTOR (
Doc_Number INTEGER,
Name VARCHAR(50) NOT NULL,
Specialty VARCHAR(50) NOT NULL,
Address VARCHAR(50) NOT NULL,
City VARCHAR(30) NOT NULL,
Phone VARCHAR(10) NOT NULL,
Salary DECIMAL(8,2) NOT NULL,
DNI VARCHAR(10) NOT NULL,
CONSTRAINT pk_Doctor PRIMARY KEY (Doc_Number)
);
The view will show the rank of the doctors with highest salary for each specialty, I tried this code but it shows all of the doctors fro each specialty:
CREATE VIEW top_specialty_doctors
AS (Select MAX(Salary), name, specialty from DOCTOR
where specialty = 'family and community'
or specialty = 'psychiatry'
or specialty = 'Rheumatology'
group by name, salary, specialty);
How can I do for the view shows only the doctor with highest salary for each specialty.
DISTINCT ON is a simple Postgres specific technique to get one winner per group. Details:
Select first row in each GROUP BY group?
CREATE VIEW top_specialty_doctors AS
SELECT DISTINCT ON (specialty)
salary, name, specialty
FROM doctor
WHERE specialty IN ('family and community', 'psychiatry', 'Rheumatology')
ORDER BY specialty, salary DESC, doc_number -- as tiebreaker
And you do not need parentheses around the query for CREATE VIEW.
If multiple docs tie for the highest salary, the one with the smallest doc_number is selected.
If salary can be NULL, use DESC NULLS LAST:
PostgreSQL sort by datetime asc, null first?
For big tables and certain data distributions other query techniques are superior:
Optimize GROUP BY query to retrieve latest record per user
Here's a query that shows the best doctor by salary for each of the specialties:
with specialty_ranks as (
select
Salary, name, specialty,
rank() over (
partition by specialty
order by salary desc
) as rank
from DOCTOR
where specialty in ('family and community', 'psychiatry', 'Rheumatology')
)
select specialty, name, salary
from specialty_ranks
where rank = 1;
The query uses a CTE and the RANK() window function to do the job. You might want to read their docs if you haven't used them before.
Without using Common Table Expressions or analytics, you can use an inline view/virtual table:
Create View top_specialty_doctors as
Select m.MaxSalary, d.Name, d.Specialty
From Doctor d
Join( -- Expose largest salary of each specialty
Select Specialty, Max( Salary) as MaxSalary
From Doctor
Group by Specialty
) as m
on m.Specialty = d.Specialty
and m.MaxSalary = d.Salary
Where specialty in( 'family and community', 'psychiatry', 'Rheumatology' );
Using a CTE instead of the inline view makes the query more readable and allows the query optimizer to turn out better performance (usually). They are really easy to learn.

SQL query to find the average number of employees per company in a database

The goal of this query is to find the average number of employees per company in my database. Here is what I currently have for my query, along with my result:
create view Totals as
((select distinct count(company_name) as TotalCompanies, company_name
from company
group by company_name)
union
(select distinct count(Lastname) as TotalEmps, company_name
from Works
group by company_name));
select avg(TotalCompanies) from Totals;
--RESULT:
AVG(TOTALCOMPANIES)
2.777777777778
While I did get a result (so I do not believe there are any syntax errors), based on the actual data I've inputted into my tables, I do not believe this value is correct.
Is what I'm doing in my view creation even getting me to an appropriate point where I can just call the average function of the TotalCompanies field of that view? My intent was to count all the employees per company name in the view, and then average those values...
For an FYI, I am using SQL for Oracle 11g R2 and here is my initial schema I'm creating queries for:
create table Employee(
Lastname varchar(10),
FirstName varchar(10),
MidInitial char(1),
gender char(1),
street varchar(10),
city varchar(10),
primary key(Lastname, FirstName, MidInitial));
create table company(
company_name varchar(20),
city varchar(10),
primary key(company_name));
create table Works(
Lastname varchar(10),
FirstName varchar(10),
MidInitial char(1),
company_name varchar(20),
salary numeric(8,2),
primary key(Lastname, FirstName, MidInitial, company_name),
foreign key(Lastname, FirstName, MidInitial) references Employee,
foreign key(company_name) references company);
Thanks for the help!
Leaving aside issues about the schema, I believe the following will work:
Select
Avg(employeeCount)
From (
Select
company_name,
count(coalesce(w.LastName, w.FirstName, w.MidInitial)) employeeCount
From
company c
left outer join
works w
on c.company_name = w.company_name
Group By
company_name
) x
It's slightly tricky to deal with companies with 0 employees, and people with null as one of their names (I'm not sure if Oracle actually allows this in primary key columns, but the definition seems to imply it)
Create a view to store the total number of employees working in each employee.
create view Totals as
(select count(*) as No_of_employees,company_name from Employee,Works
where Employee.Lastname = Works.Lastname and
Employee.FirstName = Works.FirstName and
Employee.MidInitial = Works.MidInitial
group by Works.company_name);
Now write a query to find the average number of employees working in each company:
select avg(No_of_employees) from Totals;
This should give you the correct answer. I tried this using MySQL. Pardon me for any typos.