create view selecting the max in one column related with another column - sql

I need to create a view in PostgreSQL 9.4 about this table:
CREATE TABLE DOCTOR (
Doc_Number INTEGER,
Name VARCHAR(50) NOT NULL,
Specialty VARCHAR(50) NOT NULL,
Address VARCHAR(50) NOT NULL,
City VARCHAR(30) NOT NULL,
Phone VARCHAR(10) NOT NULL,
Salary DECIMAL(8,2) NOT NULL,
DNI VARCHAR(10) NOT NULL,
CONSTRAINT pk_Doctor PRIMARY KEY (Doc_Number)
);
The view will show the rank of the doctors with highest salary for each specialty, I tried this code but it shows all of the doctors fro each specialty:
CREATE VIEW top_specialty_doctors
AS (Select MAX(Salary), name, specialty from DOCTOR
where specialty = 'family and community'
or specialty = 'psychiatry'
or specialty = 'Rheumatology'
group by name, salary, specialty);
How can I do for the view shows only the doctor with highest salary for each specialty.

DISTINCT ON is a simple Postgres specific technique to get one winner per group. Details:
Select first row in each GROUP BY group?
CREATE VIEW top_specialty_doctors AS
SELECT DISTINCT ON (specialty)
salary, name, specialty
FROM doctor
WHERE specialty IN ('family and community', 'psychiatry', 'Rheumatology')
ORDER BY specialty, salary DESC, doc_number -- as tiebreaker
And you do not need parentheses around the query for CREATE VIEW.
If multiple docs tie for the highest salary, the one with the smallest doc_number is selected.
If salary can be NULL, use DESC NULLS LAST:
PostgreSQL sort by datetime asc, null first?
For big tables and certain data distributions other query techniques are superior:
Optimize GROUP BY query to retrieve latest record per user

Here's a query that shows the best doctor by salary for each of the specialties:
with specialty_ranks as (
select
Salary, name, specialty,
rank() over (
partition by specialty
order by salary desc
) as rank
from DOCTOR
where specialty in ('family and community', 'psychiatry', 'Rheumatology')
)
select specialty, name, salary
from specialty_ranks
where rank = 1;
The query uses a CTE and the RANK() window function to do the job. You might want to read their docs if you haven't used them before.

Without using Common Table Expressions or analytics, you can use an inline view/virtual table:
Create View top_specialty_doctors as
Select m.MaxSalary, d.Name, d.Specialty
From Doctor d
Join( -- Expose largest salary of each specialty
Select Specialty, Max( Salary) as MaxSalary
From Doctor
Group by Specialty
) as m
on m.Specialty = d.Specialty
and m.MaxSalary = d.Salary
Where specialty in( 'family and community', 'psychiatry', 'Rheumatology' );
Using a CTE instead of the inline view makes the query more readable and allows the query optimizer to turn out better performance (usually). They are really easy to learn.

Related

How to count last names in a table without duplicating employee ID

I have an employee table with duplicate instances of employees. For instance the last name Baba may show up 2 times with the same employee ID. I have to count last names from the table, but do not want to count the same one twice.
I am writing SQL in Postgres. Here is the table from which I draw my query:
CREATE TABLE Employee (
emp_no int NOT NULL,
birth_date date NOT NULL,
first_name varchar(100) NOT NULL,
last_name varchar(100) NOT NULL,
gender varchar(100) NOT NULL,
hire_date date NOT NULL,
CONSTRAINT pk_Salaries PRIMARY KEY (
emp_no
)
);
The data was given and contained duplicates. I cannot remove the duplicates but do not want to count them. Here is my query statement:
SELECT Employee.last_name, COUNT(Employee.last_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;
The output works well but I am sure it is counting some last names more than once.
I have tried adding a WHERE cause to get a count of last names where the emp_no is distinct but it does not work.
You want to count last names from the table, but do not count the same one twice.
So try this :
"SELECT Employee.last_name, COUNT(DISTINCT Employee.last_name) AS "Last Name Count" FROM Employee GROUP BY Employee.last_name"
The emp_no is a primary key, so it has to be unique and a where clause with distinct would have no impact. The query seems to be accurate, I'd be surprised if it's counting last names more than once.
Just use distinct keyword during applying the COUNT() aggregation :
SELECT e.last_name, COUNT(distinct e.last_name) AS "Last Name Count"
FROM Employee e
GROUP BY e.last_name
ORDER BY "Last Name Count" DESC;
You should try validating if the first name is counted uniquely by each last name
something like this
SELECT Employee.last_name, COUNT(distinct Employee.first_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;
see fiddle
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f0a9568e6cb5fb5e0247d2f2c5e95114
or if necessary check if more data is repeating in both lines, doing something like
select distinct * from (
SELECT Employee.last_name,
COUNT(*) over (partition by first_name, birth_date, last_name, gender) AS n
FROM Employee
) V
where n > 1
see the fiddle
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=223143f0d603abf30d99ad87fa07781e
Thank you all for your quick responses. They were all very good and helpful!
I ran the following code to find that I was wrong and each individual had only one instance in the table and had only one unique employee ID (emp_no).
SELECT Employee.emp_no, COUNT(Employee.emp_no) AS "Employee ID Count"
FROM Employee
GROUP BY Employee.emp_no
ORDER BY "Employee ID Count" ASC;
Again, thank you all very much!

Postgres most common value query

I am trying to figure out how to structure some queries, and I am a bit lost.
Tables:
CREATE TABLE dv_customer(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
CREATE TABLE dv_address(
address_id INTEGER PRIMARY KEY,
address VARCHAR(50),
address2 VARCHAR(50),
district VARCHAR(50),
city_id INTEGER,
postal_code VARCHAR(50),
phone VARCHAR(50)
);
CREATE TYPE MPAA_RATING AS ENUM(
'G',
'PG',
'PG-13',
'R',
'NC-17'
);
CREATE TABLE dv_film(
film_id INTEGER PRIMARY KEY,
title VARCHAR(50),
description TEXT,
length SMALLINT,
rating MPAA_RATING,
release_year SMALLINT
);
CREATE TABLE cb_customers(
last_name VARCHAR(50),
first_name VARCHAR(50),
PRIMARY KEY (last_name, first_name)
);
CREATE TABLE cb_books(
title VARCHAR(50),
author_id INTEGER,
edition SMALLINT,
publisher VARCHAR(50),
PRIMARY KEY (title, author_id, edition)
);
CREATE TABLE cb_authors(
author_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE mg_customers(
customer_id INTEGER PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(50),
address_id INTEGER,
active BOOLEAN
);
I need to figure out the following Queries:
What are the first and last names of all customers who live in the district having the most customers?
So far:
SELECT x.first_name, x.last_name
FROM dv_customer x, dv_address y
WHERE x.address_id = y.address_id
AND (SELECT count(district)
FROM dv_address >= SELECT count(district) FROM dv_address
);
What are the first and last names of the top 10 authors when ranked by the number of books each has written? I want author name and book count, in descending order of book count.
So far:
SELECT x.first_name, x.last_name, count(y.title)
FROM cb_authors x, cb_books y
GROUP BY first_name, last_name
ORDER BY count(*) DESC
LIMIT 10;
I know these are a bit of a mess, but they are the only queries I can't seem to figure out. Any help would be appreciated. I am a Postgres noob and just trying to figure out how it works.
What are the first and last names of the top 10 authors when ranked by the number of books each has written
This kind of query is typically done using a window function:
select first_name, last_name, num_books
from (
SELECT x.first_name, x.last_name,
dense_rank() over (order by count(y.title) desc) as rnk,
count(*) as num_books
FROM cb_authors x
join cb_books y on x.author_id = y.author_id
GROUP BY x.author_id
) t
where rnk <= 10
Your from clause FROM cb_authors x, cb_books y is missing a join condition and thus creates a cartesian join between the two tables. It is a good example on why the implicit joins in the where clause are a bad thing. If you get in the habit of using an explicit JOIN operator you will never accidentally miss a join condition.
The above also uses x.author_id which is sufficient for grouping as it is the primary key of the column and all other (non-grouped) columns in the select list are functionally dependent on it.
The query below will give you the district with the most customers
select district
from dv_address
group by district
order by count(*) desc
limit 1
Then you can select all customers living in that district with a sub query
select c.* from dv_customer c
join dv_address a on c.address_id = a.address_id
where a.district = (
select district
from dv_address
group by district
order by count(*) desc
limit 1
)
Similarly you can get the top 10 author_id's with this query
select author_id
from cb_books
group by author_id
order by count(*) desc
limit 10
Similarly, with a dervied table
select a.*, t.cnt from cb_authors a
join (
select author_id, count(*) cnt
from cb_books
group by author_id
order by count(*) desc
limit 10
) t on t.author_id = a.author_id
order by t.cnt desc

Calculate salary in exist

I have table employee and table family. Let's say that employee has name column and salary column.
Then I have to calculate their salary: 3% to salary of employee who have family and 2% to all who don't have family.
Do you have any idea how to do this? I know that I have to use exist but I don't know how to calculate the salary.
This table employee:
employeeID int,
Name varchar(10),
PhoneNumber varchar(20),
ICNumber varchar(15),
Salary decimal(5,2),
primary key(employeeId));
This is table family
familyId int,
name varchar(20),
family varchar(20),
address varchar(25),
phoneNumber varchar(20),
employeeID int,
primary key (employeeID),
FOREIGN KEY (employeeID) REFERENCES Employee(employeeID))
I'm assuming you wanted to select the +3% or +2% value rather than update?
SELECT
employeeid,
salary,
salary * loading_factor AS loading
FROM (
SELECT
employeeid,
salary,
CASE
WHEN EXISTS SELECT 1 FROM family f WHERE f.employeeID = e.employeeID THEN 0.03
ELSE 0.02
END AS loading_factor
FROM employee e
)
You can do it without the sub-select, but I thought it would read easier. If you want the total new salary amount rather than the increment, change the 0.02 / 0.03 to 1.02 / 1.03.
The SQL EXISTS condition is used in combination with a subquery and is considered to be met, if the subquery returns at least one row. It can be used in a SELECT, INSERT, UPDATE, or DELETE statement.
http://www.techonthenet.com/sql/exists.php
So let's see how we can use that to determine whether to update a particular salary by 3% or 2%.
For employees with a family:
UPDATE Employee
SET Salary = 1.03 * Salary
WHERE EXISTS (
SELECT 1 FROM Family WHERE Employee.employeeId = Family.employeeId
)
I'll leave the other case as an exercise to the reader (hint).
UPDATE
If you just want to select what the new salary should be
SELECT employeeId, 1.03 * Salary As NewSalary FROM Employee
WHERE EXISTS (
SELECT 1 FROM Family WHERE Employee.employeeId = Family.employeeId
)
Personally, I try not to use exists or not exists clauses when possible. In your case, you can get your desired result with a left outer join:
select
emp.name,
emp.salary,
case when fam.employeeID is null
then 0.02
else 0.03
end * emp.salary salary_adjusted
from
employee emp left join
family fam on emp.employeeID = fam.employeeID;

SQL query to find the average number of employees per company in a database

The goal of this query is to find the average number of employees per company in my database. Here is what I currently have for my query, along with my result:
create view Totals as
((select distinct count(company_name) as TotalCompanies, company_name
from company
group by company_name)
union
(select distinct count(Lastname) as TotalEmps, company_name
from Works
group by company_name));
select avg(TotalCompanies) from Totals;
--RESULT:
AVG(TOTALCOMPANIES)
2.777777777778
While I did get a result (so I do not believe there are any syntax errors), based on the actual data I've inputted into my tables, I do not believe this value is correct.
Is what I'm doing in my view creation even getting me to an appropriate point where I can just call the average function of the TotalCompanies field of that view? My intent was to count all the employees per company name in the view, and then average those values...
For an FYI, I am using SQL for Oracle 11g R2 and here is my initial schema I'm creating queries for:
create table Employee(
Lastname varchar(10),
FirstName varchar(10),
MidInitial char(1),
gender char(1),
street varchar(10),
city varchar(10),
primary key(Lastname, FirstName, MidInitial));
create table company(
company_name varchar(20),
city varchar(10),
primary key(company_name));
create table Works(
Lastname varchar(10),
FirstName varchar(10),
MidInitial char(1),
company_name varchar(20),
salary numeric(8,2),
primary key(Lastname, FirstName, MidInitial, company_name),
foreign key(Lastname, FirstName, MidInitial) references Employee,
foreign key(company_name) references company);
Thanks for the help!
Leaving aside issues about the schema, I believe the following will work:
Select
Avg(employeeCount)
From (
Select
company_name,
count(coalesce(w.LastName, w.FirstName, w.MidInitial)) employeeCount
From
company c
left outer join
works w
on c.company_name = w.company_name
Group By
company_name
) x
It's slightly tricky to deal with companies with 0 employees, and people with null as one of their names (I'm not sure if Oracle actually allows this in primary key columns, but the definition seems to imply it)
Create a view to store the total number of employees working in each employee.
create view Totals as
(select count(*) as No_of_employees,company_name from Employee,Works
where Employee.Lastname = Works.Lastname and
Employee.FirstName = Works.FirstName and
Employee.MidInitial = Works.MidInitial
group by Works.company_name);
Now write a query to find the average number of employees working in each company:
select avg(No_of_employees) from Totals;
This should give you the correct answer. I tried this using MySQL. Pardon me for any typos.

sql - find average salary for each department with more than five members

Not quite sure how to get this one. I have a staff table and I need to find the average salary. I know I can use use avg(). But the trick is I need to find the average for departments that have more than 5 staff members. I'm not sure if I should use group by or how to use it. Thanks!
CREATE TABLE STAFF (STAFF_ID CHAR(3),
STAFF_NAME CHAR(20),
GENDER CHAR(6),
DEPARTMENT CHAR(20),
BOSS_ID CHAR(3)
SALARY NUMBER(8,2));
select DEPARTMENT,count(STAFF_ID) as CountStaff, avg(SALARY) as AVGSalary
from STAFF
group by DEPARTMENT
having count(STAFF_ID) > 5