how to remove duplicate results - sql

given the following schema:
CREATE TABLE IF NOT EXISTS companies (
id serial,
name text NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS cars (
id serial,
make text NOT NULL,
year integer NOT NULL,
company_id INTEGER REFERENCES companies(id),
PRIMARY KEY (id)
);
INSERT INTO companies (id, name) VALUES
(1, 'toyota'),
(2, 'chevy');
INSERT INTO cars (make, year, company_id) VALUES
('silverado', 1995, 2),
('malibu', 1999, 2),
('tacoma', 2017, 1),
('custom truck', 2010, null),
('van custom', 2005, null);
how do i select the rows for cars, only showing the newest car for a given company?
e.g.
select make, companies.name as model, year from cars
left join companies
on companies.id = cars.company_id
order by make;
outputs
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
silverado | chevy | 1995
tacoma | toyota | 2017
van custom | | 2005
but i only want to show the newest "chevy", e.g.
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
tacoma | toyota | 2017
van custom | | 2005
and still be able to sort by "make", and to show cars without a null company_id.
fiddle link:
https://www.db-fiddle.com/f/5Vh1sFXvEvnbnUJsCYhCHf/0

SQL can be done based on Set Math (discrete math). So, you want the set of all cars minus the set of cars whose years a less than the maximum year for a given company id.
The set of all cars:
select * from cars
The set of all cars whose year is less than the maximum year for a given company id:
select a.id from cars a, cars b where a.company_id = b.company_id and a.year < b.year
One set minus the other:
select * from cars where id not in (select a.id from cars a, cars b where a.company_id = b.company_id and a.year < b.year)
Result which includes the null company_ids because they are excluded from the id comparison:
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
tacoma | toyota | 2017
van custom | | 2005

With the help of common table expressions and row_number function, we can get the desired output and below is the query that gives the desired output.
WITH temp AS
(SELECT
make
, companies.name AS model
, year
, row_number() over(PARTITION BY coalesce(companies.name, make) ORDER BY year desc) as rnk
FROM
cars
left join
companies
ON
companies.id = cars.company_id
)
SELECT
make
, model
, year
FROM
temp
WHERE
rnk = 1
;

In Postgres, this is best done using distinct on:
select distinct on (co.id) ca.*, co.name as model
from cars ca left join
companies co
on ca.company_id = co.id
order by co.id, ca.year desc;
DISTINCT ON is very handy Postgres syntax. It keeps one row for each combination in parentheses. The specific row is determined by the ORDER BY clause.
However, you have a twist, because co.id can be null. In that case, you seem to want to keep all the cars with no company.
So:
select distinct on (co.id, case when co.id is null then ca.id end) ca.*, co.name
from cars ca left join
companies co
on ca.company_id = co.id
order by co.id, case when co.id is null then ca.id end, ca.year desc;
Or perhaps more simply using union all:
-- get the ones with a company
select distinct on (co.id) ca.*, co.name
from cars ca join
companies co
on ca.company_id = co.id
union all
-- get the ones with no company
select ca.*, null
from cars ca
where ca.company_id is null
order by year desc;
In other databases, this would typically be done using row_number():
select ca.*
from (select ca.*, co.name as model,
row_number() over (partition by co.id,
case when co.id is null then ca.id end
order by year desc
) as seqnum
from cars ca left join
companies co
on ca.company_id = co.id
) ca
where seqnum = 1

Related

SQL Max Value for a Specified Limit

I'm trying to return a list of years when certain conditions are met but I am having trouble with the MAX function and having it work with the rest of my logic.
For the following two tables:
coach
coach | team | wins | year
------+------+------+------
nk01 | a | 4 | 2000
vx92 | b | 1 | 2000
nk01 | b | 5 | 2003
vx92 | a | 2 | 2003
team
team | worldcupwin | year
-----+-------------+------
a | Y | 2000
b | N | 2000
a | Y | 2003
b | N | 2003
I want to get the following output:
years
-----
2000
Where the years printed are where the coaches' team with most wins during that year also won the world cup.
I decided to use the MAX function but quickly ran into the problem of not knowing how to use it to only be looking for max values for a certain year. This is what I've got so far:
SELECT y.year
FROM (SELECT c.year, MAX(c.wins), c.team
FROM coach AS c
WHERE c.year >= 1999
GROUP BY c.year, c.team) AS y, teams AS t
WHERE y.year = t.year AND t.worldcupwin = 'Y' AND y.team = t.team;
This query outputs all years greater than 1999 for me, rather than just those where a coach with the most wins also won the world cup.
(Using postgresql)
Any help is appreciated!
You can use correlated subquery
DEMO
SELECT c.year, c.team
FROM coachs AS c inner join teams t on c.team = t.team and c.year=t.year
WHERE c.year >= 1999 and exists (select 1 from coachs c1 where c.team=c1.team
having max(c1.wins)=c.wins)
and t.worldcupwin = 'Y'
OUTPUT:
year team
2000 a
The following query uses DISTINCT ON:
SELECT DISTINCT ON (year) c.year, wins, worldcupwin, c.team
FROM coach AS c
INNER JOIN team AS t ON c.team = t.team AND c.year = t.year
WHERE c.year > 1999
ORDER BY year, wins DESC
in order to return the records having the biggest number of wins per year
year wins worldcupwin team
---------------------------------
2000 4 Y a
2003 5 N b
Filtering out teams that didn't win the world cup:
SELECT year, team
FROM (
SELECT DISTINCT ON (year) c.year, wins, worldcupwin, c.team
FROM coach AS c
INNER JOIN team AS t ON c.team = t.team AND c.year = t.year
WHERE c.year > 1999
ORDER BY year, wins DESC) AS t
WHERE t.worldcupwin = 'Y'
ORDER BY year, wins DESC
gives the expected result:
year team
-------------
2000 a
Demo here
You can use the below to get the desired result:
EASY METHOD
SELECT TOP 1 c.year
FROM coach AS c INNER JOIN team AS t ON c.team = t.team AND c.year = t.year
WHERE t.worldcupwin = 'Y'
ORDER BY c.wins DESC;
use row_number() window function
select a.coach,a.team,a.win,a.year from
(select c.*,t.*,
row_number()over(order by wins desc) rn
from coach c join team t on c.team=t.team
where worldcupwin='Y'
) a where a.rn=1

Max function returning multiple values [SQL]

I have 3 tables: money, student, faculty. This query returns each faculty and highest stipend in each one of them.
select
f.name as "FACULTY_NAME",
max(stipend) as "MAX_STIPEND"
from
money m, student s
inner join
faculty f on f.id_faculty = s.faculty_id
where
m.student_id = s.id_student
group by
f.id_faculty, f.name;
Query works fine:
FACULTY_NAME | MAX_STIPEND
-----------------+---------------
IT Faculty | 50
Architecture | 60
Journalism | 40
However when I add s.name to original query to also show the name of the student who received max_stipend, query is not working like it used to - it returns all of the students
select
f.name as "FACULTY_NAME",s.name,
max(stipend) as "MAX_STIPEND"
from
money m, student s
inner join
faculty f on f.id_faculty = s.faculty_id
where
m.student_id = s.id_student
group by
f.id_faculty, f.name, s.name;
Query result:
FACULTY_NAME | s.name | MAX_STIPEND
----------------+-----------+---------------
IT Faculty | Joe | 50
IT Faculty | Lisa | 10
Architecture | Bob | 60
Journalism | Fred | 5
Architecture | Susan | 5
Journalism | Tom | 40
It does the same thing using right, left and inner joins. Can someone tell where the problem is?
First, you should be using proper JOIN syntax for all your joins.
Second, you can use Oracle's keep syntax:
select f.name as FACULTY_NAME,
max(stipend) as MAX_STIPEND,
max(s.name) keep (dense_rank first order by stipend desc)
from money m join
student s
on m.student_id = s.id_student join
faculty f
on f.id_faculty = s.faculty_id
group by f.id_faculty, f.name;
However when I add s.name to original query to also show the name of the student who received max_stipend, query is not working like it used to - it returns all of the students
When you add s.name you are looking for min value for each user.
If you want the name of user who has the MAX_STIPEND you should to move to window functions. For example Dense Rank in MS SQL Server.
with cte as
(select
f.name as "FACULTY_NAME",
s.name as "STUDENT_NAME",
stipend as "MAX_STIPEND",
DENSE_RANK() OVER
(PARTITION BY f.name, s.name ORDER BY i.stipend DESC) AS Rank
from
money m
inner join student s on m.student_id = s.id_student
inner join
faculty f on f.id_faculty = s.faculty_id
)
select "FACULTY_NAME", "STUDENT_NAME"
from cte
where rank = 1
Not all sql brands have windowed functions. Here the link for dense_rank on MySQL and also dense_Rank for Oracle

select the highest record between two table

I have two table. One table contains graduation records and the second table contains post graduation records. A candidate must have graduation, but it is not necessarily to have post graduation.
My question is to select the post graduation record if the candidate has post graduation else only graduation.
table 1 graduation_table
rollno | degree | division
--------------------------
001 | B.tech | 1st
002 | B.sc | 1st
003 | BA | 1st
table 2 postgraduation_table
rollno | degree | division
--------------------------
002 | M.sc | 1st
the result must be
rollno | degree | division
--------------------------
001 | B.tech | 1st
002 | M.sc | 1st
003 | BA | 1st
You want all rows from graduation_table which do not have a row in postgraduation_table plus those in postgraduation_table. This can be expressed with a not exists and union query:
select gt.rollno, gt.degree, gt.division
from graduation_table gt
where not exists (select *
from postgraduation_table pg
where pg.rollno = gt.rollno)
union all
select rollno, degree, division
from postgraduation_table
order by rollno;
Online example: http://rextester.com/IFCQR67320
select
rollno,
case when p.degree is null then g.degree else p.degree end as degree,
case when p.division is null then g.division else p.division end as division
from
grad g
left join
post p using (rollno)
Or better as suggested in the comments:
select
rollno,
coalesce (p.degree, g.degree) as degree,
coalesce (p.division, g.division) as division
from
grad g
left join
post p using (rollno)
Take a union of both tables, and introduce a position column, to rank the relative importance of the two tables. The postgraduate table has a pos value of 1, and the graduate table has a value of 2. Then, apply ROW_NUMBER() over this union query and assign a row number to each rollno group of records (presumed to be either one or at most two records). Finally, perform one more outer subquery to retain the most important record, postgraduate first, graduate second.
SELECT rollno, degree, division
FROM
(
SELECT
rollno, degree, division,
ROW_NUMBER() OVER (PARTITION BY rollno ORDER BY pos) rn
FROM
(
SELECT p.*, 1 AS pos p FROM postgraduation_table
UNION ALL
SELECT p.*, 2 FROM graduation_table p
) t
) t
WHERE t.rn = 1;
This should make your needs :
SELECT dg.rollno, CASE WHEN pg IS NOT NULL THEN pg.degree ELSE gd.degree END AS degree, dg.division
FROM graduation_table AS dg
LEFT OUTER JOIN postgraduation_table AS pg USING (rollno)
GROUP BY dg.rollno, dg.division;
Hope this help.

How to remove duplicate columns from join in SQL

I have the following code
SELECT *
FROM customer
INNER JOIN
(SELECT
customerid, newspapername, enddate, n.publishedby
FROM
newspapersubscription ns, newspaper n
WHERE
publishedby IN (SELECT publishedby
FROM newspaper
WHERE ns.newspapername = n.NewspaperName)
UNION
SELECT
customerid, Magazinename, enddate, m.publishedby
FROM
magazinesubscription ms, magazine m
WHERE
publishedby IN (SELECT publishedby
FROM magazine
WHERE ms.Magazinename = m.MagazineName)) ON customer.customerid = customerid
ORDER BY
customer.customerid;
The customer table has the following:
customerid | customername | customersaddress
This query returns the following result:
customerid | customername | customersaddress | customerid | newspapername | enddate| publishedby
What I actually want is
customerid | customername | customersaddress | newspapername | magazinename | enddate| publishedby
Here, the newspapername field should be blank if the magazinename is present and vice versa. Also, the duplicate field of customerid from the union operations should not be present, while in my result, the value of both the newspapername and the magazinename are put under newspapername title.
How can I do that?
Since you are querying the table with '*', you will always get all the columns in both tables. In order to omit this column, you will have to manually name all columns you DO want to query. To address your other need, you need to simply insert a dummy column to each clause in the union query. Below is an example that should work to allow for what you want -
SELECT customer.customerid, customer.customername, customer.customeraddress, newspapername, magazinename, enddate, publishedby
FROM customer
INNER JOIN
(select customerid, newspapername, null Magazinename, enddate, n.publishedby
from newspapersubscription ns, newspaper n
where publishedby in(select publishedby
from newspaper
where ns.newspapername = n.NewspaperName)
UNION
select customerid, null newspapername, Magazinename, enddate, m.publishedby
from magazinesubscription ms, magazine m
where publishedby in(select publishedby
from magazine
where ms.Magazinename = m.MagazineName))
on customer.customerid = customerid
ORDER BY customer.customerid;
To get the projection you want, build sub-queries of the right shape and UNION them to get the result set. UNION ALL is better than UNION because it avoids a sort: you know you'll get a distinct set because you're joining on two different tables.
select * from (
select customer.*
, n.newspapername
, null as magazinename
, ns.enddate
, n.publishedby
from customer
join newspapersubscription ns
on ns.customerid = customer.customerid
join newspaper n
on n.newspapername = ns.newspapername
union all
select customer.*
, null as newspapername
, m.magazinename
, ms.enddate
, m.publishedby
from customer
join magazinesubscription ms
on ms.customerid = customer.customerid
join magazine m
on m.magazinename = ms.magazinename
)
order by customerid, newspapername nulls last, magazinename ;
Here is the output from my toy data set (which lacks publishedby columns:
CUSTOMERID CUSTOMERNAME NEWSPAPERNAME MAGAZINENAME ENDDATE
---------- -------------------- ---------------------- ---------------------- ---------
10 DAISY-HEAD MAISIE THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE WHOVILLE TIMES 30-SEP-16
30 FOX-IN-SOCKS GREEN NEWS 31-DEC-17
30 FOX-IN-SOCKS TWEETLE BEETLE MONTHLY 31-DEC-16
40 THE LORAX GREEN NEWS 31-DEC-18
6 rows selected.
SQL>

Specific Ordering in SQL

I have a SQL Server 2008 database. In this database, I have a result set that looks like the following:
ID Name Department LastOrderDate
-- ---- ---------- -------------
1 Golf Balls Sports 01/01/2015
2 Compact Disc Electronics 02/01/2015
3 Tires Automotive 01/15/2015
4 T-Shirt Clothing 01/10/2015
5 DVD Electronics 01/07/2015
6 Tennis Balls Sports 01/09/2015
7 Sweatshirt Clothing 01/04/2015
...
For some reason, my users want to get the results ordered by department, then last order date. However, not by department name. Instead, the departments will be in a specific order. For example, they want to see the results ordered by Electronics, Automotive, Sports, then Clothing. To throw another kink in works, I cannot update the table schema.
Is there a way to do this with a SQL Query? If so, how? Currently, I'm stuck at
SELECT *
FROM
vOrders o
ORDER BY
o.LastOrderDate
Thank you!
You can use case expression ;
order by case when department = 'Electronics' then 1
when department = 'Automotive' then 2
when department = 'Sports' then 3
when department = 'Clothing' then 4
else 5 end
create a table for the departments that has the name (or better id) of the department and the display order. then join to that table and order by the display order column.
alternatively you can do a order by case:
ORDER BY CASE WHEN Department = 'Electronics' THEN 1
WHEN Department = 'Automotive' THEN 2
...
END
(that is not recommended for larger tables)
Here solution with CTE
with c (iOrder, dept)
as (
Select 1, 'Electronics'
Union
Select 2, 'Automotive'
Union
Select 3, 'Sports'
Union
Select 4, 'Clothing'
)
Select * from c
SELECT o.*
FROM
vOrders o join c
on c.dept = o.Department
ORDER BY
c.iOrder