How to remove duplicate columns from join in SQL - sql

I have the following code
SELECT *
FROM customer
INNER JOIN
(SELECT
customerid, newspapername, enddate, n.publishedby
FROM
newspapersubscription ns, newspaper n
WHERE
publishedby IN (SELECT publishedby
FROM newspaper
WHERE ns.newspapername = n.NewspaperName)
UNION
SELECT
customerid, Magazinename, enddate, m.publishedby
FROM
magazinesubscription ms, magazine m
WHERE
publishedby IN (SELECT publishedby
FROM magazine
WHERE ms.Magazinename = m.MagazineName)) ON customer.customerid = customerid
ORDER BY
customer.customerid;
The customer table has the following:
customerid | customername | customersaddress
This query returns the following result:
customerid | customername | customersaddress | customerid | newspapername | enddate| publishedby
What I actually want is
customerid | customername | customersaddress | newspapername | magazinename | enddate| publishedby
Here, the newspapername field should be blank if the magazinename is present and vice versa. Also, the duplicate field of customerid from the union operations should not be present, while in my result, the value of both the newspapername and the magazinename are put under newspapername title.
How can I do that?

Since you are querying the table with '*', you will always get all the columns in both tables. In order to omit this column, you will have to manually name all columns you DO want to query. To address your other need, you need to simply insert a dummy column to each clause in the union query. Below is an example that should work to allow for what you want -
SELECT customer.customerid, customer.customername, customer.customeraddress, newspapername, magazinename, enddate, publishedby
FROM customer
INNER JOIN
(select customerid, newspapername, null Magazinename, enddate, n.publishedby
from newspapersubscription ns, newspaper n
where publishedby in(select publishedby
from newspaper
where ns.newspapername = n.NewspaperName)
UNION
select customerid, null newspapername, Magazinename, enddate, m.publishedby
from magazinesubscription ms, magazine m
where publishedby in(select publishedby
from magazine
where ms.Magazinename = m.MagazineName))
on customer.customerid = customerid
ORDER BY customer.customerid;

To get the projection you want, build sub-queries of the right shape and UNION them to get the result set. UNION ALL is better than UNION because it avoids a sort: you know you'll get a distinct set because you're joining on two different tables.
select * from (
select customer.*
, n.newspapername
, null as magazinename
, ns.enddate
, n.publishedby
from customer
join newspapersubscription ns
on ns.customerid = customer.customerid
join newspaper n
on n.newspapername = ns.newspapername
union all
select customer.*
, null as newspapername
, m.magazinename
, ms.enddate
, m.publishedby
from customer
join magazinesubscription ms
on ms.customerid = customer.customerid
join magazine m
on m.magazinename = ms.magazinename
)
order by customerid, newspapername nulls last, magazinename ;
Here is the output from my toy data set (which lacks publishedby columns:
CUSTOMERID CUSTOMERNAME NEWSPAPERNAME MAGAZINENAME ENDDATE
---------- -------------------- ---------------------- ---------------------- ---------
10 DAISY-HEAD MAISIE THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE DAILY BUGLE 30-SEP-17
30 FOX-IN-SOCKS THE WHOVILLE TIMES 30-SEP-16
30 FOX-IN-SOCKS GREEN NEWS 31-DEC-17
30 FOX-IN-SOCKS TWEETLE BEETLE MONTHLY 31-DEC-16
40 THE LORAX GREEN NEWS 31-DEC-18
6 rows selected.
SQL>

Related

SQL MAX aggregate function not bringing the latest date

Purpose: I am trying to find the max date of when the teachers made a purchase and type.
Orders table
ID
Ordertype
Status
TeacherID
PurchaseDate
SchoolID
TeacherassistantID
1
Pencils
Completed
1
1/1/2021
1
1
2
Paper
Completed
1
3/5/2021
1
1
3
Notebooks
Completed
1
4/1/2021
1
1
4
Erasers
Completed
2
2/1/2021
2
2
Teachers table
TeacherID
Teachername
1
Mary Smith
2
Jason Crane
School table
ID
schoolname
1
ABC school
2
PS1
3
PS2
Here is my attempted code:
SELECT o.ordertype, o.status, t.Teachername, s.schoolname
,MAX(o.Purchasedate) OVER (PARTITION by t.ID) last_purchase
FROM orders o
INNER JOIN teachers t ON t.ID=o.TeacherID
INNER JOIN schools s ON s.ID=o.schoolID
WHERE o.status in ('Completed','In-progress')
AND o.ordertype not like 'notebook'
It should look like this:
Ordertype
Status
teachername
last_purchase
schoolname
Paper
Completed
Mary Smith
3/5/2021
ABC School
Erasers
Completed
PS1
2/1/2021
ABC school
It is bringing multiple rows instead of just the latest purchase date and its associated rows. I think i need a subquery.
Aggregation functions are not appropriate for what you are trying to do. Their purpose is to summarize values in multiple rows, not to choose a particular row.
Just a window function does not filter any rows.
You want to use window functions with filtering:
SELECT ordertype, status, Teachername, schoolname, Purchasedate
FROM (SELECT o.ordertype, o.status, t.Teachername, s.schoolname,
o.Purchasedate,
ROW_NUMBER() OVER (PARTITION by t.ID ORDER BY o.PurchaseDate DESC) as seqnum
FROM orders o JOIN
teachers t
ON t.ID = o.TeacherID
schools s
ON s.ID = o.schoolID
WHERE o.status in ('Completed', 'In-progress') AND
o.ordertype not like 'notebook'
) o
WHERE seqnum = 1;
You can use it in different way. it's better to use Group By for grouping the other columns and after that use Order by for reorder all records just like bellow.
SELECT top 1 o.ordertype, o.status, t.Teachername, s.schoolname
,o.Purchasedate
FROM orders o
INNER JOIN teachers t ON t.ID=o.TeacherID
INNER JOIN schools s ON s.ID=o.schoolID
having o.status in ('Completed','In-progress')
AND o.ordertype not like 'notebook'
group by o.ordertype, o.status, t.Teachername, s.schoolname
order by o.Purchasedate Desc

how to remove duplicate results

given the following schema:
CREATE TABLE IF NOT EXISTS companies (
id serial,
name text NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS cars (
id serial,
make text NOT NULL,
year integer NOT NULL,
company_id INTEGER REFERENCES companies(id),
PRIMARY KEY (id)
);
INSERT INTO companies (id, name) VALUES
(1, 'toyota'),
(2, 'chevy');
INSERT INTO cars (make, year, company_id) VALUES
('silverado', 1995, 2),
('malibu', 1999, 2),
('tacoma', 2017, 1),
('custom truck', 2010, null),
('van custom', 2005, null);
how do i select the rows for cars, only showing the newest car for a given company?
e.g.
select make, companies.name as model, year from cars
left join companies
on companies.id = cars.company_id
order by make;
outputs
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
silverado | chevy | 1995
tacoma | toyota | 2017
van custom | | 2005
but i only want to show the newest "chevy", e.g.
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
tacoma | toyota | 2017
van custom | | 2005
and still be able to sort by "make", and to show cars without a null company_id.
fiddle link:
https://www.db-fiddle.com/f/5Vh1sFXvEvnbnUJsCYhCHf/0
SQL can be done based on Set Math (discrete math). So, you want the set of all cars minus the set of cars whose years a less than the maximum year for a given company id.
The set of all cars:
select * from cars
The set of all cars whose year is less than the maximum year for a given company id:
select a.id from cars a, cars b where a.company_id = b.company_id and a.year < b.year
One set minus the other:
select * from cars where id not in (select a.id from cars a, cars b where a.company_id = b.company_id and a.year < b.year)
Result which includes the null company_ids because they are excluded from the id comparison:
make | model | year
--------------+--------+------
custom truck | | 2010
malibu | chevy | 1999
tacoma | toyota | 2017
van custom | | 2005
With the help of common table expressions and row_number function, we can get the desired output and below is the query that gives the desired output.
WITH temp AS
(SELECT
make
, companies.name AS model
, year
, row_number() over(PARTITION BY coalesce(companies.name, make) ORDER BY year desc) as rnk
FROM
cars
left join
companies
ON
companies.id = cars.company_id
)
SELECT
make
, model
, year
FROM
temp
WHERE
rnk = 1
;
In Postgres, this is best done using distinct on:
select distinct on (co.id) ca.*, co.name as model
from cars ca left join
companies co
on ca.company_id = co.id
order by co.id, ca.year desc;
DISTINCT ON is very handy Postgres syntax. It keeps one row for each combination in parentheses. The specific row is determined by the ORDER BY clause.
However, you have a twist, because co.id can be null. In that case, you seem to want to keep all the cars with no company.
So:
select distinct on (co.id, case when co.id is null then ca.id end) ca.*, co.name
from cars ca left join
companies co
on ca.company_id = co.id
order by co.id, case when co.id is null then ca.id end, ca.year desc;
Or perhaps more simply using union all:
-- get the ones with a company
select distinct on (co.id) ca.*, co.name
from cars ca join
companies co
on ca.company_id = co.id
union all
-- get the ones with no company
select ca.*, null
from cars ca
where ca.company_id is null
order by year desc;
In other databases, this would typically be done using row_number():
select ca.*
from (select ca.*, co.name as model,
row_number() over (partition by co.id,
case when co.id is null then ca.id end
order by year desc
) as seqnum
from cars ca left join
companies co
on ca.company_id = co.id
) ca
where seqnum = 1

SQL - Return group of records from different tables not including specific row

I am new at programming and SQL, so sorry if I do not include enough info.
[I have these 2 tables that are linked by an OrderID. Table1 includes OrderIDs and customer information such as FirstName, LastName, and Address. Table2 includes OrderIDs and order details such as ItemName, Price, and Quantity.
Each OrderID in Table2 might have multiple ItemName entries with the same OrderID.]1
CustInfo
OrderID FirstName LastName Address
1 Bob Pratt 123
2 Jane Doe 456
3 John Smith 789
4 Barbara Walters 147
Orders
OrderID ItemName Price Quantity
1 Milk 4.00 1
1 Eggs 5.00 2
2 Cheese 5.00 1
2 Bread 5.00 1
3 Milk 4.00 2
4 Yogurt 5.00 2
I'm trying to make a query that will send back a list of every Order, listing the OrderID and ItemName among other info, as long as the order doesn't include a specific type of item (which would be in ItemName). So if an OrderID contains 2 ItemName, one of which is the one I do not want, the entire order (OrderID) should not show up in my result.
For example, based off the img included, if I wanted to show all orders as long as they do not have Milk as an ItemName, the result should only show OrderID 2 and 4.
2 Cheese 5.00 1
2 Bread 5.00 1
4 Yogurt 5.00 2
This is what I have tried but this would return OrderIDs even though Milk is technically part of that OrderID.
SELECT OrderID, FirstName, LastName, ItemName, Price, Quantity
FROM CustInfo
JOIN Orders
ON CustInfo.OrderID = Orders.OrderID
WHERE ItemName != 'Milk'
Can you help?
select o.OrderID, o.ItemName, c.FirstName, c.LastName -- include other fields if needed
from Orders o
left join CustInfo c on o.OrderID = c.OrderID
where o.OrderID not in (
select OrderID from Orders where ItemName = 'Milk'
)
If you want the whole order to not show up, rather than just individual rows, you can use WHERE NOT EXISTS:
SELECT o.OrderID, d.ItemName, d.Price, d.Quantity
FROM Orders o
JOIN OrderDetails d ON o.OrderID=d.OrderID
WHERE NOT EXISTS
(
SELECT * FROM OrderDetails d2
WHERE o.OrderID=d2.OrderID
AND d.ItemName = 'Milk'
)
SELECT T1.OrderID, T1.FirstName, T1.LastName, T1.Address, T2.OrderID, T2.ItemName, T2.Price, T2.Quantity
FROM Table2 as T2
LEFT JOIN Table1 as T1 ON T1.OrderID = T2.OrderID
WHERE T2.OrderID <> (SELECT OrderID FROM Table2 WHERE ItemName='Milk');

SQL Server joining two tables, order by and display one record

I am having trouble with a SQL Server statement. The perfect scenario is the order and another table (jobs) by date created then display the contact information in descending order. Currently I can get the script to show all records, however if the user has more than one job then they are displayed more than once.
SELECT
c.*,
p.date_created
FROM
[db].[dbo].[Contact] AS c
JOIN
[db].[dbo].[job] AS p ON p.contact_id = c.contact_id
UNION
SELECT
*,
0 as date_created
FROM
[db].[dbo].[Contact]
ORDER BY
p.date_created DESC
The output
contact_id| date_created | contact_name
1 | 8/29/2016 1:07:18 PM | sam
1 | 8/26/2016 1:04:01 PM | sam
14 | 8/24/2016 5:07:22 PM | steve
The final output should just show the newest date created and for one user. Help is much appreciated.
The column in union select must match for number and type so convert 0 in a proper date
SELECT
c.contact_id
,max(p.date_created)
,c. contact_name
FROM [db].[dbo].[Contact] AS c
JOIN [db].[dbo].[job] AS p
ON p.contact_id = c.contact_id
GROUP BY c.contact_id,c. contact_name
union
select
c.contact_id
, convert(datetime, '01/01/1070', 101) as date_created
, c. contact_name
from [db].[dbo].[Contact]
ORDER BY p.date_created desc`
The result you need anyway should be obtainable with only
SELECT
c.contact_id
,max(p.date_created) as max_date_created
,c. contact_name
FROM [db].[dbo].[Contact] AS c
LEFT JOIN [db].[dbo].[job] AS p
ON p.contact_id = c.contact_id
GROUP BY c.contact_id,c. contact_name
ORDER BY c.contact_id,c. contact_name, max_date_created

Cross-multiplying table

I have this SQL code and I want to show the sum of each item on its charge slip and on their receipt:
select item_description, sum(receipt_qty) as Samp1, sum(chargeSlip_qty) as Samp2
from Items inner join Receipt_Detail on (Receipt_Detail.item_number =
Items.item_number)
inner join ChargeSlip_Detail on (ChargeSlip_Detail.item_number =
Items.item_number)
group by item_description
It produces this output:
Acetazolamide 2681 1730
Ascorbic Acid 1512 651
Paracetamol 1370 742
Silk 576 952
But it should be:
Acetazolamide 383 173
Ascorbic Acid 216 93
Paracetamol 274 106
Silk 96 238
What's wrong with my code?
Since you are joining tables, you might have a one-to-many relationship that is causing the problem when you then get the sum(). So you can use subqueries to get the result. This will get the sum() for the receipt and chargeslip for each item_number and then you join that back to your items table to get the final result:
select i.item_description,
r.Samp1,
c.Samp2
from Items i
inner join
(
select sum(receipt_qty) Samp1,
item_number
from Receipt_Detail
group by item_number
) r
on r.item_number = i.item_number
inner join
(
select sum(chargeSlip_qty) Samp2,
item_number
from ChargeSlip_Detail
group by item_number
) c
on c.item_number = i.item_number
Do the GROUP BYs first, per Item_Number, so you don't multiply out rows from Receipt_Detail and ChargeSlip_Detail. That is, you generate the SUM values per Item_Number before JOINing back to Items
select
I.item_description,
R.Samp1,
C.Samp2
from
Items I
inner join
(SELECT item_number, sum(receipt_qty) as Samp1
FROM Receipt_Detail
GROUP BY item_number
) R
on (R.item_number = I.item_number)
inner join
(SELECT item_number, sum(chargeSlip_qty) as Samp2
FROM ChargeSlip_Detail
GROUP BY item_number
) C
on (C.item_number = I.item_number)
A left join returns rows from the left table, and for each row in the left table, all matching rows in the right table.
So for example:
create table Customers (name varchar(50));
insert Customers values
('Tim'),
('John'),
('Spike');
create table Orders (customer_name varchar(50), product varchar(50));
insert Orders values (
('Tim', 'Guitar'),
('John', 'Drums'),
('John', 'Trumpet');
create table Addresses (customer_name varchar(50), address varchar(50));
insert Addresses values (
('Tim', 'Penny Lane 1'),
('John', 'Abbey Road 1'),
('John', 'Abbey Road 2');
Then if you run:
select c.name
, count(o.product) as Products
, count(a.address) as Addresses
from Customers c
left join Orders o on o.customer_name = c.name
left join Addresses a on a.customer_name = c.name
group by name
You get:
name Products Addresses
Tim 1 1
John 4 4
Spike 0 0
But John doesn't have 4 products!
If you run without the group by, you can see why the counts are off:
select *
from Customers c
left join Orders o on o.customer_name = c.name
left join Addresses a on a.customer_name = c.name
You get:
name customer_name product customer_name address
Tim Tim Guitar Tim Penny Lane 1
John John Drums John Abbey Road 1
John John Drums John Abbey Road 2
John John Trumpet John Abbey Road 1
John John Trumpet John Abbey Road 2
Spike NULL NULL NULL NULL
As you can see, the joins end up repeating each other. For each product, the list of addresses is repeated. That gives you the wrong counts. To solve this problem, use one of the excellent other answers:
select c.name
, o.order_count
, a.address_count
from Customers c
left join
(
select customer_name
, count(*) as order_count
from Orders
group by
customer_name
) o
on o.customer_name = c.name
left join
(
select customer_name
, count(*) as address_count
from Addresses
group by
customer_name
) a
on a.customer_name = c.name
The subqueries ensure only one row is joined per customer. The result is much better:
name order_count address_count
Tim 1 1
John 2 2
Spike NULL NULL