Getting first line of a LEFT OUTER JOIN - sql

I have 3 tables:
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS FROM ADDRESSES
WHERE ROWNUM <2
ORDER BY UPDATED_DATE DESC)c
ON a.ID = c.ID
An ID can have only one name but can have multiple addresses. I only want the latest one. This query returns the address as null even when there is an address I guess cause it only fetches the first address from the table and then tries LEFT JOIN it to the ID of addresses which it canno find. What is the correct way of writing this query?

Try KEEP DENSE_RANK
Data source:
CREATE TABLE person
(person_id int primary key, firstname varchar2(4), lastname varchar2(9))
/
INSERT ALL
INTO person (person_id, firstname, lastname)
VALUES (1, 'john', 'lennon')
INTO person (person_id, firstname, lastname)
VALUES (2, 'paul', 'mccartney')
SELECT * FROM dual;
CREATE TABLE address
(person_id int, address_id int primary key, city varchar2(8))
/
INSERT ALL
INTO address (person_id, address_id, city)
VALUES (1, 1, 'new york')
INTO address (person_id, address_id, city)
VALUES (1, 2, 'england')
INTO address (person_id, address_id, city)
VALUES (1, 3, 'japan')
INTO address (person_id, address_id, city)
VALUES (2, 4, 'london')
SELECT * FROM dual;
Query:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select person_id,
min(city) -- can change this to max(city). will work regardless of min/max
-- important you do this to get the recent: keep(dense_rank last)
keep(dense_rank last order by address_id)
as recent_city
from address
group by person_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/2
Not all database has similar functionality with Oracle's KEEP DENSE_RANK windowing function, you can use plain windowing function instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city, x.pick_one_only
from person p
left join (
select
person_id,
row_number() over(partition by person_id order by address_id desc) as pick_one_only,
city as recent_city
from address
) x on x.person_id = p.person_id and x.pick_one_only = 1
Live test: http://www.sqlfiddle.com/#!4/7b1c9/48
Or use tuple testing, shall work on databases that doesn't support windowing function:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
person_id,city as recent_city
from address
where (person_id,address_id) in
(select person_id, max(address_id)
from address
group by person_id)
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/21
Not all database supports tuple testing like in the preceding code though. You can use JOIN instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
address.person_id,address.city as recent_city
from address
join
(
select person_id, max(address_id) as recent_id
from address
group by person_id
) r
ON address.person_id = r.person_id
AND address.address_id = r.recent_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/24

You can use the analytic function RANK
(SELECT DISTINCT ID
FROM IDS) a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES) b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS ,
rank() over (partition by id
order by updated_date desc) rnk
FROM ADDRESSES) c
ON ( a.ID = c.ID
and c.rnk = 1)

Without having access to any database at the moment, you should be able to do
(SELECT DISTINCT ID
FROM IDS) a LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b ON a.ID = b.ID LEFT OUTER JOIN
(SELECT TOP 1 ADDRESS
FROM ADDRESSES
ORDER BY UPDATED_DATE DESC) c ON a.ID = c.ID
As you might see, the "TOP 1" at 'Address' will only return the first row of the result set.
Also, are you sure that a.ID and c.ID is the same?
I would imagine you need something like .... c ON a.ID = c.AddressID
If not, i'm not entirely sure how you link multiple addresses to a single ID.

(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS, ROWNUMBER() OVER(PARTITON BY ID ORDER BY UPDATED_DATE DESC) RN
FROM ADDRESSES
)c
ON a.ID = c.ID
where c.RN=1

Related

Oracle SQL: show max when record in table exists

If record exists in the table i need to bring the data from the highest address record linked to the person.
Example:
John Doe have no address at all. Report need to still bring John Doe name but nothing as an address.
John Doe have 3 addresses with address number increasing. Report need to bring John Doe name and only the address with the highest address number.
Code I tried:
Select *
from person p left join address a on p.id = a.person and a.addressnumber = (select max(a2.addressnumber) from address a2 where a2.a_peron = p.id)
Oracle returns error: ORA-01799: a column may not be outer-joined to a
subquery
01799. 00000 - "a column may not be outer-joined to a subquery"
I also tried
Select *
from person p left join address a1 on p.id = a1.person
inner join (select a.person, max(a.addressnumber) MaxAdd, a.postcode, a.country from address a group by a.person, a.postcode, a.country) main on main.person = p.id and main.MaxAdd = a1.addressnumber
This doesnt work neither due to the grouping.
I can probably get this done by using subqueries in the select itself together with the case statement but i would like to avoid that because I will be pulling a lot of data from the address so this would mean case statement with subquery for every single column.
Oracle 11g - 11.2
Any idea? :D
You can use ROW_NUMBER to rank your rows and only keep the last one:
select *
from person p
left join
(
select
ad.*,
row_number() over (partition by person order by addressnumber desc) as rn
from address ad
) a on a.person = p.id and a.rn = 1;
Please try this:
select * from
person p left join
(select a.person, max(a.addressnumber) MaxAdd, a.postcode, a.country from address a
group by a.person, a.postcode, a.country) A
on p.id=A.person
Use row_number():
Select *
from person p left join
(select a.*,
row_number() over (partition by a.person order by a.addressnumber desc) as seqnum
from address a
) a
on p.id = a.person and seqnum = 1;

Figure out the total number of people in an overlapping er database

I am trying to find:
the total number of doctors which aren't patients
the total number of patients which aren't doctors
the total number of people who are both patients and doctors
I can't seem to get the correct answer.
SQL:
CREATE TABLE persons (
id integer primary key,
name text
);
CREATE TABLE doctors (
id integer primary key,
type text,
FOREIGN KEY (id) REFERENCES persons(id)
);
CREATE TABLE patients (
id integer primary key,
suffering_from text,
FOREIGN KEY (id) REFERENCES persons(id)
);
INSERT INTO persons (id, name) VALUES
(1, 'bob'), (2, 'james'), (3, 'bill'), (4, 'mark'), (5, 'chloe');
INSERT INTO doctors (id, type) VALUES
(2, 'family doctor'), (3, 'eye doctor'), (5, 'family doctor');
INSERT INTO patients (id, suffering_from) VALUES
(1, 'flu'), (2, 'diabetes');
Select statement:
select count(d.id) as total_doctors, count(pa.id) as total_patients, count(d.id) + count(pa.id) as both_doctor_and_patient
from persons p
JOIN doctors d
ON p.id = d.id
JOIN patients pa
ON p.id = pa.id;
http://www.sqlfiddle.com/#!17/98ae9/2
One option uses left joins from persons and conditional aggrgation:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where do.id is not null) cnt_patient_and_doctor,
count(*) filter(where dr.id is null and pa.id is null) cnt_persons_not_dotor_nor_patient
from persons pe
left join doctors dr on dr.id = pe.id
left join patients pa on pa.id = pe.id
As a bonus, this gives you an opportunity to count the persons that are neither patient nor doctor. If you don't need that information, then a full join is simpler, and does not require bringing the persons table:
select
count(dr.id) filter(where pa.id is null) cnt_doctor_not_patient,
count(pa.id) filter(where dr.id is null) cnt_patient_not_doctor,
count(pa.id) filter(where dr.id is not null) cnt_patient_and_doctor
from doctors dr
full join patients pa using (id)
You can simply solve this using LEFT JOIN like:
--Aren't doctors:
SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is null
--Aren't patients:
SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is null
--Both:
SELECT
(SELECT count(*) from persons as A left join patients as B on A.id=B.id where B.id is not null) +
(SELECT count(*) from persons as A left join doctors as B on A.id=B.id where B.id is not null)
AS summ
Here a CTE alternative:
with doc_not_pat
as(
select count(*) as Doc_Not_Pat
from doctors d
where not exists (select 1 from patients p where p.id = d.id)
),
pat_not_doc as(
select count(*) as Pat_Not_Doc
from patients p
where not exists ( select 1 from doctors d where d.id = p.id)
),
pat_and_doc as(
select count(*) as Pat_And_Doc
from patients p
where exists (select 1 from doctors d where d.id = p.id)
)
select (select Doc_Not_Pat
from doc_not_pat dcp) as Doc_Not_Pat,
(select Pat_Not_Doc
from pat_not_doc) as Pat_Not_Doc,
(select Pat_And_Doc
from pat_and_doc) as Pat_And_Doc

How to find columns that only have one value - Postgresql

I have 2 tables, person(email, first_name, last_name, postcode, place_name) and location(postcode, place_name). I am trying to find people that live in places where only one person lives. I tried using SELECT COUNT() but failed because I couldn't figure out what to count in this situation.
SELECT DISTINCT email,
first_name,
last_name
FROM person
INNER JOIN location USING(postcode,
place_name)
WHERE 1 <=
(SELECT COUNT(?))
Aggregate functions always go with having:
SELECT DISTINCT first_value(email) over (partition by place_name),
first_value(first_name) over (partition by place_name),
first_value(last_name) over (partition by place_name),
count(*)
FROM person
INNER JOIN location USING(postcode,
place_name)
GROUP BY place_name
HAVING count(*) = 1
For more about the window functions (like first_value) check out this tutorial.
I would do this as follows. I find it plain and simple.
select p1.* from
person p1
join
(
select p.postcode, p.place_name, count(*) cnt from
person p
group by p.postcode, p.place_name
) t on p1.postcode = t.postcode and p1.place_name = t.place_name and t.cnt = 1
How does it work?
In the inner query (aliased t) we just count how many people live in each location.
Then we join the result of it (t) with the table person (aliased p1) and in the join we require t.cnt = 1. This is probably the most natural way of doing it, I think.
Thanks to the help of people here, I found this answer:
SELECT first_name,
last_name,
email
FROM person
WHERE postcode IN
(SELECT postcode
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(place_name)=1
ORDER BY postcode)
AND place_name IN
(SELECT place_name
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(postcode)=1
ORDER BY place_name)

Using SQL Group By while keeping same varchar values

I have a query that is returning two values. I want to have the largest value so I do a group by, then MAX. However, I have three other columns(varchar) that I would like to remain consistent with the id that is brought in with max.
Example.
OId CId FName LName BName
18477 110 Hubba Bubba whoa
158 110 Test2 Person2 leee
What I want is
OId CId FName LName BName
18477 110 Hubba Bubba whoa
So I want to group them by CId. And O Id I want to keep the largest number. I can't use Min or Max for the FName, LName, or BName because I want them to be the one with the OId that is selected. The FName, LName and BName for the other row I don't even want/need.
I tried using SELECT TOP, but that only pulls in literally one row and I need multiple.
SQL
INSERT INTO #CustomerInfoAll(FName, LName, BName, OwnerId, CustomerId)
SELECT
-- what goes here --(o.FirstName) AS FName,
-- what goes here --(o.LastName) AS LName,
-- what goes here --(o.BusinessName) AS BName,
MAX(o.OId) AS OId,
(r.CId) AS CId
FROM Owner o
INNER JOIN Report r
ON o.ReportId = r.ReportId
WHERE r.CId IN (SELECT CId FROM #ThisReportAll)
AND r.Completed IS NOT NULL
GROUP BY r.CId
ORDER BY OId DESC;
Assuming you have SQL Server 2005 or higher:
INSERT INTO #CustomerInfoAll (FName, LName, BName, OwnerId, CustomerId)
SELECT
FirstName,
LastName,
BusinessName,
Id,
CId
FROM
(
SELECT
Seq = ROW_NUMBER() OVER (PARTITION BY r.CId ORDER BY o.Id DESC),
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
FROM
dbo.Owner o
INNER JOIN dbo.Report r
ON o.ReportId = r.ReportId
WHERE
EXISTS ( -- can be INNER JOIN instead if `CId` is unique in temp table
SELECT *
FROM #ThisReportAll tra
WHERE r.CId = tra.CId
)
AND r.Completed IS NOT NULL
GROUP BY
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
) x
WHERE
x.Seq = 1;
DO use full schema names on all your objects (dbo.Owner and dbo.Report).
DO use a semi-join (an EXISTS clause) or INNER JOIN instead of IN when possible.

Joining two tables with aggregates

I've got two tables described below:
CREATE TABLE categories
(
id integer NOT NULL,
category integer NOT NULL,
name text,
CONSTRAINT kjhfskfew PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE products_
(
id integer NOT NULL,
date date,
id_employee integer,
CONSTRAINT grh PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
Now I have to do report in which I need following information:
categories.category, categories.name (all of them, so string_agg is ok) - could be many assigned to one category and products_.id_employee -> but not with comma as above with category name but the one with newest date assigned (and here is my problem);
I've tried already constructions as:
SELECT
DISTINCT ON (category ) category,
string_agg(name, ','),
(SELECT
id_employee
FROM products_
WHERE date = (SELECT
max(date)
FROM products_
WHERE id IN (SELECT
id
FROM categories
WHERE id = c.id)))
FROM categories c
ORDER BY category;
But PostgreSQL says that subquery is returning to many rows...
Please help!
EXAMPLE INSERTS:
INSERT INTO categories(
id, category, name)
VALUES (1,22,'car'),(2,22,'bike'),(3,22,'boat'),(4,33,'soap'),(5,44,'chicken');
INSERT INTO products_(
id, date, id_employee)
VALUES (1,'2009-11-09',11),(2,'2010-09-09',2),(3,'2013-01-01',4),(5,'2014-09-01',90);
OK, I've solved this problem.
This one works just fine:
WITH max_date AS (
SELECT
category,
max(date) AS date,
string_agg(name, ',') AS names
FROM test.products_
JOIN test.categories c
USING (id)
GROUP BY c.category
)
SELECT
max(id_employee) AS id_employee,
md.category,
names
FROM test.products_ p
LEFT JOIN max_date md
USING (date)
LEFT JOIN test.categories
USING (category)
WHERE p.date = md.date AND p.id IN (SELECT
id
FROM test.categories
WHERE category = md.category)
GROUP BY category, names;
It seems that id is being used to join the two tables, which seems strange to me.
In any case, the base query for the category names is:
SELECT c.category, string_agg(c.name, ','),
FROM categories c
group by c.category;
The question is: how to get the most recent name? This approach uses the row_number() function:
SELECT c.category, string_agg(c.name, ','), cp.id_employee
FROM categories c left outer join
(select c.category, c.name, p.id_employee,
row_number() over (partition by c.category order by date desc) as seqnum
from categories c left outer join
products_ p
on c.id = p.id
) cp
on cp.category = c.category and
cp.seqnum = 1
group by c.category, cp.id_employee;