Oracle SQL: Joining at most one associated entity - sql

I have tables Building and Address, where each Building is associated with 0..n Addresses.
I'd like to list Buildings with an associated Address. If a Building has several entrances, and thus several Addresses, I don't care which one is displayed. If a Building has no known addresses, the address fields should be null.
This is, I want something like a left join that joins each row at most once.
How can I express this in Oracle SQL?
PS: My query will include rather involved restrictions on both tables. Therefore, I'd like to avoid repeating those restrictions in the query text.

I would consider querying the address in the SELECT clause, e.g.:
SELECT b.*
,(SELECT a.text
FROM addresses a
WHERE a.buildingid = b.id
AND ROWNUM=1) as atext
FROM building b;
The ROWNUM=1 means "just get one if there are any, don't care which".
The advantage of this approach is that it will probably perform better than most alternatives, as long as a suitable index on addresses.buildingid exists. It will stop looking for more addresses as soon as it finds one for each building queried.
The downside to this approach is if you want multiple columns from the address table, you can't - although you can concatenate them together into one string.

Because you don't care which of many addresses is displayed:
Oracle 9i+:
WITH summary AS (
SELECT b.*,
a.*,
ROW_NUMBER() OVER (PARTITION BY b.building_id) rn
FROM BUILDINGS b
LEFT JOIN ADDRESSES a ON a.building_id = b.building_id)
SELECT s.*
FROM summary s
WHERE s.rn = 1
Non-Subquery Factoring Equivalent:
SELECT s.*
FROM (SELECT b.*,
a.*,
ROW_NUMBER() OVER (PARTITION BY b.building_id) rn
FROM BUILDINGS b
LEFT JOIN ADDRESSES a ON a.building_id = b.building_id) s
WHERE s.rn = 1

what you could do is an restriction on the addresses dat you join.
For instance by requiring that there is no address with a lower id:
select *
from building b
left join addresses a on (a.buildingid = b.id)
where not exists (select 1 from addresses a2
where a2.buildingid = b.id and a2.id < a.id)
in this case you will get at most 1 address per building.

select b.*, max(a.id) as aid
from building b
left outer join addresses a on (a.buildingid = b.id)
group by a.buildingid
or
select b.*, maxid
from building b
left outer join
(
select buildingid, max(id) as maxid
from addresses
group by buildingid
) a on (a.buildingid = b.id)

Meriton,
This approach uses nested inline views. I've proven this approach on large data sets, it performs very well.
The best way to understand the query is to start from the inner-most "M" inline view. I added the count for the sake of debugging and clarity. This identifies the maximum (ie. most recent???) address id for each building:
select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id;
The next "A" inline view uses the above "M" inline view to decide which address to get, then joins to that address id to return a set of address fields:
select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id;
The above "A" inline view delivers a transformed set of addresses to the final query. Whereas the relationship between BUILDING and ADDRESS is 1 to 0..n, the relationship between BUILDING and "A" is 1 to 0..1, a basic outer-join:
select b.b_id, b.b_code, b.b_name, a.*
from building b,
( select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id ) a
where b.b_id = a.b_id (+);
The key advantages with this approach are:
Delivers any number of address columns.
Deterministic, returns exactly the same results each time it is run.
Does not place undue complexities on your final query, which will surely be more complex than this one.
The "A" inline view can be easily encapsulated within a database view, perhaps call it the LATEST_ADDRESS view:
create view latest_address (b_id, a_id, addr1, addr2, addr3, c) as
select ma.b_id, ma.a_id, ma.addr1, ma.addr2, ma.addr3, m.c
from address ma,
( select maxa.b_id, max(maxa.a_id) a_id, count(*) c
from address maxa
group by maxa.b_id ) m
where ma.a_id = m.a_id;
select b.b_id, b.b_code, b.b_name, a.*
from building b, latest_address a
where b.b_id = a.b_id (+);
Enjoy!
Matthew

Related

SQL Left Join - OR clause

I am trying to join two tables. I want to join where all the three identifiers (Contract id, company code and book id) are a match in both tables, if not match using contract id and company code and the last step is to just look at contract id
Can the task be performed wherein you join using all three parameters, if does not, check the two parameters and then just the contract id ?
Code:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
LEFT JOIN #claim_total C
ON ( ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd
AND C.book_id_2 = A.book_id )
OR ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd )
OR ( C.contract_id_2 = A.contract_id ) )
Your ON clause boils down to C.contract_id_2 = A.contract_id. This gets you all matches, no matter whether the most precise match including company and book or a lesser one. What you want is a ranking. Two methods come to mind:
Join on C.contract_id_2 = A.contract_id, then rank the rows with ROW_NUMBER and keep the best ranked ones.
Use a lateral join in order to only join the best match with TOP.
Here is the second option. You forgot to tell us which DBMS you are using. SELECT INTO looks like SQL Server. I hope I got the syntax right:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
OUTER APPLY
(
SELECT TOP(1) *
FROM #claim_total C
WHERE C.contract_id_2 = A.contract_id
ORDER BY
CASE
WHEN C.company_cd_2 = A.company_cd AND C.book_id_2 = A.book_id THEN 1
WHEN C.company_cd_2 = A.company_cd THEN 2
ELSE 3
END
);
If you want to join all rows in case of ties (e.g. many rows matching contract, company and book), then make this TOP(1) WITH TIES.

How to join three tables having relation parent-child-child's child. And I want to access all records related to parent

I have three tables:
articles(id,title,message)
comments(id,article_id,commentedUser_id,comment)
comment_likes(id, likedUser_id, comment_id, action_like, action_dislike)
I want to show comments.id, comments.commentedUser_id, comments.comment, ( Select count(action_like) where action_like="like") as likes and comment_id=comments.id where comments.article_id=article.id
Actually I want to count all action_likes that related to any comment. And show all all comments of articles.
action_likes having only two values null or like
SELECT c.id , c.CommentedUser_id , c.comment , (cl.COUNT(action_like) WHERE action_like='like' AND comment_id='c.id') as likes
FROM comment_likes as cl
LEFT JOIN comments as c ON c.id=cl.comment_id
WHERE c.article_id=article.id
It shows nothing, I know I'm doing wrong way, that was just that I want say
I guess you are looking for something like below. This will return Article/Comment wise LIKE count.
SELECT
a.id article_id,
c.id comment_id,
c.CommentedUser_id ,
c.comment ,
COUNT (CASE WHEN action_like='like' THEN 1 ELSE NULL END) as likes
FROM article a
INNER JOIN comments C ON a.id = c.article_id
LEFT JOIN comment_likes as cl ON c.id=cl.comment_id
GROUP BY a.id,c.id , c.CommentedUser_id , c.comment
IF you need results for specific Article, you can add WHERE clause before the GROUP BY section like - WHERE a.id = N
I would recommend a correlated subquery for this:
SELECT a.id as article_id, c.id as comment_id,
c.CommentedUser_id, c.comment,
(SELECT COUNT(*)
FROM comment_likes cl
WHERE cl.comment_id = c.id AND
cl.action_like = 'like'
) as num_likes
FROM article a INNER JOIN
comments c
ON a.id = c.article_id;
This is a case where a correlated subquery often has noticeably better performance than an outer aggregation, particularly with the right index. The index you want is on comment_likes(comment_id, action_like).
Why is the performance better? Most databases will implement the group by by sorting the data. Sorting is an expensive operation that grows super-linearly -- that is, twice as much data takes more than twice as long to sort.
The correlated subquery breaks the problem down into smaller pieces. In fact, no sorting should be necessary -- just scanning the index and counting the matching rows.

PostgreSQL - SELECT DISTINCT, ORDER BY expressions must appear in select list

I'm new to SQL.
I guess I've misunderstood the concept of how to use DISTINCT keyword.
Here's my code:
SELECT DISTINCT(e.id), e.text, e.priority, CAST(e.order_number AS integer), s.name AS source, e.modified_time, e.creation_time, (SELECT string_agg(DISTINCT text, '|') FROM definitions WHERE entry_id = d.entry_id) AS definitions
FROM entries AS e
LEFT JOIN definitions d ON d.entry_id = e.id
INNER JOIN sources s ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.order_number
The error is as follows:
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 6: ORDER BY e.order_number
Just trying to understand what my SELECT statement should look like.
It appears to me that you are trying to distinct on a single column and not on others - which is bound to fail.
For example, select distinct a,b,c from x returns the unique combinations of a,b and c, not unique a but normal b and c
If you want one row per distinct e.id, then you are looking for distinct on. It is very important that the order by be consistent with the distinct on keys:
SELECT DISTINCT ON (e.id), e.id, e.text, e.priority, CAST(e.order_number AS integer),
s.name AS source, e.modified_time, e.creation_time,
(SELECT string_agg(DISTINCT d2.text, '|') FROM definitions d2 WHERE d2.entry_id = d.entry_id) AS definitions
FROM entries e LEFT JOIN
definitions d
ON d.entry_id = e.id INNER JOIN
sources s
ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.id, e.order_number;
Given the subquery, I suspect that there are better ways to write the query. If that is of interest, ask another question, provide sample data, desired results, and a description of the logic.

SD0_NNcannot be evaluated without using index

I know this is a frequently discussed error but I am not able to get my hands through it even after trying really hard.
I have the following query that works fine
SELECT b.BID
FROM STUDENT s,
BUILDINGS b
WHERE sdo_nn(b.LOC, s.LOC, 'sdo_num_res=1', 1) = 'TRUE'
and shows the nearest neighbor of each s. But what I want is to display the BID of the top 2 b that appears the most, so I change my query to this:
SELECT b.BID, count(b.BID)
FROM STUDENT s,
BUILDINGS b
WHERE sdo_nn(b.LOC, s.LOC, 'sdo_num_res=1', 1) = 'TRUE'
GROUP BY b.BID
and then it fails with the error SDO_NN cannot be evaluated without using index.
Can you please help with this problem or tell me an alternate way to do it.
You can try using a subquery:
SELECT BID, COUNT(*)
FROM (SELECT b.BID
FROM STUDENT s,
BUILDINGS b
WHERE sdo_nn(b.LOC, s.LOC, 'sdo_num_res=1', 1) = 'TRUE'
) b
GROUP BY BID;
I'm not sure why the subquery is needed, but if the first query works, this one should as well.
Note: I'd be inclined to write this using an explicit join (because I abhor commas in the from clause):
SELECT BID, COUNT(*)
FROM (SELECT b.BID
FROM STUDENT s JOIN
BUILDINGS b
ON sdo_nn(b.LOC, s.LOC, 'sdo_num_res=1', 1) = 'TRUE'
) b
GROUP BY BID;

comparison query taking ages

My query is quite simple:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a, COMPANIES b
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
Database: sql server 2008 r2
What I'm trying to do:
The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)
Right now it's taking a lot of time to complete. I was wondering if if could be done faster
Try this version. It should be only a little faster. The COUNT is quite slow. I've added a.ID <> b.ID to avoid few cases earlier.
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
The FROM ... INNER JOIN ... ON ... is a preferred SQL construct to join tables. It may be faster too.
One approach would be to pre-calculate the COMPANYID count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:
insert into #CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID
Then your main query:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
inner join #CompanyCount aCount on aCount.ID = a.ID
inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
inner join #CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount
If you want all instances of a even though there is no corresponding b then you'd need to have left outer joins to b and bCount.
However you need to look at the query plan - which indexes are you using - you probably want to have them on the IDs and the Postcode and Adres fields as a minimum since you're joining on them.
Build an index on postcode and adres
The database probably executes the subselects for every row. (Just guessing here, veryfy it in the explain plan. If this is the case you can rewrite the query to join with the inline views (note this is how it would look in oracle hop it works in sql server as well):
select distinct a.ID, a.adres, a.place, a.postalcode
from
COMPANIES a,
COMPANIES b,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntA,
(
select COUNT(COMPANYID) cnt, companyid
from USERS
group by companyid) cntb
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and a.ID<>b.ID
and cnta.cnt>cntb.cnt