SQL: how to limit a join on the first found row? - sql

How to make a join between two tables but limiting to the first row that meets the join condition ?
In this simple example, I would like to get for every row in table_A the first row from table_B that satisfies the condition :
select table_A.id, table_A.name, table_B.city
from table_A join table_B
on table_A.id = table_B.id2
where ..
table_A (id, name)
1, John
2, Marc
table_B (id2, city)
1, New York
1, Toronto
2, Boston
The output would be:
1, John, New York
2, Marc, Boston
May be Oracle provides such a function (performance is a concern).

The key word here is FIRST. You can use analytic function FIRST_VALUE or aggregate construct FIRST.
For FIRST or LAST the performance is never worse and frequently better than the equivalent FIRST_VALUE or LAST_VALUE construct because we don't have a superfluous window sort and as a consequence a lower execution cost:
select table_A.id, table_A.name, firstFromB.city
from table_A
join (
select table_B.id2, max(table_B.city) keep (dense_rank first order by table_B.city) city
from table_b
group by table_B.id2
) firstFromB on firstFromB.id2 = table_A.id
where 1=1 /* some conditions here */
;
Since 12c introduced operator LATERAL, as well as CROSS/OUTER APPLY joins, make it possible to use a correlated subquery on right side of JOIN clause:
select table_A.id, table_A.name, firstFromB.city
from table_A
cross apply (
select max(table_B.city) keep (dense_rank first order by table_B.city) city
from table_b
where table_B.id2 = table_A.id
) firstFromB
where 1=1 /* some conditions here */
;

If you want just single value a scalar subquery can be used:
SELECT
id, name, (SELECT city FROM table_B WHERE id2 = table_A.id AND ROWNUM = 1) city
FROM
table_A

Query:
SELECT a.id,
a.name,
b.city
FROM table_A a
INNER JOIN
( SELECT id2,
city
FROM (
SELECT id2,
city,
ROW_NUMBER() OVER ( PARTITION BY id2 ORDER BY NULL ) rn
FROM Table_B
)
WHERE rn = 1
) b
ON ( a.id = b.id2 )
--WHERE ...
Outputs:
ID NAME CITY
---------- ---- --------
1 John New York
2 Marc Boston

select table_A.id, table_A.name,
FIRST_VALUE(table_B.city) IGNORE NULLS
OVER (PARTITION BY table_B.id2 ORDER BY table_B.city) AS "city"
from table_A join table_B
on table_A.id = table_B.id2
where ..

On Oracle12c there finally is the new cross/outer apply operator that will allow what you asked for without any workaround.
the following is an example that looks on dictionary views for just one of the (probably)many objects owned by those users having their name starting with 'SYS':
select *
from (
select USERNAME
from ALL_USERS
where USERNAME like 'SYS%'
) U
cross apply (
select OBJECT_NAME
from ALL_OBJECTS O
where O.OWNER = U.USERNAME
and ROWNUM = 1
)
On Oracle 11g and prior versions you should only use workarounds that generally full scan the second table based on IDs of the second table to get the same results, but for testing puposes you may enable the lateral operator (also available on 12c without need of enabling new stuff) and use this other one
-- Enables some new features
alter session set events '22829 trace name context forever';
select *
from (
select USERNAME
from ALL_USERS
where USERNAME like 'SYS%'
) U,
lateral (
select OBJECT_NAME
from ALL_OBJECTS O
where O.OWNER = U.USERNAME
and ROWNUM = 1
);

This solution uses the whole table, like in a regular join, but limits to the first row. I am posting this because for me the other solutions were not sufficient because they use one field only, or they have performance issues with large tables. I am no expert at Oracle so if someone can improve this please do so, I will be happy to use your version.
select *
from tableA A
cross apply (
select *
from (
select B.*,
ROW_NUMBER() OVER (
-- replace this by your own partition/order statement
partition by B.ITEM_ID order by B.DELIVERYDATE desc
) as ROW_NUM
from tableB B
where
A.ITEM_ID=B.ITEM_ID
)
where ROW_NUM=1
) B

I use the partition to separate the id2 and then just take the r_num = 1.
SELECT A.ID, A.NAME, B.CITY
FROM TABLE_A A,
(SELECT ID2, CITY,
ROW_NUMBER() OVER (PARTITION BY ID2 ORDER BY ID2) AS R_NUM
FROM TABLE_B) B
WHERE A.ID = B.ID2
AND R_NUM = 1;

Related

SQL Server Join - With INFO_SCHEMA information

I have the first table:
select COLUMN_NAME
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
Don't mind the numbering in the Column_Name. I was doing this while testing because I need the order to remain as they are in the table. Not by ASC, DESC.
Anyhow, I don't know how to use the row numbers on the left that the system provides to JOIN another table without a condition.
Here is Table 2:
You can see that the left row numbers are my linking value but I don't know how to use that system index value as a condition in my JOIN.
Or if there is another way to join these two tables without a condition while keeping the Table 1 information in it's correct position and not affecting it by ORDER would be much appreciated.
Thank you!
-Chase
I guess you are looking for row_number. Use row_number to order result of two queries then join by matching order nums. Your query would be something like
with query_1 as (
select COLUMN_NAME
, rn = row_number() over (order by cast(left(COLUMN_NAME, 3) as int))
from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = N'tbl_Client_List_Pricing'
)
, query_2 as (
select
*, rn = row_number() over (order by (select null))
from
Table_2
)
select
*
from
query_1 q1
join query_2 q2 on q1.rn = q2.rn
select COLUMN_NAME from Emerald_Data.INFORMATION_SCHEMA.COLUMNS
inner join with Table_2 on Num=cast(LEFT(COLUMN_NAME,CHARINDEX('-', COLUMN_NAME)) AS int)
where TABLE_NAME = N'tbl_Client_List_Pricing'
You could also use sys.all_columns object which could able to state the index for your desired column & JOIN them with table2
SELECT *
FROM sys.all_columns c
INNER JOIN Table2 t ON t.Num = c.column_id
WHERE OBJECT_NAME(object_id) = 'tbl_Client_List_Pricing'

Only one expression can be specified in the select list when the subquery is not introduced with EXISTS. in subquery sqlserver

I want to execute this query in my database.As you can see both tables A and B has one-many relations ,but i need the latest record in B.so i here is my query :
select *,(select top 1 ResultTest ,ResultState2 from B where GasReceptionId=A.Id order by Id desc)
from A where OrganizationGasId= 4212
But i get this error
Msg 116, Level 16, State 1, Line 2
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
You can rephrase this query as a basic join which uses an analytic function (e.g. row number) to identify the correct row's data from B to include with each record coming from the A table.
SELECT *
FROM
(
SELECT a.*, b.ResultTest, b.ResultState2,
ROW_NUMBER() OVER (PARTITION BY a.Id ORDER BY a.ID DESC) rn
FROM A a
LEFT JOIN B b
ON a.Id = b.GasReceptionId
WHERE
a.OrganizationGasId = 4212
) t
WHERE t.rn = 1;
A subquery in the SELECT clause must return exactly one column (and one or zero rows). So you can either have two subqueries:
select
a.*,
(select top 1 resulttest from b where gasreceptionid = a.id order by id desc) as test,
(select top 1 resultstate2 from b where gasreceptionid = a.id order by id desc) as state
from a
where a.organizationgasid = 4212;
Or, much better, move the subquery to the FROM clause. One way is OUTER APPLY:
select
a.*, r.resulttest, r.resultstate2
from a
outer apply
(
select top 1 resulttest, resultstate2
from b
where gasreceptionid = a.id
order by id desc
) r
where a.organizationgasid = 4212;

PostgreSQL: Error in left join

I am trying to join my master table to some sub-tables in PostgreSQL in a single select query. I am getting a syntax error and I have the feeling I am making a terrible mistake or doing something which is not allowed. The code:
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2) tbl2 using (id)
left join
-- I get syntax error here
(
With a as (select id from some_table),
b as (Select value from other_table)
Select id, value from a, b) tbl3 using (id)
order by tbl1.id
Can we use WITH clause in left joins sub or nested queries and Is there a better way to do this?
UPDATE1
Well, I would like to add some more details. I have three select queries like this (having unique ID) and I want to join them based on ID.
Query1:
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
Select
id,
my_records
from a left join c on some_condtion_with_a
order by 1
Second query:
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
Third Query:
With d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
select
id,
records..
from f
I tried to join them using left join. The first two queries were joined successfully. While, joining third query I got the syntax error. Maybe, I am complicating things thus I asked is there a better way to do it.
You are over complicating things. There is no need to use a derived table to outer join my_table2. And there is no need for a CTE plus a derived table to join the tbl3 alias:
Select id,
length,
other_stuff
from my_table tbl1
Left join my_table2 tbl2 using (id)
left join (
select st.id, ot.value
from some_table st
cross join other_table ot
) tbl3 using (id)
order by tbl1.id;
This assumes that the cross join you create with Select id, value from a, b is intended.
Not tested, but I think you need this. try:
with a as (select id from some_table),
b as (Select value from other_table)
Select
id,
length,
other_stuff
from my_table tbl1
Left join
(
Select
id,
height
from my_table2 tbl2
)
tbl2 using (id)
left join
(
Select id, value from a, b
)
tbl3 using (id)
order by tbl1.id
I've only ever seen/used WITH in the following format:
WITH
temptablename(columns) as (query),
temptablename2(columns) as (query),
...
temptablenameX(columns) as (query)
SELECT ...
i.e. they come first
You'll probably find it easier to write queries if you use indentation to describe nesting levels. I like to make my SELECT FROM WHERE GROUPBY ORDERBY at one indent level, and then tablename INNER JOIN ON etc more indented:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Organising your indents every time you nest down a layer really helps. By making everything that is "a table or a block of data like a table (i.e. a subquery)" indented the same amount you can easily see the notional order that the DB should retrieve
Move your WITHs to the top of the statement, you will still use the alias names in place in the sub sub query of course
Looking at your query, there isn't much point in your subqueries.. You don't do any grouping or particularly complex processing of the data, you just select an ID and another column and then join it in. Your query will be simpler if you don't do this:
SELECT
column
FROM
table
INNER JOIN
(
SELECT subcolumn FROM subtable WHERE subclause
) myalias
ON
table.id = myalias.whatever
WHERE
blah
Instead, do this:
SELECT
column
FROM
table
INNER JOIN
subtable
ON
table.id = subtable.id
WHERE
blah
Re your updated requirements, following the same pattern.
look for --my comments
With a as (Select id, my_other records... from postgres_table1)
b as (select id, my_records... from postgres_table2)
c as (select id, my_record.. from postgres_table3, b)
d as (select id, records.. from b),
e as (select id, records.. from d),
f as (select id, records.. from e)
SELECT * FROM
(
--your first
Select
id,
my_records
from a left join c on some_condtion_with_a
) Q1
LEFT OUTER JOIN
(
--your second
Select
id, my_records
from
(
multiple_sub_queries_by_getting_records_from_c
)
) Q2
ON Q1.XXXX = Q2.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
LEFT OUTER JOIN
(
--your third
select
id,
records..
from f
) Q3
ON QX.XXXXX = Q3.XXXX --fill this in !!!!!!!!!!!!!!!!!!!
It'll work, but it might not be the prettiest or most necessary SQL arrangement. As both i and HWNN have said, you can rewrite a lot of these queries where you're just doing some simple selecting in your WITH.. But likely that theyre simple enough that the database optimizer can also see this and rerwite the query for you when it runs it
Just remember to code clearly, and lay your indentation out nicely to stop it tunring into a massive, unmaintainable, undebuggable spaghetti mess

SQL joined by last date

This is a question asked here before more than once, however I couldn't find what I was looking for. I am looking for join two tables, where the joined table is set by the last register ordered by date time, until here all is ok.
My trouble start on having more than two records on the joined table, let me show you a sample
table_a
-------
id
name
description
created
updated
table_b
-------
id
table_a_id
name
description
created
updated
What I have done at the beginning was:
SELECT a.id, b.updated
FROM table_a AS a
LEFT JOIN (SELECT table_a_id, max (updated) as updated
FROM table_b GROUP BY table_a_id ) AS b
ON a.id = b.table_a_id
Until here I was getting cols, a.id and b.updated. I need the full table_b cols, but when I try to add a new col to my query, Postgres tells me that I need to add my col to a GROUP BY criteria in order to complete the query, and the result is not what I am looking for.
I am trying to find a way to have this list.
DISTINCT ON or is your friend. Here is a solution with correct syntax:
SELECT a.id, b.updated, b.col1, b.col2
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
table_a_id, updated, col1, col2
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Or, to get the whole row from table_b:
SELECT a.id, b.*
FROM table_a as a
LEFT JOIN (
SELECT DISTINCT ON (table_a_id)
*
FROM table_b
ORDER BY table_a_id, updated DESC
) b ON a.id = b.table_a_id;
Detailed explanation for this technique as well as alternative solutions under this closely related question:
Select first row in each GROUP BY group?
Try:
SELECT a.id, b.*
FROM table_a AS a
LEFT JOIN (SELECT t.*,
row_number() over (partition by table_a_id
order by updated desc) rn
FROM table_b t) AS b
ON a.id = b.table_a_id and b.rn=1
You can use Postgres's distinct on syntax:
select a.id, b.*
from table_a as a left join
(select distinct on (table_a_id) table_a_id, . . .
from table_b
order by table_a_id, updated desc
) b
on a.id = b.table_a_id
Where the . . . is, you should put in the columns that you want.

SQL: Turn a subquery into a join: How to refer to outside table in nested join where clause?

I am trying to change my sub-query in to a join where it selects only one record in the sub-query. It seems to run the sub-query for each found record, taking over a minute to execute:
select afield1, afield2, (
select top 1 b.field1
from anothertable as b
where b.aForeignKey = a.id
order by field1
) as bfield1
from sometable as a
If I try to only select related records, it doesn't know how to bind a.id in the nested select.
select afield1, afield2, bfield1
from sometable a left join (
select top 1 id, bfield, aForeignKey
from anothertable
where anothertable.aForeignKey = a.id
order by bfield) b on
b.aForeignKey = a.id
-- Results in the multi-part identifier "a.id" could not be bound
If I hard code values in the nested where clause, the select duration drops from 60 seconds to under five. Anyone have any suggestions on how to join the two tables while not processing every record in the inner table?
EDIT:
I ended up adding
left outer join (
select *, row_number() over (partition by / order by) as rank) b on
b.aforeignkey = a.id and b.rank = 1
went from ~50 seconds to 8 for 22M rows.
Try this:
WITH qry AS
(
SELECT afield1,
afield2,
b.field1 AS bfield1,
ROW_NUMBER() OVER(PARTITION BY a.id ORDER BY field1) rn
FROM sometable a LEFT JOIN anothertable b
ON b.aForeignKey = a.id
)
SELECT *
FROM qry
WHERE rn = 1
Try this
select afield1,
afield2,
bfield1
from sometable a
left join
(select top 1 id, bfield, aForeignKey from anothertable where aForeignKey in(a.id) order by bfield) b on b.aForeignKey = a.id