The efficient way to use left join to find the closest value - sql

I need to find the closest value in the right table and combine them all.
but for doing do my left join query runs on all the permutations and it tasks a lot of resources to calculate (my basic tables are huge)
For example my A table looks like
<A,1>
<A,2>
<A,10>
And table B looks like :
<A,4>
<A,5>
<A,6>
<A,7>
For this example the result will be:
<A,1,4>
<A,2,4>
<A,10,7>
This how I thought to do so:
select * from (
select *,row_number() over(partition by rown order by abs(b-a) asc) diff from (
(select a,b, row_number over () rown from x) a
CROSS JOIN
(select a,b from x) b
on a.a = b.a
) )where diff =1
Is there a better and efficient way to do so?

Consider below example
select id, a.val a_val, b.val b_val
from tableA a
left join tableB b
using(id)
where true
qualify row_number() over(partition by id, a.val order by abs(a.val - b.val)) = 1
If applied to sample data in your question - output is

Related

How can I display two columns together in SQL?

I have 2 queries that return data in the form:
query 1:
column 1
a
b
c
query 2:
column 2
d
e
How can I combine the 2 queries to get output as:
column 1 column 2
a d
b e
c
The order of data in the columns does not matter.
Possibly anything with joins ?
Thanks
use row_number()
select t1.col1,t2.col2 from
(
select *,row_number() over(order by col1) rn from query1
) t1 full outer join
(
select *,row_number() over(order by col2) rn from query2
) t2 on t1.rn=t2.rn
For n,m rows use full outer join
A possible solution is selecting both columns with row_number() and join them by the row_number. One must be aware to select first from the table with the higher number of rows. Example:
select
col_1,
col_2
from (
select
a.col_1,
row_number() over () rn
from a
) s1
FULL OUTER JOIN (
select
b.col_2,
row_number() over () rn
from b
) s2 on s1.rn = s2.rn

SQL Select first that matches the criteria

I have a query where in I have to sort the table and retrieve the first value that matches for every id.
The scenario that I would like to achieve is to get the ID of Table A that matches the first ID_2 from the sorted Table B
I have a slight concept of the code.
select A.ID, A.COL1, B.COL1, B.COL2
from A, B
where A.ID = B.ID
and B.ID_2 = (select ID_2
from (select ID_2
from B B2
where B2.ID = A.ID
order by (case when B2.PRIO ...))
where rownum = 1)
The problem here is A.ID is not accessible within the select in where clause.
Another way that I found was using analytic function
select ID, COL1, COL2
from (select A.ID, A.COL1, B.COL2,
row_number() over (partition by A.ID order by (case when B.PRIO ...) row_num
from A, B
where A.ID = B.ID)
where row_num = 1
The problem with this code is I think it is not good performance wise.
Can anyone help me? =)
row_number() is not a statistic function. It is an analytic or window function. It is probably your best bet. I would do:
select a.*
from A join
(select b.*,
row_number() over (partition by b.ID order by (case when b.PRIO ...) as seqnum
from b
) b
on A.ID = B.ID and b.seqnum = 1;
If you really only want A.ID, then you don't need A at all . . . the information is in B.ID (assuming it is not being used for filtering). The above then simplifies to:
select b.id
from (select b.*,
row_number() over (partition by b.ID order by (case when b.PRIO ...) as seqnum
from b
) b
where b.seqnum = 1;
You don't need a correlated sub-subquery (which is invalid in Oracle), and you don't need an analytic function either. You need the aggregate first/last function.
... and b.id_2 = (select max(id_2) keep (dense_rank first order by case.....)
from b b2
where b2.id = a.id
) .....
Even this is probably too complicated. If you would describe your requirement (instead of just posting some incomplete code), the community may be able to help you simplify the query even further.

SQL Server 2012 writing duplicate entries into table from CTE

So I am writing to a table the output from a few sequential CTEs, and when I fixed a join in one of the CTEs from an inner to a left join, there are now duplicated entries in the Table that do not show up if I just run the query without the insert.
Is there something I need to understanding about creating and inserting into a table with regards to joins in a CTE?
EDIT
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select * from cte1
When creating the table on the inner join there is no problem with duplicates; on the left join (as shown above), rows where there are NULLs appear to be duplicated many times.
Check your right table (table2) my guess is that there are more than one record that have the same ID and Date.
If that is the case, the records are not technically duplicated if you do a select all (*) in the CTE, you will see the other fields that have changed.
If you do not care about the rest of the fields being different though, just try adding a Row_Number to your CTE and select where the Row_Number = 1 outside of the CTE.
For Instance:
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date, a.Val1, a.Val2 ORDER BY ID)
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select ID, Date, Val1, Val2 from cte1
where Rnum = 1
The row_number acts as a "distinct" and depending on what combination of fields you want to not duplicate, you will get different results.
For instance, if you do not want the IDs to duplicate, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY ID)
if you do not care about the IDs duplicating, but you do not want the same ID on the same date, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date ORDER BY ID)
etc.... just depends on your selection criteria of what you do not want to duplicate.
Hope this helps

How to replace TOP 1000 rows of select columns indiscriminately

Basically I have a table that contains 1000 rows with three columns. (TABLE A)
I have ANOTHER table with 200 columns with 1million+ records. (TABLE B)
I am trying to replace the THREE COLUMNS OF 1000 rows of TABLE B with those of TABLE A. I've read a lot of solutions where you can INSERT into table B from TABLE A.. but that's useless because I'll get NULLs in the remaining 197 columns that I need data for.
So the task is to replace rows of certain columns from one table to select columns of another table. There is NO conditions, just the top rows or whatever order you can think of is fine. If you can give an answer that takes ORDER BY something into account, that'd be bonus! Thank you so much!
If I understood your requirements
WITH TA
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableA),
TB
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableB)
UPDATE TB
SET TB.col1 = TA.col1,
TB.col2 = TA.col2,
TB.col3 = TA.col3
FROM TB
JOIN TA
ON TB.RN = TA.RN
Try something like this:
WITH topB AS (
SELECT TOP 1000 row_number() OVER(ORDER BY field_n) rn, b.* FROM table_b b
ORDER BY field_x),
topA AS (
SELECT row_number() OVER(ORDER BY field_m) rn, a.*
FROM table_a a)
UPDATE b
SET
b.Field_1 = a.Field_1,
b.Field_2 = a.Field_2,
b.Field_3 = a.Field_3
FROM
TopB b JOIN TopA a ON b.rn = a.rn
Idea here is to assign row numbers in both tables, join them by these numbers, and update the B part of the join with values from A.

SQL with multiple select statements for distinct entities

I am trying to learn some SQL and what I want to do is simply select some rows from a table according to some criteria.
So, I am trying something like:
Select * from mytable where id=1090 as A, Select * from mytable where id=1075 as B;
I need to keep them as distinct entities (A and B) in my example, so that I can do something like:
Select A.col, B.row from A, B where <some criteria>
I am unable to figure out how to put all this together in a SQL query
You can do this magic to achieve what you want, if I understand you correctly. This will return rows side by side, don't know how to explain:
;with a as(select *, row_number() over(order by(select null)) rn
from tableA where id = 1090),
b as(select *, row_number() over(order by(select null)) rn
from tableB where id = 1075)
select a.*, b.*
from a
full join b on a.rn = b.rn
If first select returns 4 rows and second 2 rows the output will be something like:
A(rn, cols) B(rn, cols)
1 ......... 1 .........
2 ......... 2 .........
3 ......... NULL
4 ......... NULL
select A.*, B.*
from
(Select * from mytable where id=1090) as A
join
(Select * from mytable where id=1075) as B ON <some criteria>