I'm looking for ... lack of a better word, an "exclusive join". I want only 1 row from the right to match a row on the left and cannot be joined to another preceding row. Solution should be for MS SQL 2012.
Example: I have two tables A and B each with 10 rows. Row A1 matches B3 and B5, and Row A3 matches B3 and B5. Result set should include 2 rows: A1 joined to B3 (because it's the first match on the right), and A3 joined to B5 (because it's the first match on the right that hasn't already been used).
Obviously, I'm trying to avoid cursors. Perhaps a recursive CTE is the only other way to go about this?
One way to do this is to apply a row_number call to the join column, and then add a condition on it when you join:
SELECT a.*, b.*
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY join_col ORDER BY 1) rn
FROM table_a) a
JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY join_col ORDER BY 1) rn
FROM table_b) b ON a.join_col = b.join_col AND a.rn = b.rn
See the script beneath. CTE and cursor are not knockout demands I understand... Anyway, I hope it leads to your answer.
Without a identifier to join on (as not provided in your question, I think a cross join is needed, and partition by will do the trick.
Something like this:
create table #t (a nvarchar(2),rna int, b nvarchar(2), rnb int)
insert into #t select * from
(select a
,row_number() over (partition by a order by a) rna
,b
,row_number() over (partition by b order by b) rnb
from table_a
cross join
table_b
) as x
where rnb=1
select a, b from #t
drop table #t
And here a script to create the tables:
create table table_a (a nvarchar(2))
insert into table_a values ('A1'),('A3');
create table table_b (b nvarchar(2))
insert into table_b values ('B3'),('B5');
Related
I need to find the closest value in the right table and combine them all.
but for doing do my left join query runs on all the permutations and it tasks a lot of resources to calculate (my basic tables are huge)
For example my A table looks like
<A,1>
<A,2>
<A,10>
And table B looks like :
<A,4>
<A,5>
<A,6>
<A,7>
For this example the result will be:
<A,1,4>
<A,2,4>
<A,10,7>
This how I thought to do so:
select * from (
select *,row_number() over(partition by rown order by abs(b-a) asc) diff from (
(select a,b, row_number over () rown from x) a
CROSS JOIN
(select a,b from x) b
on a.a = b.a
) )where diff =1
Is there a better and efficient way to do so?
Consider below example
select id, a.val a_val, b.val b_val
from tableA a
left join tableB b
using(id)
where true
qualify row_number() over(partition by id, a.val order by abs(a.val - b.val)) = 1
If applied to sample data in your question - output is
Table A has a single field.
Table B has many fields.
I want to copy each value from Table A to a row on Table B but I have no values with which to join the two tables.
It doesn't matter which row in B each value in A goes into as long as each value in Table A only appears once in Table B.
I do not want to use loops.
You could assign a row number to each of the two tables, and then do an update join:
WITH cte1 AS (
SELECT col, ROW_NUMBER() OVER (ORDER BY col) rn
FROM TableA
),
WITH cte2 AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY some_col) rn
FROM TableB
)
UPDATE t2
SET col = t1.col
FROM cte2 t2
INNER JOIN cte1 t1
ON t1.rn = t2.rn
This solution makes several assumptions, including that TableB already has a destination column col for the data coming from the first TableA table's single column, and that the types match in both tables. It also assumes that TableB has more rows than TableA to fit the data from TableB. If not, data would be lost.
So I am writing to a table the output from a few sequential CTEs, and when I fixed a join in one of the CTEs from an inner to a left join, there are now duplicated entries in the Table that do not show up if I just run the query without the insert.
Is there something I need to understanding about creating and inserting into a table with regards to joins in a CTE?
EDIT
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select * from cte1
When creating the table on the inner join there is no problem with duplicates; on the left join (as shown above), rows where there are NULLs appear to be duplicated many times.
Check your right table (table2) my guess is that there are more than one record that have the same ID and Date.
If that is the case, the records are not technically duplicated if you do a select all (*) in the CTE, you will see the other fields that have changed.
If you do not care about the rest of the fields being different though, just try adding a Row_Number to your CTE and select where the Row_Number = 1 outside of the CTE.
For Instance:
create table MYTABLE
(
ID int,
Date smalldatetime,
Val1 int,
Val2 int
)
; with cte1 as (
select
a.ID,
a.Date,
a.Val1,
b.Val2
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date, a.Val1, a.Val2 ORDER BY ID)
from table1 a
left join table2 b
on a.ID = b.ID
and a.Date = b.Date
)
insert into MYTABLE
(ID, Date, Val1, Val2)
select ID, Date, Val1, Val2 from cte1
where Rnum = 1
The row_number acts as a "distinct" and depending on what combination of fields you want to not duplicate, you will get different results.
For instance, if you do not want the IDs to duplicate, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY ID)
if you do not care about the IDs duplicating, but you do not want the same ID on the same date, then
Rnum = ROW_NUMBER() OVER(PARTITION BY a.ID, a.Date ORDER BY ID)
etc.... just depends on your selection criteria of what you do not want to duplicate.
Hope this helps
Basically I have a table that contains 1000 rows with three columns. (TABLE A)
I have ANOTHER table with 200 columns with 1million+ records. (TABLE B)
I am trying to replace the THREE COLUMNS OF 1000 rows of TABLE B with those of TABLE A. I've read a lot of solutions where you can INSERT into table B from TABLE A.. but that's useless because I'll get NULLs in the remaining 197 columns that I need data for.
So the task is to replace rows of certain columns from one table to select columns of another table. There is NO conditions, just the top rows or whatever order you can think of is fine. If you can give an answer that takes ORDER BY something into account, that'd be bonus! Thank you so much!
If I understood your requirements
WITH TA
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableA),
TB
AS (SELECT *,
ROW_NUMBER()
OVER (
ORDER BY col1) AS RN
FROM TableB)
UPDATE TB
SET TB.col1 = TA.col1,
TB.col2 = TA.col2,
TB.col3 = TA.col3
FROM TB
JOIN TA
ON TB.RN = TA.RN
Try something like this:
WITH topB AS (
SELECT TOP 1000 row_number() OVER(ORDER BY field_n) rn, b.* FROM table_b b
ORDER BY field_x),
topA AS (
SELECT row_number() OVER(ORDER BY field_m) rn, a.*
FROM table_a a)
UPDATE b
SET
b.Field_1 = a.Field_1,
b.Field_2 = a.Field_2,
b.Field_3 = a.Field_3
FROM
TopB b JOIN TopA a ON b.rn = a.rn
Idea here is to assign row numbers in both tables, join them by these numbers, and update the B part of the join with values from A.
I have two table:
A:
id code
1 A1
2 A1
3 B1
4 B1
5 C1
6 C1
=====================
B:
id Aid
1 1
2 4
(B doesn't contain the Aid which link to code C1)
Let me explain the overall flow:
I want to make each row in table A have different code(by delete duplicate),and I want to retain the Aid which I can find in table B.If Aid which not be saved in table B,I retain the id bigger one.
so I can not just do something as below:
DELETE FROM A
WHERE id NOT IN (SELECT MAX(id)
FROM A
GROUP BY code,
)
I can get each duplicate_code_groups by below sql statement:
SELECT code
FROM A
GROUP BY code
HAVING COUNT(*) > 1
Is there some code in sql like
for (var ids in duplicate_code_groups){
for (var id in ids) {
if (id in B){
return id
}
}
return max(ids)
}
and put the return id into a idtable?? I just don't know how to write such code in sql.
then I can do
DELETE FROM A
WHERE id NOT IN idtable
Using ROW_NUMBER() inside CTE (or sub-query) you can assign numbers for each Code based on your ordering and then just join the result-set with your table A to make a delete.
WITH CTE AS
(
SELECT A.*, ROW_NUMBER() OVER (PARTITION BY A.Code ORDER BY COALESCE(B.ID,0) DESC, A.ID desc) RN
FROM A
LEFT JOIN B ON A.ID = B.Aid
)
DELETE A FROM A
INNER JOIN CTE C ON A.ID = C.ID
WHERE RN > 1;
SELECT * FROM A;
SQLFiddle DEMO
The first select gives you all A.id that are in B - you don't want to delete them. The second select takes A, selects all codes without an id that appears in B, and from this subset takes the maximum id. These two sets of ids are the ones you want to keep, so the delete deletes the ones not in the sets.
DELETE from A where A.id not in
(
select aid from B
union
select MAX(A.id) from A left outer join B on B.Aid=A.id group by code having COUNT(B.id)=0
)
Actual Execution Plan on MS SQL Server 2008 R2 reveals that this solution performs quite well, it's 5-6 times faster than Nenad's solution :).
Try this Solution
DELETE FROM A
WHERE NOT id IN
(
SELECT MAX(B.AId)
FROM A INNER JOIN B ON A.id = B.aId
)