SQLServer join two tables - sql

I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks

You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01

You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )

Related

SQL server - Finding duplicates on linked servers

I need to find duplicates of the same database/table/column on two linked SQL servers
Note that the column may also have duplicates inside the table itself in each individual SQL server !
ie
server1.tableName.columnName:
john
john
mary
kate
kate
server2.tableName.columnName:
kate
I want the result in this case to be kate as it is the only entry that exists in both
I tried this:
select table1.columnName, table2.columnName, count(*)
from [server1].[dbName].[dbo].[tableName] table1
inner join [server2].[dbName].[dbo].[tableName] table2
ON table1.columnName = table2.columnName
group by table1.columnName, table2.columnName having count(table1.columnName) > 1
Which gives a set of results
My question is is this correct ? will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
Not exactly. This would do what you want without the having -- the join is determining whether anything matches.
If you can leave the count out entirely, an alternative formulation uses exists:
select t1.columnName
from [server1].[dbName].[dbo].[tableName] t1
where exists (select 1
from [server2].[dbName].[dbo].[tableName] t2
where t1.columnName = t2.columnName
);
No, you'd only get results which have a duplicate within one of the two servers and is present in both.
The linked servers here are irrelevant, your question functions the same on any two tables. I believe you're asking for one of the two below queries.
DECLARE #table1 TABLE (Id INT)
DECLARE #table2 TABLE (Id INT)
INSERT INTO #table1 (Id)
VALUES (1), (2), (3)
INSERT INTO #table2 (Id)
VALUES (2), (2), (3), (4), (5), (5)
-- original query - 1 result
select table1.Id, table2.Id, count(*)
from #table1 table1
inner join #table2 table2
ON table1.Id = table2.Id
group by table1.Id, table2.Id having count(table1.Id) > 1
-- cross table duplicates - 2 results
select table1.id
from #table1 table1
where exists (select 1 from #table2 table2 where table1.Id = table2.Id)
-- cross/within table duplicates - 3 results
select unioned.Id
from (
select table1.Id
from #table1 table1
union all
select table2.Id
from #table2 table2
) unioned
group by unioned.Id
having count(*) > 1
you can left join the two tables on column name then select the column name from each table and exclude the null values. This will show duplicates if they exist in the output otherwise use a distinct to remove the duplicates.

is there linear interpolation sql query?

I have two tables. Table1 has data recorded at 10 sec intervals and the data in Table2 was recorded at 1 or 2 sec intervals. I want to join these two tables in a way that it will select the whole data from Table1 joined with Table2 where the recording time matches or the recording time in Table two is near to the recording time in Table1.
For example, one row in Table1 was recorded at 21:11:20. This row should be joined with a row in Table2 recorded at 21:11:20 if it exists, otherwise, selects the nearest row, let's say a row at 21:11:19.
Thank you.
Table1
Table2
you could try:
select t1.* from table1 t1 inner join table2 t2 on date_trunc('sec',t1.val) = t2.val;
Databases that allow for function indexes will do slightly better with this query than those without.
It takes a little effort, but it is clearly doable (here: SQL-Server):
-- set upt a test environment with two tables:
create table tbl1 ( i1 int identity primary key, t1 time, v1 float);
create table tbl10 (i10 int identity primary key, t10 time, v10 float);
-- fill them with some test values:
insert into tbl1 (t1,v1) VALUES ('12:00:06',1),('12:00:07',2),('12:00:08',3),('12:00:09',3),('12:00:10',2),('12:00:11',1),('12:00:12',0);
insert into tbl10 (t10,v10) VALUES ('12:00:00',99),('12:00:10',100),('12:00:20',98),('12:00:30',110);
-- and "join" them with interpolation
WITH t10s AS (
SELECT i10, t10, v10, DATEDIFF(second,0,t10) s10 FROM tbl10
)
SELECT t1,v1, v10a*f+v10b*(1.-f) v10int FROM (
SELECT t1, v1, CAST(b-DATEDIFF(second,0.,t1) AS FLOAT)/(b-a) f, ta.v10 v10a, tb.v10 v10b
FROM (
select t1, v1,
(SELECT max(s10) FROM t10s WHERE t10<=t1) a,
(SELECT min(s10) FROM t10s WHERE t10> t1) b
FROM tbl1
) tmp1
INNER JOIN t10s ta ON ta.s10=a
INNER JOIN t10s tb ON tb.s10=b
) tmp2
-- output:
t1 v1 v10int
12:00:06 1 99.6
12:00:07 2 99.7
12:00:08 3 99.8
12:00:09 3 99.9
12:00:10 2 100.0
12:00:11 1 99.8
12:00:12 0 99.6
see the little demo here: https://rextester.com/PQDH85753
In my rudimentary script I had the situation of always having t10 values greater or smaller than the t1 values. To protect against "getting out of range" you could use a COALESCE() function with a default value that applies outside the range.

If one join works per rep id, don't join next

I am matching two datasets that I imported into a Redshift DB: both are at rep id level.
This is my initial query to match the two datasets:
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
#t is the source of truth I am matching to, and unique_id is supposedly the universal identifier (though only matches about 60%) for rep id (internal identifier), however, in some cases #t2 table has (incorrectly) multiple unique_ids per rep, and incorrectly multiple emails.
How can I change it so that it is more restrictive, ie. when getting a match by unique_id- dont match next record for that rep, when matching by email- dont match next record for that rep, and lastly join by firstname/lastname.
Thank you!
I think there are a few ways to skin this cat. As one option you could add a rank for each join as a CASE statement, and then pick out the one that has the min rank:
SELECT *
FROM
(
SELECT *,
min(ranktest) OVER (PARTITION BY t1.unique_id) as minrank
FROM
(
select *,
CASE WHEN lower(trim(t.unique_id))=lower(trim(t2.unique_id)) THEN 1
WHEN lower(trim(t.email))=lower(trim(t2.email)) THEN 2
WHEN ower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1))) THEN 3
END as ranktest
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
) sub1
WHERE ranktest = minrank;
You could also do this by querying twice, once to get your data, and once to get the min(ranktest). It will almost definitely be slower, but.. it's a little prettier:
WITH subquery AS
(
select *,
CASE WHEN lower(trim(t.unique_id))=lower(trim(t2.unique_id)) THEN 1
WHEN lower(trim(t.email))=lower(trim(t2.email)) THEN 2
WHEN ower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1))) THEN 3
END as ranktest
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
or lower(trim(t.email))=lower(trim(t2.email))
or lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))
)
SELECT *
FROM subquery t1
WHERE t1.ranktest = (SELECT min(ranktest) FROM subquery WHERE subquery.unique_id = t1.ranktest)
Alternatively, you could run this as a UNION ALL, testing for the join differently each time to avoid repeats and only allowing the top most ranked join through:
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))=lower(trim(t2.unique_id))
UNION ALL
select *
from #t t
join #t2 t2
on lower(trim(t.unique_id))<>lower(trim(t2.unique_id))
AND lower(trim(t.email))=lower(trim(t2.email))
UNION ALL
select *
FROM #t t
join #t2 t2
ON lower(trim(t.unique_id))<>lower(trim(t2.unique_id))
AND lower(trim(t.email))<>lower(trim(t2.email))
AND lower(trim(split_part(t.first_name,',',1))||trim(split_part(t.last_name,',',1)))=lower(trim(split_part(t2.first_name,',',1))||trim(split_part(t2.last_name,',',1)))

Equivalent Subquery for a Join

I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?
Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A
You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.

SQL Server Simple Problem

EDIT: I have written it a bit wrong gill change my Q
I'm a newbie with SQL and I have a Q..
I made 2 Temp. Tables.
Each has 25 Rows.(DateValue)
I want to combine this 2 tables in a third table..
First Table is [From]
Second Table is [To]...
Both tables have different values
I want to get it like this:
From| To |
1111|2222
2222|3333
3333|4444
etc..
I use this simple Query
Create Table #T3
(
[From] Datetime
,[To] Datetime
)
INSERT Into #T3
SELECT Distinct #T1.[From], #T2.[To]
From #T1,#T2
Where #T1.[From] is not null
And #T2.[To] is not null
Select * from #T3
Drop Table #T3
Drop Table #T2
Drop Table #T1
But my results are like this
From| To |
1111|1111
1111|2222
1111|3333
2222|1111
2222|2222
2222|3333
It multiplies the first field with the second wich gives me alot more records back..
Any help ?
THANKS !
After the OP's edit
This may work as you want (which is not entirely clear):
INSERT INTO #T3
SELECT #T1.[From]
, MIN(#T2.[To])
FROM #T1
JOIN #T2
ON #T1.[From] < #T2.[To]
GROUP BY #T1.[From]
Using
FROM T1, T2
results in all combinations or rows of T1 and T2. It's called a cross product and (properly) used with CROSS JOIN, like this:
FROM T1 CROSS JOIN T2
When you want to join the two tables based on a condition (and not get the cross product), you use a JOIN or INNER JOIN (these two are same thing):
FROM T1 JOIN T2
ON T1.[From] = T2.[To]
will get you all rows combinations where T1.From matches T2.To (on equality). I suppose you wanted to match every row of T1 with the row of T2 where T2.To was just larger than T1.From so I used the "smaller than" < operator instead of the "equality" = operator.
The GROUP BY and MIN() were added to get only the one with smallest T2.To from those rows.
It would do. It will insert a copy of table 2 for each line of table 1, as you didnt say how for it to work out how to extract what you want.
Now, assuming from and to are the same.. you can do
INSERT Into #T3
SELECT Distinct #T1.[From], #T2.[To]
From #T1 left join #T2 on #T1.[From]=#T2.[To]
Where #T1.[From] is not null
if this isnt how you mean (although having same value in both columns would seem counter productive in that sense), what other fields have you got and how would you tie the lines together.