LEFT JOIN on multiple columns with unwanted duplicates - sql

I have been running in circles with a query that is driving me nuts.
The background:
I have two tables, and unfortunately, both have duplicate records. (Dealing with activity logs if that puts it into perspective). Each table comes from a different system and I am trying to join the data together to get a sudo full picture (I realize that I won't get a perfect view because there is no "event key" shared between the two systems; I am attempting to match on a composite of meta data).
Here is what I am working with:
Table1
------------
JobID CustID Name ActionDate IsDuplicate
12345 11111 Ryan 1/1/2015 01:20:20 False
12345 11112 Bob 1/1/2015 02:10:20 False
12345 11111 Ryan 1/1/2015 04:15:35 True
12346 11113 Jim 1/1/2015 05:10:40 False
12346 11114 Jeb 1/1/2015 06:10:40 False
12346 11111 Ryan 1/1/2015 07:10:30 False
Table2
------------
ResponseID CustID ActionDate Browser
11123 10110 12/1/2014 23:32:15 IE
12345 11111 1/1/2015 03:20:20 IE
12345 11112 1/1/2015 05:10:20 Firefox
12345 11111 1/1/2015 06:15:35 Firefox
12346 11113 1/1/2015 07:10:40 Chrome
12346 11114 1/1/2015 08:10:40 Chrome
12346 11111 1/1/2015 10:10:30 Safari
12213 11123 2/1/2015 01:10:30 Chrome
Please note a few things:
- JobID and ResponseID are the same thing
- JobID and ResponseID are indicators of an event on the site (people are responding to an event)
- Action date does not match (system 2 has about an inconsistent 2 hour delay on it but never more that 3 hours delay)
- Note Table2 doesnt have a duplicate flag
- table 1 (~2,000 records) is significantly smaller than table 2 (~16,000 records)
- Note Cust 11111 is bopping around on browsers, taking the same action twice on job 12345 at different times and only taking action once on job 12346
What I am looking for:
Result (ideal)
------------
t1.JobID t1.CustID t1.Name t1.ActionDate t2.Browser
12345 11111 Ryan 1/1/2015 01:20:20 IE
12345 11112 Bob 1/1/2015 02:10:20 Firefox
12345 11111 Ryan 1/1/2015 04:15:35 Firefox
12346 11113 Jim 1/1/2015 05:10:40 Chrome
12346 11114 Jeb 1/1/2015 06:10:40 Chrome
12346 11111 Ryan 1/1/2015 07:10:30 Safari
Note that I JUST want matches for records in Table1. I am getting tons of duplicates because of the join...Which is frustrating.
Here is what I have so far (which I can humbly can say; isn't really close):
SELECT
t1.JobID,
t1.CustID,
t1.Name,
t1.ActionDate,
t2.Browser
FROM
Table1 t1
LEFT OUTER JOIN
Table2 t2
ON
t1.JobID=t2.ResponseID AND
t1.CustID=t2.CustID AND
DATEPART(dd,t1.ActionDate)=DATEPART(dd,t2.ActionDate)

Try changing the join condition for the date to check that t2.actiondate fulfills the condition t1.actiondate <= t2.actiondate <= t1.actiondate + 3 hours
SELECT
t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
FROM
Table1 t1
LEFT JOIN Table2 t2
ON t1.JobID = t2.ResponseID
AND t1.CustID = t2.CustID
AND t2.ActionDate >= t1.ActionDate
AND t2.ActionDate <= DATEADD(hour, 3, t1.ActionDate)
ORDER BY t1.JobID , t1.ActionDate;
With your sample data the result of this query matches your desired result.

One method is to enumerate each table using row_number() and match on the sequence numbers as well:
select t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
from (select t1.*,
row_number() over (partition by JobId, CustId order by ActionDate) as seqnum
from table t1
) t1 join
(select t2.*
row_number() over (partition by ResponseId, CustId order by ActionDate) as seqnum
from table t2
) t2
on t1.JobId = t2.ResponseId and
t1.CustId = t2.CustId and
t1.seqnum = t2.seqnum;
This works for your sample data. However, if there is not a response for every job, then the alignment might get out of whack. If that is a possibility, then date arithmetic might be the better solution.

Related

Types of joins and expected output

I have a table that has wholesale data and retail data.
the data is structured as
Channel
Serial#
Date
WS-Build
12345
1/1/2019
WS-Dealer
34567
1/5/2021
Retail
12345
1/1/2020
Retail
34567
3/5/2021
I would like the output to match on serial#
Each serial # will appear twice in the table. I am trying to get a count of # of units sold via builder or dealer.
Serial#
Channel
WholesaleDate
Retail Date
12345
WS-Build
1/1/2019
1/1/2020
34567
WS-Dealer
1/5/2021
3/5/2021
How can i achieve that by joining on the same table?
Try join by serial and channel
select t1.serial#, t2.WholesaleDate, t2."Retail Date", (*) from table1 t1
join table2 t2 on t1.serial# = t2.serial# and t1.channel = t2.channel
group by t1.serial#, , t2.WholesaleDate, t2."Retail Date";
As long as the retail is after the sale you can do
but i don't get where the counts come in
SELECT
t1."Serial#",t1."Channel", t1."Date" as WholesaleDate, t2."Date" as "Retail Date"
FROM tab1 t1 JOIN tab1 t2 ON t1."Serial#" = t2."Serial#" AND t1."Date" < t2."Date"
Serial#
Channel
wholesaledate
Retail Date
12345
WS-Build
2019-01-01 00:00:00
2020-01-01 00:00:00
34567
WS-Dealer
2021-05-01 00:00:00
2021-05-03 00:00:00
SELECT 2
fiddle

Join records only on first match

im trying to join two tables. I only want the first matching row to be joined the others have to be null.
One of the tables contains daily records per User and the second table contains the goal for each user and day.
The joined result table should only join the firs ocurrence of User and Day and set the others to null. The Goal in the joined table can be interpreted as DailyGoal.
Example:
Table1 Table2
Id Day User Value Id Day User Goal
================================ ============================
01 01/01/2020 Bob 100 01 01/01/2020 Bob 300
02 01/01/2020 Bob 150 02 02/01/2020 Carl 170
03 01/01/2020 Bob 50
04 02/01/2020 Carl 200
05 02/01/2020 Carl 30
ResultTable
Day User Value Goal
============================================
01/01/2020 Bob 100 300
01/01/2020 Bob 150 (null)
01/01/2020 Bob 50 (null)
02/01/2020 Carl 200 170
02/01/2020 Carl 30 (null)
I tryed doing top1, distinct, subqueries but I cant find way to do it. Is this possible?
One option uses window functions:
select t1.*, t2.goal
from (
select t1.*,
row_number() over(partition by day, user order by id) as rn
from table1 t1
) t1
left join table2 t2 on t2.day = t1.day and t2.user = t1.user and t1.rn = 1
A case expression is even simpler:
select t1.*,
case when row_number() over(partition by day, user order by id) = 1
then t2.goal
end as goal
from table1 t1

Sql query to assign value to a column having null value from other row based on different scenarios

I have the below real production data scenario and I am trying to get the desired output. I have to populate all the NULL values for the Worker from other rows (next or previous based on data).
Sample Input
PK Id Status Worker Created Date
--- --- ----------- ----------- -------------
1 101 Waiting NULL 1/1/2019 8:00
2 101 Assigned Jon Doe 1/1/2019 8:10
3 101 Initiated Jon Doe 1/1/2019 8:15
4 102 Waiting NULL 1/1/2019 8:00
5 102 Waiting NULL 1/1/2019 8:12
6 102 Assigned Jane Doe 1/1/2019 8:15
7 103 Waiting NULL 1/1/2019 8:00
9 103 Initiated Jon Doe 1/1/2019 8:15
11 103 Waiting NULL 1/1/2019 8:17
12 103 Assigned Jane Doe 1/1/2019 8:20
13 103 Assigned NULL 1/1/2019 8:22
14 103 Initiated NULL 1/1/2019 8:25
Desired Output
PK Id Status Worker Created Date
--- --- ----------- ----------- -------------
1 101 Waiting Jon Doe 1/1/2019 8:00
2 101 Assigned Jon Doe 1/1/2019 8:10
3 101 Initiated Jon Doe 1/1/2019 8:15
4 102 Waiting Jane Doe 1/1/2019 8:00
5 102 Waiting Jane Doe 1/1/2019 8:12
6 102 Assigned Jane Doe 1/1/2019 8:15
7 103 Waiting Jon Doe 1/1/2019 8:00
9 103 Initiated Jon Doe 1/1/2019 8:15
11 103 Waiting Jane Doe 1/1/2019 8:17
12 103 Assigned Jane Doe 1/1/2019 8:20
13 103 Assigned Jane Doe 1/1/2019 8:22
14 103 Initiated Jane Doe 1/1/2019 8:25
SQL:
select tl.*, RANK() OVER (ORDER BY tl.[Id],tl.[Created Date]) rnk
into #temp
from table tl
select tl.*,
case when tl.[Worker] is null t2.[Worker] else tl.[Worker] end as [Worker Updated]
from #temp tl
left join #temp t2 on tl.[Id]=t2.[Id] and tl.rnk=t2.rnk-1
I am only able to get the correct result for scenario Id 101 in the Input Data Sample. I am not sure how to handle scenario 102 (two consecutive rows having NULL on Worker column) and 103 (Last 2 rows having NULL on Worker).
Can someone please help me on this?
I think what you need is ISNULL() and MAX() OVER() so your query would have something like this :
SELECT
t1.PK
, t1.Id
, t1.Status
, ISNULL(t1.Worker, MAX(t1.Worker) OVER(PARTITION BY Id) ) Worker
, t1.CreatedDate
FROM #temp tl
ISNULL() will check the value, if is it null will replace it with the secondary value. it's the same the case that you have in your query.
MAX(t1.Worker) OVER(PARTITION BY Id)
Since the aggregation functions eliminate nulls, we take this advantage and use it with OVER() clause to partition the rows by Id and get the value that we need using one of the aggregation functions.
Possibly the simplest way is outer apply:
select t.id, t.status, t2.worker, t.date
from t outer apply
(select top (1) t2.*
from t2
where t2.worker is not null and t2.id >= t.id
order by t2.id asc
) t2;
What you really want is the IGNORE NULLS option on LEAD(). However, SQL Server does not support that.
If you want to fill in the most recent values with the preceding value, then follow the same logic with another apply:
select t.id, t.status,
coalesce(tnext.worker, tprev.worker) as worker, t.date
from t outer apply
(select top (1) t2.*
from t2
where t2.worker is not null and t2.id >= t.id
order by t2.id asc
) tnext outer apply
(select top (1) t2.*
from t2
where t2.worker is not null and t2.id <= t.id
order by t2.id desc
) tprev;

Retrieve all distinct records from table and if any changes happen between two similar distinct record then need to consider both. Using select query

I want to convert table1 into table2. As I need to find out all distinct records excluding mis_date fro the table and most important condition is if any changes happen between two similar distinct records than in that case I want both of them as two distinct records.
Example:
i/p
empId Empname Pancard MisDate
123 alex ads234 31/11/2012
123 alex ads234 31/12/2012
123 alex ads234 31/01/2013
123 alex dds124 29/02/2013
123 alex ads234 31/03/2013
123 alex ads234 31/04/2013
123 alex dds124 30/05/2013
Expected o/p
empId Empname Pancard MisDate
123 alex ads234 31/11/2012
123 alex dds124 29/02/2013
123 alex ads234 31/03/2013
123 alex dds124 30/05/2013
Assuming there's only one row for each MisDate (otherwise you'll have to find another way to specify ordering):
SELECT t1.empId, t1.Empname, t1.Pancard
FROM Table1 t1
LEFT OUTER JOIN Table1 t2
ON t2.MisDate = (SELECT MAX(MisDate) FROM Table1 t3 WHERE t3.MisDate < t1.MisDate)
WHERE t2.empId IS NULL
OR t2.empId <> t1.empId OR t2.Empname <> t1.Empname OR t2.Pancard <> t1.Pancard
SQL Fiddle example
This performs a self-join on the previous record, as ordered by MisDate, outputting if it is different or if there is no previous record (it is the first row).
Note: You've got some funky dates. I assume these are just transcription errors and have corrected them in the fiddle.

SQL Join Ignore multiple matches (fuzzy results ok)

I don't even know what the name of my problem is called, so I'm just gonna put some sample data. I don't mind fuzzy results on this (this is the best way I can think to express it. I don't mind if I overlook some data, this is for approximated evaluation, not for detailed accounting, if that makes sense). But I do need every record in TABLE 1, and I would like to avoid the nulls case indicated below.
IS THIS POSSIBLE?
TABLE 1
acctnum sub fname lname phone
12345 1 john doe xxx-xxx-xxxx
12346 0 jane doe xxx-xxx-xxxx
12347 0 rob roy xxx-xxx-xxxx
12348 0 paul smith xxx-xxx-xxxx
TABLE 2
acctnum sub division
12345 1 EAST
12345 2 WEST
12345 3 NORTH
12346 1 TOP
12346 2 BOTTOM
12347 2 BALLOON
12348 1 NORTH
So if we do a "regular outer" join, we'd get some results like this, since the sub 0's don't match the second table:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx null
12347 0 rob roy xxx-xxx-xxxx null
12348 0 paul smith xxx-xxx-xxxx null
But I would rather get
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx TOP
12347 0 rob roy xxx-xxx-xxxx BALLOON
12348 0 paul smith xxx-xxx-xxxx NORTH
And I'm trying to avoid:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12345 1 john doe xxx-xxx-xxxx WEST
12345 1 john doe xxx-xxx-xxxx NORTH
12346 0 jane doe xxx-xxx-xxxx TOP
12346 0 jane doe xxx-xxx-xxxx BOTTOM
12347 0 rob roy xxx-xxx-xxxx BALOON
12348 0 paul smith xxx-xxx-xxxx NORTH
So I decided to go with using a union and two if conditions. I'll accept a null for conditions where the sub account is defined in table 1 but not in table 2, and for everything else, I'll just match against the min.
If I'm understanding correctly, it looks like you're trying to join on the sub column if it matches. If there's no match on sub, then you want it to select the "first" row for that acctnum. Is this correct?
If so, you'll need to left join on the full match, then perform another left join on a select statement that determines the division that corresponds to the lowest sub value for that acctnum. The row_number() function can help you with this, like this:
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
isnull(t2_match.division, t2_first.division) as division
from table1 t1
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
left join
(
select
acctnum,
sub,
division,
row_number() over (partition by acctnum order by sub) as rownum
from table2
) t2_first on t2_first.acctnum = t1.acctnum
EDIT
If you don't care at all about which record you get back from table 2 when a matching sub doesn't exist, you could combine two different queries (one that matches the sub and one that just takes the min or max division) with a union.
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
t2.division
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum and t2.sub = t1.sub
union
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
min(t2.division)
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
where t2_match.acctnum is null
Personally, I don't find the union syntax any more compelling and you now have to maintain the query in two places. For this reason, I'd favor the row_number() approach.
try to use
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum AND Table_1.sub = Table_2.sub
where Table_1.sub>0
group by Table_1.acctnum
union
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum
where Table_1.sub=0
group by Table_1.acctnum
this is the result
12345 1 john doe xxxxxxxxxx EAST
12346 0 jane doe xxxxxxxxxx BOTTOM
12347 0 rob roy xxxxxxxxxx BALLOON
12348 0 paul smith xxxxxxxxxx NORTH
if you change min to max TOP will be insted of BOTTOM on the second row
It may also work for you:
SELECT t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone,
ISNULL(MAX(t2.division),MAX(t3.division)) as division
FROM table_1 t1
LEFT JOIN table_2 t2 ON (t2.acctnum = t1.acctnum AND t1.sub = t2.sub)
LEFT JOIN table_2 t3 ON (t3.acctnum = t1.acctnum)
GROUP BY t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone
This will give your desired result, exactly (for the shown data):
Updated to not assume there is always a sub==1 value:
SELECT
T1.acctnum,
T1.sub,
T1.fname,
T1.lname,
T1.phone,
T2.division
FROM
TABLE_1 T1
LEFT JOIN
TABLE_2 T2 ON T1.acctnum = T2.acctnum
AND
T2.sub = (SELECT MIN(T3.sub) FROM TABLE_2 T3 WHERE T1.acctnum = T3.acctnum)
ORDER BY
T1.lname,
T1.fname,
T1.acctnum