Join records only on first match - sql

im trying to join two tables. I only want the first matching row to be joined the others have to be null.
One of the tables contains daily records per User and the second table contains the goal for each user and day.
The joined result table should only join the firs ocurrence of User and Day and set the others to null. The Goal in the joined table can be interpreted as DailyGoal.
Example:
Table1 Table2
Id Day User Value Id Day User Goal
================================ ============================
01 01/01/2020 Bob 100 01 01/01/2020 Bob 300
02 01/01/2020 Bob 150 02 02/01/2020 Carl 170
03 01/01/2020 Bob 50
04 02/01/2020 Carl 200
05 02/01/2020 Carl 30
ResultTable
Day User Value Goal
============================================
01/01/2020 Bob 100 300
01/01/2020 Bob 150 (null)
01/01/2020 Bob 50 (null)
02/01/2020 Carl 200 170
02/01/2020 Carl 30 (null)
I tryed doing top1, distinct, subqueries but I cant find way to do it. Is this possible?

One option uses window functions:
select t1.*, t2.goal
from (
select t1.*,
row_number() over(partition by day, user order by id) as rn
from table1 t1
) t1
left join table2 t2 on t2.day = t1.day and t2.user = t1.user and t1.rn = 1
A case expression is even simpler:
select t1.*,
case when row_number() over(partition by day, user order by id) = 1
then t2.goal
end as goal
from table1 t1

Related

subquery calculate days between dates

Sub query, SQL, Oracle
I'm new to sub queries and hoping to get some assistance. My thought was the sub query would run first and then the outer query would execute based on the sub query filter of trans_code = 'ABC'. The query works but it pulls all dates from all transaction codes, trans_code 'ABC' and 'DEF' ect.
The end goal is to calculate the number of days between dates.
The table structure is:
acct_num effective_date
1234 01/01/2020
1234 02/01/2020
1234 03/01/2020
1234 04/01/2021
I want to execute a query to look like this:
account Effective_Date Effective_Date_2 Days_Diff
1234 01/01/2020 02/01/2020 31
1234 02/01/2020 03/01/2020 29
1234 03/01/2020 04/01/2021 395
1234 04/01/2021 0
Query:
SELECT t3.acct_num,
t3.trans_code,
t3.effective_date,
MIN (t2.effective_date) AS effective_date2,
MIN (t2.effective_date) - t3.effective_date AS days_diff
FROM (SELECT t1.acct_num, t1.trans_code, t1.effective_date
FROM lawd.trans t1
WHERE t1.trans_code = 'ABC') t3
LEFT JOIN lawd.trans t2 ON t3.acct_num = t2.acct_num
WHERE t3.acct_num = '1234' AND t2.effective_date > t3.effective_date
GROUP BY t3.acct_num, t3.effective_date, t3.trans_code
ORDER BY t3.effective_date asc
TIA!
Use lead():
select t.*,
lead(effective_date) over (partition by acct_num order by effect_date) as next_efffective_date,
(lead(effective_date) - effective_date) as diff
from lawd.trans t

Select all user that have a specific date and have been recorded only one time

I have this table.
My Sql table in (SQL Fiddle)
ID Date Value
___ ____ _____
3241 01/01/00 15456
3241 9/17/12 5
3241 9/16/12 100
3241 9/15/12 20
4355 01/01/00 01
4355 9/16/12 12
4355 9/15/12 132
4355 9/14/12 4
1001 01/01/00 456
1001 9/16/12 125
5555 01/01/00 01
1234 01/01/00 01
1234 9/16/12 45
2236 01/01/00 879
2236 9/15/12 128
2236 9/14/12 323
2002 01/01/00 567
I would like, to select all the record that have 01-01-00 as date and have been showed only one time.
The result that i'm trying to have is like the table below.
ID Date Value
___ ____ _____
5555 01/01/00 01
2002 01/01/00 567
I tried to use HAVING clause but because of the GROUP BY, the result is wrong because one of my select has more than one record which isn't good for my case.
My Wrong Attempt:
SELECT * FROM
(SELECT *
FROM table1
GROUP BY id, date, value
HAVING count(Id)=1) t1
WHERE date='01-01-00'
Query Result (SQL Fiddle)
I would use:
select id, max(date) as date, max(value) as value
from t1
group by id
having max(date) = '01-01-00' and count(*) = 1;
A somewhat faster method might be:
select t1.*
from t1
where date = '01-01-00' and
not exists (select 1 from t1 tt1 where tt1.id = t1.id and tt1.date <> '01-01-00');
This can take advantage of index on t1(date) and t1(id, date).
Use IN
SELECT *
FROM table1
WHERE id IN (
SELECT id
FROM table1
GROUP BY id
HAVING count(Id)=1
) and date='01-01-00'
I just didn't notice that i make a error in my group by instead of making only the ID, I put all the columns
SELECT * FROM
(SELECT *
FROM Table1
GROUP BY `ID`
HAVING count(`ID`)=1) t1
WHERE `Date`='2000-01-01 00:00:00'
However for my problem, I take this solution from Gordon Linoff because it's seems better for me.
PS:I have 2 million records.
Change your query as
SELECT *
FROM
(SELECT *
FROM table1
GROUP BY id
HAVING count(id)=1) t1
WHERE t1.date='01-01-00'

LEFT JOIN on multiple columns with unwanted duplicates

I have been running in circles with a query that is driving me nuts.
The background:
I have two tables, and unfortunately, both have duplicate records. (Dealing with activity logs if that puts it into perspective). Each table comes from a different system and I am trying to join the data together to get a sudo full picture (I realize that I won't get a perfect view because there is no "event key" shared between the two systems; I am attempting to match on a composite of meta data).
Here is what I am working with:
Table1
------------
JobID CustID Name ActionDate IsDuplicate
12345 11111 Ryan 1/1/2015 01:20:20 False
12345 11112 Bob 1/1/2015 02:10:20 False
12345 11111 Ryan 1/1/2015 04:15:35 True
12346 11113 Jim 1/1/2015 05:10:40 False
12346 11114 Jeb 1/1/2015 06:10:40 False
12346 11111 Ryan 1/1/2015 07:10:30 False
Table2
------------
ResponseID CustID ActionDate Browser
11123 10110 12/1/2014 23:32:15 IE
12345 11111 1/1/2015 03:20:20 IE
12345 11112 1/1/2015 05:10:20 Firefox
12345 11111 1/1/2015 06:15:35 Firefox
12346 11113 1/1/2015 07:10:40 Chrome
12346 11114 1/1/2015 08:10:40 Chrome
12346 11111 1/1/2015 10:10:30 Safari
12213 11123 2/1/2015 01:10:30 Chrome
Please note a few things:
- JobID and ResponseID are the same thing
- JobID and ResponseID are indicators of an event on the site (people are responding to an event)
- Action date does not match (system 2 has about an inconsistent 2 hour delay on it but never more that 3 hours delay)
- Note Table2 doesnt have a duplicate flag
- table 1 (~2,000 records) is significantly smaller than table 2 (~16,000 records)
- Note Cust 11111 is bopping around on browsers, taking the same action twice on job 12345 at different times and only taking action once on job 12346
What I am looking for:
Result (ideal)
------------
t1.JobID t1.CustID t1.Name t1.ActionDate t2.Browser
12345 11111 Ryan 1/1/2015 01:20:20 IE
12345 11112 Bob 1/1/2015 02:10:20 Firefox
12345 11111 Ryan 1/1/2015 04:15:35 Firefox
12346 11113 Jim 1/1/2015 05:10:40 Chrome
12346 11114 Jeb 1/1/2015 06:10:40 Chrome
12346 11111 Ryan 1/1/2015 07:10:30 Safari
Note that I JUST want matches for records in Table1. I am getting tons of duplicates because of the join...Which is frustrating.
Here is what I have so far (which I can humbly can say; isn't really close):
SELECT
t1.JobID,
t1.CustID,
t1.Name,
t1.ActionDate,
t2.Browser
FROM
Table1 t1
LEFT OUTER JOIN
Table2 t2
ON
t1.JobID=t2.ResponseID AND
t1.CustID=t2.CustID AND
DATEPART(dd,t1.ActionDate)=DATEPART(dd,t2.ActionDate)
Try changing the join condition for the date to check that t2.actiondate fulfills the condition t1.actiondate <= t2.actiondate <= t1.actiondate + 3 hours
SELECT
t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
FROM
Table1 t1
LEFT JOIN Table2 t2
ON t1.JobID = t2.ResponseID
AND t1.CustID = t2.CustID
AND t2.ActionDate >= t1.ActionDate
AND t2.ActionDate <= DATEADD(hour, 3, t1.ActionDate)
ORDER BY t1.JobID , t1.ActionDate;
With your sample data the result of this query matches your desired result.
One method is to enumerate each table using row_number() and match on the sequence numbers as well:
select t1.JobID, t1.CustID, t1.Name, t1.ActionDate, t2.Browser
from (select t1.*,
row_number() over (partition by JobId, CustId order by ActionDate) as seqnum
from table t1
) t1 join
(select t2.*
row_number() over (partition by ResponseId, CustId order by ActionDate) as seqnum
from table t2
) t2
on t1.JobId = t2.ResponseId and
t1.CustId = t2.CustId and
t1.seqnum = t2.seqnum;
This works for your sample data. However, if there is not a response for every job, then the alignment might get out of whack. If that is a possibility, then date arithmetic might be the better solution.

Select Most Recent Entry in SQL

I'm trying to select the most recent non zero entry from my data set in SQL. Most examples of this are satisfied with returning only the date and the group by variables, but I would also like to return the relevant Value. For example:
ID Date Value
----------------------------
001 2014-10-01 32
001 2014-10-05 10
001 2014-10-17 0
002 2014-10-03 17
002 2014-10-20 60
003 2014-09-30 90
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-17 0
005 2014-10-18 9
Using
SELECT ID, MAX(Date) AS MDate FROM Table WHERE Value > 0 GROUP BY ID
Returns:
ID Date
-------------------
001 2014-10-05
002 2014-10-20
003 2014-10-10
004 2014-10-06
005 2014-10-18
But whenever I try to include Value as one of the selected variables, SQLServer results in an error:
"Column 'Value' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause."
My desired result would be:
ID Date Value
----------------------------
001 2014-10-05 10
002 2014-10-20 60
003 2014-10-10 7
004 2014-10-06 150
005 2014-10-18 9
One solution I have thought of would be to look up the results back in the original Table and return the Value that corresponds to the relevant ID & Date (I have already trimmed down and so I know these are unique), but this seems to me like a messy solution. Any help on this would be appreciated.
NOTE: I do not want to group by Value as this is the result I am trying to pull out in the end (i.e. for each ID, I want the most recent Value). Further Example:
ID Date Value
----------------------------
001 2014-10-05 10
001 2014-10-06 10
001 2014-10-10 10
001 2014-10-12 8
001 2014-10-18 0
Here, I only want the last non zero entry. (001, 2014-10-12, 8)
SELECT ID, MAX(Date) AS MDate, Value FROM Table WHERE Value > 0 GROUP BY ID, Value
Would return:
ID Date Value
----------------------------
001 2014-10-10 10
001 2014-10-12 8
This can also be done using a window function which is very ofter faster than a join on a grouped query:
select id, date, value
from (
select id,
date,
value,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
Assuming you don't have repeated dates for the same ID in the table, this should work:
SELECT A.ID, A.Date, A.Value
FROM
T1 AS A
INNER JOIN (SELECT ID,MAX(Date) AS Date FROM T1 WHERE Value > 0 GROUP BY ID) AS B
ON A.ID = B.ID AND A.Date = B.Date
select a.id, a.date, a.value from Table1 a inner join (
select id, max(date) mydate from table1
where Value>0 group by ID) b on a.ID=b.ID and a.Date=b.mydate
Using Subqry,
SELECT ID, Date AS MDate, VALUE
FROM table t1
where date = (Select max(date)
from table t2
where Value >0
and t1.id = t2.id
)
Answers provided are perfectly adequate, but Using CTE:
;WITH cteTable
AS
(
SELECT
Table.ID [ID], MAX(Date) [MaxDate]
FROM
Table
WHERE
Table.Value > 0
GROUP BY
Table.ID
)
SELECT
cteTable.ID, cteTable.Date, Table.Value
FROM
Table INNER JOIN cteTable ON (Table.ID = cteTable.ID)

tsql proc logic help

I am weak in SQL and need some help working through some logic with my proc.
Three pieces: store procedure, table1, table2
Table 1 stores most recent data for specific IDs
Customer_id status_dte status_cde app_dte
001 2010-04-19 Y 2010-04-19
Table 2 stores history of data for specific customer IDs:
For example:
Log_id customer_Id status_dte status_cde
01 001 2010-04-20 N
02 001 2010-04-19 Y
03 001 2010-04-19 N
04 001 2010-04-19 Y
The stored proecure currently throws an error if the status date from
table1 is < than app_date in table1.
If #status_dte < app_date
Error
Note: #status_dte is a variable stored as the status_dte from table1
However, I want it to throw an error when the EARLIEST status_dte from
table 2 with a status_cde of 'Y' is less than the app_dte column in
table 1.
Keep in mind that this earliest date is not stored anywhere, the history
of data changes per customer. Another customer might have the following
history.
Log_id customer_Id status_dte status_cde
01 002 2010-04-20 N
02 002 2010-04-18 N
03 002 2010-04-19 Y
04 002 2010-04-19 Y
Any ideas on how I can approach this?
You can test in one go per customer to find where the earliest date is less than the appdate using this construct
IF EXISTS (SELECT *
FROM
mytable M
JOIN
HistoryTable H ON M.customer_Id = H.customer_Id
WHERE
H.status_cde = 'Y'
GROUP BY
H.customer_Id, M.app_dte
HAVING
MIN(H.status_dte) < M.app_dte)
...error...
If instead of a single customer, you wanted a list of customers with their earliest status date prior to the app_date, you could do something like:
;With
CustomerStatusDates As
(
Select T2.customer_id, Min(T2.status_dte) As MinDate
From Table2 As T2
Where status_cte = 'Y'
Group By T2.customer_id
)
Select ....
From Table1 As T1
Join CustomerStatusDates As T2
On T2.Customer_Id = T1.Customer_Id
Where T2.MinDate < T1.app_dte