SQL(BigQuery) Join two tables with lag() function - sql

Considering I have a two tables.
First one:
user_id
name
timestamp1
1
purchase
12
1
purchase
14
2
purchase
22
2
purchase
14
Second one:
user_id
event_name
timestamp2
1
event1
10
1
event2
11
2
event12
20
2
event10
12
A want to add to the table one some fields(event_name, timestamp2) from the table two with the closest previous values by user_id, order by timestamp for every event from table one
Desired table should look like this
user_id
name
timestamp1
event_name
timestamp2
1
purchase
12
event2
11
1
purchase
14
event2
11
2
purchase
22
event12
20
2
purchase
14
event10
12
Help me please with sql query!
Thanks.

You can use join on user_id then using row_number() ordered by the distance between timestamp1 and timestamp2 to get the closest row from table2:
SELECT user_id, name, timestamp1, event_name, timestamp2
FROM (
SELECT t1.*, t2.event_name, t2.timestamp2,
ROW_NUMBER() OVER(PARTITION BY t1.user_id, t1.timestamp1 ORDER BY ABS(t1.timestamp1 - t2.timestamp2)) AS rn
FROM table1 t1
INNER JOIN table2 t2
ON t1.user_id = t2.user_id
)
WHERE rn = 1
Output:

select any_value(t1).*,
array_agg(struct(event_name,timestamp2) order by timestamp2 desc limit 1)[offset(0)].*
from `project.dataset.table1` t1
cross join `project.dataset.table2` t2
where t2.user_id = t1.user_id and timestamp2 < timestamp1
group by format('%t', t1)
if to apply to sample data in your question - output is

Related

sql: max value by 2 columns in another table

I have 2 tables and for every id in the first table I need to find max value in the date_2 column that would be lower than a value in the date_1 column.
Tables:
table 1
id
date_1
1
01.01.2020
1
11.01.2020
2
02.11.2020
2
02.12.2020
3
12.12.2020
3
31.01.2021
table 2
id
date_2
1
30.12.2019
1
05.01.2020
2
01.11.2020
2
30.10.2020
3
10.11.2020
3
31.12.2020
outcome needed:
id
date_1
max(date_2) within id,date_1
1
01.01.2020
30.12.2019
1
11.01.2020
05.01.2020
2
02.11.2020
01.11.2020
2
02.12.2020
01.11.2020
3
12.12.2020
10.11.2020
3
31.01.2021
31.12.2020
appreciate your help with this!
you could rank each row (I'm doing it here with row_number() function) then match on the id and the ranking.
with t1 as (select id, date_1,
row_number() over (partion by id order by date1) as rn
from table1),
t2 as (select id, date_2,
row_number() over (partion by id order by date2) as rn
from table2 ),
select id, date1, date2
from t1 inner join t2 on t1.id = t2.id and t1.rn = t2.rn
You can pretty much write a simple correlated query using exists that mirrors the English narrative:
select id, (
select Max(date_2) /* find max value in the date_2 column */
from t2
where t2.id = t1.id /* for every id in the first table */
and t2.date_2 < t1.date_1 /* lower than a value in the date_1 column */
) as "max(date_2) within id,date_1"
from t1;

Get rows based on the MAX value of one of the columns in Db2 SQL

I want to get a row based on the MAX value of one of its columns in Db2 SQL.
TABLE_1
ID ORG DEST AccountNumber Amount Status
----------------------------------------------------
11 1224 6778 32345678 458.00 Accepted
12 1225 6779 12345678 958.00 Rejected
4 1226 6780 22345678 478.00 Rejected
6 1227 6781 21345678 408.00 Accepted
TABLE_2
ID NAME VERSION
---------------------------
1224 BankA 1
1224 BankA1 2
1225 BankB 1
1226 BankC 1
1227 BankD 1
1227 BankD1 2
6778 TestBankA 1
6778 TestBankA1 2
6778 TestBankA1 3
6779 TestBankB 1
6779 TestBankB1 2
6779 TestBankB2 3
6779 TestBankB3 4
6780 TestBankC 1
6781 TestBankD 1
Expected Output
ID AccountNumber Amount Status Origin Destination
----------------------------------------------------------
11 32345678 458.00 Accepted BankA1 TestBankA1
12 12345678 958.00 Rejected BankB TestBankB3
4 22345678 478.00 Rejected BankC TestBankC
6 21345678 408.00 Accepted BankD1 TestBankD
The query below does not show the bank name for the latest version.
SELECT *
FROM TABLE_1 AS T1
INNER JOIN (SELECT ID, MAX(VERSION) FROM TABLE GROUP BY ID) AS T2
ON T2.ID = T1.ORG
INNER JOIN (SELECT ID, MAX(VERSION) FROM TABLE GROUP BY ID) AS T3
ON T3.ID = T1.DEST
WHERE Status <> 'Failed'
The ROW_NUMBER analytic function provides one option here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VERSION DESC) rn
FROM TABLE_2
)
SELECT
t1.AccountNumber,
t1.Amount,
t1.Status,
t2org.NAME AS Origin,
t2dest.NAME AS Destination
FROM TABLE_1 t1
LEFT JOIN cte t2org
ON t2org.ID = t1.ORG AND t2org.rn = 1
LEFT JOIN cte t2dest
ON t2dest.ID = t1.DEST AND t2dest.rn = 1;
Tim's option of using a CTE and the ROW_NUMBER() OLAP function is a good approach.
Since you only want a single column (NAME) from TABLE_2, you could also retrieve it from a correlated subquery, although it might not perform as well if there are lots of qualifying rows in TABLE_1.
SELECT t1.ID, t1.AccountNumber, t1.Amount, t1.Status,
(SELECT t2r.NAME FROM TABLE_2 AS t2r
WHERE t2r.ID = t1.ORG
ORDER BY t2r.VERSION DESC FETCH FIRST ROW ONLY
) AS Origin,
(SELECT t2d.NAME FROM TABLE_2 AS t2d
WHERE t2d.ID = t1.DEST
ORDER BY t2d.VERSION DESC FETCH FIRST ROW ONLY
) AS Destination
FROM TABLE_1 AS t1
WHERE t1.Status <> 'Failed';

SQL: How do I display all records per unique id, but not the first record ever recorded in SQL

Example:
id Pricemoney time/date
1 100 01/20/2017
1 10 01/21/2017
1 1000 01/21/20147
2 10 01/23/2017
2 100 01/24/2017
3 1000 01/19/2017
3 100 01/22/2017
3 10 01/24/2017
I want to run a SQL query where I can display all the Id and it's pricemoney BUT NOT include the first record (based on time/date) per unique
Just to clarify what I do not want to be displayed
userid Pricemoney issuedate
1 100 01/20/2017 -- not included
1 10 01/21/2017
1 1000 01/21/20147
2 10 01/23/2017 --- not inlcuded
2 100 01/24/2017
3 1000 01/19/2017 -- not included
3 100 01/22/2017
3 10 01/24/2017
Expected result:
id Pricemoney time/date
1 10 01/21/2017
1 1000 01/21/20147
2 100 01/24/2017
3 100 01/22/2017
3 10 01/24/2017
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by time_date asc) as seqnum
from <tablename> t
) t
where seqnum > 1;
If you want to keep single rows, you can do:
select t.*
from (select t.*,
row_number() over (partition by id order by time_date asc) as seqnum,
count(*) over (partition by id) as cnt
from <tablename> t
) t
where seqnum > 1 and cnt > 1;
You may use EXISTS
select t1.*
from data t1
where exists (
select 1
from data t2
where t1.id = t2.id and t2.time_date < t1.time_date
)
you can try this :
select data1.id,data1.Date,data1.Pricemoney from data1
left join (
select id ,min(Date) date from data1
group by id
) as t
on data1.date= t.date and t.id = data1.id
where t.id is null
group by data1.id,data1.Date,data1.Pricemoney
above query not duplicated records also ignore, if want
not duplicated records then use having count(id) > 1 in left query e,g.
select data1.id,data1.Date,data1.Pricemoney from data1
left join (
select id ,min(Date) date from data1
group by id
having COUNT(id) > 1
) as t
on data1.date= t.date and t.id = data1.id
where t.id is null
group by data1.id,data1.Date,data1.Pricemoney

SQL query with a GROUP BY

I have a table like
Id WID AID DateValue
1 1 12 2015-07-10 15:14:46.770
2 1 13 2015-07-10 14:14:46.770
3 2 13 2015-07-10 13:14:46.770
4 2 13 2015-07-10 12:14:46.770
5 2 13 2015-07-10 11:14:46.770
Now, I want to get the Id value by grouping WIDAND AID, then taking the MAX value from DateValue.
The desired output is
Output:
Id
1
2
3
I tried something like this
SELECT Id, MAX(DateValue)
FROM Table1
GROUP BY WID, AID`
Though I don't want DateValue in the select but it is fine.
Can anyone help me on this
I think you want a query like this:
SELECT Id --or *
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY WID, AID ORDER BY DateValue DESC) AS seqNum
FROM yourTable) dt
WHERE (SeqNum =1);
You can use a correlated subquery like so:
SELECT Id FROM Table1 t1
WHERE NOT EXISTS (
SELECT 1 FROM Table1 t2
WHERE t1.WID = t2.WID AND t1.AID = t2.AID AND t1.DateValue < t2.DateValue
)

SQL select columns group by

If I have a table which is of the following format:
ID NAME NUM TIMESTAMP BOOL
1 A 5 09:50 TRUE
1 B 6 13:01 TRUE
1 A 1 10:18 FALSE
2 A 3 12:20 FALSE
1 A 1 05:30 TRUE
1 A 12 06:00 TRUE
How can I get the ID, NAME and NUM for each unique ID, NAME pair with the latest Timestamp and BOOL=TRUE.
So for the above table the output should be:
ID NAME NUM
1 A 5
1 B 6
I tried using Group By but I cannot seem to get around that either I need to put an aggregator function around num (max, min will not work when applied to this example) or specifying it in group by (which will end up matching on ID, NAME, and NUM combined). Both as far as I can think will break in some case.
PS: I am using SQL Developer (that is the SQL developed by Oracle I think, sorry I am a newbie at this)
If you're using at least SQL-Server 2005 you can use the ROW_NUMBER function:
WITH CTE AS
(
SELECT ID, NAME, NUM,
RN = ROW_NUMBER()OVER(PARTITION BY ID, NAME ORDER BY TIMESTAMP DESC)
FROM Table
WHERE BOOL='TRUE'
)
SELECT ID, NAME, NUM FROM CTE
WHERE RN = 1
Result:
ID NAME NUM
1 A 5
1 B 6
Here's the fiddle: http://sqlfiddle.com/#!3/a1dc9/10/0
select t1.* from table as t1 inner join
(
select NAME, NUM, max(TIMESTAMP) as TIMESTAMP from table
where BOOL='TRUE'
) as t2
on t1.name=t2.name and t1.num=t2.num and t1.timestamp=t2.timestamp
where t1.BOOL='TRUE'
select t1.*
from TABLE1 as t1
left join
TABLE1 as t2
on t1.name=t2.name and t1.TIMESTAMP>t2.TIMESTAMP
where t1.BOOL='TRUE' and t2.id is null
should do it for you.