Join 2 tables and keep only the first closest event

Join 2 tables and keep only the first closest event - sql

I have the following current tables:
table_1
id | timestamp | origin | info
table_2
id | timestamp | origin | type
My aim is to find, for each line in table 2, the origin event in table 1. I want to keep only the first one.
For instance:
table 1
1 | 1000 | "o1" | "i1"
2 | 2000 | "o2" | "i2"
3 | 2010 | "o2" | "i2"
table 2
1 | 1010 | "o1" | "t1"
2 | 2100 | "o2" | "t2"
My expected result is:
table_2.id | table_2.timestamp | table_2.origin | table_2.type | table_1.info | table_1.timestamp
1 | 1010 | "o1" | "t1" | "i1" | 1000
2 | 2100 | "o2" | "t2" | "i2" | 2010
Currently I'm just using a simple join on origin and table_2.timestamp > table_1.timestamp which give me:
table_2.id | table_2.timestamp | table_2.origin | table_2.type | table_1.info | table_1.timestamp
1 | 1010 | "o1" | "t1" | "i1" | 1000
2 | 2100 | "o2" | "t2" | "i2" | 2000
2 | 2100 | "o2" | "t2" | "i2" | 2010
As you can see I don't want second line above because I just want first closest event in table_1.
Any ideas?

A cross-database solution is to join and filter with a correlated subquery:
select
t2.*,
t1.info,
t1.timestamp t1_timestamp
from
table_2 t2
inner join table_1 t1
on t1.origin = t2.origin
and t1.timestamp = (
select max(t11.timestamp)
from table_1 t11
where t11.origin = t2.origin and t11.timestamp < t2.timestamp
)
order by t2.id
Since you are using Postgres, you can use handy syntax distinct on; this might actually perform better:
select
distinct on(t2.id)
t2.*,
t1.info,
t1.timestamp t1_timestamp
from
table_2 t2
inner join table_1 t1
on t1.origin = t2.origin and t1.timestamp < t2.timestamp
order by t2.id, t1.timestamp desc
Demo on DB Fiddle - both queries yield:
id | timestamp | origin | type | info | t1_timestamp
-: | --------: | :----- | :--- | :--- | -----------:
1 | 1010 | o1 | t1 | i1 | 1000
2 | 2100 | o2 | t2 | i2 | 2010

Related

SELECT DISTINCT on multiple columns with new value

I have these two tables.
Table #1:
+----+------+-----+
| ID | Y | AGE |
+----+------+-----+
| 1 | 2022 | a |
| 1 | 2022 | b |
| 3 | 2021 | a |
| 4 | 2021 | a |
| 4 | 2021 | b |
| 4 | 2021 | c |
| 7 | 2021 | a |
| 7 | 2022 | b |
+----+-------+----+
Table #2:
+----+------+-----------+
| ID | num | something |
+----+------+-----------+
| 1 | 10 | a1221 |
| 3 | 30 | a4342 |
| 4 | 40 | bdssd |
| 7 | 70 | asdsds |
+----+-----+------------+
and I would like to merge them into this result set:
+----+-------+-----+
| ID | Y | num |
+----+-------+-----+
| 1 | 2022 | 10 |
| 3 | 2021 | 30 |
| 4 | 2021 | 40 |
| 7 | 2021 | 70 |
| 7 | 2022 | 70 |
+----+-------+-----+
That means I would like take unique pairs of T1.ID and T1.Y ignoring column AGE and them INNER JOIN resulting table with T2 on T1.ID = T2.ID.
I know I can do this in steps as
WITH cte AS
(
SELECT ID, Y
FROM T1
GROUP BY ID, Y
)
SELECT cte.ID, cte.Y, T2.num
FROM cte
INNER JOIN T2 ON cte.ID = T2.ID
but is there any better way without creating a temporary table? Because simple
SELECT T1.ID, T1.Y, T2.num
FROM T1
INNER JOIN T2 ON T1.ID = T2.ID
will result with duplicates that comes from T1.AGE even tho I'm not using it

I think it's better to use views for this:
CREATE VIEW dbo.view1
AS
SELECT
ID
,Y
FROM T1
GROUP BY ID
,Y
GO
And call it wherever needed like tables:
SELECT v1.ID, v1.Y, T2.num
FROM view1 v1
INNER JOIN T2 ON v1.ID = T2.ID

How to join a grouped table in sql?

Novice in SQL here but hopefully someone can help. I have two tables. For the simplicity here is how the tables are structured.
Table 1:
+------------+-------+-----------+------------+
| department | sales | date | sales_code |
+------------+-------+-----------+------------+
| 1 | 50 | 5/26/2021 | A |
+------------+-------+-----------+------------+
| 2 | 150 | 5/26/2021 | B |
+------------+-------+-----------+------------+
| 1 | 200 | 5/25/2021 | C |
+------------+-------+-----------+------------+
| 2 | 250 | 5/24/2021 | D |
+------------+-------+-----------+------------+
Table 2:
+------+------------+-------+-----------+-----------------------+
| item | department | sales | date | column I want to join |
+------+------------+-------+-----------+-----------------------+
| 31 | 1 | 50 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 30 | 2 | 150 | 5/26/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 29 | 1 | 200 | 5/25/2021 | x |
+------+------------+-------+-----------+-----------------------+
| 28 | 2 | 250 | 5/24/2021 | x |
+------+------------+-------+-----------+-----------------------+
I need to join table 2 to table 1 - however it needs to be aggregated by department sales first, this is because table 2 is already aggregated by department sales. Here is what I was thinking but cannot seem to get it to work.
SELECT t1.*, t2.*
FROM table1 as t1
JOIN (
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department ) as t2
ON t2.department = t1.department AND t1.date = t2.date
Desired Output:
+------------+-------+-----------+------------+-----------------------+
| department | sales | date | sales_code | column I want to join |
+------------+-------+-----------+------------+-----------------------+
| 1 | 50 | 5/26/2021 | A | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 150 | 5/26/2021 | B | x |
+------------+-------+-----------+------------+-----------------------+
| 1 | 200 | 5/25/2021 | C | x |
+------------+-------+-----------+------------+-----------------------+
| 2 | 250 | 5/24/2021 | D | x |
+------------+-------+-----------+------------+-----------------------+
Any help would be appreciated.

There are several ways to go about doing that, the easiest one is to create a view
CREATE VIEW t2 AS
SELECT department, date, column_i_want, sum(sales)
FROM table2
GROUP BY department;
then it's easier to join them (you can also use a With clause instead of a view but it can get messy)
SELECT *
FROM table1 NATURAL JOIN t2

here is what you want:
select t2.*, t1.sales_code
from table2 t2
join table1 t1
on t1.department = t2.department
and t1.date = t2.date

Greatest-n-per-group query including rows that don't have 'n'

I know that this is a common question but I haven't been able to find an answer to this specific problem:
I have two tables:
+--------------+ +-------------------------------+
| temp1 | | temp2 |
+----+---------+ +----+----+---------------------+
| id | name | | id | fk | ts |
+----+---------+ +----+--------------------------+
| 1 | first | | 1 | 1 | 2020-06-19 23:56:46 |
| 2 | second | | 2 | 1 | 2020-06-19 22:56:46 |
| 3 | third | | 3 | 2 | 2020-06-19 21:56:46 |
+----+---------+ | 4 | 2 | 2020-06-19 20:56:46 |
+----+--------------------------+
In order to get for each entry in temp1 the corresponding entry from temp2 with the newest timestamp I am running the following query:
SELECT
t1.id AS id,
t1.name AS name,
t2.ts AS ts
FROM
(temp2 t2
JOIN temp1 t1)
WHERE
t2.ts = (SELECT MAX(t3.ts)
FROM
temp2 t3
WHERE
t2.fk = t3.fk)
AND t2.fk = t1.id
This results in:
+----+--------+---------------------+
| id | name | ts |
+----+------------------------------+
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
+----+------------------------------+
Is it possible to alter this query in order to include the rows from temp1 that do not have a corresponding value in temp2?
The desired result would be:
+----+--------+---------------------+
| id | name | ts |
+----+------------------------------+
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
| 3 | third | NULL |
+----+------------------------------+

Use window functions:
SELECT t1.id, t1.name, t2.ts
FROM t1 LEFT JOIN
(SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t2.fk ORDER BY t2.ts DESC) as seqnum
FROM temp2 t2
) t2
ON t2.fk = t1.id AND seqnum = 1;

The simplest way to do this is with a correlated subquery in the SELECT list:
SELECT t1.*,
(SELECT MAX(t2.ts) FROM temp2 t2 WHERE t2.fk = t1.id) ts
FROM temp1 t1
See the demo.
Results:
| id | name | ts |
| --- | ------ | ------------------- |
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
| 3 | third | |

Join two tables on a condition of ID and a date from first table is between two other dates in another table

I'm trying to add to table 1 time-related data from table 2. In table 1 I have ID, date. In table 2 I have ID, DateFrom, DateTo. IDs, dates repeating.
t1 for example:
+-----+------------+------+-------+-------+
| ID | day | Type | data1 | data2 |
+-----+------------+------+-------+-------+
| 111 | 21.07.2019 | - | … | … |
| 111 | 01.08.2019 | - | … | … |
| 111 | 14.08.2019 | - | … | … |
| 112 | 21.07.2019 | - | … | … |
| … | … | | .. | … |
+-----+------------+------+-------+-------+
t2:
+-----+------------+------------+------+
| ID | date_from | date_to | Type |
+-----+------------+------------+------+
| 111 | 01.07.2019 | 03.08.2019 | AAA |
| 111 | 04.08.2019 | 29.09.2019 | BBB |
| 111 | 30.09.2019 | 01.12.2019 | CCC |
| 111 | … | … | … |
+-----+------------+------------+------+
What I want to get - is to fill Type with proper data from t2:
+-----+------------+------+-------+-------+
| ID | day | Type | data1 | data2 |
+-----+------------+------+-------+-------+
| 111 | 21.07.2019 | AAA | … | … |
| 111 | 01.08.2019 | AAA | … | … |
| 111 | 14.08.2019 | BBB | … | … |
| 112 | 21.07.2019 | BBB | … | … |
| … | … | … | .. | … |
+-----+------------+------+-------+-------+
What I have done for now:
SELECT TOP 100
t1.ID
t1.day
t2.type
FROM t1 LEFT OUTER JOIN t2 ON ( (t1.date >= t2.date_from) AND (t1.date <=t2.date_to)
AND (t1.ID = t2.ID) )
Is it correct?

A join seems the relevant approach here.
The parentheses around conditions are not necessary. Whether you want an inner join or a left join depends on the possibility of orphan records and how you want to handle them: inner join removes records in t1 that have no match in t2, while left joins allows them (the resulting type will be null):
select t1.*, t2.type
from t1
inner join t2 on t1.day between t2.date_from and t2.date_to and t2.id = t1.id

SQL Query to get results that match between three tables, or a single result for no match

Is there a way to use a where clause to check if there were zero matches between tables for a record from the first table, and produce one row or results reflecting that?
I'm trying to get results that look like this:
+----------+----------+-----------+----------+-------------+
| Results |
+----------+----------+-----------+----------+-------------+
| Date | Queue ID | From Date | To Date | Campaign ID |
| 3/1/2014 | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | (NULL) | (NULL) | (NULL) |
+----------+----------+-----------+----------+-------------+
From a combination of tables that look like this:
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Table 1 | | Table 2 | | Table 3 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Date | Queue | | Queue | SP | | SP | From Date | To Date | Campaign |
| | ID | | ID | ID | | ID | | | ID |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| 3/1/2014 | 1 | | 1 | 1 | | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | | 1 | 2 | | 2 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 1 | 3 | | 3 | 3/10/2014 | 3/16/2014 | 1 |
| | | | 1 | 4 | | 4 | 3/17/2014 | 3/23/2014 | 1 |
| | | | 1 | 5 | | 5 | 3/24/2014 | 3/30/2014 | 4 |
| | | | 2 | 6 | | 6 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 2 | 7 | | 7 | 3/10/2014 | 3/16/2014 | 5 |
| | | | 2 | 8 | | 8 | 3/17/2014 | 3/23/2014 | 5 |
| | | | 2 | 9 | | 9 | 3/24/2014 | 3/30/2014 | 5 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
I'm joining Table 1 to Table 2 on QUEUE ID,
and Table 2 to Table 3 on SP ID,
and DATE from Table 1 should fall between Table 3's FROM DATE and TO DATE.
I want a single record returned for each queue, including if there were no date matches.
Unfortunately any combinations of joins or where clauses I've tried so far only result in either one record for Queue ID 1 or multiple records for each Queue ID.

I would suggest this:
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM
Table1 t1
LEFT JOIN
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM
Table2 t2
INNER JOIN
Table3 t3 ON
t2.SPID = t3.SPID
) s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
SQL Fiddle here with an abbreviated dataset

A trivial amendment to AHiggins code. Using the CTE makes it a little easier to read perhaps.
With AllDates as
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM Table2 t2
INNER JOIN Table3 t3 ON
t2.SPID = t3.SPID
)
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM Table1 t1
LEFT JOIN AllDates s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate

You want something like:
select distinct t1.date, t1,queue_id IFNULL(t3.from_date,'NULL'),
IFNULL(t3.to_date,'NULL'), IFNULL(t3.campaign,'NULL')
FROM table1 t1
LEFT OUTER JOIN table2 t2 on t1.queue_id = t2.queue_id
left outer join table3 t3 on t2.sp_id = t3.sp_id
where t3.from_date <= t1.date
AND t3.to_date >= t1.date
This will select dsitinct records from the table (eliminating null duplicates and replacing them with NULL)

SELECT t1.[Date], t1.[Queue ID], s.[From Date], s.[To Date], s.[Campaign ID]
FROM table1 t1
LEFT JOIN (SELECT t3.*, t2.[Queue ID] FROM table3 t3 JOIN table2 t2 ON t2.[SP ID] = t3.[SP ID]) s
ON s.[Queue ID] = t1.[Queue ID] AND t1.[Date] BETWEEN s.[From Date] AND s.[To Date]
SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join 2 tables and keep only the first closest event - sql

Related

SELECT DISTINCT on multiple columns with new value

How to join a grouped table in sql?

Greatest-n-per-group query including rows that don't have 'n'

Join two tables on a condition of ID and a date from first table is between two other dates in another table

SQL Query to get results that match between three tables, or a single result for no match

Categories

Resources