Greatest-n-per-group query including rows that don't have 'n'

Greatest-n-per-group query including rows that don't have 'n' - sql

I know that this is a common question but I haven't been able to find an answer to this specific problem:
I have two tables:
+--------------+ +-------------------------------+
| temp1 | | temp2 |
+----+---------+ +----+----+---------------------+
| id | name | | id | fk | ts |
+----+---------+ +----+--------------------------+
| 1 | first | | 1 | 1 | 2020-06-19 23:56:46 |
| 2 | second | | 2 | 1 | 2020-06-19 22:56:46 |
| 3 | third | | 3 | 2 | 2020-06-19 21:56:46 |
+----+---------+ | 4 | 2 | 2020-06-19 20:56:46 |
+----+--------------------------+
In order to get for each entry in temp1 the corresponding entry from temp2 with the newest timestamp I am running the following query:
SELECT
t1.id AS id,
t1.name AS name,
t2.ts AS ts
FROM
(temp2 t2
JOIN temp1 t1)
WHERE
t2.ts = (SELECT MAX(t3.ts)
FROM
temp2 t3
WHERE
t2.fk = t3.fk)
AND t2.fk = t1.id
This results in:
+----+--------+---------------------+
| id | name | ts |
+----+------------------------------+
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
+----+------------------------------+
Is it possible to alter this query in order to include the rows from temp1 that do not have a corresponding value in temp2?
The desired result would be:
+----+--------+---------------------+
| id | name | ts |
+----+------------------------------+
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
| 3 | third | NULL |
+----+------------------------------+

Use window functions:
SELECT t1.id, t1.name, t2.ts
FROM t1 LEFT JOIN
(SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t2.fk ORDER BY t2.ts DESC) as seqnum
FROM temp2 t2
) t2
ON t2.fk = t1.id AND seqnum = 1;

The simplest way to do this is with a correlated subquery in the SELECT list:
SELECT t1.*,
(SELECT MAX(t2.ts) FROM temp2 t2 WHERE t2.fk = t1.id) ts
FROM temp1 t1
See the demo.
Results:
| id | name | ts |
| --- | ------ | ------------------- |
| 1 | first | 2020-06-19 23:56:46 |
| 2 | second | 2020-06-19 21:56:46 |
| 3 | third | |

Related

Join 2 tables and keep only the first closest event

I have the following current tables:
table_1
id | timestamp | origin | info
table_2
id | timestamp | origin | type
My aim is to find, for each line in table 2, the origin event in table 1. I want to keep only the first one.
For instance:
table 1
1 | 1000 | "o1" | "i1"
2 | 2000 | "o2" | "i2"
3 | 2010 | "o2" | "i2"
table 2
1 | 1010 | "o1" | "t1"
2 | 2100 | "o2" | "t2"
My expected result is:
table_2.id | table_2.timestamp | table_2.origin | table_2.type | table_1.info | table_1.timestamp
1 | 1010 | "o1" | "t1" | "i1" | 1000
2 | 2100 | "o2" | "t2" | "i2" | 2010
Currently I'm just using a simple join on origin and table_2.timestamp > table_1.timestamp which give me:
table_2.id | table_2.timestamp | table_2.origin | table_2.type | table_1.info | table_1.timestamp
1 | 1010 | "o1" | "t1" | "i1" | 1000
2 | 2100 | "o2" | "t2" | "i2" | 2000
2 | 2100 | "o2" | "t2" | "i2" | 2010
As you can see I don't want second line above because I just want first closest event in table_1.
Any ideas?

A cross-database solution is to join and filter with a correlated subquery:
select
t2.*,
t1.info,
t1.timestamp t1_timestamp
from
table_2 t2
inner join table_1 t1
on t1.origin = t2.origin
and t1.timestamp = (
select max(t11.timestamp)
from table_1 t11
where t11.origin = t2.origin and t11.timestamp < t2.timestamp
)
order by t2.id
Since you are using Postgres, you can use handy syntax distinct on; this might actually perform better:
select
distinct on(t2.id)
t2.*,
t1.info,
t1.timestamp t1_timestamp
from
table_2 t2
inner join table_1 t1
on t1.origin = t2.origin and t1.timestamp < t2.timestamp
order by t2.id, t1.timestamp desc
Demo on DB Fiddle - both queries yield:
id | timestamp | origin | type | info | t1_timestamp
-: | --------: | :----- | :--- | :--- | -----------:
1 | 1010 | o1 | t1 | i1 | 1000
2 | 2100 | o2 | t2 | i2 | 2010

Hive window functions: last value of previous partition

Using Hive window functions, I would like to get the last value of the previous partition:
| name | rank | type |
| one | 1 | T1 |
| two | 2 | T2 |
| thr | 3 | T2 |
| fou | 4 | T1 |
| fiv | 5 | T2 |
| six | 6 | T2 |
| sev | 7 | T2 |
Following query:
SELECT
name,
rank,
first_value(rank over(partition by type order by rank)) as new_rank
FROM my_table
Would give:
| name | rank | type | new_rank |
| one | 1 | T1 | 1 |
| two | 2 | T2 | 2 |
| thr | 3 | T2 | 2 |
| fou | 4 | T1 | 4 |
| fiv | 5 | T2 | 5 |
| six | 6 | T2 | 5 |
| sev | 7 | T2 | 5 |
But what I need is "the last value of the previous partition":
| name | rank | type | new_rank |
| one | 1 | T1 | NULL |
| two | 2 | T2 | 1 |
| thr | 3 | T2 | 1 |
| fou | 4 | T1 | 3 |
| fiv | 5 | T2 | 4 |
| six | 6 | T2 | 4 |
| sev | 7 | T2 | 4 |

This seems quite tricky. This is a variant of group-and-islands. Here is the idea:
Identify the "islands" where type is the same (using difference of row numbers).
Then use lag() to introduce the previous rank into the island.
Do a min scan to get the new rank that you want.
So:
with gi as (
select t.*,
(seqnum - seqnum_t) as grp
from (select t.*,
row_number() over (partition by type order by rank) as seqnum_t,
row_number() over (order by rank) as seqnum
from t
) t
),
gi2 as (
select gi.*, lag(rank) over (order by gi.rank) as prev_rank
from gi
)
select gi2.*,
min(prev_rank) over (partition by type, grp) as new_rank
from gi2
order by rank;
Here is a SQL Fiddle (albeit using Postgres).

Select most recent inspection

I have a ROAD_INSPECTION table:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 01/01/2009 | 20 |
| 1 | 05/01/2013 | 16 |
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 01/01/2009 | 8 |
| 2 | 06/06/2012 9:55:13 AM | 8 |
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 06/11/2012 | 10 |
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+
What is the most efficient way to select the most recent inspection? The query would need to include the ID and CONDITION columns, despite the fact that they wouldn't group by cleanly:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+

One way could be to retrieve id and date column in derived table and join the output to the main table to retrieve corresponding data from condition column as below.
SELECT t1.id,
t1.date1,
t2.CONDITION1
FROM
(SELECT id,
max(date1) AS date1
FROM table1
GROUP BY id) t1
JOIN table1 t2 ON t1.id = t2.id
AND t1.date1 = t2.date1;
Result:
id date1 CONDITION1
-------------------------------------
1 29.04.2016 10:02:52 15
2 28.04.2015 00:00:00 11
3 21.04.2015 00:00:00 19
DEMO
OR if your rdbms supports windows function, use below.
SELECT id,
date1,
condition1
FROM
(SELECT id,
date1,
condition1,
row_number() over(PARTITION BY id
ORDER BY date1 DESC) AS rn
FROM table1 ) t1
WHERE rn = 1;
DEMO

combine two select query

I have two select query like
select name1,age1 from table1
output :
+--------+------+
| name1 | age1 |
+--------+------+
| ravi | 25 |
| rakhav | 12 |
| joil | 10 |
+--------+------+
select color,no from table2
output
+--------+----+
| color | no |
+--------+----+
| red | 3 |
| yellow | 4 |
+--------+----+
i want the output like
+--------+------+--------+----+
| name1 | age1 | color | no |
+--------+------+--------+----+
| ravi | 25 | red | 3 |
| rakhav | 12 | yellow | 4 |
| joil | 10 | | |
+--------+------+--------+----+

Try this:
select * from
(select name1, age1, row_number() over(order by age1 desc) as rn from table1) as t1
left join
(select color, no, row_number() over(order by no) as rn from table2) as t2
on t1.rn = t2.rn

try select t1.*, t2.* from table1 t1, table2 t2

SQL Query to get results that match between three tables, or a single result for no match

Is there a way to use a where clause to check if there were zero matches between tables for a record from the first table, and produce one row or results reflecting that?
I'm trying to get results that look like this:
+----------+----------+-----------+----------+-------------+
| Results |
+----------+----------+-----------+----------+-------------+
| Date | Queue ID | From Date | To Date | Campaign ID |
| 3/1/2014 | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | (NULL) | (NULL) | (NULL) |
+----------+----------+-----------+----------+-------------+
From a combination of tables that look like this:
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Table 1 | | Table 2 | | Table 3 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Date | Queue | | Queue | SP | | SP | From Date | To Date | Campaign |
| | ID | | ID | ID | | ID | | | ID |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| 3/1/2014 | 1 | | 1 | 1 | | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | | 1 | 2 | | 2 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 1 | 3 | | 3 | 3/10/2014 | 3/16/2014 | 1 |
| | | | 1 | 4 | | 4 | 3/17/2014 | 3/23/2014 | 1 |
| | | | 1 | 5 | | 5 | 3/24/2014 | 3/30/2014 | 4 |
| | | | 2 | 6 | | 6 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 2 | 7 | | 7 | 3/10/2014 | 3/16/2014 | 5 |
| | | | 2 | 8 | | 8 | 3/17/2014 | 3/23/2014 | 5 |
| | | | 2 | 9 | | 9 | 3/24/2014 | 3/30/2014 | 5 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
I'm joining Table 1 to Table 2 on QUEUE ID,
and Table 2 to Table 3 on SP ID,
and DATE from Table 1 should fall between Table 3's FROM DATE and TO DATE.
I want a single record returned for each queue, including if there were no date matches.
Unfortunately any combinations of joins or where clauses I've tried so far only result in either one record for Queue ID 1 or multiple records for each Queue ID.

I would suggest this:
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM
Table1 t1
LEFT JOIN
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM
Table2 t2
INNER JOIN
Table3 t3 ON
t2.SPID = t3.SPID
) s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
SQL Fiddle here with an abbreviated dataset

A trivial amendment to AHiggins code. Using the CTE makes it a little easier to read perhaps.
With AllDates as
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM Table2 t2
INNER JOIN Table3 t3 ON
t2.SPID = t3.SPID
)
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM Table1 t1
LEFT JOIN AllDates s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate

You want something like:
select distinct t1.date, t1,queue_id IFNULL(t3.from_date,'NULL'),
IFNULL(t3.to_date,'NULL'), IFNULL(t3.campaign,'NULL')
FROM table1 t1
LEFT OUTER JOIN table2 t2 on t1.queue_id = t2.queue_id
left outer join table3 t3 on t2.sp_id = t3.sp_id
where t3.from_date <= t1.date
AND t3.to_date >= t1.date
This will select dsitinct records from the table (eliminating null duplicates and replacing them with NULL)

SELECT t1.[Date], t1.[Queue ID], s.[From Date], s.[To Date], s.[Campaign ID]
FROM table1 t1
LEFT JOIN (SELECT t3.*, t2.[Queue ID] FROM table3 t3 JOIN table2 t2 ON t2.[SP ID] = t3.[SP ID]) s
ON s.[Queue ID] = t1.[Queue ID] AND t1.[Date] BETWEEN s.[From Date] AND s.[To Date]
SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Greatest-n-per-group query including rows that don't have 'n' - sql

Use window functions: SELECT t1.id, t1.name, t2.ts FROM t1 LEFT JOIN (SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t2.fk ORDER BY t2.ts DESC) as seqnum FROM temp2 t2 ) t2 ON t2.fk = t1.id AND seqnum = 1;

Related

Join 2 tables and keep only the first closest event

Hive window functions: last value of previous partition

Select most recent inspection

combine two select query

SQL Query to get results that match between three tables, or a single result for no match

Categories

Resources