How to "correlate" columns in group by query? - sql

Say input:
Table T1
row_num_unimportant indicator
1 111
2 222
Table T2
row_num_unimportant indicator val_timestamp val_of_interest2
1 112 timestamp2 value1
2 113 timestamp1 value3
3 114 timestamp3 value2
4 223 timestamp4 value5
5 224 timestamp5 value4
I'd like to see the JOIN results
indicator min_timestamp val_of_interest2
111 timestamp1 value3
222 timestamp4 value5
The difficulty is the have val_of_interest2 to correlate with the min_timestamp.
Say in a naive JOIN:
SELECT
indicator,
MIN(val_timestamp) AS min_timestamp,
???? AS val_of_interest2
FROM (
SELECT
t1.indicator,
t2.val_timestamp,
t2.val_of_interest2
FROM
T1 t1
JOIN T2 t2
ON (t2.indicator >= t1.indicator)
)
GROUP BY
indicator
Basically, what do I put in the ??? part? (or do I need a different query all together?)
Thanks!

You would not use group by for this. One option is window functions:
SELECT indicator, val_timestamp, val_of_interest2
FROM (SELECT t1.indicator, t2.val_timestamp, t2.val_of_interest2,
ROW_NUMBER() OVER (PARTITION BY t1.indicator ORDER BY t2.val_timestamp) as seqnum
FROM T1 t1 JOIN
T2 t2
ON t2.indicator >= t1.indicator
) t
WHERE seqnum = 1;

Related

sql: max value by 2 columns in another table

I have 2 tables and for every id in the first table I need to find max value in the date_2 column that would be lower than a value in the date_1 column.
Tables:
table 1
id
date_1
1
01.01.2020
1
11.01.2020
2
02.11.2020
2
02.12.2020
3
12.12.2020
3
31.01.2021
table 2
id
date_2
1
30.12.2019
1
05.01.2020
2
01.11.2020
2
30.10.2020
3
10.11.2020
3
31.12.2020
outcome needed:
id
date_1
max(date_2) within id,date_1
1
01.01.2020
30.12.2019
1
11.01.2020
05.01.2020
2
02.11.2020
01.11.2020
2
02.12.2020
01.11.2020
3
12.12.2020
10.11.2020
3
31.01.2021
31.12.2020
appreciate your help with this!
you could rank each row (I'm doing it here with row_number() function) then match on the id and the ranking.
with t1 as (select id, date_1,
row_number() over (partion by id order by date1) as rn
from table1),
t2 as (select id, date_2,
row_number() over (partion by id order by date2) as rn
from table2 ),
select id, date1, date2
from t1 inner join t2 on t1.id = t2.id and t1.rn = t2.rn
You can pretty much write a simple correlated query using exists that mirrors the English narrative:
select id, (
select Max(date_2) /* find max value in the date_2 column */
from t2
where t2.id = t1.id /* for every id in the first table */
and t2.date_2 < t1.date_1 /* lower than a value in the date_1 column */
) as "max(date_2) within id,date_1"
from t1;

Get rows based on the MAX value of one of the columns in Db2 SQL

I want to get a row based on the MAX value of one of its columns in Db2 SQL.
TABLE_1
ID ORG DEST AccountNumber Amount Status
----------------------------------------------------
11 1224 6778 32345678 458.00 Accepted
12 1225 6779 12345678 958.00 Rejected
4 1226 6780 22345678 478.00 Rejected
6 1227 6781 21345678 408.00 Accepted
TABLE_2
ID NAME VERSION
---------------------------
1224 BankA 1
1224 BankA1 2
1225 BankB 1
1226 BankC 1
1227 BankD 1
1227 BankD1 2
6778 TestBankA 1
6778 TestBankA1 2
6778 TestBankA1 3
6779 TestBankB 1
6779 TestBankB1 2
6779 TestBankB2 3
6779 TestBankB3 4
6780 TestBankC 1
6781 TestBankD 1
Expected Output
ID AccountNumber Amount Status Origin Destination
----------------------------------------------------------
11 32345678 458.00 Accepted BankA1 TestBankA1
12 12345678 958.00 Rejected BankB TestBankB3
4 22345678 478.00 Rejected BankC TestBankC
6 21345678 408.00 Accepted BankD1 TestBankD
The query below does not show the bank name for the latest version.
SELECT *
FROM TABLE_1 AS T1
INNER JOIN (SELECT ID, MAX(VERSION) FROM TABLE GROUP BY ID) AS T2
ON T2.ID = T1.ORG
INNER JOIN (SELECT ID, MAX(VERSION) FROM TABLE GROUP BY ID) AS T3
ON T3.ID = T1.DEST
WHERE Status <> 'Failed'
The ROW_NUMBER analytic function provides one option here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VERSION DESC) rn
FROM TABLE_2
)
SELECT
t1.AccountNumber,
t1.Amount,
t1.Status,
t2org.NAME AS Origin,
t2dest.NAME AS Destination
FROM TABLE_1 t1
LEFT JOIN cte t2org
ON t2org.ID = t1.ORG AND t2org.rn = 1
LEFT JOIN cte t2dest
ON t2dest.ID = t1.DEST AND t2dest.rn = 1;
Tim's option of using a CTE and the ROW_NUMBER() OLAP function is a good approach.
Since you only want a single column (NAME) from TABLE_2, you could also retrieve it from a correlated subquery, although it might not perform as well if there are lots of qualifying rows in TABLE_1.
SELECT t1.ID, t1.AccountNumber, t1.Amount, t1.Status,
(SELECT t2r.NAME FROM TABLE_2 AS t2r
WHERE t2r.ID = t1.ORG
ORDER BY t2r.VERSION DESC FETCH FIRST ROW ONLY
) AS Origin,
(SELECT t2d.NAME FROM TABLE_2 AS t2d
WHERE t2d.ID = t1.DEST
ORDER BY t2d.VERSION DESC FETCH FIRST ROW ONLY
) AS Destination
FROM TABLE_1 AS t1
WHERE t1.Status <> 'Failed';

Join with a second table containing multiple records, take the latest

I have two tables:
person_id | name
1 name1
2 name2
3 name3
and a second table:
person_id | date | balance
1 2016-03 1200 ---- \
1 2016-04 700 ---- > same person
1 2016-05 400 ---- /
3 2016-05 4000
Considering that person_id 1 has three record on the second table how can I join the first just by taking the latest record? (that is: balance 400, corresponding to date: 2016-05).
E.g.: query output:
person_id | name | balance
1 name1 400
2 name2 ---
3 name3 4000
if it's possibile prefer the simplicity over the complexity of the solution
A query working for all DB engines is
select t1.name, t2.person_id, t2.balance
from table1 t1
join table2 t2 on t1.person_id = t2.person_id
join
(
select person_id, max(date) as mdate
from table2
group by person_id
) t3 on t2.person_id = t3.person_id and t2.date = t3.mdate
The best way to do this in any database that supports the ANSI standard window functions (which is most of them) is:
select t1.*, t2.balance
from table1 t1 left join
(select t2.*,
row_number() over (partition by person_id order by date desc) as seqnum
from table2 t2
) t2
on t1.person_id = t2.person_id and seqnum = 1;

Consolidate, Combine, Merge Rows

Every search I do leads me to results for people seeking array_agg to combine multiple columns in a row into column. That's not what I am trying to figure out here, and maybe I am not using the right search terms (e.g., consolidate, combine, merge).
I am trying to combine rows by populating values in fields ... I am not sure the best way to describe this other than with an example:
Current:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 0 0
2 111 333 0 0
3 111 0 0 444
4 0 222 555 0
5 777 999 0 0
6 0 999 888 0
After Processing:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 555 444
2 111 333 555 444
3 111 333 555 444
4 111 222 555 444
5 777 999 888 0
6 777 999 888 0
After Deleting Duplicate Rows:
--------------------------------
id num_1 num_2 num_3 num_4
--------------------------------
1 111 222 555 444
2 111 333 555 444
3 777 999 888 0
This will likely be a 2 step process ... first fill in the blanks, and then find/delete the duplicate. I can do the second step, but having trouble figuring how to first populate the 0 values with values from another row where you might have two different values (id 1/2 for num_2 column) but only one value for num_1 (e.g., 111)
I can do it in PHP, but would like to figure out how to do it using only Postgres.
EDIT: My example table is a relations table. I have multiple datasets with similar information (e.g., username) but different registration ID numbers. So, I do an inner join on table 1 and table 2 (for example) where the username is the same. Then I take the registration IDs (which are different) from each table and insert that as a row into my relations table. In my example tables above, Row 1 has two different registration IDs from the two tables I joined … the values 111 (num_1) and 222 (num_2) are inserted into the table and zeros inserted for num_3 and num_4. Then I compare table 1 and table 4 and the values 111 (num_1) and 444 (num_4) get inserted into the relations table and zeros for num_2 and num_3. Since registration ID 111 is related to registration ID 222 and registration ID 111 is related to registration ID 444, then registration IDs 111, 222, and 444 are all related (meaning the username is the same for each of those registration IDs). Does that help to clarify?
EDIT 2: I corrected Tables 2 and 3. Hopefully now it makes sense. The username column is not unique. So, I have 4 tables like this:
Table 1:
bob - 111
mary - 777
Table 2:
bob - 222
bob - 333
mary - 999
Table 3:
bob - 555
mary - 888
Table 4:
bob - 444 -- mary does not exist in this table
So, in my relations table I should end up with 3 rows as given in example Table 3 above.
It seems like you started in the middle of a presumed solution, forgetting to present the initial problem. Based on your added information I suggest a completely different, much simpler solution. You have:
CREATE TABLE table1 (username text, registration_id int);
CREATE TABLE table2 (LIKE table1);
CREATE TABLE table3 (LIKE table1);
CREATE TABLE table4 (LIKE table1);
INSERT INTO table1 VALUES ('bob', 111), ('mary', 777);
INSERT INTO table2 VALUES ('bob', 222), ('bob', 333), ('mary', 999);
INSERT INTO table3 VALUES ('bob', 555), ('mary', 888);
INSERT INTO table4 VALUES ('bob', 444); -- no mary
Solution
What you really seem to need is FULL [OUTER] JOIN. Details in the manual on FROM and JOIN.
-- CREATE TABLE relations AS
SELECT username
, t1.registration_id AS reg1
, t2.registration_id AS reg2
, t3.registration_id AS reg3
, t4.registration_id AS reg4
FROM table1 t1
FULL JOIN table2 t2 USING (username)
FULL JOIN table3 t3 USING (username)
FULL JOIN table4 t4 USING (username)
ORDER BY username;
That's all. Produces your desired result directly.
username reg1 reg2 reg3 reg4
---------------------------------
bob 111 222 555 444
bob 111 333 555 444
mary 777 999 888 (null)
Your given example would work with LEFT JOIN as well, since all missing entries are to the right. But that would fail in other constellations. I added some more revealing test cases in the fiddle:
SQL Fiddle.
I assume you are aware that multiple entries in multiple tables will produce a huge number of output rows:
Two SQL LEFT JOINS produce incorrect result
If your values are always increasing (as in the example), then just use cumulative maximum and then select distinct:
select row_number() over (order by min(id)) as id,
t.num1, t.num2, t.num3, t.num4
from (select id,
max(num1) over (order by id) as num1,
max(num2) over (order by id) as num2,
max(num3) over (order by id) as num3,
max(num4) over (order by id) as num4
from t
) t
group by t.num1, t.num2, t.num3, t.num4;
If max() doesn't work, then what you really want is lag( . . . ignore nulls). That is not yet available. Perhaps the simplest method is then correlated subqueries for each column:
select row_number() over (order by min(id)) as id,
t.num1, t.num2, t.num3, t.num4
from (select id,
(select t2.num1 from t t2 where t2.id <= t.id and t2.num1 <> 0 order by t2.id desc limit 1
) as num1,
(select t2.num2 from t t2 where t2.id <= t.id and t2.num2 <> 0 order by t2.id desc limit 1
) as num2,
(select t2.num3 from t t2 where t2.id <= t.id and t2.num3 <> 0 order by t2.id desc limit 1
) as num3,
(select t2.num4 from t t2 where t2.id <= t.id and t2.num4 <> 0 order by t2.id desc limit 1
) as num4
from t
) t
group by t.num1, t.num2, t.num3, t.num4;
This version would not be very efficient on even medium sized tables.
A more efficient version is more complicated:
select row_number() over (order by id) as id,
t1.num1, t2.num2, t3.num3, t4.num4
from (select min(id) as id,
from (select id,
max(case when num1 > 0 then id end) over (order by id) as num1_id,
max(case when num2 > 0 then id end) over (order by id) as num2_id,
max(case when num3 > 0 then id end) over (order by id) as num3_id,
max(case when num4 > 0 then id end) over (order by id) as num4_id
from t
) t
group by num1_id, num2_id, num3_id, num4_id
) t left join
t t1
on t1.id = t.num1_id left join
t t2
on t2.id = t.num2_id left join
t t3
on t3.id = t.num3_id left join
t t4
on t4.id = t.num4_id left join
group by t.num1, t.num2, t.num3, t.num4;
EDIT:
That was a little silly. There is an easier way using first_value() (which Postgres unfortunately does not support as an aggregation function):
select row_number() over (order by min(id)) as id,
num1, num2, num3, num4
from (select id,
first_value(num1) over (order by (case when num1 is not null then id en) nulls last
) as num1,
first_value(num2) over (order by (case when num2 is not null then id end) nulls last
) as num2,
first_value(num3) over (order by (case when num3 is not null then id end) nulls last
) as num3,
first_value(num4) over (order by (case when num4 is not null then id end) nulls last
) as num4
from t
) t
group by num1, num2, num3, num4;

SQL Nearest Past Date

Hi i have an issue on handling some data on SQL, and returning some values by the nearest date. I have two Tables:
Table 1
ID Content Date
--------------------------------------------
123 X 2013-11-18
123 ZE 2013-11-29
233 YX 2013-12-30
233 XX 2013-12-28
444 Z 2014-02-24
Table 2
ID Value Validation Date
--------------------------------------------
123 0.54 2013-11-11
123 0.42 2013-11-18
123 0.32 2013-11-27
233 1.2 2013-12-4
233 1.1 2013-12-28
233 1.0 2013-12-29
444 4 2014-02-11
444 3 2014-02-15
444 2 2014-02-23
The output that i pretend is something like:
ID Content Date Value Validation Date
------------------------------------------------------------------------
123 X 2013-11-18 0.42 2013-11-18
123 ZE 2013-11-29 0.32 2013-11-27
233 YX 2013-12-30 1.0 2013-12-29
233 XX 2013-12-28 1.1 2013-12-28
444 Z 2014-02-24 2 2014-02-23
So i would like to return back the value where the validation date is the nearest to the date (where the validation date has to be always smaller than the date). Can you please help me? The ID in table 1 and 2 is not unique.
You can use the following query:
SELECT ID, Content, [Date], Value, [Validation Date]
FROM (
SELECT t1.ID, Content, [Date], Value, [Validation Date],
ROW_NUMBER() OVER (PARTITION BY t1.ID, Content
ORDER BY DATEDIFF(d, [Validation Date], [Date])) AS rn
FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.ID = t2.ID AND [Validation Date] <= [Date]
) t
WHERE t.rn = 1
ROW_NUMBER() is used to track the record with the smallest [Date] -[Validation Date] difference per (ID, Content) pair of values.
try this :
SELECT a.id,
a.content,
a.date,
b.valu,
b.validationdate
FROM (select tt.id,
tt.content,
tt.date,
row_number() over(partition by tt.id order by tt.date desc) rn
from table1 tt) a
JOIN (select t.id,
t.content,
t.date,
t.valu,
t.validationdate,
row_number() over(partition by t.id order by t.validationdate desc) rn
from table2 t) b
on a.id=b.id and a.rn=b.rn
I think the only way to do this is correlation. Something like that.
SELECT a.id, a.content, a.date,
(SELECT TOP 1 b.value, b.validate
FROM table2 b
WHERE b.id=a.id
ORDER BY b.validate DESC) from table1 a
I think the best approach is to use outer apply:
select t1.id, t1.content, t1.date, t2.value, t2.validdate
from table1 t1 outer apply
(SELECT TOP 1 t2.value, t2.validdate
FROM table2 t2
WHERE t2.id = t1.id
ORDER BY t2.validdate DESC
) t2;