PostgreSQL fill in the blanks in an outer join

PostgreSQL fill in the blanks in an outer join - sql

Outer Join 'fill-in-the blanks'
I have a pair of master-detail tables in a PostgreSQL database where master table 'samples' has some samples with a timestamp in each.
The detail table 'sample_values' has some values for some parameters at any given sample timestamp.
My Query
SELECT s.sample_id, s.sample_time, v.parameter_id, v.sample_value
FROM samples s LEFT OUTER JOIN sample_values v ON v.sample_id=s.sample_id
ORDER BY s.sample_id, v.parameter_id;
returns (as expected):
sample_id
sample_time
parameter_id
sample_value
1
2023-01-13T01:00:00.000Z
1
1.23
1
2023-01-13T01:00:00.000Z
2
4.98
2
2023-01-13T01:01:00.000Z
3
2023-01-13T01:02:00.000Z
4
2023-01-13T01:03:00.000Z
5
2023-01-13T01:04:00.000Z
2
6.08
6
2023-01-13T01:05:00.000Z
7
2023-01-13T01:06:00.000Z
1
1.89
8
2023-01-13T01:07:00.000Z
9
2023-01-13T01:08:00.000Z
10
2023-01-13T01:09:00.000Z
11
2023-01-13T01:10:00.000Z
12
2023-01-13T01:11:00.000Z
13
2023-01-13T01:12:00.000Z
14
2023-01-13T01:13:00.000Z
15
2023-01-13T01:14:00.000Z
1
2.11
16
2023-01-13T01:15:00.000Z
17
2023-01-13T01:16:00.000Z
18
2023-01-13T01:17:00.000Z
19
2023-01-13T01:18:00.000Z
2
3.57
20
2023-01-13T01:19:00.000Z
21
2023-01-13T01:20:00.000Z
22
2023-01-13T01:21:00.000Z
23
2023-01-13T01:22:00.000Z
1
3.21
23
2023-01-13T01:22:00.000Z
2
5.31
How do I write a query that returns one row per timestamp per parameter, where sample_value is the 'latest known' sample_value for that parameter like this:
sample_id
sample_time
parameter_id
sample_value
1
2023-01-13T01:00:00.000Z
1
1.23
1
2023-01-13T01:00:00.000Z
2
4.98
2
2023-01-13T01:01:00.000Z
1
1.23
2
2023-01-13T01:01:00.000Z
2
4.98
3
2023-01-13T01:02:00.000Z
1
1.23
3
2023-01-13T01:02:00.000Z
2
4.98
4
2023-01-13T01:03:00.000Z
1
1.23
4
2023-01-13T01:03:00.000Z
2
4.98
5
2023-01-13T01:04:00.000Z
1
1.23
5
2023-01-13T01:04:00.000Z
2
6.08
6
2023-01-13T01:05:00.000Z
1
1.23
6
2023-01-13T01:05:00.000Z
2
6.08
7
2023-01-13T01:06:00.000Z
1
1.89
7
2023-01-13T01:06:00.000Z
2
6.08
8
2023-01-13T01:07:00.000Z
1
1.89
8
2023-01-13T01:07:00.000Z
2
6.08
View on DB Fiddle
I cannot get my head around the LAST_VALUE function (if that is even the right tool for this?):
LAST_VALUE ( expression )
OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC | DESC], ...
)

First of all you need two rows for each of your sample ids. You can achieve it by cross joining your sample values with the distinct amount of parameters, and ensuring the condition on parameters is met as well on the left join.
...
FROM samples s
CROSS JOIN (SELECT DISTINCT parameter_id FROM sample_values) p
LEFT JOIN sample_values v
ON v.sample_id = s.sample_id AND v.parameter_id = p.parameter_id
...
In addition to this, your intuition of using the LAST_VALUE window function was correct. Problem is that PostgreSQL is unable to ignore null values till its current version. The only workaround for this problem is to generate partitioning on your parameter_ids and sample_value (each partition will contain one non-null value and the other null values), then taking the maximum value from each partition.
WITH cte AS (
SELECT s.sample_id, s.sample_time, p.parameter_id, v.sample_value,
COUNT(v.sample_value) OVER(
PARTITION BY p.parameter_id
ORDER BY s.sample_id
) AS partitions
FROM samples s
CROSS JOIN (SELECT DISTINCT parameter_id FROM sample_values) p
LEFT JOIN sample_values v
ON v.sample_id = s.sample_id AND v.parameter_id = p.parameter_id
)
SELECT sample_id, sample_time, parameter_id,
COALESCE(sample_value,
MAX(sample_value) OVER (PARTITION BY parameter_id, partitions)
) AS sample_value
FROM cte
ORDER BY sample_id, parameter_id
Check the demo here.

Related

Counting SUM(VALUE) from previous cell

I have the following table:
A
Sum(Tickets)
01-2022
5
02-2022
2
03-2022
8
04-2022
1
05-2022
3
06-2022
3
07-2022
4
08-2022
1
09-2022
5
10-2022
5
11-2022
3
I would like to create the following extra column 'TotalSum(Tickets)' but I am stuck....
Anyone who can help out?
A
Sum(Tickets)
TotalSum(Tickets)
01-2022
5
5
02-2022
2
7
03-2022
8
15
04-2022
1
16
05-2022
3
19
06-2022
3
22
07-2022
4
26
08-2022
1
27
09-2022
5
32
10-2022
5
37
11-2022
3
40

You may use SUM() as a window function here:
SELECT A, SumTickets, SUM(SumTickets) OVER (ORDER BY A) AS TotalSumTickets
FROM yourTable
ORDER BY A;
But this assumes that you actually have a bona-fide column SumTickets which contains the sums. Assuming you really showed us the intermediate result of some aggregation query, you should use:
SELECT A, SUM(Tickets) AS SumTickets,
SUM(SUM(Tickets)) OVER (ORDER BY A) AS TotalSumTickets
FROM yourTable
GROUP BY A
ORDER BY A;

left join the same table where date is not bigger, then sum that for every date:
select
table1.date,
sum(t.tickets)
from
table1
left join table1 t
on t.date<= table1.date
group by
table1.date;

Get max record for each group of records, link multiple tables

I seek to find the maximum timestamp (ob.create_ts) for each group of marketid's (ob.marketid), joining tables obe (ob.orderbookid = obe.orderbookid) and market (ob.marketid = m.marketid). Although there are a number of solutions posted like this for a single table, when I join multiple tables, I get redundant results. Sample table and desired results below:
table: ob
orderbookid
marketid
create_ts
1
1
1664635255298
2
1
1664635255299
3
1
1664635255300
4
2
1664635255301
5
2
1664635255302
6
2
1664635255303
table: obe
orderbookentryid
orderbookid
entryname
1
1
'entry-1'
2
1
'entry-2'
3
1
'entry-3'
4
2
'entry-4'
5
2
'entry-5'
6
3
'entry-6'
7
3
'entry-7'
8
4
'entry-8'
9
5
'entry-9'
10
6
'entry-10'
table: m
marketid
marketname
1
'market-1'
2
'market-2'
desired results
ob.orderbookid
ob.marketid
obe.orderbookentryid
obe.entryname
m.marketname
3
1
6
'entry-6'
'market-1'
3
1
7
'entry-7'
'market-1'
6
2
10
'entry-10'
'market-2'

Use ROW_NUMBER() to get a properly filtered ob table. Then JOIN the other tables onto that!
WITH
ob_filtered AS (
SELECT
orderbookid,
marketid
FROM
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY
marketid
ORDER BY
create_ts DESC
) AS create_ts_rownumber
FROM
ob
) ob_with_rownumber
WHERE
create_ts_rownumber = 1
)
SELECT
ob_filtered.orderbookid,
ob_filtered.marketid,
obe.orderbookentryid,
obe.entryname,
m.marketname
FROM
ob_filtered
JOIN m
ON m.marketid = ob_filtered.marketid
JOIN obe
ON ob_filtered.orderbookid = obe.orderbookid
;

T-SQL How to configure Group by so that specific values would be correctly shown

My current T-SQL query provides the following results:
Query:
WITH CTE AS
(
SELECT SubscriberID, sum(valueMB) as ValuesMB
FROM dbo.InternetNetwork
GROUP BY SubscriberID
),
CTE2 AS (
SELECT ab.planID, a.SubscriberID, MAX(ValuesMB) as MaximumValue
FROM CTE AS a
left join
Subscriber as ab on a.SubscriberID= ab.SubscriberID
GROUP BY ab.planID, a.SubscriberID
)
select *
FROM CTE2 as b
ORDER BY b.MaximumValue desc
Output:
planID | SubscriberID | MaxValue
19 1555 97536.00
18 3528 97478.00
2 4029 93413.00
Query #2:
WITH CTE AS
(
SELECT SubscriberID, sum(valueMB) as ValuesMB
FROM dbo.InternetNetwork
GROUP BY SubscriberID
),
CTE2 AS(
SELECT ab.planID, MAX(ValuesMB) as MaximumValue
FROM CTE AS a
left join
Subscriber as ab on a.SubscriberID= ab.SubscriberID
GROUP BY ab.planID
)
SELECT pl.OperatorID, MAX(b.MaximumValue) as Super
FROM CTE2 as b
left join
Plan as pl on b.planID= pl.planID
GROUP BY pl.operatorID
ORDER BY pl.operatorID
Output #2:
OperatorID | Value
1 93413.00
2 86017.00
3 97536.00
I would like to also include a subscriberID, but I'm unable to figure out a way to do so, as the only way to do it, is including in the last SELECT and adding to GROUP BY, which when done, makes a mess of a result which is not accurate.
My desired output:
OperatorID | Value | SubscriberID
1 93413.00 4029
2 86017.00 164
3 97536.00 1544
internet network data:
SubscriberID ValuesMB
1 28
1 27
2 27
2 27
2 27
3 29
3 28
3 27
3 27
4 27
4 27
4 29
Subscriber Data:
SubscriberID PersonID PlanID
1 1 3
2 2 10
3 2 6
4 3 14
5 3 1
6 4 18
7 5 5
8 5 1
9 5 9
10 5 16
11 6 13
12 6 13
13 6 20
14 6 16
15 7 4
Plan data
PlanID OperatorID
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 2
9 2
10 2
11 2
12 3
13 3
14 3
15 3
16 3
17 3
18 3
19 3
20 3
The tables are somewhat like this related InternetNetwork-> Subscriber -> Plan. InternetNetwork contains how much each Subscribed has used. Each Subscriber has Plan associated with him. Each Plan contains a different Operator, there are only three. I wish to list all three operators, the data transferred by the subscriber of the plan that has the operator and Subscriber ID.

Window functions allow you to have fields in your select along with aggregate functions. You can do something like this
;WITH CTE AS
(
SELECT I.SubscriberID,
S.PlanID,
SUM(ValuesMB) OVER(PARTITION BY i.SubscriberID)as ValuesMB
FROM dbo.InternetNetwork I
JOIN Subscriber S
ON I.SubscriberID = S.SubscriberID
),
CTE2 AS
(
SELECT p.operatorID,
a.SubscriberID,
a.ValuesMB,
ROW_NUMBER() OVER(PARTITION BY p.operatorID ORDER BY a.ValuesMB DESC) as rn
FROM CTE a
join [Plan] P
on a.planID = P.planID
)
SELECT operatorID,
ValuesMB,
SubscriberID
FROM CTE2
where rn = 1

filling running total over all month although its null

I have 2 tables. One only with all periods. Second with Account, Amount and period.
I want to build a View that lists Amount kumulated, period and account. Also if I don't have an fact for a period in my table should be appear the period in my view with the last amount.
select distinct
account, b.periode,
SUM(amount) OVER (PARTITION BY account ORDER BY b.periode RANGE UNBOUNDED PRECEDING)
from
fakten a
full join
perioden b on a.periode = b.periode
order by b.periode
it like this:
1 1 6
2 1 4
1 2 13
2 2 3
NULL 3 NULL
1 4 46
2 5 48
NULL 6 NULL
NULL 7 NULL
1 8 147
NULL 9 NULL
NULL 10 NULL
NULL 11 NULL
NULL 12 NULL
I need it like:
1 1 6
2 1 4
1 2 13
2 2 3
1 3 13
2 3 3
1 4 46
2 4 3
1 5 46
2 5 48
1 6 46
2 6 46
and so one...
Any ideas?

full join is not the right approach. Instead, generate the rows you want using a cross join. Then use left join and group by to do the calculation.
select a.account, p.periode,
SUM(f.amount) OVER (PARTITION BY a.account ORDER BY p.periode)
from (select distinct account from fakten) a cross join -- you probably have an account table, use it
perioden p
on a.periode = p.periode left join
fakten f
on f.account = a.account and f.periode = p.periode
group by a.account, p.periode
order by a.account, p.periode;

How to do grouping by a date span?

Conside this Table Structure.
Key ID VISITDATE
1 1 2011-01-07
2 1 2011-01-09
3 2 2011-01-10
4 1 2011-01-12
5 3 2011-01-12
6 1 2011-01-15
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
11 1 2011-01-06
I need to get all the IDs,Key,min(VisitDate) where VisitDate is within 10 days span?if you have two visits within 10 days one row need to be there in the result.
Result
KEY ID VISITDATE
11 1 2011-01-06
3 2 2011-01-10
5 3 2011-01-12
7 2 2011-01-21
9 1 2011-02-28
10 2 2011-03-21
Can this be done without a self join. i have a query which does a self join with the table on ID and check the datediff.is there a better solution?can we use recursive CTE here?
EDIT
Prefer a solution which can use the index on date column

Yes a CTE would work nicely for this (everything with me is CTEs lately)...
;WITH TenDayVisits
AS (
SELECT
ID
,MIN(VisitDate) AS VisitDate
FROM Visits
GROUP BY ID
UNION ALL
SELECT
t.ID
,v.VisitDate
FROM Visits AS v
JOIN TenDayVisits AS t ON v.ID = t.ID
AND DATEDIFF(dd,t.Visitdate,v.VisitDate) > 10
)
SELECT
DISTINCT
v.[key]
,t.id
,t.VisitDate
FROM TenDayVisits as T
JOIN Visits AS v ON t.id = v.id
AND t.VisitDate = v.VisitDate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PostgreSQL fill in the blanks in an outer join - sql

Related

Counting SUM(VALUE) from previous cell

Get max record for each group of records, link multiple tables

T-SQL How to configure Group by so that specific values would be correctly shown

filling running total over all month although its null

How to do grouping by a date span?

Categories

Resources