SQL/Power BI How to expand table according dates - sql

I have a table like below, where a new record is created when there is a change in the status of a task.
task
status
last update
A
1
28/04/2022
A
3
01/05/2022
A
5
05/05/2022
B
1
28/04/2022
B
3
03/05/2022
B
4
05/05/2022
The problem is that I need to plot a graph within a time range, where I know the status of each item regardless of the date it was changed/created. With that, I think the easiest is to transform to the table below:
task
status
last update
A
1
28/04/2022
A
1
29/04/2022
A
1
28/04/2022
A
1
29/04/2022
A
1
30/04/2022
A
3
01/05/2022
A
3
02/05/2022
A
3
03/05/2022
A
3
04/05/2022
A
5
05/05/2022
B
1
28/04/2022
B
1
29/04/2022
B
1
30/04/2022
B
1
01/05/2022
B
1
02/05/2022
B
3
03/05/2022
B
3
04/05/2022
B
4
05/05/2022
However, I can't think of a way to do it, either directly in Power BI or even in SQL, since I'm connecting to a redshift database through a sql query.
Could you please help me?
Thanks

You can create the below visual using the standard line chart visualization. In the visualization settings, go to the "Shapes" menu and turn the "Stepped" view on.
While not necessary, it may be best practice to create a date dimension table with daily values spanning from the minimum update date to the maximum update date.
Dates = CALENDAR(MIN(Tasks[last update]),MAX(Tasks[last update]))
You can then create a one to many relationship between Dates and Tasks.

demo
very similar question: How to do forward fill as a PL/PGSQL function
I don't know the actual differences between amazon redshift and postgresql.
The demo is based on postgresql 14. It may not works on redshift.
Basic idea:for every distinct task, get the max, min last_updated date then use the generate_series function to expand the date based on task, task's min & max last_update. key point is first_value(status), because the once you expand the date, then obviously some date the status value is null, then use partition to fill the gap. If you want deep more, you can read manual: https://www.postgresql.org/docs/14/plpgsql.html
CREATE OR REPLACE FUNCTION test_expand ()
RETURNS TABLE (
_date1 date,
_first_ctask text,
_first_cstatus bigint
)
AS $$
DECLARE
distinct_task record;
max_last_update date;
min_last_update date;
_sql text;
BEGIN
FOR distinct_task IN SELECT DISTINCT
task
FROM
test_1
ORDER BY
1 LOOP
min_last_update := (
SELECT
min(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
max_last_update := (
SELECT
max(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
_sql := format($dml$ WITH cte AS (
SELECT
date1::date, $task$ % s$task $ AS _task, status, count(status) OVER (ORDER BY date1) AS c_s FROM (
SELECT
generate_series($a$ % s$a $::date, $b$ % s$b $::date, interval '1 day')) g (date1)
LEFT JOIN test_1 ON date1 = last_update)
SELECT
date1, _task, first_value(status) OVER (PARTITION BY c_s ORDER BY date1, status)
FROM cte $dml$, distinct_task.task, min_last_update, max_last_update);
RETURN query EXECUTE _sql;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;

Related

SAS Proc SQL - ranking top nth (3rd) highest for a group of say universities and their price? (HW to be honest)

(this is homework, not going to lie)
I have an ANSI SQL query I wrote
this produces
the required
3rd highest prices correctly,
table sample is
select unique uni, price
from
(
(
select unique uni, price
from
(
select unique uni, price
from table1
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
group by uni
having price < max(price)
)
now i need to list the 1st, 2nd and 3rd into one table but make is such that it could be used nth times.
example:
Col1 Col2
uni1 10
uni1 20
uni2 20
uni2 10
uni3 30
uni3 20
uni1 30
/sorry for the formatting i havent been here for a very long time, i appreciate any assistance, i will supply a link to the uni of which i have asked the tutor if i can do so he said yes but not the whole code, something like 10%, but anyways./
In SAS you can use the proprietary option OUTOBS to restrict how many rows of a result set are output.
Example:
Use OUTOBS=3 to create top 3 table. Then use that table in a subsequent query.
data have;
input x ##; datalines;
10 9 8 7 6 5 4 3 2 1 0
;
proc sql;
reset outobs=3;
create table top3x as
select * from have
order by x descending;
reset outobs=max;
* another query;
quit;

Combine data in multiple rows in Oracle

I have Oracle 12c so please answer my question based on using Oracle syntax. I want to combine data in multiple rows into 1 row. Please see expected result for an example.I tried using PIVOT function but it did not work for me because I want to PIVOT Call_day from previous row to latest row and want to have list of columns as shown in "Expected result" below. Thank you for your help.
Data in the table:
Acct_num Call_day Call_code Start_day_To_Call
1 04/23/2018 AA 04/02/2018
1 04/24/2018 NULL 04/02/2018
1 04/25/2018 CC 04/02/2018
2 04/26/2018 ZZ 05/02/2018
2 04/27/2018 CC 05/02/2018
If multiple calls made within Start_day_To_Call date then I want last 2 latest call pivot data as shown below:
Expected result:
Acct_num Call_day1 Call_day2 Call_code1 Call_code2 Start_day_To_Call
1 04/24/2018 04/25/2018 NULL CC 04/02/2018
2 04/26/2018 04/27/2018 ZZ CC 05/02/2018
If you want only two days you can use this query:
first you get last call for each acct_num and then find previous call and then fill data according to them. You can use an id to touch performance if needed.
select p.acct_num,
p.prev_last_day,
(select z.call_code
from test_tbl z
where z.acct_num = p.acct_num
and z.call_day = p.prev_last_day) prev_call_code,
last_day,
(select z.call_code
from test_tbl z
where z.acct_num = p.acct_num
and z.call_day = p.last_day) last_call_code,
p.start_day_to_call
from (select x.acct_num,
max(x.call_day) last_day,
max((select max(y.call_day)
from test_tbl y
where y.acct_num = x.acct_num
and y.call_day < x.call_day)) prev_last_day,
min(x.start_day_to_call) start_day_to_call
from test_tbl x
group by x.acct_num) p
order by p.acct_num

MonetDB: Enumerate groups of rows based on a given "boundary" condition

Consider the following table:
id gap groupID
0 0 1
2 3 1
3 7 2
4 1 2
5 5 2
6 7 3
7 3 3
8 8 4
9 2 4
Where groupID is the desired, computed column, such as its value is incremented whenever the gap column is greater than a threshold (in this case 6). The id column defines the sequential order of appearance of the rows (and it's already given).
Can you please help me figure out how to dynamically fill out the appropriate values for groupID?
I have looked in several other entries here in StackOverflow, and I've seen the usage of sum as an aggregate for a window function. I can't use sum because it's not supported in MonetDB window functions (only rank, dense_rank, and row_num). I can't use triggers (to modify the record insertion before it takes place) either because I need to keep the data mentioned above within a stored function in a local temporary table -- and trigger declarations are not supported in MonetDB function definitions.
I have also tried filling out the groupID column value by reading the previous table (id and gap) into another temporary table (id, gap, groupID), with the hope that this would force a row-by-row operation. But this has failed as well because it gives the groupID 0 to all records:
declare threshold int;
set threshold = 6;
insert into newTable( id, gap, groupID )
select A.id, A.gap,
case when A.gap > threshold then
(select case when max(groupID) is null then 0 else max(groupID)+1 end from newTable)
else
(select case when max(groupID) is null then 0 else max(groupID) end from newTable)
end
from A
order by A.id asc;
Any help, tip, or reference is greatly appreciated. It's been a long time already trying to figure this out.
BTW: Cursors are not supported in MonetDB either --
You can assign the group using a correlated subquery. Simply count the number of previous values that exceed 6:
select id, gap,
(select 1 + count(*)
from t as t2
where t2.id <= t.id and t2.gap > 6
) as Groupid
from t;

how to find continuous series using pl/sql

i am a pl/sql programmer and facing a problem in finding continuity in series for the same date
suppose i am having series like
1000,1001,
1002,1003,
1004,1005,
1016,1017,
1018,1019,
1020,1021,
1035,1036,
1037,1038,
1039,1040
and i am looking for the output as
from_series ------------- to_series
1000 ------------- 1005
1016 ------------- 1021
1035 ------------- 1040
i did trying it with but the problem which i faced is in case
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
r.series_NO-ISSUE_NO=1 );
SELECT *
FROM retort_t r
where NOT EXISTS
(
SELECT 'X'
FROM retort_t
ISSUE_NO=r.series_NO+1 );
I am getting the result by joining the above two queries in alignment. It's ok for few records but my records are in lac's, it's taking a long time to fetch data from joining these two queries.
please let me the appropriate way to sort out the data in correct interval.
Assuming a simple table structure such as:
CREATE TABLE T (x INT);
INSERT INTO T (x) VALUES
(1000), (1001), (1002), (1003),
(1004), (1005), (1016), (1017),
(1018), (1019), (1020), (1021),
(1035), (1036), (1037), (1038),
(1039), (1040);
You can use ROW_NUMBER() to get a static value for sequential numbers, you can then group by this value to get the min and max values in a range:
SELECT MIN(x) AS RangeStart, MAX(x) AS RangeEnd
FROM ( SELECT X,
X - ROW_NUMBER() OVER(ORDER BY x) AS GroupBy
FROM T
) t
GROUP BY GroupBy;
Example On SQL Fiddle

Moving Average based on Timestamps in PostgreSQL

I wanted to perform moving average through timestamps.
I have two columns: Temperature and timestamps (time-date) and I want to perform the moving average based on every 15 minutes successive temperature observations. In other words, selecting data to perform the average based on 15 minutes time interval. Moreover, it is possible to have different number of observations for different time sequences. I meant all the window sizes are equal (15 minutes) but it is possible to have different number of observations in each window.
For example:
For a first window we have to calculate the average of n observation and for second window calculate the average of the observation for n+5 observation.
Data Sample:
ID Timestamps Temperature
1 2007-09-14 22:56:12 5.39
2 2007-09-14 22:58:12 5.34
3 2007-09-14 23:00:12 5.16
4 2007-09-14 23:02:12 5.54
5 2007-09-14 23:04:12 5.30
6 2007-09-14 23:06:12 5.20
7 2007-09-14 23:10:12 5.39
8 2007-09-14 23:12:12 5.34
9 2007-09-14 23:20:12 5.16
10 2007-09-14 23:24:12 5.54
11 2007-09-14 23:30:12 5.30
12 2007-09-14 23:33:12 5.20
13 2007-09-14 23:40:12 5.39
14 2007-09-14 23:42:12 5.34
15 2007-09-14 23:44:12 5.16
16 2007-09-14 23:50:12 5.54
17 2007-09-14 23:52:12 5.30
18 2007-09-14 23:57:12 5.20
Main Challenges:
How I can learn the code to discriminate every 15 minute while there are not exact 15 minutes time intervals due to different sampling frequency.
You can join your table with itself:
select l1.id, avg( l2.Temperature )
from l l1
inner join l l2
on l2.id <= l1.id and
l2.Timestamps + interval '15 minutes' > l1.Timestamps
group by l1.id
order by id
;
Results:
| ID | AVG |
-----------------------
| 1 | 5.39 |
| 2 | 5.365 |
| 3 | 5.296666666667 |
| 4 | 5.3575 |
| 5 | 5.346 |
| 6 | 5.321666666667 |
| 7 | 5.331428571429 |
Notice: Only 'hard work' is made. You should join result with original table or append new columns to query. I don't know your final query needed. Adapt this solution or ask for more help.
Assuming you want to restart the rolling average after each 15 minute interval:
select id,
temp,
avg(temp) over (partition by group_nr order by time_read) as rolling_avg
from (
select id,
temp,
time_read,
interval_group,
id - row_number() over (partition by interval_group order by time_read) as group_nr
from (
select id,
time_read,
'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group,
temp
from readings
) t1
) t2
order by time_read;
It is based on Depesz's solution to group by "time ranges":
Here is an SQLFiddle example: http://sqlfiddle.com/#!1/0f3f0/2
Here's an approach that utilises the facility to use an aggregation function as a window function. The aggregate function keeps the last 15 minutes' worth of observations in an array, along with the current running total. The state transition function shifts elements off the array that have fallen behind the 15-minute window, and pushes on the latest observation. The final function simply computes the mean temperature in the array.
Now, as to whether this is a benefit or not... it depends. It focuses on the plgpsql-execution part of postgresql rather than database-access part, and my own experiences is that plpgsql is not fast. If you can easily do lookups back to the table to find the previous 15 minutes' rows for each observation, a self-join (as in #danihp answer) will do well. However, this approach can deal with the observations coming from some more complex source, where those lookups aren't practical. As ever, trial and compare on your own system.
-- based on using this table definition
create table observation(id int primary key, timestamps timestamp not null unique,
temperature numeric(5,2) not null);
-- note that I'm reusing the table structure as a type for the state here
create type rollavg_state as (memory observation[], total numeric(5,2));
create function rollavg_func(state rollavg_state, next_in observation) returns rollavg_state immutable language plpgsql as $$
declare
cutoff timestamp;
i int;
updated_memory observation[];
begin
raise debug 'rollavg_func: state=%, next_in=%', state, next_in;
cutoff := next_in.timestamps - '15 minutes'::interval;
i := array_lower(state.memory, 1);
raise debug 'cutoff is %', cutoff;
while i <= array_upper(state.memory, 1) and state.memory[i].timestamps < cutoff loop
raise debug 'shifting %', state.memory[i].timestamps;
i := i + 1;
state.total := state.total - state.memory[i].temperature;
end loop;
state.memory := array_append(state.memory[i:array_upper(state.memory, 1)], next_in);
state.total := coalesce(state.total, 0) + next_in.temperature;
return state;
end
$$;
create function rollavg_output(state rollavg_state) returns float8 immutable language plpgsql as $$
begin
raise debug 'rollavg_output: state=% len=%', state, array_length(state.memory, 1);
if array_length(state.memory, 1) > 0 then
return state.total / array_length(state.memory, 1);
else
return null;
end if;
end
$$;
create aggregate rollavg(observation) (sfunc = rollavg_func, finalfunc = rollavg_output, stype = rollavg_state);
-- referring to just a table name means a tuple value of the row as a whole, whose type is the table type
-- the aggregate relies on inputs arriving in ascending timestamp order
select rollavg(observation) over (order by timestamps) from observation;
Based on dani herrera's answer:
select l1.id,
l1.time_read,
l1.temp ,
avg( l2.Temp ) as rolling_avg
from readings l1
inner join readings l2
on l2.id <= l1.id and
l2.time_read + interval '15 minutes' > l1.time_read
group by l1.id
order by time_read;
Here is an SQLFiddle: http://sqlfiddle.com/#!17/9db74/161 and the data in a chart would look like this: