How to fill missing values in certain time interval - sql

I have table in below format
user timestamp count total_count
xyz 01-01-2020 00:12:00 45 45
xyz 01-01-2020 00:27:00 12 57
xyz 01-01-2020 00:29:00 11 68
xyz 01-01-2020 00:53:00 32 100
I want the data into 5 min interval like below (Expected Output)
user timestamp count total_count
xyz 01-01-2020 00:05:00 0 0
xyz 01-01-2020 00:10:00 0 0
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:20:00 0 45
xyz 01-01-2020 00:25:00 0 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:35:00 0 68
xyz 01-01-2020 00:40:00 0 68
xyz 01-01-2020 00:45:00 0 68
xyz 01-01-2020 00:50:00 0 68
xyz 01-01-2020 00:55:00 32 100
I tried
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(timestamp), 5*60)) timekey,
SUM(count) AS count,
MAX(total_count) as total_count
FROM db.table
WHERE
timestamp BETWEEN {{ start_date }}
AND {{ end_date }}
AND user = {{ user_id }}
GROUP BY
timekey
ORDER BY
timekey
Result of above query:
user timestamp count total_count
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:55:00 32 100
How can I fill those missing timestamps in above query and fill values of count(with zeros) and total_count(previous non null value)?

Use generate_timestamp_array() to fill in the missing values:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
db.table t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;
If you need to partition by the table, you can slightly modify the query:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
(SELECT t.*
FROM db.table t
WHERE timestamp BETWEEN {{ start_date }} AND {{ end_date }}
) t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;

Related

(bigquery) how number of hours event is happening within multiple dates

So my data looks like this:
DATE TEMPERATURE
2012-01-13 23:15:00 UTC 0
2012-01-14 01:35:00 UTC 5
2012-01-14 02:15:00 UTC 6
2012-01-14 03:15:00 UTC 8
2012-01-14 04:15:00 UTC 0
2012-01-14 04:55:00 UTC 0
2012-01-14 05:15:00 UTC -2
2012-01-14 05:35:00 UTC 0
I am trying to calculate the amount of time a zip code temperature will drop to 0 or below on any given day. On the 13th, it only happens for a very short amount of time so we don't really care. I want to know how to calculate the number of minutes this happens on the 14th, since it looks like a significantly (and consistently) cold day.
I want the query to add two more columns.
The first column added would be the time difference between the rows on a given date. So row 3- row 2=40 mins and row 4-row3=60 mins.
The second column would total the amount of minutes for a whole day the minutes the temperature has dropped to 0 or below. Here row 2-4 would be ignored. From row 5-8, total time that the temperature was 0 or below would be about 90 mins
It should end up looking like this:
DATE TEMPERATURE MINUTES_DIFFERENCE TOTAL_MINUTES
2012-01-13 23:15:00 UTC 0 0 0
2012-01-14 01:35:00 UTC 5 140 0
2012-01-14 02:15:00 UTC 6 40 0
2012-01-14 03:15:00 UTC 8 60 0
2012-01-14 04:15:00 UTC 0 60 60
2012-01-14 04:55:00 UTC 0 30 90
2012-01-14 05:15:00 UTC-2 20 110
2012-01-14 05:35:00 UTC 0 20 130
Use below
select *,
sum(minutes_difference) over(order by date) total_minutes
from (
select *,
ifnull(timestamp_diff(timestamp(date), lag(timestamp(date)) over(order by date), minute), 0) as minutes_difference
from your_table
)
if applied to sample data in your question - output is
Update to answer updated question
select * except(new_grp, grp),
sum(if(temperature > 0, 0, minutes_difference)) over(partition by grp order by date) total_minutes
from (
select *, countif(new_grp) over(order by date) as grp
from (
select *,
ifnull(timestamp_diff(timestamp(date), lag(timestamp(date)) over(order by date), minute), 0) as minutes_difference,
ifnull(((temperature <= 0) and (lag(temperature) over(order by date) > 0)) or
((temperature > 0) and (lag(temperature) over(order by date) <= 0)), true) as new_grp
from your_table
)
)
with output

Selecting first element in Group by object Postgres

I have the following table and I want to get the specidic Amount per loan_ID that corresponds to the earliest observation with greater than or equal to 10 dpd per month.
Loan_ID date dpd Amount
1 1/1/2017 1 55
1 1/2/2017 2 100
1 1/3/2017 3 5000
1 1/4/2017 5 6000
1 1/5/2017 10 50000
1 1/6/2017 15 50001
1 1/9/2017 31 50004
1 1/10/2017 55 50005
1 1/11/2017 59 50006
1 1/12/2017 65 50007
1 1/13/2017 70 80000
1 1/20/2017 85 900000
1 1/29/2017 92 100000
1 1/30/2017 93 10000
2 1/1/2017 0 522
2 1/2/2017 8 5444
2 1/3/2017 12 8784
2 1/6/2017 15 6221
2 1/12/2017 18 2220
2 1/13/2017 20 177
2 1/29/2017 35 5151
2 1/30/2017 60 40000
2 1/31/2017 61 5500
The expected output:
Loan_ID Month Amount
1 1 50000
2 1 8784
SELECT DISTINCT ON ("Loan_ID", date_trunc('month', "date"))
"Loan_ID",
date_trunc('month', "date")::date as month,
"Amount"
FROM
loans
WHERE
dpd >= 10
ORDER BY
"Loan_ID",
date_trunc('month', "date"),
"date"
;
Returns:
Loan_ID
month
Amount
1
2017-01-01
50000
2
2017-01-01
8784
You can find test case in db<>fiddle
Hmmm . . . if you want the amount per month and the first date that matches the condition, then you want conditional aggregation:
select loan_id, date_trunc('month', date) as mon,
sum(dpd),
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
group by load_id, mon;
Edit: Based on your comment, you can use distinct on:
select distinct on (loan_id, date_trunc('month', date)) t.*
min(case when dpd >= 10 then dpd end) as first_dpd_10
from t
where dpd >= 10
order by load_id, date_trunc('month', date), date

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

My SQL Query is working on one date, but I want start date to end date

I am using SQL Server 2005
I have two tables:
CheckInOut
TR BadgeNum USERID Dated Time CHECKTYPE
------- --------- ------ ----------------------- ----------------------- ----------
2337334 4 1 2018-04-01 00:00:00.000 2018-04-14 10:10:58.000 I
2337334 4 1 2018-04-01 00:00:00.000 2018-04-14 18:10:00.000 O
2337334 4 1 2018-04-02 00:00:00.000 2018-04-14 10:00:10.000 I
2337335 4 1 2018-04-02 00:00:00.000 2018-04-14 18:14:27.000 O
2337336 4 1 2018-04-03 00:00:00.000 2018-04-14 10:22:10.000 I
2337334 4 1 2018-04-03 00:00:00.000 2018-04-14 18:03:11.000 O
2337337 44 5 2018-04-01 00:00:00.000 2018-04-14 09:27:03.000 I
2337337 44 5 2018-04-01 00:00:00.000 2018-04-14 18:27:42.000 O
2337337 44 5 2018-04-02 00:00:00.000 2018-04-14 10:00:50.000 I
2337337 44 5 2018-04-02 00:00:00.000 2018-04-14 18:02:25.000 O
2337337 44 5 2018-04-03 00:00:00.000 2018-04-14 08:58:36.000 I
2337337 44 5 2018-04-03 00:00:00.000 2018-04-14 18:12:18.000 O
UserInfo
Tr UserID BadgeNumber Name
----- ------- ----------- --------------
13652 44 5 SAMIA NAZ
13653 4 1 Waqar Yousufzai
I need to calculate presence hours for each day for each user. My below query is working fine for given day. But I need to calculate for a given range. How do I get expected result?
Select isnull(max(ch.userid), 0)As 'ID'
,isnull(max(ch.badgenum), 0)as 'Badge#'
,isnull(max(convert(Char(10), ch.dated, 103)), '00:00')as 'Date'
,isnull(max(ui.name),'Empty')as 'Name'
,isnull(min(convert(VARCHAR(26), ch.time, 108)), '00:00') as 'Time In'
,case when min(ch.time) = max(ch.time) then '' else isnull(max(convert(VARCHAR(26), ch.time, 108)), '00:00') end as 'TimeOut'
,case when min(ch.time) = max(ch.time) then 'Absent' else 'Present' end as 'Status'
,isnull(CONVERT(varchar(3),DATEDIFF(minute,min(ch.time), max(ch.time))/60) + ' hrs and ' +
RIGHT('0' + CONVERT(varchar(2),DATEDIFF(minute,min(ch.time),max(ch.time))%60),2) + 'Min' , 0) as 'Total Hrs'
From CHECKINOUT ch left Join userinfo ui on ch.badgenum = ui.badgenumber
Where ch.Dated between '2018-04-01' and '2018-04-03' GROUP BY ch.badgenum
Query result
ID Badge# Date Name Time In TimeOut Status Total Hrs
--- ------ ---------- --------------- -------- ---------- -------- -----------------
4 1 03/04/2018 Waqar Yousufzai 11:33:34 18:24:23 Present 30 hrs and 14Min
82 3 03/04/2018 TANVEER ANSARI 09:37:14 19:18:22 Present 32 hrs and 37Min
13 4 03/04/2018 07:19:26 09:30:17 Present 21 hrs and 49Min
44 5 03/04/2018 SAMIA NAZ 08:53:15 18:25:21 Present 33 hrs and 24Min
28 7 03/04/2018 Anees Ahmad 08:34:57 22:00:38 Present 61 hrs and 25Min
46 8 03/04/2018 Shazia - OT 08:10:41 16:15:05 Present 32 hrs and 01Min
Expected result
ID Badge# Date Name Time In TimeOut Status Total Hrs
--- ------ ---------- --------------- -------- ---------- -------- -----------------
4 1 01/04/2018 Waqar Yousufzai 10:30:00 18:00:00 Present 7 hrs and 30Min
4 1 02/04/2018 Waqar Yousufzai 10:30:00 18:00:00 Present 7 hrs and 30Min
4 1 03/04/2018 Waqar Yousufzai 10:00:00 18:00:00 Present 8 hrs and 00Min
44 5 01/04/2018 SAMIA 08:00:00 18:00:00 Present 10 hrs and 00Min
44 5 02/04/2018 SAMIA 08:30:00 18:00:00 Present 9 hrs and 30Min
44 5 03/04/2018 SAMIA 08:00:00 18:00:00 Present 10 hrs and 00Min
You shouldn't do aggregation on date value, it must be part of grouping. Get time out and time in using conditional aggregation. And count total hours worked. Your query should be something like:
select
BadgeNum, USERID, Dated, Name
, right('0' + cast(datediff(mi, [in], [out]) / 60 as varchar(10)), 2) + ':'
+ right('0' + cast(datediff(mi, [in], [out]) % 60 as varchar(10)), 2)
from (
select
ch.BadgeNum, ch.USERID, dated = cast(ch.Dated as date), ui.Name
, [in] = min(case when ch.CHECKTYPE = 'I' then ch.Time end)
, [out] = min(case when ch.CHECKTYPE = 'O' then ch.Time end)
from
CheckInOut ch
left join UserInfo ui on ch.USERID = ui.badgenumber
where
ch.Dated >= '20180401'
and ch.Dated < '20180404'
group by ch.BadgeNum, ch.USERID, cast(ch.Dated as date), ui.Name
) t

Oracle, SQL, how to get intervals between dates

I need help with a problem. Actually, I do not know if it will be possible to solve it directly in SQL.
I have a list of works. Each work has a start date and ending date, with this format
YYYY/MM/DD HH24:MI:SS
I need to calculate the cost of those jobs, the hour price depends on the time intervals in which the work has been done:
Nigth time: 22:00 to 6:00, for example: 20 €/h
Normal time: the rest 17 €/h
So, if I have a sample like this:
wo start end
21 2017/11/16 21:25:00 2017/11/16 22:55:00
22 2017/11/17 05:45:00 2017/11/17 07:05:00
23 2017/11/18 23:00:00 2017/11/19 1:10:00
24 2017/11/17 18:00:00 2017/11/17 19:00:00
I would need to calculate the intervals of the dates between the 22h and 6h and the rest to multiply them by their corresponding price
wo rest(minutes) night(minutes)
21 35 55
22 15 65
23 0 130
24 1 0
Thank for your help in advance.
Heh. If you really wish it :)
Fifth record (started at 2016-10-30) had been added for testing purposes.
SQL> with
2 src as (select timestamp '2017-11-16 21:25:00' b, timestamp '2017-11-16 22:55:00' f from dual union all
3 select timestamp '2017-11-17 05:45:00' b, timestamp '2017-11-17 07:05:00' f from dual union all
4 select timestamp '2017-11-18 23:00:00' b, timestamp '2017-11-19 1:10:00' f from dual union all
5 select timestamp '2017-11-17 18:00:00' b, timestamp '2017-11-17 19:00:00' f from dual union all
6 select timestamp '2016-10-30 00:00:00' b, timestamp '2016-11-03 23:00:00' f from dual),
7 srd as (select b, f, f - b t from src),
8 mmm as (select min(trunc(b)) b, max(trunc(f)) f from src),
9 rws as (select b + 6/24 + rownum - 1 b, b + 22/24 + rownum - 1 f from mmm connect by level <= f - b + 1),
10 mix as (select s.b, s.f, s.t, r.b rb, r.f rf from srd s, rws r where s.f >= r.b (+) and r.f (+) >= s.b),
11 clc as (select b, f, t, nvl(numtodsinterval(sum((least(f, rf) + 0) - (greatest(b, rb) + 0)), 'DAY'), interval '0' second) d from mix group by b, f, t)
12 select
13 to_char(b, 'dd.mm.yyyy hh24:mi') as "datetime begin",
14 to_char(f, 'dd.mm.yyyy hh24:mi') as "datetime finish",
15 cast(t as interval day to second(0)) as "total time",
16 cast(d as interval day to second(0)) as "daytime",
17 cast(t - d as interval day to second(0)) as "nighttime"
18 from
19 clc
20 order by
21 1, 2;
datetime begin datetime finish total time daytime nighttime
------------------ ------------------ -------------- -------------- --------------
16.11.2017 21:25 16.11.2017 22:55 +00 01:30:00 +00 00:35:00 +00 00:55:00
17.11.2017 05:45 17.11.2017 07:05 +00 01:20:00 +00 01:05:00 +00 00:15:00
17.11.2017 18:00 17.11.2017 19:00 +00 01:00:00 +00 01:00:00 +00 00:00:00
18.11.2017 23:00 19.11.2017 01:10 +00 02:10:00 +00 00:00:00 +00 02:10:00
30.10.2016 00:00 03.11.2016 23:00 +04 23:00:00 +03 08:00:00 +01 15:00:00
A different approach is more brute force one, but it allows to distinct the interval configuration from the reporting.
It goes in three stept:
1) define the rate type for aech minute of the day (change the granularity if required)
create table day_config as
with helper as (
select
rownum -1 minute_id
from dual connect by level <= 24*60),
helper2 as (
select
minute_id,
trunc(minute_id/60) hour_no,
mod(minute_id,60) minute_no
from helper)
select
minute_id,hour_no, minute_no,
case when hour_no >= 22 or hour_no <= 5 then 0 else 1 end rate_id
from helper2;
select * from day_config order by minute_id;
MINUTE_ID HOUR_NO MINUTE_NO RATE_ID
---------- ---------- ---------- ----------
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 0 6 0
7 0 7 0
8 0 8 0
9 0 9 0
Here rate_id means nigth, rate_id 1 means a day.
Advantage is, that you can introduce as much rate types as required.
2) expand the configuration for the required interval e.g. to whole year.
So now we have for each minute of the year the configuration, which rate is to be applied.
create or replace view year_config as
select my_date + MINUTE_ID / (24*60) minute_ts , MINUTE_ID, HOUR_NO, MINUTE_NO, RATE_ID from day_config
cross join
(select DATE '2017-01-01' + rownum -1 as my_date from dual connect by level <= 365)
order by 1,2;
select * from (
select * from year_config
order by 1)
where rownum <= 5;
MINUTE_TS MINUTE_ID HOUR_NO MINUTE_NO RATE_ID
------------------- ---------- ---------- ---------- ----------
01-01-2017 00:00:00 0 0 0 0
01-01-2017 00:01:00 1 0 1 0
01-01-2017 00:02:00 2 0 2 0
01-01-2017 00:03:00 3 0 3 0
01-01-2017 00:04:00 4 0 4 0
3) the reporting is as easy as joining to our config table constraining the interval (half open) and grouping in the RATE.
select b, f,RATE_ID, count(*) minute_cnt
from tst join year_config c on c.MINUTE_TS >= tst.b and c.MINUTE_TS < tst.f
group by b, f,RATE_ID
order by b, f,RATE_ID;
B F RATE_ID MINUTE_CNT
------------------- ------------------- ---------- ----------
16-11-2017 21:25:00 16-11-2017 22:55:00 0 55
16-11-2017 21:25:00 16-11-2017 22:55:00 1 35
17-11-2017 05:45:00 17-11-2017 07:05:00 0 15
17-11-2017 05:45:00 17-11-2017 07:05:00 1 65
17-11-2017 18:00:00 17-11-2017 19:00:00 1 60
18-11-2017 23:00:00 19-11-2017 01:10:00 0 130
The easiest way is probably to get all minutes worked in a recursive WITH clause and then see in which time range the minutes fall. As Oracle doesn't have a TIME datatype unfortunately, we'll have to work with times strings ('00'00' till '23:59').
with shifts as
(
select 'night' as shift, '00:00' as starttime, '05:59' as endtime, 20 as cost from dual
union all
select 'normal' as shift, '06:00' as starttime, '21:59' as endtime, 17 as cost from dual
union all
select 'night' as shift, '22:00' as starttime, '23:59' as endtime, 20 as cost from dual
)
, workminutes(wo, workminute, thetime, endtime) as
(
select wo, to_char(starttime, 'hh24:mi') as workminute, starttime as thetime, endtime
from mytable
union all
select
wo,
to_char(thetime + interval '1' minute, 'hh24:mi') as workminute,
thetime + interval '1' minute as thetime,
endtime
from workminutes
where thetime + interval '1' minute < endtime
)
select
wo,
count(case when s.shift = 'normal' then 1 end) as normal_time,
coalesce(sum(case when m.workminute between '06:00' and '21:59' then s.cost end), 0)
as normal_cost,
count(case when s.shift = 'night' then 1 end) as night_time,
coalesce(sum(case when m.workminute not between '06:00' and '21:59' then s.cost end), 0)
as night_cost,
count(*) as total_time,
coalesce(sum(s.cost), 0)
as total_cost
from workminutes m
join shifts s on m.workminute between s.starttime and s.endtime
group by wo
order by wo;
Output:
WO NORMAL_TIME NORMAL_COST NIGHT_TIME NIGHT_COST TOTAL_TIME TOTAL_COST
21 35 595 55 1100 90 1695
22 65 1105 15 300 80 1405
23 0 0 130 2600 130 2600
24 60 1020 0 0 60 1020
25 4800 81600 2340 46800 7140 128400
(This query looks a lot nicer of course, if you have a real shifts table and don't have to make one up on-the-fly. Also, you may not need all those seven columns I have in my result.)