Extract the number of daily users from table - sql

Given a start date and end date for every user I would like to count the daily number of users on the platform:
ID
START
END
1
2022-12-01
2022-12-03
2
2022-12-01
2022-12-01
I want to get an output like this:
DATE
NUMBER
2022-12-01
2
2022-12-02
1
2022-12-03
1

Make a list of all the dates (generate_series) and count for each of them.
with the_table(id, dstart, dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
)
select d::date as "DATE",
(select count(*) from the_table where d between dstart and dend) as "NUMBER"
from generate_series('2022-12-01'::date,'2022-12-03'::date,interval '1 day') as d;
Alternative
with the_table(id,dstart,dend) as
(
values
(1, '2022-12-01'::date, '2022-12-03'::date),
(2, '2022-12-01', '2022-12-01')
),
d (id, dlogged) as
(
select id, generate_series(dstart,dend,interval '1 day')::date
from the_table
)
select dlogged as "DATE", count(*) as "NUMBER"
from d group by dlogged;

Related

Add missing month in result with values from previous month

I have a result set with month as first column. Some of the month are missing in the result. I need to add previous month record as the missing month till last month.
Current data:
Desired Output:
I have a sql but instead of filling for just missing month it is taking every rows into account and populate it.
select
to_char(generate_series(date_trunc('MONTH',to_date(period,'YYYYMMDD')+interval '1' month),
date_trunc('MONTH',now()+interval '1' day),
interval '1' month) - interval '1 day','YYYYMMDD') as period,
name,age,salary,rating
from( values ('20201205','Alex',35,100,'A+'),
('20210110','Alex',35,110,'A'),
('20210512','Alex',35,999,'A+'),
('20210625','Jhon',20,175,'B-'),
('20210922','Jhon',20,200,'B+')) v (period,name,age,salary,rating) order by 2,3,4,5,1;
Output of this query:
Can someone help in getting desired output.
Regards!!
You can achieve this with a recursive cte like this:
with RECURSIVE ctetest as (SELECT * FROM (values ('2020-12-31'::date,'Alex',35,100,'A+'),
('2021-01-31'::date,'Alex',35,110,'A'),
('2021-05-31'::date,'Alex',35,999,'A+'),
('2021-06-30'::date,'Jhon',20,175,'B-'),
('2021-09-30'::date,'Jhon',20,200,'B+')) v (mth, emp, age, salary, rating)),
cte AS (
SELECT MIN(mth) AS mth, emp, age, salary, rating
FROM ctetest
GROUP BY emp, age, salary, rating
UNION
SELECT COALESCE(n.mth, (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date), COALESCE(n.emp, l.emp),
COALESCE(n.age, l.age), COALESCE(n.salary, l.salary), COALESCE(n.rating, l.rating)
FROM cte l
LEFT OUTER JOIN ctetest n ON n.mth = (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date
AND n.emp = l.emp
WHERE (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date <= (SELECT MAX(mth) FROM ctetest)
)
SELECT * FROM cte order by 2, 1;
Note that although ctetest is not itself recursive, being only used to get the test data, if any cte among multiple ctes are recursive, you must have the recursive keyword after the with.
You can use cross join lateral to fill the gaps and then union all with the original data.
WITH the_table (period, name, age, salary, rating) as ( values
('2020-12-01'::date, 'Alex', 35, 100, 'A+'),
('2021-01-01'::date, 'Alex', 35, 110, 'A'),
('2021-05-01'::date, 'Alex', 35, 999, 'A+'),
('2021-06-01'::date, 'Jhon', 20, 100, 'B-'),
('2021-09-01'::date, 'Jhon', 20, 200, 'B+')
),
t as (
select *, coalesce(
lead(period) over (partition by name order by period) - interval 'P1M',
max(period) over ()
) last_period
from the_table
)
SELECT lat::date period, name, age, salary, rating
from t
cross join lateral generate_series
(period + interval 'P1M', last_period, interval 'P1M') lat
UNION ALL
SELECT * from the_table
ORDER BY name, period;
Please note that using integer data type for a date column is sub-optimal. Better review your data design and use date data type instead. You can then present it as integer if necessary.
period
name
age
salary
rating
2020-12-01
Alex
35
100
A+
2021-01-01
Alex
35
110
A
2021-02-01
Alex
35
110
A
2021-03-01
Alex
35
110
A
2021-04-01
Alex
35
110
A
2021-05-01
Alex
35
999
A+
2021-06-01
Alex
35
999
A+
2021-07-01
Alex
35
999
A+
2021-08-01
Alex
35
999
A+
2021-09-01
Alex
35
999
A+
2021-06-01
Jhon
20
100
B-
2021-07-01
Jhon
20
100
B-
2021-08-01
Jhon
20
100
B-
2021-09-01
Jhon
20
200
B+

How to get minimum and maximum value by using partition in sql

type value prod date
a 20 2 2019-07-08
a 20 3 2019-07-08
b 30 2 2019-07-08
b 35 1 2019-07-08
a 40 4 2019-07-09
a 20 4 2019-07-09
b 32 3 2019-07-09
b 31 3 2019-07-09
b 30 2 2019-07-09
b 33 2 2019-07-09
b 12 1 2019-07-10
b 23 1 2019-07-10
b 20 2 2019-07-10
b 22 2 2019-07-10
My table looks like this:
First thing, I want to get the result of prod / value as util for each type and also date, but for every result I need to also sum from the previous dates.
By that, I also need to know the minimum and the maximum value from each type and also date.
What I have done so far:
select *, t1.value / t1.prod as util
select
type, date, sum(value), sum(prod)
from table1
where true
and event_date <= ‘2019-07-11’
group by type, date) t1
How can I get the minimum and the maximum util by the logic I have that the util calculation should be sum from the previous dates. I assume I need to use partition, but I am still not sure for this.
Thanks in advance
Not sure if you are looking for this. It gives you min, max, sum values of value column by ordering by date and partitioning by type.
Check this:
drop table tmp_table10
create table tmp_table10
(
type nvarchar(5) null,
value float null,
prod nvarchar(255) null,
date nvarchar(255) null,
)
insert into tmp_table10
values('a', '20' ,2 , '2019-07-08'),
('a', '20' ,3 , '2019-07-08'),
('b', '30' ,2 , '2019-07-08'),
('b', '35' ,1 , '2019-07-08'),
('a', '40' ,4 , '2019-07-09'),
('a', '20' ,4 , '2019-07-09'),
('b', '32' ,3 , '2019-07-09'),
('b', '31' ,3 , '2019-07-09'),
('b', '30' ,2 , '2019-07-09'),
('b', '33' ,2 , '2019-07-09'),
('b', '12' ,1 , '2019-07-10'),
('b', '23' ,1 , '2019-07-10'),
('b', '20' ,2 , '2019-07-10'),
('b', '22' ,2 , '2019-07-10')
select
*
, max(value) over(partition by type order by date) maxValueByType
, min(value) over(partition by type order by date) minValueByType
, sum(value) over(partition by type order by date) sumValue
from tmp_table10
order by type, date
If I interpret your question as you want the cumulative min and max of util which is calculated like this:
select type, date, sum(value) / sum(prod) as util
from table1
where event_date <= ‘2019-07-11’
group by type, date;
Then you can use window functions:
select type, date, sum(value) / sum(prod) as util,
min(sum(value) / sum(prod)) over (partition by type order by date) as min_running_util,
max(sum(value) / sum(prod)) over (partition by type order by date) as max_running_util
from table1
where event_date <= ‘2019-07-11’
group by type, date;

Calculate total time without vacations in postgres

I have a database table that represents activities and for each activity, how long it took.
It looks something like this :
activity_id | name | status | start_date | end_date
=================================================================
1 | name1 | WIP | 2019-07-24 ... | 2019-07-24 ...
start_date and end_date are timestamps. I use a view with a column total_time that is described like that:
date_part('day'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) + date_part('hour'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) / 24::double precision AS total_time
I would like to create a table for vacation or half day vacations that looks like:
date | work_percentage
=================================================
2019-07-24 | 0.4
2019-07-23 | 0.7
And then, I would like to calculate total_time in a way that uses this vacations table such that:
If a date is not in the column it's considered to have work_percentage==1
For every date that is in the table, reduce the relative percentage from the total_time query.
So let's take an example:
Activity - "Write report" started at 11-July-2019 14:00 and ended at 15-July-2019 19:00 - so the time diff is 4 days and 5 hours.
The 13th and 14th were weekend so I'd like to have a column in the vacations table that holds 2019-07-13 with work_percentage == 1 and the same for the 14th.
Deducting those vacations, the time diff would be 2 days and 5 hours as the 13th and 14th are not workdays.
Hope this example explains it better.
I think you can take this example and add some modifications based on your database
Just ddl statements to test script
create table activities (
user_id int,
activity_id int,
name text,
status text,
start_date timestamp,
end_date timestamp
);
create table vacations (
user_id int,
date date,
work_percentage numeric
);
insert into activities
values
(1, 1, 'name1', 'WIP', timestamp'2019-07-20 10:00:00', timestamp'2019-07-25 8:00:00'),
(2, 2, 'name2', 'DONE', timestamp'2019-07-28 19:00:00', timestamp'2019-08-01 7:00:00'),
(1, 3, 'name3', 'DONE', timestamp'2019-07-21 12:00:00', timestamp'2019-07-21 15:00:00'),
(-1, 4, 'Write report', 'DONE', timestamp'2019-07-11 14:00:00', timestamp'2019-07-15 19:00:00');
insert into vacations
values
(1, date'2019-07-21', 0.5),
(1, date'2019-07-22', 0),
(1, date'2019-07-23', 0.25),
(2, date'2019-07-29', 0),
(2, date'2019-07-30', 0),
(-1, date'2019-07-13', 0),
(-1, date'2019-07-14', 0);
sql script
with
daily_activity as (
select
*,
date(
generate_series(
date(start_date),
date(end_date),
interval'1 day')
) as date_key
from
activities
),
raw_data as (
select
da.*,
v.work_percentage,
case
when date(start_date) = date(end_date)
then (end_date - start_date) * coalesce(work_percentage, 1)
when date(start_date) = date_key
then (date(start_date) + 1 - start_date) * coalesce(work_percentage, 1)
when date(end_date) = date_key
then (end_date - date(end_date)) * coalesce(work_percentage, 1)
else interval'24 hours' * coalesce(work_percentage, 1)
end as activity_coverage
from
daily_activity as da
left join vacations as v on da.user_id = v.user_id
and da.date_key = v.date
)
select
user_id,
activity_id,
name,
status,
start_date,
end_date,
justify_interval(sum(activity_coverage)) as total_activity_time
from
raw_data
group by
1, 2, 3, 4, 5, 6

Analytical function range window for max date interval

I am trying to get 10 minute interval data from latest date of each group or partition.
Pseudo code SQL:
Select
count(1) Over( partition by col1, col2, col3
Order by Col_Date Desc
Range Max(Col_Date) Between Max(Col_Date) - 10(24*60) ) col_upd
From
Table_1;
Values out of of this particular range will have need assign number to set for delete.
2014-01-05 01:20:00 -- Max date
2014-01-05 01:15:13
2014-01-05 01:12:13
2014-01-05 01:07:13 -- 1) these last two rows should be set for
2014-01-05 01:06:13 -- 2) delete or assign same id
Is there any analytical function way to approach this?
You haven't given table structures, but if I make up a dummy table like:
create table t42 (id number, grp_id number, dt date);
insert into t42 values (1, 1, timestamp '2014-01-05 01:20:00');
insert into t42 values (2, 1, timestamp '2014-01-05 01:15:13');
insert into t42 values (3, 1, timestamp '2014-01-05 01:12:13');
insert into t42 values (4, 1, timestamp '2014-01-05 01:07:13');
insert into t42 values (5, 1, timestamp '2014-01-05 01:06:13');
Then this will give you the age of each row in the group compared to its (analytic) max:
select grp_id, id, dt, max(dt) over (partition by grp_id) - dt as age
from t42
order by id;
GRP_ID ID DT AGE
---------- ---------- ------------------- ------------
1 1 2014-01-05 01:20:00 0
1 2 2014-01-05 01:15:13 .00332175926
1 3 2014-01-05 01:12:13 .00540509259
1 4 2014-01-05 01:07:13 .00887731481
1 5 2014-01-05 01:06:13 .00957175926
And you can use that as an inner query and filter out records up to 10 minutes old:
select grp_id, id, dt
from (
select grp_id, id, dt, max(dt) over (partition by grp_id) - dt as age
from t42
)
where age > (10*60)/(24*60*60)
order by id;
GRP_ID ID DT
---------- ---------- -------------------
1 4 2014-01-05 01:07:13
1 5 2014-01-05 01:06:13
And you can then use those up delete/update as needed. It's not clear from your question if your group/partition is already being calculated from an inner query; if so you can just use that instead of my t42 table. (Changing column names etc., of course).

Get time difference between row values grouped by event

I am using Postgres 9.3.3
I have a table with multiple events, two of them are "AVAILABLE" and "UNAVAILABLE". These events are assigned to a specific object. There are also other object ids in this table (removed for clarity):
What I need is the "available" time per day, something like that:
SQL Fiddle
select
object_id, day,
sum(upper(available) - lower(available)) as available
from (
select
g.object_id, date_trunc('day', d) as day,
(
available *
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)')
) as available
from
(
select
object_id, event,
tsrange(
timestamp,
lead(timestamp) over(
partition by object_id order by timestamp
),
'[)'
) as available
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) s
right join
(
generate_series(
(select min(timestamp) from events),
(select max(timestamp) from events),
'1 day'
) g (d)
cross join
(select distinct object_id from events) s
) g on
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)') && available and
(event = 'AVAILABLE' or event is null) and
g.object_id = s.object_id
) s
group by 1, 2
order by 1, 2
psql output
object_id | day | available
-----------+---------------------+-----------
1 | 1970-01-02 00:00:00 | 12:00:00
1 | 1970-01-03 00:00:00 | 12:00:00
1 | 1970-01-04 00:00:00 |
1 | 1970-01-05 00:00:00 | 1 day
1 | 1970-01-06 00:00:00 | 1 day
1 | 1970-01-07 00:00:00 | 12:00:00
Table DDL
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp) values
(1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00');
Your example output suggests that you want all your objects to be returned, but grouped. If that is the case, this query can do that
select object_id, day, sum(upper(tsrange) - lower(tsrange))
from (
select object_id, date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select object_id,
case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by object_id, day
order by day, object_id
But that will output something like that (if you have multiple object_ids):
object_id | day | sum
-----------+--------------+-----------
| '1970-01-01' |
1 | '1970-01-02' | '12:00:00'
1 | '1970-01-03' | '12:00:00'
| '1970-01-04' |
1 | '1970-01-05' | '1 day'
1 | '1970-01-06' | '1 day'
2 | '1970-01-06' | '12:00:00'
1 | '1970-01-07' | '12:00:00'
In my opinion it would make much more sense, if you would query just one object at a time:
select day, sum(upper(tsrange) - lower(tsrange))
from (
select date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
and object_id = 1
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by day
order by day
This will output something, like:
day | sum
--------------+----------
'1970-01-01' |
'1970-01-02' | '12:00:00'
'1970-01-03' | '12:00:00'
'1970-01-04' |
'1970-01-05' | '1 day'
'1970-01-06' | '1 day'
'1970-01-07' | '12:00:00'
I used this schema/data for my outputs:
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp)
values (1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00'),
(2, 'AVAILABLE', '1970-01-06 00:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 06:00:00'),
(2, 'AVAILABLE', '1970-01-06 12:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 18:00:00');
This is a partial answer. If we assume that the next event after available is unavailable, then lead() comes to the rescue and the following is a start:
select object_id, to_char(timestamp, 'YYYY-MM-DD') as day,
to_char(nextts - timestamp, 'HH24:MI') as interval
from (select t.*,
lead(timestamp) over (partition by object_id order by timestamp) as nextts
from table t
where event in ('AVAILABLE', 'UNAVAILABLE')
) t
where event = 'AVAILABLE'
group by object_id, to_char(timestamp, 'YYYY-MM-DD');
I suspect, though, that when the interval spans multiple days, you want to split the days into separate parts. This becomes more of a challenge.