SQL Count by Active Date - sql

If I have a table of records and active/inacitve dates, is there a simple way to count active records by month? For example:
tbl_a
id dt_active dt_inactive
a 2013-01-01 2013-08-24
b 2013-01-01 2013-07-05
c 2012-02-01 2012-01-01
If I have to generate an output of active records by month like this:
active: dt_active < first_day_of_month <= dt_inactive
month count
2013-01 2
2013-02 2
2013-03 2
2013-04 2
2013-05 2
2013-06 2
2013-07 2
2013-08 1
2013-09 0
Is there any clever way to do this besides uploading a temp table of dates and using subqueries?

Here is one method that gives the count of actives on the beginning of the month. It creates a list of all the months and then joins this information to tbl_a.
with dates as (
select cast('2013-01-01' as date) as month
union all
select dateadd(month, 1, dates.month)
from dates
where month < cast('2013-09-01' as date)
)
select convert(varchar(7), month, 121), count(a.id)
from dates m left outer join
tbl_a a
on m.month between a.dt_active and a.dt_inactive
group by convert(varchar(7), month, 121)
order by 1;
Note: if dt_inactive is the first date of inactivity, then the on clause should be:
on m.month >= a.dt_active and m.month < a.dt_inactive
Here is a SQL Fiddle with the working query.

Related

SQL active users by month

I would like to know the number of active users by month, I use SQL Server 2017.
I have an AuditLog table like:
- UserID: int
- DateTime: datetime
- AuditType: int
UserID DateTime AuditType
------------------------------
1 2022-01-01 1
1 2022-01-15 4
1 2022-02-20 3
2 2022-01-10 8
2 2022-03-10 1
3 2022-03-20 1
If someone has at least one entry in a given month then he/she is treated as active.
I would like to have a result like:
Date Count
2022-01 2
2022-02 1
2022-03 2
I think you can combine the function Month(datetime) in the GROUP BY with the Count function SELECT COUNT(UserID)
SELECT (CAST(YEAR(C.DATE)AS CHAR(4))+'-'+CAST(MONTH(C.DATE)AS CHAR(2)))YEAR_MONTH,COUNT(C.USER_ID)CNTT
FROM AUDITLOG AS C
GROUP BY (CAST(YEAR(C.DATE)AS CHAR(4))+'-'+CAST(MONTH(C.DATE)AS CHAR(2)))
ORDER BY (CAST(YEAR(C.DATE)AS CHAR(4))+'-'+CAST(MONTH(C.DATE)AS CHAR(2)));
Here is solutions,
Select [Date],count(1) as Count From (
select Cast(cast(d.DateTime as date) as varchar(7)) as [Date],UserId
from AuditLog d
Group by Cast(cast(d.DateTime as date) as varchar(7)),UserId
) as q1 Group by [Date]
Order by 1
Hope, it will works.
GROUP DATE (Year and Month) either combine or separate and count distinct userId
SELECT CONVERT(VARCHAR(7), [DateTime], 126)[Date], COUNT(DISTINCT UserID)[Count]
FROM AuditLog
GROUP BY CONVERT(VARCHAR(7), [DateTime], 126)

Get count of susbcribers for each month in current year even if count is 0

I need to get the count of new subscribers each month of the current year.
DB Structure: Subscriber(subscriber_id, create_timestamp, ...)
Expected result:
date | count
-----------+------
2021-01-01 | 3
2021-02-01 | 12
2021-03-01 | 0
2021-04-01 | 8
2021-05-01 | 0
I wrote the following query:
SELECT
DATE_TRUNC('month',create_timestamp)
AS create_timestamp,
COUNT(subscriber_id) AS count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp);
Which works but does not include months where the count is 0. It's only returning the ones that are existing in the table. Like:
"2021-09-01 00:00:00" 3
"2021-08-01 00:00:00" 9
First subquery is used for retrieving year wise each month row then LEFT JOIN with another subquery which is used to retrieve month wise total_count. COALESCE() is used for replacing NULL value to 0.
-- PostgreSQL (v11)
SELECT t.cdate
, COALESCE(p.total_count, 0) total_count
FROM (select generate_series('2021-01-01'::timestamp, '2021-12-15', '1 month') as cdate) t
LEFT JOIN (SELECT DATE_TRUNC('month',create_timestamp) create_timestamp
, SUM(subscriber_id) total_count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp)) p
ON t.cdate = p.create_timestamp
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=20dcf6c1784ed0d9c5772f2487bcc221
get the count of new subscribers each month of the current year
SELECT month::date, COALESCE(s.count, 0) AS count
FROM generate_series(date_trunc('year', LOCALTIMESTAMP)
, date_trunc('year', LOCALTIMESTAMP) + interval '11 month'
, interval '1 month') m(month)
LEFT JOIN (
SELECT date_trunc('month', create_timestamp) AS month
, count(*) AS count
FROM subscriber
GROUP BY 1
) s USING (month);
db<>fiddle here
That's assuming every row is a "new subscriber". So count(*) is simplest and fastest.
See:
Join a count query on generate_series() and retrieve Null values as '0'
Generating time series between two dates in PostgreSQL

SQL query group by using day startdatetime and end datetime

I have the following table Jobs:
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-10-20 23:00:00 | 2020-10-21 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Note job id 1 spans October 20 and 21.
I am using the following query
SELECT DAY(StartDateTime), COUNT(id)
FROM Job
GROUP BY DAY(StartDateTime)
To get the following output. But the problem I am facing is that day 21 is not including job id 1. Since the job spans two days I want to include it in both days 20 and 21.
Day | TotalJobs
----+----------
20 | 1
21 | 1
I am struggling to get the following expected output:
Day | TotalJobs
----+----------
20 | 1
21 | 2
One method is to generate the days that you want and then count overlaps:
with days as (
select convert(date, min(j.startdatetime)) as startd,
convert(date, max(j.enddatetime)) as endd
from jobs j
union all
select dateadd(day, 1, startd), endd
from days
where startd < endd
)
select days.startd, count(j.id)
from days left join
jobs j
on j.startdatetime < dateadd(day, 1, startd) and
j.enddatetime >= startd
group by days.startd;
Here is a db<>fiddle.
You can first group by with same start and end date and then group by for start and end date having different start and end date
SELECT a.date, SUM(counts) from (
SELECT DAY(StartDateTime) as date, COUNT(id) counts
FROM Table1
WHERE DAY(StartDateTime) = DAY(EndDateTime)
GROUP BY StartDateTime
UNION ALL
SELECT DAY(EndDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY EndDateTime
UNION ALL
SELECT DAY(StartDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY StartDateTime) a
GROUP BY a.date
Here is SQL Fiddle link
SQL Fiddle
Also replace Table1 with Jobs when running over your db context

SQL - How to group/count items by age and status on every date of a year?

I am trying to build a query from multi-year data set (tickets table) of support tickets, with relevant columns of ticked_id, status, created_on date and closed_on date for each ticket. There is also a generic dates table I can join/query to a list of dates.
I'd like to create a "burn down" chart for this year that displays the number of open tickets that were at least a year old on any given date this year. I have been able to create tables that use a sum(case... statement to group by a date - for example to show how many tickets were created on a given week - but I can't figure out how to group by every day or week this year the number of tickets that were open on that day and at least a year old.
Any help is appreciated.
Example Data:
ticket_id | status | created_on | closed_on
--------------------------------------------
1 open 1/5/2019
2 open 1/26/2019
3 closed 1/28/2019 2/1/2020
4 open 6/1/2019
5 closed 6/5/2019 1/1/2020
Example Results I Seek:
Date (2020) | Count of Year+ Aged Tickets
------------------------------------------------
1/1/2020 0
1/2/2020 0
1/3/2020 0
1/4/2020 0
1/5/2020 1
1/6/2020 1
... (skipping dates here but want all dates in results)...
1/25/2020 1
1/26/2020 2
1/27/2020 2
1/28/2020 3
1/29/2020 3
1/30/2020 3
1/31/2020 3
2/1/2020 2
... (skipping dates here but want all dates up to current date in results)...
ticket_id 1 reached one year of age on 1/5/2020 and is still open
(remains in count)
ticket_id 2 reached one year of age on 1/26/2020 and is still open (remains in count)
ticket_id 3 reached one year of age on 1/28/2020 and was still open, adding to the count, but was closed on 2/1/2020, reducing the count
ticket_id 4 will only add to the count if it is still open on 6/1/2020, but not if it is closed before then
ticket_id 5 will never appear in the count because it never reached one year of age and is closed
One option is to build a sequential list of dates, then bring the table with a ‘left join` and conditional logic, and finally aggregate.
This would give the results you want for year 2020.
select d.dt, count(t.ticket_id) no_tickets
from (
select date '2020-01-01' + I * interval '1 day' dt
from generate_series(0, 365) i
) d
left join mytable t
on t.created_on + interval '1 year' <= d.dt
and (
t.closed_on is null
or t.closed_on > d.dt
)
group by d.dt
If your version of Redshift does not support generate_series(), you can emulate it a custom number table, or with row_number() against a large table (say mylargetable):
select d.dt, count(t.ticket_id) no_tickets
from (
select date '2020-01-01' + row_number() over(order by 1) * interval '1 day' dt
from mylargetable
) d
left join mytable t
on t.created_on + interval '1 year' <= d.dt
and (
t.closed_on is null
or t.closed_on > d.dt
)
where d.dt < date '2021-01-01'
group by d.dt
If ticket_id is unique then you can do this to get all ticket at least 1 year old
select ticket_id, created_on , status where status = 'open' and created_on <= dateadd(year,-1,getdate())
if you want to count number of ticket per month then
select count(ticket_id), month(created_on) , status where status = 'open' and created_on <= dateadd(year,-1,getdate())
group by month(created_on)

Total Number of Records per Week

I have a Postgres 9.1 database. I am trying to generate the number of records per week (for a given date range) and compare it to the previous year.
I have the following code used to generate the series:
select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
However, I am not sure how to join the counted records to the dates generated.
So, using the following records as an example:
Pt_ID exam_date
====== =========
1 2012-01-02
2 2012-01-02
3 2012-01-08
4 2012-01-08
1 2013-01-02
2 2013-01-02
3 2013-01-03
4 2013-01-04
1 2013-01-08
2 2013-01-10
3 2013-01-15
4 2013-01-24
I wanted to have the records return as:
series thisyr lastyr
=========== ===== =====
2013-01-01 4 2
2013-01-08 3 2
2013-01-15 1 0
2013-01-22 1 0
2013-01-29 0 0
Not sure how to reference the date range in the subsearch. Thanks for any assistance.
The simple approach would be to solve this with a CROSS JOIN like demonstrated by #jpw. However, there are some hidden problems:
The performance of an unconditional CROSS JOIN deteriorates quickly with growing number of rows. The total number of rows is multiplied by the number of weeks you are testing for, before this huge derived table can be processed in the aggregation. Indexes can't help.
Starting weeks with January 1st leads to inconsistencies. ISO weeks might be an alternative. See below.
All of the following queries make heavy use of an index on exam_date. Be sure to have one.
Only join to relevant rows
Should be much faster:
SELECT d.day, d.thisyr
, count(t.exam_date) AS lastyr
FROM (
SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0 -- for 2nd join
, count(t.exam_date) AS thisyr
FROM generate_series('2013-01-01'::date
, '2013-01-31'::date -- last week overlaps with Feb.
, '7 days'::interval) d(day) -- returns timestamp
LEFT JOIN tbl t ON t.exam_date >= d.day::date
AND t.exam_date < d.day::date + 7
GROUP BY d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.day, d.thisyr
ORDER BY d.day;
This is with weeks starting from Jan. 1st like in your original. As commented, this produces a couple of inconsistencies: Weeks start on a different day each year and since we cut off at the end of the year, the last week of the year consists of just 1 or 2 days (leap year).
The same with ISO weeks
Depending on requirements, consider ISO weeks instead, which start on Mondays and always span 7 days. But they cross the border between years. Per documentation on EXTRACT():
week
The number of the week of the year that the day is in. By definition (ISO 8601), weeks start on Mondays and the first week of a
year contains January 4 of that year. In other words, the first
Thursday of a year is in week 1 of that year.
In the ISO definition, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for
late-December dates to be part of the first week of the next year. For
example, 2005-01-01 is part of the 53rd week of year 2004, and
2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is
part of the first week of 2013. It's recommended to use the isoyear
field together with week to get consistent results.
Above query rewritten with ISO weeks:
SELECT w AS isoweek
, day::text AS thisyr_monday, thisyr_ct
, day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM (
SELECT w, day
, date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
, count(t.exam_date) AS thisyr_ct
FROM (
SELECT w
, date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
FROM generate_series(0, 4) w
) d
LEFT JOIN tbl t ON t.exam_date >= d.day
AND t.exam_date < d.day + 7
GROUP BY d.w, d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.w, d.day, d.day0, d.thisyr_ct
ORDER BY d.w, d.day;
January 4th is always in the first ISO week of the year. So this expression gets the date of Monday of the first ISO week of the given year:
date_trunc('week', '2012-01-04'::date)::date
Simplify with EXTRACT()
Since ISO weeks coincide with the week numbers returned by EXTRACT(), we can simplify the query. First, a short and simple form:
SELECT w AS isoweek
, COALESCE(thisyr_ct, 0) AS thisyr_ct
, COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM generate_series(1, 5) w
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2013
GROUP BY 1
) t13 USING (w)
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2012
GROUP BY 1
) t12 USING (w);
Optimized query
The same with more details and optimized for performance
WITH params AS ( -- enter parameters here, once
SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
, date_trunc('week', '2013-01-04'::date)::date AS this_start
, date_trunc('week', '2014-01-04'::date)::date AS next_start
, 1 AS week_1
, 5 AS week_n -- show weeks 1 - 5
)
SELECT w.w AS isoweek
, p.this_start + 7 * (w - 1) AS thisyr_monday
, COALESCE(t13.ct, 0) AS thisyr_ct
, p.last_start + 7 * (w - 1) AS lastyr_monday
, COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
, generate_series(p.week_1, p.week_n) w(w)
LEFT JOIN (
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.this_start -- only relevant dates
AND t.exam_date < p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.next_start -- don't cross over into next year
GROUP BY 1
) t13 USING (w)
LEFT JOIN ( -- same for last year
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.last_start
AND t.exam_date < p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.this_start
GROUP BY 1
) t12 USING (w);
This should be very fast with index support and can easily be adapted to intervals of choice.
The implicit JOIN LATERAL for generate_series() in the last query requires Postgres 9.3.
SQL Fiddle.
Using across joinshould work, I'm just going to paste the markdown output from SQL Fiddle below. It would seem that your sample output is incorrect for series 2013-01-08: the thisyr should be 2, not 3. This might not be the best way to do this though, my Postgresql knowledge leaves a lot to be desired.
SQL Fiddle
PostgreSQL 9.2.4 Schema Setup:
CREATE TABLE Table1
("Pt_ID" varchar(6), "exam_date" date);
INSERT INTO Table1
("Pt_ID", "exam_date")
VALUES
('1', '2012-01-02'),('2', '2012-01-02'),
('3', '2012-01-08'),('4', '2012-01-08'),
('1', '2013-01-02'),('2', '2013-01-02'),
('3', '2013-01-03'),('4', '2013-01-04'),
('1', '2013-01-08'),('2', '2013-01-10'),
('3', '2013-01-15'),('4', '2013-01-24');
Query 1:
select
series,
sum (
case
when exam_date
between series and series + '6 day'::interval
then 1
else 0
end
) as thisyr,
sum (
case
when exam_date + '1 year'::interval
between series and series + '6 day'::interval
then 1 else 0
end
) as lastyr
from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series
Results:
| SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 | 4 | 2 |
| January, 08 2013 00:00:00+0000 | 2 | 2 |
| January, 15 2013 00:00:00+0000 | 1 | 0 |
| January, 22 2013 00:00:00+0000 | 1 | 0 |
| January, 29 2013 00:00:00+0000 | 0 | 0 |