PL-SQL query to calculate customers per period from start and stop dates - sql

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...

You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

PostgreSQL - how to loop over a parameter to make union of different queries of the same table

I need to count the number of events still occurring in a given year, provided these events started before the given year and ended after it. So, the query I'm using now is just a pile of unparametrized queries and I'd like to write it out with a loop on a parameter:
select
count("starting_date" ) as "number_of_events" ,'2020' "year"
from public.table_of_events
where "starting_date" < '2020-01-01' and ("closing_date" > '2020-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2019' "year"
from public.table_of_events
where "starting_date" < '2019-01-01' and ("closing_date" > '2019-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2018' "year"
from public.table_of_events
where "starting_date" < '2018-01-01' and ("closing_date" > '2018-12-31')
...
...
...
and so on for N years
So, I have ONE table that must be filtered according to one parameter that is both part of the select statement and the where clause
The general aspect of the "atomic" query is then
select
count("starting_date" ) as "number_of_events" , **PARAMETER** "year"
from public.table_of_events
where "starting_date" < **PARAMETER** and ("closing_date" > **PARAMETER** )
union all
Can anyone help me put this in a more formal loop?
Thanks a lot, I hope I was clear enough.
You seem to want events that span entire years. Perhaps a simple way is to generate the years, then use join and aggregate:
select gs.yyyy, count(e.starting_date)
from public.table_of_events e left join
generate_series('2015-01-01'::date,
'2021-01-01'::date,
interval '1 year'
) as gs(yyyy)
on e.starting_date < gs.yyyy and
e.closing_date >= gs.yyyy + interval '1 year'
group by gs.yyyy;

SQL Loop Count of people in program during specified duration

I'm not sure if there should be a loop for this or what the easiest approach would be.
My data consists of a list of people participating in our program. They have various start and end dates, but the following equation is able to capture the number of people who participated on a specific date:
DECLARE #PopulationDate DATETIME = '2018-06-01 05:00:00';
select count(People)
FROM Program_Log
WHERE
START_TIME <= #PopulationDate
AND (END_TIME >= #PopulationDate OR END_TIME IS NULL)`
Is there a way I can loop in different date values to get the number of program participants each day for an entire year?
Multiple years?
One simple way is to use a CTE to generate the dates and then a left join to bring in the data. For instance, the following gets the counts as of the first of the month for this year:
with dates as (
select cast('2018-01-01' as date) as dte
union all
select dateadd(month, 1, dte)
from dates
where dte < getdate()
)
select d.dte, count(pl.people)
from dates d left join
program_log pl
on pl.start_time <= d.dte and (pl.end_time >= d.dte or pl.end_time is null)
group by d.dte
order by d.dte;
Note that this will work best for a handful of dates. If you want more than 100, you need to add option (maxrecursion 0) to the end of the query.
Also, count(people) is highly suspicious. Perhaps you mean sum(people) or something similar.

Add one for every row that fulfills where criteria between period

I have a Postgres table that I'm trying to analyze based on some date columns.
I'm basically trying to count the number of rows in my table that fulfill this requirement, and then group them by month and year. Instead of my query looking like this:
SELECT * FROM $TABLE WHERE date1::date <= '2012-05-31'
and date2::date > '2012-05-31';
it should be able to display this for the months available in my data so that I don't have to change the months manually every time I add new data, and so I can get everything with one query.
In the case above I'd like it to group the sum of rows which fit the criteria into the year 2012 and month 05. Similarly, if my WHERE clause looked like this:
date1::date <= '2012-06-31' and date2::date > '2012-06-31'
I'd like it to group this sum into the year 2012 and month 06.
This isn't entirely clear to me:
I'd like it to group the sum of rows
I'll interpret it this way: you want to list all rows "per month" matching the criteria:
WITH x AS (
SELECT date_trunc('month', min(date1)) AS start
,date_trunc('month', max(date2)) + interval '1 month' AS stop
FROM tbl
)
SELECT to_char(y.mon, 'YYYY-MM') AS mon, t.*
FROM (
SELECT generate_series(x.start, x.stop, '1 month') AS mon
FROM x
) y
LEFT JOIN tbl t ON t.date1::date <= y.mon
AND t.date2::date > y.mon -- why the explicit cast to date?
ORDER BY y.mon, t.date1, t.date2;
Assuming date2 >= date1.
Compute lower and upper border of time period and truncate to month (adding 1 to upper border to include the last row, too.
Use generate_series() to create the set of months in question
LEFT JOIN rows from your table with the declared criteria and sort by month.
You could also GROUP BY at this stage to calculate aggregates ..
Here is the reasoning. First, create a list of all possible dates. Then get the cumulative number of date1 up to a given date. Then get the cumulative number of date2 after the date and subtract the results. The following query does this using correlated subqueries (not my favorite construct, but handy in this case):
select thedate,
(select count(*) from t where date1::date <= d.thedate) -
(select count(*) from t where date2::date > d.thedate)
from (select distinct thedate
from ((select date1::date as thedate from t) union all
(select date2::date as thedate from t)
) d
) d
This is assuming that date2 occurs after date1. My model is start and stop dates of customers. If this isn't the case, the query might not work.
It sounds like you could benefit from the DATEPART T-SQL method. If I understand you correctly, you could do something like this:
SELECT DATEPART(year, date1) Year, DATEPART(month, date1) Month, SUM(value_col)
FROM $Table
-- WHERE CLAUSE ?
GROUP BY DATEPART(year, date1),
DATEPART(month, date1)

in sql, calculating date parts versus date lookup table in group queries

many queries are by week, month or quarter when the base table date is either date or timestamp.
in general, in group by queries, does it matter whether using
- functions on the date
- a day table that has extraction pre-calculated
note: similar question as DATE lookup table (1990/01/01:2041/12/31)
for example, in postgresql
create table sale(
tran_id serial primary key,
tran_dt date not null default current_date,
sale_amt decimal(8,2) not null,
...
);
create table days(
day date primary key,
week date not null,
month date not null,
quarter date non null
);
-- week query 1: group using funcs
select
date_trunc('week',tran_dt)::date - 1 as week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
where date_trunc('week',tran_dt)::date - 1 between '2012-1-1' and '2011-12-31'
group by date_trunc('week',tran_dt)::date - 1
order by 1;
-- query 2: group using days
select
days.week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
join days on( days.day = sale.tran_dt )
where week between '2011-1-1'::date and '2011-12-31'::date
group by week
order by week;
to me, whereas the date_trunc() function seems more organic, the the days table is easier to use.
is there anything here more than a matter of taste?
-- query 3: group using instant "immediate" calendar table
WITH calender AS (
SELECT ser::date AS dd
, date_trunc('week', ser)::date AS wk
-- , date_trunc('month', ser)::date AS mon
-- , date_trunc('quarter', ser)::date AS qq
FROM generate_series( '2012-1-1' , '2012-12-31', '1 day'::interval) ser
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calender cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Note: I fixed an apparent typo in the BETWEEN range.
UPDATE: I used Erwin's recursive CTE to squeeze out the duplicated date_trunc(). Nested CTE galore:
WITH calendar AS (
WITH RECURSIVE montag AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 AS dd
FROM montag
WHERE dd < '2012-1-1'::date
)
SELECT mo.dd, date_trunc('week', mo.dd + 1)::date AS wk
FROM montag mo
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calendar cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Yes, it is more than a matter of taste. The performance of the query depends on the method.
As a first approximation, the functions should be faster. They don't require joins, doing the read in a single table scan.
However, a good optimizer could make effective use of a lookup table. It would know the distribution of the target values. And, an in memory join could be quite fast.
As a database design, I think having a calendar table is very useful. Some information such as holidays just isn't going to work as a function. However, for most ad hoc queries the date functions are fine.
1. Your expression:
... between '2012-1-1' and '2011-12-31'
doesn't work. Basic BETWEEN requires the left argument to be less than or equal to the right argument. Would have to be:
... BETWEEN SYMMETRIC '2012-1-1' and '2011-12-31'
Or it's just a typo and you mean something like:
... BETWEEN '2011-1-1' and '2011-12-31'
It's unclear to me, what your queries are supposed to retrieve. I'll assume you want all weeks (Monday to Sunday) that start in 2011 for the rest of this answer. This expression generates exactly that in less than a microsecond on modern hardware (works for any year):
SELECT generate_series(
date_trunc('week','2010-12-31'::date) + interval '7d'
,date_trunc('week','2011-12-31'::date) + interval '6d'
, '1d')::date
*Note that the ISO 8601 definition of the "first week of a year is slightly different.
2. Your second query does not work at all. No GROUP BY?
3. The question you link to did not deal with PostgreSQL, which has outstanding date / timestamp support. And it has generate_series() which can obviate the need for a separate "days" table in most cases - as demonstrated above. Your query would look like this:
In the meantime #wildplasser provided an example query that was supposed to go here.
By popular* demand, a recursive CTE version - which is actually not that far from being a serious alternative!
* and by "popular" I mean #wildplasser's very serious request.
WITH RECURSIVE days AS (
SELECT '2011-01-01'::date AS dd
,date_trunc('week', '2011-01-01'::date )::date AS wk
UNION ALL
SELECT dd + 1
,date_trunc('week', dd + 1)::date AS wk
FROM days
WHERE dd < '2011-12-31'::date
)
SELECT d.wk
,count(*) AS sale_ct
,sum(s.sale_amt) AS sale_amt
FROM days d
JOIN sale s ON s.tran_dt = d.dd
-- WHERE d.wk between '2011-01-01' and '2011-12-31'
GROUP BY 1
ORDER BY 1;
Could also be written as (compare to #wildplasser's version):
WITH RECURSIVE d AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 FROM d WHERE dd < '2011-12-31'::date
), days AS (
SELECT dd, date_trunc('week', dd + 1)::date AS wk
FROM d
)
SELECT ...
4. If performance is of the essence, just make sure, that you do not apply functions or calculations to the values of your table. This prohibits the use of indexes and is generally very slow, because every row has to be processed. That's why your first query is going to suck with big table. When ever possible, apply calculations to the values you filter with, instead.
Indexes on expressions are one way around this. If you had an index like
CREATE INDEX sale_tran_dt_week_idx ON sale (date_trunc('week', tran_dt)::date);
.. your first query could be very fast again - at some cost for write operations for index maintenance.