Best way to sum() values between two dates (where the second date is stored as a value in the next row) - postgres - sql

I am a little bit stuck with an issue I'm having with a psql query. I know how I'd approach this using a loop etc, but have hit a wall in SQL as I'm no expert.
If we imagine that there are 3 terms per school year. Each child may recieve an allowance for lunches each month. I would like to SUM() the allowance for each month, from the start of the term up until the next term.
Where I am stuck is that the date of the next term is a value in the next row of data in the terms table.
I have something like this:
SELECT
terms."startDate",
COALESCE((
SELECT
SUM("lunch")
FROM
"allowance"
WHERE
TO_CHAR(terms."startDate", 'YYYY-MM') >= TO_CHAR("date", 'YYYY-MM')
AND
TO_CHAR(terms."startDate", 'YYYY-MM') < TO_CHAR(??? HELP ???, 'YYYY-MM')
),0) AS "lunchMoney"
FROM "schoolTerms" AS terms
...
Where I have put TO_CHAR(??? HELP ???, 'YYYY-MM') I would like to reference the start date of the childs next term. I have looked into using a LEAD() method but couldn't figure it out.
Any help would be greatly appreciated.

I think you want and join and aggregation:
select t.startdate, t.enddate, sum(a.lunch) as lunch_money
from schoolterms t
inner join allowance a on a.date >= t.startdate and a.date < t.enddate
group by t.startdate, t.enddate
This puts each allowance in the terms it belongs, and then aggregate by term. You might want a left join, if there may be terms without any allowance.
Your current query gives no clue about what a "child" is. Presumably, that should be a column in allowance, that you might want to put in the select and group by clauses.
If you want to compute the end_date as the "next" start_date, then use lead():
select t.startdate, t.enddate, sum(a.lunch) as lunch_money
from (
select start_date,
lead(startdate) over(order by startdate) enddate
from schoolterms
) t
inner join allowance a
on a.date >= t.startdate
and (a.date < t.enddate or t.enddate is null)
group by t.startdate, t.enddate

Related

PostgreSQL - how to loop over a parameter to make union of different queries of the same table

I need to count the number of events still occurring in a given year, provided these events started before the given year and ended after it. So, the query I'm using now is just a pile of unparametrized queries and I'd like to write it out with a loop on a parameter:
select
count("starting_date" ) as "number_of_events" ,'2020' "year"
from public.table_of_events
where "starting_date" < '2020-01-01' and ("closing_date" > '2020-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2019' "year"
from public.table_of_events
where "starting_date" < '2019-01-01' and ("closing_date" > '2019-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2018' "year"
from public.table_of_events
where "starting_date" < '2018-01-01' and ("closing_date" > '2018-12-31')
...
...
...
and so on for N years
So, I have ONE table that must be filtered according to one parameter that is both part of the select statement and the where clause
The general aspect of the "atomic" query is then
select
count("starting_date" ) as "number_of_events" , **PARAMETER** "year"
from public.table_of_events
where "starting_date" < **PARAMETER** and ("closing_date" > **PARAMETER** )
union all
Can anyone help me put this in a more formal loop?
Thanks a lot, I hope I was clear enough.
You seem to want events that span entire years. Perhaps a simple way is to generate the years, then use join and aggregate:
select gs.yyyy, count(e.starting_date)
from public.table_of_events e left join
generate_series('2015-01-01'::date,
'2021-01-01'::date,
interval '1 year'
) as gs(yyyy)
on e.starting_date < gs.yyyy and
e.closing_date >= gs.yyyy + interval '1 year'
group by gs.yyyy;

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

SQL Loop Count of people in program during specified duration

I'm not sure if there should be a loop for this or what the easiest approach would be.
My data consists of a list of people participating in our program. They have various start and end dates, but the following equation is able to capture the number of people who participated on a specific date:
DECLARE #PopulationDate DATETIME = '2018-06-01 05:00:00';
select count(People)
FROM Program_Log
WHERE
START_TIME <= #PopulationDate
AND (END_TIME >= #PopulationDate OR END_TIME IS NULL)`
Is there a way I can loop in different date values to get the number of program participants each day for an entire year?
Multiple years?
One simple way is to use a CTE to generate the dates and then a left join to bring in the data. For instance, the following gets the counts as of the first of the month for this year:
with dates as (
select cast('2018-01-01' as date) as dte
union all
select dateadd(month, 1, dte)
from dates
where dte < getdate()
)
select d.dte, count(pl.people)
from dates d left join
program_log pl
on pl.start_time <= d.dte and (pl.end_time >= d.dte or pl.end_time is null)
group by d.dte
order by d.dte;
Note that this will work best for a handful of dates. If you want more than 100, you need to add option (maxrecursion 0) to the end of the query.
Also, count(people) is highly suspicious. Perhaps you mean sum(people) or something similar.

Optimizing GROUP BY performance

Is there some tricky way to GROUP BY a variable which has been defined by alias or which is a result of calculation? I think that the following code makes a double dip by calculating MyMonth in Select statement and then again in Group statement. It may be unnecessary waste. It is not possible by simple GROUP BY MyMonth. Is it possible to force only one calculation of month([MyDate])?
Update of code. Aggregate function is added.
SELECT month([MyDate]) AS MyMonth, count([MyDate]) AS HowMany
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY month([MyDate])
ORDER BY MyMonth
Your real problem likely stems from calling MONTH(...) on every row. This prevents the optimizer from using an index to fulfill the count (it can use it for the WHERE clause, but this will still be many rows).
Instead, you should turn this into a range query, that the optimizer could use for comparisons against an index. First we build a simple range table:
WITH Months as (SELECT MONTH(d) AS month,
d AS monthStart, DATEADD(month, 1, d) AS monthEnd
FROM (VALUES(CAST('20140101' AS DATE))) t(d)
UNION ALL
SELECT MONTH(monthEnd),
monthEnd, DATEADD(month, 1, monthEnd)
FROM Months
WHERE monthEnd < CAST('20150101' AS DATE))
SQL Fiddle Example
(if you have an existing calendar table, you can base your query on that, but sometimes a simple ad-hoc one works best)
Once we have the range-table, you can then use it to constrain and bucket your data, like so:
SELECT Months.month, COUNT(*)
FROM TableA
JOIN Months
ON TableA.MyDate >= Months.monthStart
AND TableA.MyDate < Months.monthEnd
GROUP BY Months.month
Note: The start of the date range was changed to 2014-01-01, as it seems strange that you'd only include one day from January, when aggregating months...
No, you can't use column alias directly in the GROUP BY clause. Instead do a select in the from list, and use the result column in your group by.
select MyMonth, MAX(someothercolumn)
from
(
SELECT month([MyDate]) AS MyMonth,
someothercolumn
FROM tableA
WHERE [MyDate] BETWEEN '2014-01-31' AND '2014-12-31'
)
GROUP BY MyMonth
ORDER BY MyMonth

Find employee tenure for a company

I have written the following query to get the employees tenure yearwise.
Ie. grouped by "less than 1 year", "1-2 years", "2-3 years" and "greater than 3 years".
To get this, I compare with employee staffed end_date.
But I am not able to get the correct result when comparing with staffed end_date.
I have pasted the complete code below, but the count I am getting is not correct.
Some employee who worked for more than 2 years is falling under <1 year column.
DECLARE #Project_Id Varchar(10)='ITS-004275';
With Cte_Dates(Period,End_date,Start_date,Project_Id)
As
(
SELECT '<1 Year' AS Period, GETDATE() AS End_Date,DATEADD(YY,-1,GETDATE()) AS Start_date,#Project_Id AS Project_Id
UNION
SELECT '1-2 Years', DATEADD(YY,-1,GETDATE()),DATEADD(YY,-2,GETDATE()),#Project_Id
UNION
SELECT '2-3 Years', DATEADD(YY,-2,GETDATE()),DATEADD(YY,-3,GETDATE()),#Project_Id
UNION
SELECT '>3 Years', DATEADD(YY,-3,GETDATE()),'',#Project_Id
),
--select * from Cte_Dates
--ORDER BY Start_date DESC
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID,EMP_ID,MAX(End_Date)AS END_DATE FROM DP_Project_Staffing
WHERE FK_Project_ID=#Project_Id
GROUP BY FK_Project_ID,Emp_ID
)
SELECT D.PROJECT_ID,D.Start_date,D.End_date,COUNT(S.EMP_ID) AS Count,D.Period
FROM Cte_Staffing S
RIGHT JOIN Cte_Dates D
ON D.Project_Id=S.PROJECT_ID
AND S.END_DATE<D.End_date AND S.END_DATE>D.Start_date
GROUP BY D.PROJECT_ID,D.Start_date,D.End_date,D.Period
i think this will solve the problem
as you can see, you should use is like this:
DATEADD(year, -1, GETDATE())
you should also get the GETDATE() to a parameter
I find your query logic a little bit messy. Why don't you just compute the total period for every employee and use CASE clause? I can help you with code if you'll give me DP_Project_Staffing table structure. Do you have begin_date field in it?
You are taking the MAX(End_date) of the CTE staffing table. In that case, when an employee has several entries, only the most recent will apply. You want to use MIN instead.
Like this:
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID, EMP_ID, MIN(End_Date)AS END_DATE
FROM DP_Project_Staffing
...
Re-reading your question, you probably don't want the staffing end_date for tenure calculation; you'd want to use the start_date. (Or whatever the column is called in DP_Project_Staffing)
I would also change the WHERE/JOIN clause to be inclusive on one of the sides, so you have either
AND S.END_DATE <= D.End_date AND S.END_DATE > D.Start_date
or
AND S.END_DATE < D.End_date AND S.END_DATE >= D.Start_date
Since you are using miliseconds in the date-comparison it won't make any difference in this case. However, should you change the granularity to be only the date, which would make more sense, you would lose all records where the employee started exactly 1 year, 2 years, etc. ago.
SELECT FK_Project_ID,E.Emp_ID,MIN(Start_Date) AS Emp_Start_Date ,MAX(End_Date) AS Emp_End_Date,
E.Competency,E.First_Name+' '+E.Last_Name+' ('+E.Emp_Id+')' as Name,'Period'=
CASE
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=12 THEN '<1 Year'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>12 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=24 THEN '1-2 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>24 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=36 THEN '2-3 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>36 THEN '>3 Years'
ELSE 'NA'
END
FROM DP_Project_Staffing PS
LEFT OUTER JOIN DP_Ext_Emp_Master E
ON E.Emp_Id=PS.Emp_ID
WHERE FK_Project_ID=#PROJ_ID
GROUP BY FK_Project_ID,E.Emp_ID,E.Competency,First_Name,Last_Name