How to find sum of a column between a given date range, where the table has only start date and end date - sql

I have a postgresql table userDistributions like this :
user_id, start_date, end_date, project_id, distribution
I need to write a query in which a given date range and user id the output should be the sum of all distributions for every day for that given user.
So the output should be like this for input : '2-2-2012' - '2-4-2012', some user id :
Date SUM(Distribution)
2-2-2012 12
2-3-2012 15
2-4-2012 34
A user has distribution in many projects, so I need to sum the distributions in all projects for each day and output that sum against that day.
My problem is what I should group by against ? If I had a field as date (instead of start_date and end_date), then I could just write something like
select date, SUM(distributions) from userDistributions group by date;
but in this case I am stumped as what to do. Thanks for the help.

Use generate_series to produce your dates, something like this:
select dt.d::date, sum(u.distributions)
from userdistributions u
join generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') as dt(d)
on dt.d::date between u.start_date and u.end_date
group by dt.d::date
Your date format is ambiguous so I guess while converting it to ISO 8601.

This is much like #mu's answer.
However, to cover days with no matches you should use LEFT JOIN:
SELECT d.d::date, sum(u.distributions) AS dist_sum
FROM generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') AS d(d)
LEFT JOIN userdistributions u ON d.d::date BETWEEN u.start_date AND u.end_date
GROUP BY 1

Related

PrestoDB: select all dates between two dates

I need to form a report which provides some information per each date within dates interval.
I need to have it within a single query (can't create any functions or supporting tables).
How can I achieve that in PrestoDB?
Note: There are lots of vendor specific solution here, here and even here. But none of them satisfies my need as they either don't work in Presto or use tables/functions.
To be more precise here is an example of query:
WITH ( query to select all dates between 2017.01.01 and 2018.01.01 ) AS dates
SELECT
date date,
count(*) number_of_orders
FROM dates dates
LEFT JOIN order order
ON order.created_at = dates.date
You can use the Presto SEQUENCE() function to generate a sequence of days as an array, and then use UNNEST to explode that array as a result set.
Something like this should work for you:
SELECT date_array AS DAY
FROM UNNEST(
SEQUENCE(
cast('2017-01-01' AS date),
cast('2018-01-01' AS date),
INTERVAL '1' DAY
)
) AS t1(date_array)

sql select number divided aggregate sum function

I have this schema
and I want to have a query to calculate the cost per consultant per hour per month. In other words, a consultant has a salary per month, I want to divide the amount of the salary between the hours that he/she worked that month.
SELECT
concat_ws(' ', consultants.first_name::text, consultants.last_name::text) as name,
EXTRACT(MONTH FROM tasks.init_time) as task_month,
SUM(tasks.finish_time::timestamp::time - tasks.init_time::timestamp::time) as duration,
EXTRACT(MONTH FROM salaries.payment_date) as salary_month,
salaries.payment
FROM consultants
INNER JOIN tasks ON consultants.id = tasks.consultant_id
INNER JOIN salaries ON consultants.id = salaries.consultant_id
WHERE EXTRACT(MONTH FROM tasks.init_time) = EXTRACT(MONTH FROM salaries.payment_date)
GROUP BY (consultants.id, EXTRACT(MONTH FROM tasks.init_time), EXTRACT(MONTH FROM salaries.payment_date), salaries.payment);
It is not possible to do this in the select
salaries.payment / SUM(tasks.finish_time::timestamp::time - tasks.init_time::timestamp::time)
Is there another way to do it? Is it possible to solve it in one query?
Assumptions made for this answer:
The model is not entirely clear to me, so I am assuming the following:
you are using PostgreSQL
salaries.date is defined as a date column that stores the day when a consultant was paid
tasks.init_time and task.finish_time are defined as timestamp storing the data & time when a consultant started and finished work on a specific task.
Your join on only the month is wrong as far as I can tell. For one, because it would also include months from different years, but more importantly because this would lead to a result where the same row from salaries appeared several times. I think you need to join on the complete date:
FROM consultants c
JOIN tasks t ON c.id = t.consultant_id
JOIN salaries s ON c.id = s.consultant_id
AND t.init_time::date = s.payment_date --<< here
If my assumptions about the data types are correct, the cast to a timestamp and then back to a time is useless and wrong. Useless because you can simply subtract to timestamps and wrong because you are ignoring the actual date in the timestamp so (although unlikely) if init_time and finish_time are not on the same day, the result is wrong.
So the calculation of the duration can be simplified to:
t.finish_time - t.init_time
To get the cost per hour per month, you need to convert the interval (which is the result when subtracting one timestamp from another) to a decimal indicating the hours, you can do this by extracting the seconds from the interval and then dividing that by 3600, e.g.
extract(epoch from sum(t.finish_time - t.init_time)) / 3600)
If you divide the sum of the payments by that number you get your cost per hour per month:
SELECT concat_ws(' ', c.first_name, c.last_name) as name,
to_char(s.payment_date, 'yyyy-mm') as salary_month,
extract(epoch from sum(t.finish_time - t.init_time)) / 3600 as worked_hours,
sum(s.payment) / (extract(epoch from sum(t.finish_time - t.init_time)) / 3600) as cost_per_hour
FROM consultants c
JOIN tasks t ON c.id = t.consultant_id
JOIN salaries s ON c.id = s.consultant_id AND t.init_time::date = s.payment_date
GROUP BY c.id, to_char(s.payment_date, 'yyyy-mm') --<< no parentheses!
order by name, salary_month;
As you want the report broken down by month you should convert the month into something that contains the year as well. I used to_char() to get a string with only year and month. You also need to remove salaries.payment from the group by clause.
You also don't need the "payment month" and "salary month" because both will always be the same as that is the join condition.
And finally you don't need the cast to ::text for the name columns because they are most certainly defined as varchar or text anyway.
The sample data I made up for this: http://sqlfiddle.com/#!15/ae0c9
Somewhat unrelated, but:
You should also not put the column list of the group by in parentheses. Putting a column list in parentheses in Postgres creates an anonymous record which is something completely different then having multiple columns. This is also true for the columns in the select list.
If at all the target is putting it in one query, then just confirming, have you tried to achieve it using CTEs?
Like
;WITH cte_pymt
AS
(
//Your existing query 1
)
SELECT <your required data> FROM cte_pymt

How to generate Month list in PostgreSQL?

I have a table A with startdate column which is TIMESTAMP WITHOUT TIME ZONE I need to write a query/function that generate a list of months from the MIN value of the column till MAX value of the column.
For example:
startdate
2014-12-08
2015-06-16
2015-02-17
will generate a list of: (Dec-14,Jan-15,Feb-15,Mar-15,Apr-15,May-15,Jun-15)
How do I do that? I never used PostgreSQL to generate data that wasn't there... it always has been finding the correct data in the DB... any ideas how to do that? Is it doable in a query?
For people looking for an unformatted list of months:
select * from generate_series('2017-01-01', now(), '1 month')
You can generate sequences of data with the generate_series() function:
SELECT to_char(generate_series(min, max, '1 month'), 'Mon-YY') AS "Mon-YY"
FROM (
SELECT date_trunc('month', min(startdate)) AS min,
date_trunc('month', max(startdate)) AS max
FROM a) sub;
This generates a row for every month, in a pretty format. If you want to have it like a list, you can aggregate them all in an outer query:
SELECT string_agg("Mon-YY", ', ') AS "Mon-YY list"
FROM (
-- Query above
) subsub;
SQLFiddle here

View data by date after Format 'mmyy'

I'm trying to answer questions like, how many POs per month do we have? Or, how many lines are there in every PO by month, etc. The original PO dates are all formatted #1/1/2013#. So my first step was to Format each PO record date into 'mmyy' so I could group and COUNT them.
This worked well but, now I cannot view the data by date... For example, I cannot ask 'How many POs after December did we get?' I think this is because SQL does not recognize mm/yy as a comparable date.
Any ideas how I could restructure this?
There are 2 queries I wrote. This is the query to format the dates. This is also the query I was trying to add the date filter to (ex: >#3/14#)
SELECT qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy") AS [Date]
FROM qryALL_PO
GROUP BY qryALL_PO.POLN, Format([PO CREATE DATE],"mm/yy");
My group and counting query is:
SELECT qryALL_PO.POLN, Sum(qryALL_PO.[LINE QUANTITY]) AS SUM_QTY_PO
FROM qryALL_PO
GROUP BY qryALL_PO.POLN;
You can still count and group dates, as long as you have a way to determine the part of the date you are looking for.
In Access you can use year and month for example to get the year and month part of the date:
select year(mydate)
, month(mydate)
, count(*)
from tableX
group
by year(mydate)
, month(mydate)
You can format it 'YYYY-MM' , and then use '>' for 'after' clause

can I use the to_char function in where clause in a query with PL/SQL?

I am trying to find out average of electricity volume of certain day in a week with the following query:
SELECT avg(volume)
FROM v_nem_rm16
WHERE to_char(day, 'day') = 'monday';
where the v_nem_rm16 is a table and volume, day are its columns and my query is returning null whatever I change the day value 'monday', 'tuesday',....
is this query wrong?
Actually 'DAY' is returned with padding spaces on the right side.
If you use 'RTRIM' then you can avoid the null values.
SELECT avg(volume)
FROM v_nem_rm16
WHERE RTRIM(to_char(day, 'day')) = 'monday';
I would rather use different date format to_char DAY is nls-dependent that is bad (for instance your software will fail in Spain). D returns number so in your case the query should look like
SELECT avg(volume)
FROM v_nem_rm16
WHERE RTRIM(to_char(day, 'd')) = 1;