How to generate Month list in PostgreSQL? - sql

I have a table A with startdate column which is TIMESTAMP WITHOUT TIME ZONE I need to write a query/function that generate a list of months from the MIN value of the column till MAX value of the column.
For example:
startdate
2014-12-08
2015-06-16
2015-02-17
will generate a list of: (Dec-14,Jan-15,Feb-15,Mar-15,Apr-15,May-15,Jun-15)
How do I do that? I never used PostgreSQL to generate data that wasn't there... it always has been finding the correct data in the DB... any ideas how to do that? Is it doable in a query?

For people looking for an unformatted list of months:
select * from generate_series('2017-01-01', now(), '1 month')

You can generate sequences of data with the generate_series() function:
SELECT to_char(generate_series(min, max, '1 month'), 'Mon-YY') AS "Mon-YY"
FROM (
SELECT date_trunc('month', min(startdate)) AS min,
date_trunc('month', max(startdate)) AS max
FROM a) sub;
This generates a row for every month, in a pretty format. If you want to have it like a list, you can aggregate them all in an outer query:
SELECT string_agg("Mon-YY", ', ') AS "Mon-YY list"
FROM (
-- Query above
) subsub;
SQLFiddle here

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

PrestoDB: select all dates between two dates

I need to form a report which provides some information per each date within dates interval.
I need to have it within a single query (can't create any functions or supporting tables).
How can I achieve that in PrestoDB?
Note: There are lots of vendor specific solution here, here and even here. But none of them satisfies my need as they either don't work in Presto or use tables/functions.
To be more precise here is an example of query:
WITH ( query to select all dates between 2017.01.01 and 2018.01.01 ) AS dates
SELECT
date date,
count(*) number_of_orders
FROM dates dates
LEFT JOIN order order
ON order.created_at = dates.date
You can use the Presto SEQUENCE() function to generate a sequence of days as an array, and then use UNNEST to explode that array as a result set.
Something like this should work for you:
SELECT date_array AS DAY
FROM UNNEST(
SEQUENCE(
cast('2017-01-01' AS date),
cast('2018-01-01' AS date),
INTERVAL '1' DAY
)
) AS t1(date_array)

can I use the to_char function in where clause in a query with PL/SQL?

I am trying to find out average of electricity volume of certain day in a week with the following query:
SELECT avg(volume)
FROM v_nem_rm16
WHERE to_char(day, 'day') = 'monday';
where the v_nem_rm16 is a table and volume, day are its columns and my query is returning null whatever I change the day value 'monday', 'tuesday',....
is this query wrong?
Actually 'DAY' is returned with padding spaces on the right side.
If you use 'RTRIM' then you can avoid the null values.
SELECT avg(volume)
FROM v_nem_rm16
WHERE RTRIM(to_char(day, 'day')) = 'monday';
I would rather use different date format to_char DAY is nls-dependent that is bad (for instance your software will fail in Spain). D returns number so in your case the query should look like
SELECT avg(volume)
FROM v_nem_rm16
WHERE RTRIM(to_char(day, 'd')) = 1;

PostgreSQL "nested"? distincts and count

I need to get the count of the distinct names per hour in one query in PostgreSQL 9.1
The relevant columns(generalized for question) in my table are:
occurred timestamp with time zone and
name character varying(250)
And the table name for the sake of the question is just table
The occurred timestamps will all be within a midnight to midnight(exclusive) range for one day. So far my query looks like:
'SELECT COUNT(DISTINCT ON (name)) FROM table'
It would be nice if I could get the output formatted as a list of 24 integers(one for each hour of the day), the names aren't required to be returned.
If I understand correctly what you want, you can write:
SELECT EXTRACT(HOUR FROM occurred),
COUNT(DISTINCT name)
FROM ...
WHERE ...
GROUP
BY EXTRACT(HOUR FROM occurred)
ORDER
BY EXTRACT(HOUR FROM occurred)
;
SELECT date_trunc('hour', occurred) AS hour_slice
,count(DISTINCT name) AS name_ct
FROM mytable
GROUP BY 1
ORDER BY 1;
DISTINCT ON is a different feature.
date_trunc() gives you a sum for every distinct hour, while EXTRACT sums per hour-of-day over longer periods of time. The two results do not add up, because summing up multiple count(DISTINCT x) is equal or greater than one count(DISTINCT x).
You want this by hour:
select extract(hour from occurred) as hr, count(distinct name)
from table t
group by extract(hour from occurred)
order by 1
This assumes there is data for only one day. Otherwise, hours from different days would be combined. To get around this, you would need to include date information as well.

How to find sum of a column between a given date range, where the table has only start date and end date

I have a postgresql table userDistributions like this :
user_id, start_date, end_date, project_id, distribution
I need to write a query in which a given date range and user id the output should be the sum of all distributions for every day for that given user.
So the output should be like this for input : '2-2-2012' - '2-4-2012', some user id :
Date SUM(Distribution)
2-2-2012 12
2-3-2012 15
2-4-2012 34
A user has distribution in many projects, so I need to sum the distributions in all projects for each day and output that sum against that day.
My problem is what I should group by against ? If I had a field as date (instead of start_date and end_date), then I could just write something like
select date, SUM(distributions) from userDistributions group by date;
but in this case I am stumped as what to do. Thanks for the help.
Use generate_series to produce your dates, something like this:
select dt.d::date, sum(u.distributions)
from userdistributions u
join generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') as dt(d)
on dt.d::date between u.start_date and u.end_date
group by dt.d::date
Your date format is ambiguous so I guess while converting it to ISO 8601.
This is much like #mu's answer.
However, to cover days with no matches you should use LEFT JOIN:
SELECT d.d::date, sum(u.distributions) AS dist_sum
FROM generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') AS d(d)
LEFT JOIN userdistributions u ON d.d::date BETWEEN u.start_date AND u.end_date
GROUP BY 1