I am trying to group the rows in a table fortnightly, but can't seem to work out how to do it - especially, as the date_part function does not have a 'fortnight' keyword argument.
This is what I have so far:
CREATE TABLE foo(
dt DATE NOT NULL,
f1 REAL NOT NULL,
f2 REAL NOT NULL,
f3 REAL NOT NULL,
f4 REAL NOT NULL
);
SELECT AVG((f1+f2+f3+f4)/4) as fld_avg FROM
(
SELECT date_part('year', dt) AS year_part,
date_part('fortnight', dt) AS fortnight_part,
f1, f2, f3, f4
FROM foo
WHERE dt >= date_trunc('day', NOW() - '3 month')
) foo
GROUP BY year_part, fortnight_part
How may I rewrite (or modify) the query above so as to group data fortnightly?
Basic idea
What we need to do, is take intervals of 14 consecutive days and map them to unique buckets and then group by those buckets. These buckets can of any type, int, char, timstamp, as long as we have unique value.
Division
A simple way to accomplish this is division. Divide by 14 days and truncate the result to date precision.
For example, we can extract the number of seconds since 1970-01-01, the UNIX epoch, and divide by the number of seconds in a fortnight: 14 * 24 * 60 * 60 = 14 * 86400 = 1209600. (I'll use Vao Tsun's example data)
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400)/14 fortnight FROM c
which yields fortnights since 1970-01-01 (a Thursday):
fortnight
-----------
1251
1252
1254
1254
(4 rows)
The integer values we get, represent the number of fortnights since 1970-01-01, but we don't have to care about this. The important thing is, that it uniquely identifies a fortnight.
Due to 1970-01-01 being a Thursday, all fortnights will start at a Thursday. We might want to vary the starting point of our fortnight to a different day of the week (e.g. Monday) by adding:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT (EXTRACT(EPOCH FROM d)::int/86400 + 4)/14 fortnight FROM c
By adding four days to Thursday we end up at Monday.
If you rather want fortnights with respect to the beginning of the year, instead of some arbitrary absolute date, such as 1970-01-01, we can use the day of the year instead:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT EXTRACT(year FROM d) * 26 + EXTRACT(doy FROM d)::int/14 AS fortnight FROM c;
which yields
fortnight
-----------
52467
52468
52469
52470
(4 rows)
We need to multiply the extracted year by 26, because there are 26.1… fortnights in a year.
Truncation
Instead of division another approach is truncation. We map each day of a specific fortnight to the first timestamp of that fortnight.
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(secs => EXTRACT(EPOCH FROM d)::int % (86400 * 14)) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-14 00:00:00
2017-12-28 00:00:00
2018-01-25 00:00:00
2018-01-25 00:00:00
(4 rows)
This might seems a bit more complicated, but has some benefits. The result is still a date/time type and other code does not need to worry about the fact, that we used fortnights.
Again, instead of absolute fortnights, we can calculate this with respect to the beginning of the year:
WITH c(d) AS (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
SELECT d - make_interval(days => EXTRACT(dow FROM d)::int % 14) AS fortnight FROM c;
which yields
fortnight
---------------------
2017-12-17 00:00:00
2017-12-31 00:00:00
2018-01-21 00:00:00
2018-01-28 00:00:00
(4 rows)
The result is of type timestamp, you might want to have date instead. This can be addressed by casting:
(d - make_interval(days => EXTRACT(dow FROM d)::int % 14))::date
or subtracting int instead of interval from date:
d - (EXTRACT(dow FROM d)::int % 14)
There are much more possibilities. With this scheme, we can calculate the fortnight or any other interval with respect to the beginning of the month, some arbitrary date, etc.
update
fortnight is a two week period - one even the other odd. eg week 1 and 2, 3 and 4, 5 and 6.
closer: 2 is even, mod(2,2)=0 and 1 is odd, mod(1,2)=1
4 is even, mod(4,2)=0 and 3 is odd, mod(3,2)=1
6 is even, mod(6,2)=0 and 5 is odd, mod(5,2)=1
thus you can make an assumption that each one week's in year consecutive number divided by two reminder is 1, and each next one weeks number/2 reminders 0
The general idea is - using the sequential number of week in a year. To avoid Jan 1st to be first and Dec31 (possible be the 53rd - and thus two odds in a row), I use IW
week number of ISO 8601 week-numbering year (01-53; the first Thursday
of the year is in week 1)
then I assume that if one week number will be odd, next will be even, so we divide all the time in parts of two weeks - even+odd.
SQL Example:
o=# with c(d) as (values('2017.12.21'::date),('2017.12.31'),('2018.01.26'),('2018.02.01'))
select d,to_char(d,'IW'),right(to_char(d,'IW'),1)::int,mod(right(to_char(d,'IW'),1)::int, 2) from c;
d | to_char | right | mod
------------+---------+-------+-----
2017-12-21 | 51 | 1 | 1
2017-12-31 | 52 | 2 | 0
2018-01-26 | 04 | 4 | 0
2018-02-01 | 05 | 5 | 1
(4 rows)
mod is either 0 or 1 - group by this column
https://www.postgresql.org/docs/current/static/functions-math.html
https://www.postgresql.org/docs/current/static/functions-formatting.html
Of course you would need to add outer join on generate_series if you want data without gaps...
I post another answer to explain how I was wrong and why my "smart-n-neat"
way failed...
the schema build and queries are at:
https://www.db-fiddle.com/f/j5i2Td8CvxCVXQQYePKzCe/0
the first (and correct) query:
select distinct w2, avg(c) over (partition by w2)
from d
join generate_series('2016.11.28'::date,'2017.02.23'::date,'2 weeks'::interval) w2
on gs >= w2 and gs < w2 + '2 weeks'::interval
order by w2;
Is a long, simple and correct approach. with idea is to join on two weeks interval. It's working, reliable and all good.
Now the second query:
select distinct div(to_char(gs,'IW')::int,2), min(gs) over w, avg(c) over w
from d
window w as (partition by div(to_char(gs,'IW')::int,2))
order by min;
Is much shorter, neater and smarter, yet has a huge limitation and is unusable. Here's why:
My approach splits next to last two-weeks-interval to two parts: last week of 2016 and first week of 2017, thus dividing the result by half. If you multiply a sum of averages for those two weeks by a half, the result for both queries will match. Alas introducing CASE WHEN logic for the edge year weeks makes neat solution a heavy and overhead. And thus the very point is lost.
TL;DR the neat and lightweight solution works only on interval of one year, farther then two weeks from end or start of the year and lastly if our fortnightly interval starts from Monday.
Now the idea behind lightweight solution: round(2/2, 0)=1 and round(3/2, 0)=1, so you can divide year in intervals of two weeks and use it for grouping by.
Also I deliberately took not this New Year switch, because this 2018 Jan 1 is Monday, so IW is same as WW - which usually is not the case.
Lastly my first answer with odd and even weeks is not viable at all. It divides year not in two-weeks interval, but rather in two parts - for even and odd weeks... I deceived myself with "something close" idea and worked on the reminder, while I should do the opposite the whole value of division...
Related
There is a column with dates. I would like to calculate the number of each weekday (Monday to Sunday) from those dates to present date. On Stack Overflow and otherwise, I found answers that included creating functions, I was hoping there's some inbuilt function that would do it. I found another solution here, which mentions DATEPART('day', start - stop) AS days, but that didn't work. If this is an recent update in PostgreSQL then it won't work because the tool we use at work for PostgreSQL doesn't accept some of the recent updates (like PostgreSQL now accepts negative indexing but the tool doesn't).
What I want:
start_date
day_of_week
no_of_days
2022-04-01
1
10
2022-04-01
2
9
2022-05-15
2
3
2022-06-01
5
1
The start_date is the column of dates, which when subtracted from current_date (the other way around) returns the number of each weekday between those two days. There were 10 Mondays between 1st April 2022 and 6th June 2022 (today), and that's the number I want for each day of the week.
How can I achieve this in PostgreSQL? I am on version 12.8.
This "simple" but optimized solution counts the number of occurrences for every weekday in the interval between start_date and the current date:
WITH cte(start_date) AS (
VALUES
('2022-04-01'::date)
, ('2022-05-15')
, ('2022-06-01')
)
SELECT c.start_date, sub.dow, sub.no_of_days
FROM cte c
CROSS JOIN LATERAL (
SELECT dow, COALESCE(ct, 0) AS no_of_days
FROM (
SELECT EXTRACT('isodow' FROM g)::int AS dow, count(*) AS ct
FROM generate_series(start_date, current_date, interval '1 day') g
GROUP BY 1
) g
RIGHT JOIN generate_series(1, 7) dow USING (dow)
) sub
ORDER BY 1, 2;
db<>fiddle here
The upper bound (current_date) is included.
Every weekday is included, even when no_of_days is 0.
For very old dates (resulting in long intervals), an arithmetic solution will be cheaper than simply counting generated days. A bit more challenging, but not that hard.
I am using the DATEDIFF function to calculate the difference between my two timestamps.
payment_time = 2021-10-29 07:06:32.097332
trigger_time = 2021-10-10 14:11:13
What I have written is : date_diff('minute',payment_time,trigger_time) <= 15
I basically want the count of users who paid within 15 mins of the triggered time
thus I have also done count(s.user_id) as count
However it returns count as 1 even in the above case since the minutes are within 15 but the dates 10th October and 29th October are 19 days apart and hence it should return 0 or not count this row in my query.
How do I compare the dates in my both columns and then count users who have paid within 15 mins?
This also works to calculate minutes between to timestamps (it first finds the interval (subtraction), and then converts that to seconds (extracting EPOCH), and divides by 60:
extract(epoch from (payment_time-trigger_time))/60
In PostgreSQL, I prefer to subtract the two timestamps from each other, and extract the epoch from the resulting interval:
Like here:
WITH
indata(payment_time,trigger_time) AS (
SELECT TIMESTAMP '2021-10-29 07:06:32.097332',TIMESTAMP '2021-10-10 14:11:13'
UNION ALL SELECT TIMESTAMP '2021-10-29 00:00:14' ,TIMESTAMP '2021-10-29 00:00:00'
)
SELECT
EXTRACT(EPOCH FROM payment_time-trigger_time) AS epdiff
, (EXTRACT(EPOCH FROM payment_time-trigger_time) <= 15) AS filter_matches
FROM indata;
-- out epdiff | filter_matches
-- out ----------------+----------------
-- out 1616119.097332 | false
-- out 14.000000 | true
I am wondering if there is a way to convert arbitrary string values (such as the examples below) to something that can be interpreted as a timestamp, perhaps in days.
Dropdown_values
Desired Output(days)
12 weeks
84
1 Week 4 Days
11
1 Year
365
1 Year 1 Week 2 Days
374
The idea I had was to split part out the values since they are all separated by spaces and then do the addition in a separate column, are there other (better) ways to do this? Thank you.
To expand on my comment as an answer:
select extract(epoch from '12 Week'::interval)/86400; 84
select extract(epoch from '1 Year 1 Week 2 Days'::interval)/86400; 11
select extract(epoch from '1 Year 1 Week 2 Days'::interval)/86400; 374.25
The above is how I usually deal with this sort of thing. Extract the epoch value from a the interval and then divide by the number of seconds in a day. It would be a good idea to read in the docs this Interval input and Interval output to understand how an interval is constructed and returned and the assumptions used. Note: the queries will return a float value not a timestamp. A value like 84 cannot be timestamp. You could turn it into an interval like: 84 * '1 day'::interval 84 days. If at all possible it is good idea to store data as actual timestamps(start and end) and then derive intervals from that.
I am working with an existing E-commerce database. Actually, this process is usually done in Excel, but we want to try it directly with a query in PostgreSQL (version 10.6).
We define as an active customer a person who has bought at least once within 1 year. This means, if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
I want the output for each week of the year (2020). Basically what I need is ...
select
email,
orderdate,
id
from
orders_table
where
paid = true;
|---------------------|-------------------|-----------------|
| email | orderdate | id |
|---------------------|-------------------|-----------------|
| email1#email.com |2020-06-02 05:04:32| Order-2736 |
|---------------------|-------------------|-----------------|
I can't create new tables. And I would like to see the output like this:
Year| Week | Active customers
2020| 25 | 6978
2020| 24 | 3948
depending on whether there is a year and week column you can use a OVER (PARTITION BY ...) with extract:
SELECT
extract(year from orderdate),
extract(week from orderdate),
sum(1) as customer_count_in_week,
OVER (PARTITION BY extract(YEAR FROM TIMESTAMP orderdate),
extract(WEEK FROM TIMESTAMP orderdate))
FROM ordertable
WHERE paid=true;
Which should bucket all orders by year and week, thus showing the total count per week in a year where paid is true.
references:
https://www.postgresql.org/docs/9.1/tutorial-window.html
https://www.postgresql.org/docs/8.1/functions-datetime.html
if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
Problems on your side
This method has some corner case ambiguities / issues:
Do you include or exclude "week 22 in 2020"? (I exclude it below to stay closer to "a year".)
A year can have 52 or 53 full weeks. Depending on the current date, the calculation is based on 52 or 53 weeks, causing a possible bias of almost 2 %!
If you start the time range on "the same date last year", then the margin of error is only 1 / 365 or ~ 0.3 %, due to leap years.
A fixed "period of 365 days" (or 366) would eliminate the bias altogether.
Problems on the SQL side
Unfortunately, window functions do not currently allow the DISTINCT key word (for good reasons). So something of the form:
SELECT count(DISTINCT email) OVER (ORDER BY year, week
GROUPS BETWEEN 52 PRECEDING AND 1 PRECEDING)
FROM ...
.. triggers:
ERROR: DISTINCT is not implemented for window functions
The GROUPS keyword has only been added in Postgres 10 and would otherwise be just what we need.
What's more, your odd frame definition wouldn't even work exactly, since the number of weeks to consider is not always 52, as discussed above.
So we have to roll our own.
Solution
The following simply generates all weeks of interest, and computes the distinct count of customers for each. Simple, except that date math is never entirely simple. But, depending on details of your setup, there may be faster solutions. (I had several other ideas.)
The time range for which to report may change. Here is an auxiliary function to generate weeks of a given year:
CREATE OR REPLACE FUNCTION f_weeks_of_year(_year int)
RETURNS TABLE(year int, week int, week_start timestamp)
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
ROWS 52 COST 10 AS
$func$
SELECT _year, d.week::int, d.week_start
FROM generate_series(date_trunc('week', make_date(_year, 01, 04)::timestamp) -- first day of first week
, LEAST(date_trunc('week', localtimestamp), make_date(_year, 12, 28)::timestamp) -- latest possible start of week
, interval '1 week') WITH ORDINALITY d(week_start, week)
$func$;
Call:
SELECT * FROM f_weeks_of_year(2020);
It returns 1 row per week, but stops at the current week for the current year. (Empty set for future years.)
The calculation is based on these facts:
The first ISO week of the year always contains January 04.
The last ISO week cannot start after December 28.
Actual week numbers are computed on the fly using WITH ORDINALITY. See:
PostgreSQL unnest() with element number
Aside, I stick to timestamp and avoid timestamptz for this purpose. See:
Generating time series between two dates in PostgreSQL
The function also returns the timestamp of the start of the week (week_start), which we don't need for the problem at hand. But I left it in to make the function more useful in general.
Makes the main query simpler:
WITH weekly_customer AS (
SELECT DISTINCT
EXTRACT(YEAR FROM orderdate)::int AS year
, EXTRACT(WEEK FROM orderdate)::int AS week
, email
FROM orders_table
WHERE paid
AND orderdate >= date_trunc('week', timestamp '2019-01-04') -- max range for 2020!
ORDER BY 1, 2, 3 -- optional, might improve performance
)
SELECT d.year, d.week
, (SELECT count(DISTINCT email)
FROM weekly_customer w
WHERE (w.year, w.week) >= (d.year - 1, d.week) -- row values, see below
AND (w.year, w.week) < (d.year , d.week) -- exclude current week
) AS active_customers
FROM f_weeks_of_year(2020) d; -- (year int, week int, week_start timestamp)
db<>fiddle here
The CTE weekly_customer folds to unique customers per calendar week once, as duplicate entries are just noise for our calculation. It's used many times in the main query. The cut-off condition is based on Jan 04 once more. Adjust to your actual reporting period.
The actual count is done with a lowly correlated subquery. Could be a LEFT JOIN LATERAL ... ON true instead. See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Using row value comparison to make the range definition simple. See:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
I am using Oracle's to_char() function to convert a date to a week number (1-53):
select pat_id,
pat_enc_csn_id,
contact_date,
to_char(contact_date,'ww') week,
...
the 'ww' switch gives me these values for dates in January of this year:
Date Week
1-Jan-10 1
2-Jan-10 1
3-Jan-10 1
4-Jan-10 1
5-Jan-10 1
6-Jan-10 1
7-Jan-10 1
8-Jan-10 2
9-Jan-10 2
10-Jan-10 2
11-Jan-10 2
12-Jan-10 2
a quick look at the calendar indicates that these values should be:
Date Week
1-Jan-10 1
2-Jan-10 1
3-Jan-10 2
4-Jan-10 2
5-Jan-10 2
6-Jan-10 2
7-Jan-10 2
8-Jan-10 2
9-Jan-10 2
10-Jan-10 3
11-Jan-10 3
12-Jan-10 3
if I use the 'iw' switch instead of 'ww', the outcome is less desirable:
Date Week
1-Jan-10 53
2-Jan-10 53
3-Jan-10 53
4-Jan-10 1
5-Jan-10 1
6-Jan-10 1
7-Jan-10 1
8-Jan-10 1
9-Jan-10 1
10-Jan-10 1
11-Jan-10 2
12-Jan-10 2
Is there another Oracle function that will calculate weeks as I would expect or do I need to write my own?
EDIT
I'm trying to match the logic used by Crystal Reports. Each full week starts on a Sunday; the first week of the year starts on whichever day is represented by January 1st (e.g. in 2010, January 1st is a Friday).
When using IW, Oracle follows the ISO 8601 standard regarding week numbers (see http://en.wikipedia.org/wiki/ISO_8601). That is the same standard than the one we generally use in Europe here.
Your problem is also mentioned on the Oracle forum: http://forums.oracle.com/forums/thread.jspa?threadID=947291 and http://forums.oracle.com/forums/message.jspa?messageID=3318715#3318715. Maybe you can find a solution there.
I know this is old, but still a common question.
This should give you the correct results in the smallest amount of effort:
select pat_id,
pat_enc_csn_id,
contact_date,
to_char(contact_date + 1,'IW') week,
...
Since it looks like you are using your own special definition of the week number you'll need to write your own function.
It might be helpful that NLS_TERRITORY affects the day with which a week starts as used by the D Format Model
see also:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements004.htm#SQLRF00210
and
http://www.adp-gmbh.ch/ora/sql/to_char.html
Based on this question, How do I calculate the week number given a date?, I wrote the following Oracle logic:
CASE
--if [date field]'s day-of-week (e.g. Monday) is earlier than 1/1/YYYY's day-of-week
WHEN to_char(to_date('01/01/' || to_char([date field],'YYYY'),'mm/dd/yyyy'), 'D') - to_char([date field], 'D') > 1 THEN
--adjust the week
trunc(to_char([date field], 'DDD') / 7) + 1 + 1 --'+ 1 + 1' used for clarity
ELSE trunc(to_char([date field], 'DDD') / 7) + 1
END calendar_week