Convert Year+WeekOfYear+DayOfWeek to a date - sql

I have date values identified by a year, the week number within that year and the weekday and want to convert those into simple dates.
I couldn't find a function or another simple way to combine those, so I came up with a workaround using generate_series to get all dates in a range and JOIN the extracted values of those with my data:
SELECT data.*, days.d result
FROM ( VALUES (2017, 33, 3) ) data(d_year, d_week, d_weekday)
JOIN (
SELECT
-- the potential castdate
d::date d
-- year-week-dayofweek combination for JOINing
, EXTRACT('year' FROM d) d_year, EXTRACT('week' FROM d) d_week, EXTRACT('dow' FROM d) d_weekday
FROM generate_series('2015-01-01', '2019-12-31', INTERVAL '1day') AS days(d)
) days
USING(d_year, d_week, d_weekday)
Result is:
+--------+--------+-----------+------------+
| d_year | d_week | d_weekday | result |
+--------+--------+-----------+------------+
| 2017 | 33 | 3 | 16.08.2017 |
+--------+--------+-----------+------------+
While this works, this seems like overkill for such a simple task. Moreover, if one doesn't have a fixed range, this might not even work.
Is there an easier way to this?

demo:db<>fiddle
you can use the to_date() function, which takes an date string as argument, as well as a format pattern. So if the date string may be '2017-33-3', you could take this pattern to clarify each date part:
'IYYY-IW-ID'
'ID': The tricky part is: Does your week start with Sunday oder with Monday? This question influences the solution because it would shift the week numbers in an unexpected ways if you don't think about it. Thanks to your expected output, I saw you need 'ID' (ISO week day, week starts mondays) instead of 'D' (week day, week start sundays.)
'IW': Because we are taking the ISO week day, we need the ISO week of year as well (instead of 'WW': week of year)
'IYYY': Similar to (2)
More information about date patterns (especially the ISO thing): Postgres documentation
SELECT to_date(d_year || '-' || d_week || '-' || d_weekday, 'IYYY-IW-ID')
If you used the standard week pattern: 'YYYY-WW-D', your result would be 2017-08-13 (see fiddle)
Of course, this works also without the - characters, but it might be less readable:
SELECT to_date(d_year || d_week || d_weekday, 'IYYYIWID')

Related

Compare date filed with month and year in Postgres

I have a date field in one of my tables and the column name is from_dt. Now I have to compare a month and year combination against this from_dt field and check whether the month has already passed. The current database function uses separate conditions for the month and the year, but this is wrong as it will compare month and year separately. The current code is like this
SELECT bill_rate, currency FROM table_name WHERE
emp_id = employee_id_param
AND EXTRACT(MONTH FROM from_dt) <= month_param
AND EXTRACT(YEAR FROM from_dt) <= year_param
Now the fromt_dt field has value 2021-10-11. If I give month_param as 01 and year_param as 2022, this condition will not work as the month 10 is greater than 1, which I have given. Basically, I need to check whether 01-2022 (Jan 2022) is greater than r equal to 2021-10-01(October 1st, 2021). It would be very much helpful if someone can shed some light here.
If you just want to check whether one date is >= then another:
# select '2022-01-01'::date >= '2021-10-11'::date;
?column?
----------
t
If you want to restrict to year/month then:
select date_trunc('month','2022-01-01'::date) >= date_trunc('month', '2021-10-11'::date);
?column?
----------
t
Where the date_trunc components are:
select date_trunc('month','2022-01-01'::date) ;
date_trunc
------------------------
2022-01-01 00:00:00-08
select date_trunc('month','2021-10-11'::date) ;
date_trunc
------------------------
2021-10-01 00:00:00-07
See Postgres date_trunc for more information.
Assuming the given year_param and month_param are integers you can use the make_date function to create the first of the year_month and date_trunc to get the first on the month from the table. Just compare those values. (See date functions) So:
select bill_rate, currency
from table_name
where emp_id = employee_id_param
and date_trunc('month',from_dt) =
make_date( year_param, month_param, 01);

Count distinct customers, active within a year, for every week of the year

I am working with an existing E-commerce database. Actually, this process is usually done in Excel, but we want to try it directly with a query in PostgreSQL (version 10.6).
We define as an active customer a person who has bought at least once within 1 year. This means, if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
I want the output for each week of the year (2020). Basically what I need is ...
select
email,
orderdate,
id
from
orders_table
where
paid = true;
|---------------------|-------------------|-----------------|
| email | orderdate | id |
|---------------------|-------------------|-----------------|
| email1#email.com |2020-06-02 05:04:32| Order-2736 |
|---------------------|-------------------|-----------------|
I can't create new tables. And I would like to see the output like this:
Year| Week | Active customers
2020| 25 | 6978
2020| 24 | 3948
depending on whether there is a year and week column you can use a OVER (PARTITION BY ...) with extract:
SELECT
extract(year from orderdate),
extract(week from orderdate),
sum(1) as customer_count_in_week,
OVER (PARTITION BY extract(YEAR FROM TIMESTAMP orderdate),
extract(WEEK FROM TIMESTAMP orderdate))
FROM ordertable
WHERE paid=true;
Which should bucket all orders by year and week, thus showing the total count per week in a year where paid is true.
references:
https://www.postgresql.org/docs/9.1/tutorial-window.html
https://www.postgresql.org/docs/8.1/functions-datetime.html
if I analyze week 22 in 2020, an active customer will be the one that has bought at least once since week 22, 2019.
Problems on your side
This method has some corner case ambiguities / issues:
Do you include or exclude "week 22 in 2020"? (I exclude it below to stay closer to "a year".)
A year can have 52 or 53 full weeks. Depending on the current date, the calculation is based on 52 or 53 weeks, causing a possible bias of almost 2 %!
If you start the time range on "the same date last year", then the margin of error is only 1 / 365 or ~ 0.3 %, due to leap years.
A fixed "period of 365 days" (or 366) would eliminate the bias altogether.
Problems on the SQL side
Unfortunately, window functions do not currently allow the DISTINCT key word (for good reasons). So something of the form:
SELECT count(DISTINCT email) OVER (ORDER BY year, week
GROUPS BETWEEN 52 PRECEDING AND 1 PRECEDING)
FROM ...
.. triggers:
ERROR: DISTINCT is not implemented for window functions
The GROUPS keyword has only been added in Postgres 10 and would otherwise be just what we need.
What's more, your odd frame definition wouldn't even work exactly, since the number of weeks to consider is not always 52, as discussed above.
So we have to roll our own.
Solution
The following simply generates all weeks of interest, and computes the distinct count of customers for each. Simple, except that date math is never entirely simple. But, depending on details of your setup, there may be faster solutions. (I had several other ideas.)
The time range for which to report may change. Here is an auxiliary function to generate weeks of a given year:
CREATE OR REPLACE FUNCTION f_weeks_of_year(_year int)
RETURNS TABLE(year int, week int, week_start timestamp)
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
ROWS 52 COST 10 AS
$func$
SELECT _year, d.week::int, d.week_start
FROM generate_series(date_trunc('week', make_date(_year, 01, 04)::timestamp) -- first day of first week
, LEAST(date_trunc('week', localtimestamp), make_date(_year, 12, 28)::timestamp) -- latest possible start of week
, interval '1 week') WITH ORDINALITY d(week_start, week)
$func$;
Call:
SELECT * FROM f_weeks_of_year(2020);
It returns 1 row per week, but stops at the current week for the current year. (Empty set for future years.)
The calculation is based on these facts:
The first ISO week of the year always contains January 04.
The last ISO week cannot start after December 28.
Actual week numbers are computed on the fly using WITH ORDINALITY. See:
PostgreSQL unnest() with element number
Aside, I stick to timestamp and avoid timestamptz for this purpose. See:
Generating time series between two dates in PostgreSQL
The function also returns the timestamp of the start of the week (week_start), which we don't need for the problem at hand. But I left it in to make the function more useful in general.
Makes the main query simpler:
WITH weekly_customer AS (
SELECT DISTINCT
EXTRACT(YEAR FROM orderdate)::int AS year
, EXTRACT(WEEK FROM orderdate)::int AS week
, email
FROM orders_table
WHERE paid
AND orderdate >= date_trunc('week', timestamp '2019-01-04') -- max range for 2020!
ORDER BY 1, 2, 3 -- optional, might improve performance
)
SELECT d.year, d.week
, (SELECT count(DISTINCT email)
FROM weekly_customer w
WHERE (w.year, w.week) >= (d.year - 1, d.week) -- row values, see below
AND (w.year, w.week) < (d.year , d.week) -- exclude current week
) AS active_customers
FROM f_weeks_of_year(2020) d; -- (year int, week int, week_start timestamp)
db<>fiddle here
The CTE weekly_customer folds to unique customers per calendar week once, as duplicate entries are just noise for our calculation. It's used many times in the main query. The cut-off condition is based on Jan 04 once more. Adjust to your actual reporting period.
The actual count is done with a lowly correlated subquery. Could be a LEFT JOIN LATERAL ... ON true instead. See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Using row value comparison to make the range definition simple. See:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'

Presto how to find start date given week

I want to find start date from given ISO week (which can range from 1-53, Monday as starting day) and year using Presto SQL query.
i.e. year - 2020 and week - 2 should return 06/01/2020
Is there any inbuilt function for this ?
Table structure:
select year, week from table1; // returns year and week from table table 1
There's no direct way for constructing a date from a year + week (there is an issue for this: https://github.com/trinodb/trino/issues/2287), but you can achieve what you want with the date_parse function.
For example:
WITH data (year, week) AS (VALUES (2020, 2))
SELECT CAST(date_parse(CAST(year AS varchar) || ':' || CAST(week AS varchar), '%x:%v') AS date)
FROM data
produces:
_col0
------------
2020-01-06
(1 row)
Using DATE_ADD and MAKEDATE you can achieve the result...
select DATE_ADD(MAKEDATE(year, 1), INTERVAL (week-1) WEEK) as start_date from <table_name>;
Martin's answer is almost there. Instead of using year, you should use year_of_week. Though with this change you'll have to make sure to not have a bug with weeks that bleed into the following or previous year i.e. last days of the previous year or first days of the next year.
year_of_week returns the year of the ISO week.
Here's an example:
WITH data (year, week) AS (VALUES (2020, 2))
SELECT CAST(date_parse(CAST(year_of_week AS varchar) || ':' || CAST(week AS varchar), '%x:%v') AS date)
FROM data
References:
https://prestodb.io/docs/current/functions/datetime.html#year_of_week
https://en.wikipedia.org/wiki/ISO_week_date
I think for partial weeks (when first days in a new year is still counted as week 53) this does not work:
Query failed (#20210624_142222_02859_zf75v): Cannot parse "2021:53": Value 53 for weekOfWeekyear must be in the range [1,52]
tested by this formula:
The date was on Jan 2,2021 which is still treated as week53 but in 2021...
CAST(date_parse(CAST(year(date(service_order_creation_date)) AS varchar) || ':' || CAST(week(date(service_order_creation_date)) AS varchar), '%x:%v') AS date)

How to query checking if a month is between two dates?

I'm looking for query in postgresql to check whether a month is between two dates or not.
I know how to check if a date is between two date or not. Postgres also have a function to do that.
Let's say I have a a_table with rows:
ID | start_date (timestamp) | end_date (timestamp)
1 | 2019-07-20 00:00 | 2020-03-20 00:00
2 | 2019-08-20 00:00 | 2020-08-30 00:00
I have a to return the row that include a month between the start_date and end_date.
Let's say i have a month 2019-08.
So when i count
Select count(*) from a_table
Where [some where clause]
it returns 2 rows, ID 1 and ID 2
AND when i have a month 2020-01 it only return ID 1
You can use date range for this.
It's not clear to me what should happen if the start/end date in the table only covers part of a month.
If you only want to consider the full month, use the "contains" operator
select count(*)
from the_table
where daterange(start_date::date, end_date::date, '[]') #> daterange('2019-08-01'::date, '2019-09-01'::date, '[)');
The <# is the "is contained" operator which tests if the left range (the values from the table) is contained in the right hand range (the month you want to test). The comparison is done with an "open interval", which means '2019-09-01' is excluded from it. The above would not consider rows that do not contain the full August.
If you want to include partial matches as well, use the "overlaps" operator && instead:
select count(*)
from the_table
where daterange(start_date::date, end_date::date, '[]') && daterange('2019-08-01'::date, '2019-09-01'::date, '[)');
You can use to_date() in your where clause. So that your where clause would be like this.
WHERE '2019-08' BETWEEN to_date(start_date, 'YYYY-MM') and to_date(end_date ,'YYYY-MM')
I would recommend writing this as:
WHERE end_date >= TO_DATE('2019-08', 'YYYY-MM') AND
start_date < TO_DATE('2019-08', 'YYYY-MM') + INTERVAL '1 MONTH'
That is, the period includes at least one day of the month, because it starts before the end of the month and ends after the start of the month.
In addition, this has no functions on the columns in the table. So, if an index is available on either column, then it can be used. If you define the start/end as a range, then that provides other opportunities for using indexes.

Choose active employes per month with dates formatted dd/mm/yyyy

I'm having a hard time explaining this through writing, so please be patient.
I'm making this project in which I have to choose a month and a year to know all the active employees during that month of the year.. but in my database I'm storing the dates when they started and when they finished in dd/mm/yyyy format.
So if I have an employee who worked for 4 months eg. from 01/01/2013 to 01/05/2013 I'll have him in four months. I'd need to make him appear 4 tables(one for every active month) with the other employees that are active during those months. In this case those will be: January, February, March and April of 2013.
The problem is I have no idea how to make a query here or php processing to achieve this.
All I can think is something like (I'd run this query for every month, passing the year and month as argument)
pg_query= "SELECT employee_name FROM employees
WHERE month_and_year between start_date AND finish_date"
But that can't be done, mainly because month_and_year must be a column not a variable.
Ideas anyone?
UPDATE
Yes, I'm very sorry that I forgot to say I was using DATE as data type.
The easiest solution I found was to use EXTRACT
select * from employees where extract (year FROM start_date)>='2013'
AND extract (month FROM start_date)='06' AND extract (month FROM finish_date)<='07'
This gives me all records from june of 2013 you sure can substite the literal variables for any variable of your preference
There is no need to create a range to make an overlap:
select to_char(d, 'YYYY-MM') as "Month", e.name
from
(
select generate_series(
'2013-01-01'::date, '2013-05-01', '1 month'
)::date
) s(d)
inner join
employee e on
date_trunc('month', e.start_date)::date <= s.d
and coalesce(e.finish_date, 'infinity') > s.d
order by 1, 2
SQL Fiddle
If you want the months with no active employees to show then change the inner for a left join
Erwin, about your comment:
the second expression would have to be coalesce(e.finish_date, 'infinity') >= s.d
Notice the requirement:
So if I have an employee who worked for 4 months eg. from 01/01/2013 to 01/05/2013 I'll have him in four months
From that I understand that the last active day is indeed the previous day from finish.
If I use your "fix" I will include employee f in month 05 from my example. He finished in 2013-05-01:
('f', '2013-04-17', '2013-05-01'),
SQL Fiddle with your fix
Assuming that you really are not storing dates as character strings, but are only outputting them that way, then you can do:
SELECT employee_name
FROM employees
WHERE start_date <= <last date of month> and
(finish_date >= <first date of month> or finish_date is null)
If you are storing them in this format, then you can do some fiddling with years and months.
This version turns the "dates" into strings of the form "YYYYMM". Just express the month you want like this and you can do the comparison:
select employee_name
from employees e
where right(start_date, 4)||substr(start_date, 4, 2) <= 'YYYYMM' and
(right(finish_date, 4)||substr(finish_date, 4, 2) >= 'YYYYMM' or finish_date is null)
NOTE: the expression 'YYYYMM' is meant to be the month/year you are looking for.
First, you can generate multiple date intervals easily with generate_series(). To get lower and upper bound add an interval of 1 month to the start:
SELECT g::date AS d_lower
, (g + interval '1 month')::date AS d_upper
FROM generate_series('2013-01-01'::date, '2013-04-01', '1 month') g;
Produces:
d_lower | d_upper
------------+------------
2013-01-01 | 2013-02-01
2013-02-01 | 2013-03-01
2013-03-01 | 2013-04-01
2013-04-01 | 2013-05-01
The upper border of the time range is the first of the next month. This is on purpose, since we are going to use the standard SQL OVERLAPS operator further down. Quoting the manual at said location:
Each time period is considered to represent the half-open interval
start <= time < end [...]
Next, you use a LEFT [OUTER] JOIN to connect employees to these date ranges:
SELECT to_char(m.d_lower, 'YYYY-MM') AS month_and_year, e.*
FROM (
SELECT g::date AS d_lower
, (g + interval '1 month')::date AS d_upper
FROM generate_series('2013-01-01'::date, '2013-04-01', '1 month') g
) m
LEFT JOIN employees e ON (m.d_lower, m.d_upper)
OVERLAPS (e.start_date, COALESCE(e.finish_date, 'infinity'))
ORDER BY 1;
The LEFT JOIN includes date ranges even if no matching employees are found.
Use COALESCE(e.finish_date, 'infinity')) for employees without a finish_date. They are considered to be still employed. Or maybe use current_date in place of infinity.
Use to_char() to get a nicely formatted month_and_year value.
You can easily select any columns you need from employees. In my example I take all columns with e.*.
The 1 in ORDER BY 1 is a positional parameter to simplify the code. Orders by the first column month_and_year.
To make this fast, create an multi-column index on these expressions. Like
CREATE INDEX employees_start_finish_idx
ON employees (start_date, COALESCE(finish_date, 'infinity') DESC);
Note the descending order on the second index-column.
If you should have committed the folly of storing temporal data as string types (text or varchar) with the pattern 'DD/MM/YYYY' instead of date or timestamp or timestamptz, convert the string to date with to_date(). Example:
SELECT to_date('01/03/2013'::text, 'DD/MM/YYYY')
Change the last line of the query to:
...
OVERLAPS (to_date(e.start_date, 'DD/MM/YYYY')
,COALESCE(to_date(e.finish_date, 'DD/MM/YYYY'), 'infinity'))
You can even have a functional index like that. But really, you should use a date or timestamp column.