Calculate closest working day in Postgres - sql

I need to schedule some items in a postgres query based on a requested delivery date for an order. So for example, the order has a requested delivery on a Monday (20120319 for example), and the order needs to be prepared on the prior working day (20120316).
Thoughts on the most direct method? I'm open to adding a dates table. I'm thinking there's got to be a better way than a long set of case statements using:
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');

This gets you previous business day.
SELECT
CASE (EXTRACT(ISODOW FROM current_date)::integer) % 7
WHEN 1 THEN current_date-3
WHEN 0 THEN current_date-2
ELSE current_date-1
END AS previous_business_day

To have the previous work day:
select max(s.a) as work_day
from (
select s.a::date
from generate_series('2012-01-02'::date, '2050-12-31', '1 day') s(a)
where extract(dow from s.a) between 1 and 5
except
select holiday_date
from holiday_table
) s
where s.a < '2012-03-19'
;
If you want the next work day just invert the query.

SELECT y.d AS prep_day
FROM (
SELECT generate_series(dday - 8, dday - 1, interval '1d')::date AS d
FROM (SELECT '2012-03-19'::date AS dday) x
) y
LEFT JOIN holiday h USING (d)
WHERE h.d IS NULL
AND extract(isodow from y.d) < 6
ORDER BY y.d DESC
LIMIT 1;
It should be faster to generate only as many days as necessary. I generate one week prior to the delivery. That should cover all possibilities.
isodow as extract parameter is more convenient than dow to test for workdays.
min() / max(), ORDER BY / LIMIT 1, that's a matter of taste with the few rows in my query.
To get several candidate days in descending order, not just the top pick, change the LIMIT 1.
I put the dday (delivery day) in a subquery so you only have to input it once. You can enter any date or timestamp literal. It is cast to date either way.

CREATE TABLE Holidays (Holiday, PrecedingBusinessDay) AS VALUES
('2012-12-25'::DATE, '2012-12-24'::DATE),
('2012-12-26'::DATE, '2012-12-24'::DATE);
SELECT Day, COALESCE(PrecedingBusinessDay, PrecedingMondayToFriday)
FROM
(SELECT Day, Day - CASE DATE_PART('DOW', Day)
WHEN 0 THEN 2
WHEN 1 THEN 3
ELSE 1
END AS PrecedingMondayToFriday
FROM TestDays) AS PrecedingMondaysToFridays
LEFT JOIN Holidays ON PrecedingMondayToFriday = Holiday;
You might want to rename some of the identifiers :-).

Related

How can I get a user's activity count for today and this month in a single SELECT query

In my table I have:
Activity : Date
---------------
doSomething1 : June 1, 2020
doSomething2 : June 14, 2020
I want to be able to make a query so that I can get the following result (assuming today is June 1, 2020):
Today : ThisMonth
1 : 2
I looked at group by but I wasn't sure how to do that without a lot of additional code and I think there's very likely a much simpler solution that I'm missing. Something that will just return a single row with two results. Is this possible and if so how?
you can write subqueries to get data in single row,
Select today , month
from
(
( query to get today's count ) as today,
( query to get month's count ) as month
) t;
yes, u can do group by on dates to get todays nd months count.
Hope this will give u some perception to go on.
Is this what you want?
select array_agg(activity) filter (where date = current_date) as today,
array_agg(activity) filter (where date <> current_date) as rest_of_month
from t
where date_trunc('month', date) = current_date;
This uses arrays so it can handle more than one activity in either category.
Assume you want to query based on a particular date -
select count(case when d.date = :p_query_date then 0 end) day_count
,count(0) month_count
from d -- your table name
where d.date between date_trunc('month', :p_query_date)
and date_trunc('month', :p_query_date + interval '1 month') - interval '1 day'
The above query assumes you have index defined on d.date column. If you have index defined on date_trunc('month', date), the query condition can be simplified to:
date_trunc('month', d.date) = date_trunc('month', :p_query_date)

Postgres: Return zero as default for rows where there is no matach

I am trying to get all the paid contracts from my contracts table and group them by month. I can get the data but for months where there is no new paid contract I want to get a zero instead of missing month. I have tried coalesce and generate_series but I cannot seem to get the missing row.
Here is my query:
with months as (
select generate_series(
'2019-01-01', current_date, interval '1 month'
) as series )
select date(months.series) as day, SUM(contracts.price) from months
left JOIN contracts on date(date_trunc('month', contracts.to)) = months.series
where contracts.tier='paid' and contracts.trial=false and (contracts.to is not NULL) group by day;
I want the results to look like:
|Contract Value| Month|
| 20 | 01-2020|
| 10 | 02-2020|
| 0 | 03-2020|
I can get the rows where there is a contract but cannot get the zero row.
Postgres Version 10.9
I think that you want:
with months as (
select generate_series('2019-01-01', current_date, interval '1 month' ) as series
)
select m.series as day, coalesce(sum(c.price), 0) sum_price
from months m
left join contracts c
on c.to >= m.series
and c.to < m.series + interval '1' month
and co.tier = 'paid'
and not c.trial
group by m.series;
That is:
you want the condition on the left joined table in the on clause of the join rather than in the where clause, otherwise they become mandatory, and evict rows where the left join came back empty
the filter on the date can be optimized to avoid using date functions; this makes the query SARGeable, ie the database may take advantage of an index on the date column
table aliases make the query easier to read and write
You need to move conditions to the on clause:
with months as (
select generate_series( '2019-01-01'::date, current_date, interval '1 month') as series
)
select dm.series as day, coalesce(sum(c.price), 0)
from months m left join
contracts c
on c.to >= m.series and
c.to < m.series + interval '1 month' and
c.tier = 'paid' and
c.trial = false
group by day;
Note some changes to the query:
The conditions on c that were in the where clause are in the on clause.
The date comparison uses simple data comparisons, rather than truncating to the month. This helps the optimizer and makes it easier to use an index.
Table aliases make the query easier to write and to read.
There is no need to convert day to a date. It already is.
to is a bad choice for a column name because it is reserved. However, I did not change it.

Find Distinct IDs when the due date is always on the last day of each month

I have to find distinct IDs throughout the whole history of each ID whose due dates are always on the last day of each month.
Suppose I have the following dataset:
ID DUE_DT
1 1/31/2014
1 2/28/2014
1 3/31/2014
1 6/30/2014
2 1/30/2014
2 2/28/2014
3 1/29/2016
3 2/29/2016
I want to write a code in SQL so that it gives me ID = 1 as for this specific ID the due date is always on the last day of each given month.
What would be the easiest way to approach it?
You can do:
select id
from t
group by id
having sum(case when extract(day from due_dt + interval '1 day') = 1 then 1 else 0 end) = count(*);
This uses ANSI/ISO standard functions for date arithmetic. These tend to vary by database, but the idea is the same in all databases -- add one day and see if the day of the month is 1 for all the rows.
If your using SQL Server 2012+ you can use the EOMONTH() function to achieve this:
SELECT DISTINCT ID FROM [table]
WHERE DUE_DT = EOMONTH(DUE_DT)
http://rextester.com/VSPQR78701
The idea is quite simple:
you are on the last day of the month if (the month of due date) is not the same as (the month of due date + 1 day). This covers all cases across year, leap year and so on.
from there on, if (the count of rows for one id) is the same as (the count of rows for this id which are the last day of the month) you have a winner.
I tried to write an example (not tested). You do not specify which DB so I will assume that cte (common table expression) are available. If not just put the cte as subquery.
In the same way, I am not sure that dateadd and interval work the same in all dialect.
with addlastdayofmonth as (
select
id
-- adding a 'virtualcolumn', 1 if last day of month 0 otherwise
, if(month(dateadd(due_date, interval '1' day)) != month(due_date), 1 ,0) as onlastday
from
table
)
select
id
, count(*) - sum(onlastday) as alwayslastday
from
addlastdayofmonth
group by
id
having
-- if count(rows) == count(rows with last day) we have a winner
halwayslastday = 0
MySQL-Version (credits to #Gordon Linoff)
SELECT
ID
FROM
<table>
GROUP BY
ID
HAVING
SUM(IF(day(DUE_DT + interval 1 Day) = 1, 1, 0)) = COUNT(ID);
Original Answer:
SELECT MAX(DUE_DT) FROM <table> WHERE ID = <the desired ID>
or if you want all MAX(DUE_DT) for each unique ID
SELECT ID, MAX(DATE) FROM <table> GROUP BY ID

Oracle - Split a record into multiple records

I have a schedule table for each month schedule. And this table also has days off within that month. I need a result set that will tell working days and off days for that month.
Eg.
CREATE TABLE SCHEDULE(sch_yyyymm varchar2(6), sch varchar2(20), sch_start_date date, sch_end_date date);
INSERT INTO SCHEDULE VALUES('201703','Working Days', to_date('03/01/2017','mm/dd/yyyy'), to_date('03/31/2017','mm/dd/yyyy'));
INSERT INTO SCHEDULE VALUES('201703','Off Day', to_date('03/05/2017','mm/dd/yyyy'), to_date('03/07/2017','mm/dd/yyyy'));
INSERT INTO SCHEDULE VALUES('201703','off Days', to_date('03/08/2017','mm/dd/yyyy'), to_date('03/10/2017','mm/dd/yyyy'));
INSERT INTO SCHEDULE VALUES('201703','off Days', to_date('03/15/2017','mm/dd/yyyy'), to_date('03/15/2017','mm/dd/yyyy'));
Using SQL or PL/SQL I need to split the record with Working Days and Off Days.
From above records I need result set as:
201703 Working Days 03/01/2017 - 03/04/2017
201703 Off Days 03/05/2017 - 03/10/2017
201703 Working Days 03/11/2017 - 03/14/2017
201703 Off Days 03/15/2017 - 03/15/2017
201703 Working Days 03/16/2017 - 03/31/2017
Thank You for your help.
Edit: I've had a bit more of a think, and this approach works fine for your insert records above - however, it misses records where there are not continuous "off day" periods. I need to have a bit more of a think and will then make some changes
I've put together a test using the lead and lag functions and a self join.
The upshot is you self-join the "Off Days" onto the existing tables to find the overlaps. Then calculate the start/end dates on either side of each record. A bit of logic then lets us work out which date to use as the final start/end dates.
SQL fiddle here - I used Postgres as the Oracle function wasn't working but it should translate ok.
select sch,
/* Work out which date to use as this record's Start date */
case when prev_end_date is null then sch_start_date
else off_end_date + 1
end as final_start_date,
/* Work out which date to use as this record's end date */
case when next_start_date is null then sch_end_date
when next_start_date is not null and prev_end_date is not null then next_start_date - 1
else off_start_date - 1
end as final_end_date
from (
select a.*,
b.*,
/* Get the start/end dates for the records on either side of each working day record */
lead( b.off_start_date ) over( partition by a.sch_start_date order by b.off_start_date ) as next_start_date,
lag( b.off_end_date ) over( partition by a.sch_start_date order by b.off_start_date ) as prev_end_date
from (
/* Get all schedule records */
select sch,
sch_start_date,
sch_end_date
from schedule
) as a
left join
(
/* Get all non-working day schedule records */
select sch as off_sch,
sch_start_date as off_start_date,
sch_end_date as off_end_date
from schedule
where sch <> 'Working Days'
) as b
/* Join on "Off Days" that overlap "Working Days" */
on a.sch_start_date <= b.off_end_date
and a.sch_end_date >= b.off_start_date
and a.sch <> b.off_sch
) as c
order by final_start_date
If you had a dates table this would have been easier.
You can construct a dates table using a recursive cte and join on to it. Then use the difference of row number approach to classify rows with same schedules on consecutive dates into one group and then get the min and max of each group which would be the start and end dates for a given sch. I assume there are only 2 sch values Working Days and Off Day.
with dates(dt) as (select date '2017-03-01' from dual
union all
select dt+1 from dates where dt < date '2017-03-31')
,groups as (select sch_yyyymm,dt,sch,
row_number() over(partition by sch_yyyymm order by dt)
- row_number() over(partition by sch_yyyymm,sch order by dt) as grp
from (select s.sch_yyyymm,d.dt,
/*This condition is to avoid a given date with 2 sch values, as 03-01-2017 - 03-31-2017 are working days
on one row and there is an Off Day status for some of these days.
In such cases Off Day would be picked up as sch*/
case when count(*) over(partition by d.dt) > 1 then min(s.sch) over(partition by d.dt) else s.sch end as sch
from dates d
join schedule s on d.dt >= s.sch_start_date and d.dt <= s.sch_end_date
) t
)
select sch_yyyymm,sch,min(dt) as start_date,max(dt) as end_date
from groups
group by sch_yyyymm,sch,grp
I couldn't get the recursive cte running in Oracle. Here is a demo using SQL Server.
Sample Demo in SQL Server

in sql, calculating date parts versus date lookup table in group queries

many queries are by week, month or quarter when the base table date is either date or timestamp.
in general, in group by queries, does it matter whether using
- functions on the date
- a day table that has extraction pre-calculated
note: similar question as DATE lookup table (1990/01/01:2041/12/31)
for example, in postgresql
create table sale(
tran_id serial primary key,
tran_dt date not null default current_date,
sale_amt decimal(8,2) not null,
...
);
create table days(
day date primary key,
week date not null,
month date not null,
quarter date non null
);
-- week query 1: group using funcs
select
date_trunc('week',tran_dt)::date - 1 as week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
where date_trunc('week',tran_dt)::date - 1 between '2012-1-1' and '2011-12-31'
group by date_trunc('week',tran_dt)::date - 1
order by 1;
-- query 2: group using days
select
days.week,
count(1) as sale_ct,
sum(sale_amt) as sale_amt
from sale
join days on( days.day = sale.tran_dt )
where week between '2011-1-1'::date and '2011-12-31'::date
group by week
order by week;
to me, whereas the date_trunc() function seems more organic, the the days table is easier to use.
is there anything here more than a matter of taste?
-- query 3: group using instant "immediate" calendar table
WITH calender AS (
SELECT ser::date AS dd
, date_trunc('week', ser)::date AS wk
-- , date_trunc('month', ser)::date AS mon
-- , date_trunc('quarter', ser)::date AS qq
FROM generate_series( '2012-1-1' , '2012-12-31', '1 day'::interval) ser
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calender cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Note: I fixed an apparent typo in the BETWEEN range.
UPDATE: I used Erwin's recursive CTE to squeeze out the duplicated date_trunc(). Nested CTE galore:
WITH calendar AS (
WITH RECURSIVE montag AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 AS dd
FROM montag
WHERE dd < '2012-1-1'::date
)
SELECT mo.dd, date_trunc('week', mo.dd + 1)::date AS wk
FROM montag mo
)
SELECT
cal.wk
, count(1) as sale_ct
, sum(sa.sale_amt) as sale_amt
FROM sale sa
JOIN calendar cal ON cal.dd = sa.tran_dt
-- WHERE week between '2012-1-1' and '2011-12-31'
GROUP BY cal.wk
ORDER BY cal.wk
;
Yes, it is more than a matter of taste. The performance of the query depends on the method.
As a first approximation, the functions should be faster. They don't require joins, doing the read in a single table scan.
However, a good optimizer could make effective use of a lookup table. It would know the distribution of the target values. And, an in memory join could be quite fast.
As a database design, I think having a calendar table is very useful. Some information such as holidays just isn't going to work as a function. However, for most ad hoc queries the date functions are fine.
1. Your expression:
... between '2012-1-1' and '2011-12-31'
doesn't work. Basic BETWEEN requires the left argument to be less than or equal to the right argument. Would have to be:
... BETWEEN SYMMETRIC '2012-1-1' and '2011-12-31'
Or it's just a typo and you mean something like:
... BETWEEN '2011-1-1' and '2011-12-31'
It's unclear to me, what your queries are supposed to retrieve. I'll assume you want all weeks (Monday to Sunday) that start in 2011 for the rest of this answer. This expression generates exactly that in less than a microsecond on modern hardware (works for any year):
SELECT generate_series(
date_trunc('week','2010-12-31'::date) + interval '7d'
,date_trunc('week','2011-12-31'::date) + interval '6d'
, '1d')::date
*Note that the ISO 8601 definition of the "first week of a year is slightly different.
2. Your second query does not work at all. No GROUP BY?
3. The question you link to did not deal with PostgreSQL, which has outstanding date / timestamp support. And it has generate_series() which can obviate the need for a separate "days" table in most cases - as demonstrated above. Your query would look like this:
In the meantime #wildplasser provided an example query that was supposed to go here.
By popular* demand, a recursive CTE version - which is actually not that far from being a serious alternative!
* and by "popular" I mean #wildplasser's very serious request.
WITH RECURSIVE days AS (
SELECT '2011-01-01'::date AS dd
,date_trunc('week', '2011-01-01'::date )::date AS wk
UNION ALL
SELECT dd + 1
,date_trunc('week', dd + 1)::date AS wk
FROM days
WHERE dd < '2011-12-31'::date
)
SELECT d.wk
,count(*) AS sale_ct
,sum(s.sale_amt) AS sale_amt
FROM days d
JOIN sale s ON s.tran_dt = d.dd
-- WHERE d.wk between '2011-01-01' and '2011-12-31'
GROUP BY 1
ORDER BY 1;
Could also be written as (compare to #wildplasser's version):
WITH RECURSIVE d AS (
SELECT '2011-01-01'::date AS dd
UNION ALL
SELECT dd + 1 FROM d WHERE dd < '2011-12-31'::date
), days AS (
SELECT dd, date_trunc('week', dd + 1)::date AS wk
FROM d
)
SELECT ...
4. If performance is of the essence, just make sure, that you do not apply functions or calculations to the values of your table. This prohibits the use of indexes and is generally very slow, because every row has to be processed. That's why your first query is going to suck with big table. When ever possible, apply calculations to the values you filter with, instead.
Indexes on expressions are one way around this. If you had an index like
CREATE INDEX sale_tran_dt_week_idx ON sale (date_trunc('week', tran_dt)::date);
.. your first query could be very fast again - at some cost for write operations for index maintenance.