PGSQL query to get a list of sequential dates from today - sql

I am having an calendar table where I have added the list of dates on which no action should be performed.
The table is as follows and the date format is YYYY-MM-DD
date
2021-01-01
2021-04-05
2021-04-06
2021-04-07
2021-08-10
2021-11-22
2021-11-23
2021-11-24
2021-12-25
2021-12-31
Considering today is 2021-11-24.
The expected output is
date
2021-11-24
2021-11-23
2021-11-22
And Considering today is 2021-12-25
then the expected output is
date
2021-12-25
And Considering today is 2021-12-27
then the output should contain no data.
date
It should get me the sequence with today's date in descending order without a break of sequence.
I searched on various posts I did find some of the posts related to my question but the query was little complex with nested subqueries. Is there a way to achieve the output in a more optimized way. I am new to pgsql.

Create example table:
CREATE TABLE calendar (d date);
INSERT INTO calendar VALUES ('2021-11-23'),('2021-11-20');
Query:
SELECT * FROM
(SELECT CURRENT_DATE - '1 day'::interval * generate_series(0,10) AS d) a
LEFT JOIN calendar c ON (c.d=a.d);
a.d | c.d
---------------------+------------
2021-11-14 00:00:00 | Null
2021-11-15 00:00:00 | Null
2021-11-16 00:00:00 | Null
2021-11-17 00:00:00 | Null
2021-11-18 00:00:00 | Null
2021-11-19 00:00:00 | Null
2021-11-20 00:00:00 | 2021-11-20
2021-11-21 00:00:00 | Null
2021-11-22 00:00:00 | Null
2021-11-23 00:00:00 | 2021-11-23
2021-11-24 00:00:00 | Null
Subquery "a" generates a date series, and then we join it to the table.
You can add conditions , for example "WHERE calendar.d IS NULL", or "IS NOT NULL" depending on the filtering you want.

You can simply filter by a date range, building it by subtracting 2 days from today:
select "date"
from maintenance_dates_70099898
where "date" <= now()::date --you want to see today and 2 days prior; Last 3 days total
and "date" >= now()::date - '2 days'::interval
order by 1 desc;
With a runnable test:
drop table if exists maintenance_dates_70099898;
create table maintenance_dates_70099898 ("date" date);
insert into maintenance_dates_70099898
("date")
values
('2021-01-01'),
('2021-04-05'),
('2021-04-06'),
('2021-04-07'),
('2021-08-10'),
('2021-11-22'),
('2021-11-23'),
('2021-11-24'),
('2021-12-25'),
('2021-12-31');
select "date"
from maintenance_dates_70099898
where "date" <= now()::date --you want to see today and 2 days prior; Last 3 days total
and "date" >= now()::date - '2 days'::interval
order by 1 desc;
-- date
--------------
-- 2021-11-24
-- 2021-11-23
-- 2021-11-22
--(3 rows)
select "date"
from maintenance_dates_70099898
where "date" >= '2021-12-25'::date - '2 days'::interval
and "date" <= '2021-12-25'::date
order by 1 desc;
-- date
--------------
-- 2021-12-25
--(1 row)
I assume that for 2021-12-27 you do want to see 2021-12-25, as it's within the 3 day range prior.
select "date"
from maintenance_dates_70099898
where "date" >= '2021-12-28'::date - '2 days'::interval
and "date" <= '2021-12-28'::date
order by 1 desc;
-- date
--------
--(0 rows)

The main issue appears to be not having a known number of days thus disabling a simple range validation/selection. However to the rescue there is a RECURSIVE cte to pluck off each previous date that is exactly 1 day prior to the last and terminate when no longer holds.
with recursive no_action(no_act_dt) as
( select no_act_dt
from no_action_calendar
where no_act_dt = :parm_date::date
union all
select c.no_act_dt
from no_action_calendar c
join no_action a
on (c.no_act_dt = a.no_act_dt - 1)
)
select *
from no_action
order by no_act_dt desc;
If you use this often or from several points, you can parametrize it with a SQL function. (see demo for both).
create or replace
function consective_no_action_dates (date_in date)
returns setof date
language sql
as $$
with recursive no_action(no_act_dt) as
( select no_act_dt
from no_action_calendar
where no_act_dt = date_in
union all
select c.no_act_dt
from no_action_calendar c
join no_action a
on (c.no_act_dt = a.no_act_dt - 1)
)
select *
from no_action
order by no_act_dt desc;
$$;

Related

How to average values in one table based on the condition involving another table in SQL?

I have two tables. One defines time intervals (beginning and end). Time intervals are not equal in length. Another contains product ID, start and end date of the product.
TableOne:
Interval StartDateTime EndDateTime
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00
TableTwo
ProductID ProductStartDateTime ProductEndDateTime
ASSDWE1 2018-01-04 00:12:00 2020-04-10 20:00:30
ADFGHER 2020-01-05 00:11:30 2020-01-19 00:00:00
ASDFVBN 2017-10-10 00:12:10 2020-02-23 00:23:23
I need to compute the average length of the products from TableTwo that existed during time intervals defined in TableOne. If the product existed throughout the time interval from TableOne, then the length of the product during this time interval is defined as it length since its start date till the end of the time interval.
I tried the following
select
a.*,
(select
AVG(datediff(day, b.ProductStartDateTime, IIF (b.ProductEndDateTime> a.EndDateTime, a.EndDateTime
,b.ProductEndDateTime))) --compute average length of the products
FROM #TableTwo b
WHERE ( not (b.ProductEndDateTime <= a.StartDateTime ) and not (b.ProductStartDateTime >= a.EndDateTime) )
-- select products that existed during interval from #TableOne
) as AverageProductLength
from #TableOne a
I get the mistake "Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression."
The result I want:
Interval StartDateTime EndDateTime AverageProductLength
202020201 2020-01-01 00:00:00 2020-02-10 00:00:00 23
202020202 2020-02-10 00:00:00 2020-02-20 00:00:00 34.5
Is there a way I can do the averaging?

SQL Find Last 30 Days records count grouped by

I am trying to retrieve the count of customers daily per each status in a dynamic window - Last 30 days.
The result of the query should show each day how many customers there are per each customer status (A,B,C) for the Last 30 days (i.e today() - 29 days). Every customer can have one status at a time but change from one status to another within the customer lifetime. The purpose of this query is to show customer 'movement' across their lifetime. I've generated a series of date ranging from the first date a customer was created until today.
I've put together the following query but it appears that something I'm doing is incorrect because the results depict most days as having the same count across all statuses which is not possible, each day new customers are created. We checked with another simple query and confirmed that the split between statuses is not equal.
I tried to depict below the data and the SQL I use to reach the optimal result.
Starting point (example table customer_statuses):
customer_id | status | created_at
---------------------------------------------------
abcdefg1234 B 2019-08-22
abcdefg1234 C 2019-01-17
...
abcdefg1234 A 2018-01-18
bcdefgh2232 A 2017-09-02
ghijklm4950 B 2018-06-06
statuses - A,B,C
There is no sequential order for the statuses, a customer can have any status at the start of the business relationship and switch between statuses throughout their lifetime.
table customers:
id | f_name | country | created_at |
---------------------------------------------------------------------
abcdefg1234 Michael FR 2018-01-18
bcdefgh2232 Sandy DE 2017-09-02
....
ghijklm4950 Daniel NL 2018-06-06
SQL - current version:
WITH customer_list AS (
SELECT
DISTINCT a.id,
a.created_at
FROM
customers a
),
dates AS (
SELECT
generate_series(
MIN(DATE_TRUNC('day', created_at)::DATE),
MAX(DATE_TRUNC('day', now())::DATE),
'1d'
)::date AS day
FROM customers a
),
customer_statuses AS (
SELECT
customer_id,
status,
created_at,
ROW_NUMBER() OVER
(
PARTITION BY customer_id
ORDER BY created_at DESC
) col
FROM
customer_status
)
SELECT
day,
(
SELECT
COUNT(DISTINCT id) AS accounts
FROM customers
WHERE created_at::date BETWEEN day - 29 AND day
),
status
FROM dates d
LEFT JOIN customer_list cus
ON d.day = cus.created_at
LEFT JOIN customer_statuses cs
ON cus.id = cs.customer_id
WHERE
cs.col = 1
GROUP BY 1,3
ORDER BY 1 DESC,3 ASC
Currently what the results from the query look like:
day | count | status
-------------------------
2020-01-24 1230 C
2020-01-24 1230 B
2020-01-24 1230 A
2020-01-23 1200 C
2020-01-23 1200 B
2020-02-23 1200 A
2020-02-22 1150 C
2020-02-22 1150 B
...
2017-01-01 50 C
2017-01-01 50 B
2017-01-01 50 A
Two things I've noticed from the results above - most of the time the results show the same count across all statuses in a given day. The second observation, there are days that only two statuses appear - which should not be the case. If now new accounts are created in a given day with a certain status, the count of the previous day should be carried over - right? or is this the problem with the query I created or with the logic I have in mind??
Perhaps I'm expecting a result that will not happen logically?
Required result:
day | count | status
-------------------------
2020-01-24 1230 C
2020-01-24 1000 B
2020-01-24 2500 A
2020-01-23 1200 C
2020-01-23 1050 B
2020-02-23 2450 A
2020-02-22 1160 C
2020-02-22 1020 B
2020-02-22 2400 A
...
2017-01-01 10 C
2017-01-01 4 B
2017-01-01 50 A
Thank You!
Your query seems overly complicated. Here is another approach:
Use lead() to get when the status ends for each customer status record.
Use generate_series() to generate the days.
The rest is just filtering and aggregation:
select gs.dte, cs.status, count(*)
from (select cs.*,
lead(cs.created_at, 1, now()::date) over (partition by cs.customer_id order by cs.created_at) as next_ca
from customer_statuses cs
) cs cross join lateral
generate_series(cs.created_at, cs.next_ca - interval '1 day', interval '1 day') gs(dte)
where gs.dte < now()::date - interval '30 day'
I've altered the query a bit because I've noticed that I get duplicate records on the days a customer changes a status - one record with the old status and one records for the new day.
For example output with #Gordon's query:
dte | status
---------------------------
2020-02-12 B
... ...
01.02.2020 A
01.02.2020 B
31.01.2020 A
30.01.2020 A
I've adapted the query, see below, while the results depict the changes between statuses correctly (no duplicate records on the day of change), however, the records continue up until now()::date - interval '1day' and not include now()::date (as in today). I'm not sure why and can't find the correct logic to ensure all of this is how I want it.
Dates correctly depict the status of each customer and the status returned include today.
Adjusted query:
select gs.dte, cs.status, count(*)
from (select cs.*,
lead(cs.created_at, 1, now()::date) over (partition by cs.customer_id order by cs.created_at) - INTERVAL '1day' as next_ca
from customer_statuses cs
) cs cross join lateral
generate_series(cs.created_at, cs.next_ca, interval '1 day') gs(dte)
where gs.dte < now()::date - interval '30 day'
The two adjustments:
The adjustments also seem counter-intuitive as it seems i'm taking the interval day away from one part of the query only to add it to another (which to me seems to yield the same result)
a - added the decrease of 1 day from the lead function (line 3)
lead(cs.created_at, 1, now()::date) over (partition by cs.customer_id order by cs.created_at) - INTERVAL '1 day' as next_ca
b - removed the decrease of 1 day from the next_ca variable (line 6)
generate_series(cs.created_at, cs.next_ca - interval '1 day', interval '1 day')
Example of the output with the adjusted query:
dte | status
---------------------------
2020-02-11 B
... ...
01.02.2020 B
31.01.2020 A
30.01.2020 A
Thanks for your help!

what is the difference between setting date condition with extract date and date between d1 and d2 in sql

i have written two queries which i expected would give me the same data.
Query 1
select transaction, count(*)
from table
where create_date between to_Date('02/11/2017','MM/DD/YYYY') and to_date('02/17/2017','MM/DD/YYYY')
group by transaction
Query 2
select transaction, count(*)
from table
where extract(day from create_date) between 11 and 17
and extract(month from create_date)=2
and extract(year from create_date)=2017
group by transaction
Results from query 1
Transaction1 1155
Transaction2 333
Transaction3 5188
Results from query 2
Transaction1 1422
Transaction2 415
Transaction3 6155
why am i getting different results?
The first query gets the values where the values are between 2017-02-11 00:00:00 and 2017-02-17 00:00:00.
The second query gets the values where the values are between 2017-02-11 00:00:00 and 2017-02-17 23:59:59.
So, if there are values between 2017-02-17 00:00:01 and 2017-02-17 23:59:59 then they will be included in the COUNT of the second query but not the first.
Try:
select transaction, count(*)
from table
where create_date >= DATE '2017-02-11'
AND create_date < DATE '2017-02-18'
group by transaction
or
select transaction, count(*)
from table
where TRUNC( create_date ) BETWEEN DATE '2017-02-11' AND DATE '2017-02-18'
group by transaction
(Note: the later query will not use indexes on create_date and would need a function-based index on TRUNC( create_date ) instead.)
TO_DATE - Convert String to Datetime, and internally the between clause is working on HH: MI: SS in your first query , for making same result from second one need to take care about the HH: MI: SS in your second query

How to identify and aggregate sequence from start and end dates

I'm trying to identify a consecutive sequence in dates, per person, as well as sum amount for that sequence. My records table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date and end_date, but I don't have a way to aggregate the amount. I'm assuming the answer will involve some sort of ROW_NUMBER() window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an OVER clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):
SELECT
person
,MAX(sequence_start_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "sequence_start_date"
,MIN(sequence_end_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS "sequence_end_date"
,amount
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' >= start_date
THEN date '0001-01-01'
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' <= end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
order by person, start_date
) sq_part
ORDER BY person, sequence_start_date
That relies on MAX() and MIN() instead of COALESCE(), and it applies window framing to get the appropriate scope for each of those within each partition. Results:
person sequence_start_date sequence_end_date amount
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 500
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 100
1 October, 05 2015 00:00:00 October, 07 2015 00:00:00 2000
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 300
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 1000
3 April, 23 2015 00:00:00 April, 23 2015 00:00:00 900
Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (person, start_date) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.
And now you have a way to identify the sequences: they are characterized by the triple person, sequence_start_date, sequence_end_date. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:
SELECT
person,
sequence_start_date,
sequence_end_date,
SUM(amount) AS "amount"
FROM ( <above query> ) sq
GROUP BY person, sequence_start_date, sequence_end_date
Of course you need both dates as grouping columns if you're going to select them.
Why not:
select a1.person, a1.sequence_start_date, a1.sequence_end_date,
sum(rx.amount)
as amount
from (EXISTING_QUERY) a1
left join records rx
on rx.person = a1.person
and rx.start_date >= a1.start_date
and rx.end_date <= a1.end_date
group by a1.person, a1.sequence_start_date, a1.sequence_end_date

PostgreSQL query for multiple update

I have a table in which I have 4 columns: emp_no,desig_name,from_date and to_date:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00
1001 sr.engineer 2010-08-01 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
So my question is to update first row to_date column just one day before from_date of second row as well as for the second one aslo?
After update it should look like:
emp_no desig_name from_date to_date
1001 engineer 2004-08-01 00:00:00 2010-07-31 00:00:00
1001 sr.engineer 2010-08-01 00:00:00 2013-07-31 00:00:00
1001 chief.engineer 2013-08-01 00:00:00
You can calculate the "next" date using the lead() function.
This calculated value can then be used to update the table:
with calc as (
select promotion_id,
emp_no,
from_date,
lead(from_date) over (partition by emp_no order by from_date) as next_date
from emp
)
update emp
set to_date = c.next_date - interval '1' day
from calc c
where c.promotion_id = emp.promotion_id;
As you can see getting that value is quite easy, and storing derived information is very often not a good idea. You might want to consider a view that calculates this information on the fly so you don't need to update your table each time you insert a new row.
SQLFiddle example: http://sqlfiddle.com/#!15/31665/1