Count over rows in previous time range partitioned by a specific column - sql

My dataset consists of daily (actually business days) timeseries for different companies from different industries and I work with PostgreSQL. I have an indicator variable in my dataset taking values 1, -1 and most of the times 0. For better readability of the question I refer to days where the indicator variable is unequal to zero as indicator event.
So for all indicator events that are preceded by another indicator event for the same industry in the previous three business days, the indicator variable shall be updated to zero.
We can think of the following example dataset:
day company industry indicator
2012-01-12 A financial 1
2012-01-12 B consumer 0
2012-01-13 A financial 1
2012-01-13 B consumer -1
2012-01-16 A financial 0
2012-01-16 B consumer 0
2012-01-17 A financial 0
2012-01-17 B consumer 0
2012-01-17 C consumer 0
2012-01-18 A financial 0
2012-01-18 B consumer 0
2012-01-18 C consumer 1
So the indicator values that shall be updated to zero are on 2012-01-13 the entry for company A, and on 2012-01-18 the entry for company C, because they are preceded by another indicator event in the same industry within 3 business days.
I tried to accomplish it in the following way:
UPDATE test SET indicator = 0
WHERE (day, industry) IN (
SELECT day, industry
FROM (
SELECT industry, day,
COUNT(CASE WHEN indicator <> 0 THEN 1 END)
OVER (PARTITION BY industry ORDER BY day
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) As cnt
FROM test
) alias
WHERE cnt >= 2)
My idea was to count the indicator events for the current day and the 3 preceding days partitioned by industry. If it counts more than 1, it updates the indicator value to zero.
The weak spot is, that so far it counts over the three preceding rows (partitioned by industry) instead of the three preceding business days. So in the example data, it is not able to update company C on 2012-01-18, because it counts over the last three rows where industry = consumer instead of counting over all rows where industry=consumer for the last three business days.
I tried different methods like adding another subquery in the third last line of the code or adding a WHERE EXISTS - clause after the third last line, to ensure that the code counts over the three preceding dates. But nothing worked. I really don't know out how to do that (I just learn to work with PostgreSQL).
Do you have any ideas how to fix it?
Or maybe I am thinking in a completely wrong direction and you know another approach how to solve my problem?

DB design
Fist off, your table should be normalized. industry should be a small foreign key column (typically integer) referencing industry_id of an industry table. Maybe you have that already and only simplified for the sake of the question. Your actual table definition would go a long way.
Since rows with an indicator are rare but highly interesting, create a (possibly "covering") partial index to make any solution faster:
CREATE INDEX tbl_indicator_idx ON tbl (industry, day)
WHERE indicator <> 0;
Equality first, range last.
Assuming that indicator is defined NOT NULL. If industry was an integer, this index would be perfectly efficient.
Query
This query identifies rows to be reset:
WITH x AS ( -- only with indicator
SELECT DISTINCT industry, day
FROM tbl t
WHERE indicator <> 0
)
SELECT industry, day
FROM (
SELECT i.industry, d.day, x.day IS NOT NULL AS incident
, count(x.day) OVER (PARTITION BY industry ORDER BY day_nr
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS ct
FROM (
SELECT *, row_number() OVER (ORDER BY d.day) AS day_nr
FROM (
SELECT generate_series(min(day), max(day), interval '1d')::date AS day
FROM x
) d
WHERE extract('ISODOW' FROM d.day) < 6
) d
CROSS JOIN (SELECT DISTINCT industry FROM x) i
LEFT JOIN x USING (industry, day)
) sub
WHERE incident
AND ct > 1
ORDER BY 1, 2;
SQL Fiddle.
ISODOW as extract() parameter is convenient to truncate weekends.
Integrate this in your UPDATE:
WITH x AS ( -- only with indicator
SELECT DISTINCT industry, day
FROM tbl t
WHERE indicator <> 0
)
UPDATE tbl t
SET indicator = 0
FROM (
SELECT i.industry, d.day, x.day IS NOT NULL AS incident
, count(x.day) OVER (PARTITION BY industry ORDER BY day_nr
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS ct
FROM (
SELECT *, row_number() OVER (ORDER BY d.day) AS day_nr
FROM (
SELECT generate_series(min(day), max(day), interval '1d')::date AS day
FROM x
) d
WHERE extract('isodow' FROM d.day) < 6
) d
CROSS JOIN (SELECT DISTINCT industry FROM x) i
LEFT JOIN x USING (industry, day)
) u
WHERE u.incident
AND u.ct > 1
AND t.industry = u.industry
AND t.day = u.day;
This should be substantially faster than your solution with correlated subqueries and a function call for every row. Even if that's based on my own previous answer, it's not perfect for this case.

In the meantime I found one possible solution myself (I hope that this isn't against the etiquette of the forum).
Please note that this is only one possible solution. You are very welcome to comment it or to develop
improvements if you want to.
For the first part, the function addbusinessdays which can add (or subtract) business day to
a given date, I am referring to:
http://osssmb.wordpress.com/2009/12/02/business-days-working-days-sql-for-postgres-2/
(I just slightly modified it because I don't care for holidays, just for weekends)
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
with alldates as (
SELECT i,
$1 + (i * case when $2 < 0 then -1 else 1 end) AS date
FROM generate_series(0,(abs($2) + 5)*2) i
),
days as (
select i, date, extract('dow' from date) as dow
from alldates
),
businessdays as (
select i, date, d.dow from days d
where d.dow between 1 and 5
order by i
)
select date from businessdays where
case when $2 > 0 then date >=$1 when $2 < 0 then date <=$1 else date =$1 end
limit 1
offset abs($2)
$BODY$
LANGUAGE 'sql' VOLATILE
COST 100;
ALTER FUNCTION addbusinessdays(date, integer) OWNER TO postgres;
For the second part, I am referring to this related question, where I am applying Erwin Brandstetter's correlated subquery approach: Window Functions or Common Table Expressions: count previous rows within range
UPDATE test SET indicator = 0
WHERE (day, industry) IN (
SELECT day, industry
FROM (
SELECT industry, day,
(SELECT COUNT(CASE WHEN indicator <> 0 THEN 1 END)
FROM test t1
WHERE t1.industry = t.industry
AND t1.day between addbusinessdays(t.day,-3) and t.day) As cnt
FROM test t
) alias
WHERE cnt >= 2)

Related

Select rows for last n days after event occurs

I have the following table and data:
PatientID PatientName Diagnosed ReportDate ...
1 0
1 0
1 0
1 1
So there are multiple rows for each patient, as the reports come few times a day.
Whenever the diagnosed field is changed to 1, for that patient, I'd like to get the past 3 days of data . So when Diagnosed ==1, get report time -3 days of data for each patient.
SELECT Patients.ReportDate
FROM Patients
WHERE Diagnosed = 1 and date > ReportDate - interval '3' day;
So getting the past 3 days of data, can be done with ReportDate - interval time, but how do I specify that for every patient (since multiple ids can be for that patient) based on the diagnosed field?
I usually do this filtering after getting csvs in python, but the data set is too large, so I'd like to filter before I convert them to dataframes.
You can look at this another way, which is whether diagnosed = 1 in the next three days -- and take all rows where that is true:
select p.*
from (select p.*,
count(*) filter (where diagnosed = 1) over (partition by patientId order by reportDate range between interval '0 day' following and interval '3 day' following) as cnt_diagnosed_3
from patients p
) p
where cnt_diagnosed_3 > 0
order by patientId, reportDate;
Whenever the diagnosed field is changed to 1, for that patient, I'd like to get the past 3 days of data.
SELECT (p).*
FROM (
SELECT p
, diagnosed
, bool_or(diagnosed = 1) OVER (w RANGE BETWEEN CURRENT ROW AND '3 days' FOLLOWING) AS in_range
, lag(diagnosed) OVER w AS last_diagnosed
FROM patients p
WINDOW w AS (PARTITION BY patientid ORDER BY reportdate)
) sub
WHERE diagnosed = 0 AND in_range
OR diagnosed = 1 AND last_diagnosed = 0
ORDER BY patientid, reportdate;
db<>fiddle here
Returns the "past 3 days of data" where the "field is changed to 1" (previous row had "0").
The WINDOW clause is just syntactic sugar to avoid spelling out the same window definition repeatedly. (No additional benefit for performance.)
SELECT p in the innermost subquery is a neat way to get the whole row. The outer SELECT (p).* returns complete rows without auxiliary columns added in the subquery. This way we get whole rows without spelling out all columns (or even needing to know all of them).
RANGE distance PRECEDING/FOLLOWING requires Postgres 11 or later.
Here is a slower alternative that also works for older versions:
SELECT p.*
FROM (
SELECT patientid, reportdate
FROM (
SELECT patientid, reportdate, diagnosed
, lag(diagnosed) OVER (PARTITION BY patientid ORDER BY reportdate) AS last_diagnosed
FROM patients
) p0
WHERE diagnosed = 1
AND last_diagnosed = 0
) d
JOIN patients p USING (patientid)
WHERE p.reportdate BETWEEN d.reportdate - interval '3 days' AND d.reportdate
ORDER BY p.patientid, p.reportdate;
Subquery d select rows where Diagnosed just switched to 1. Then self-join to select your time frame.
For gaps-and-islands basics, see:
Select longest continuous sequence
You also added:
So when Diagnosed ==1, get report time -3 days of data for each patient.
That's a wider definition, and that's what Gordon's query does. Goes to show the importance of an exact definition of requirements.

Compare prior row using time values

I have this set of data
What I want to do is compare the Start time to the prior row and if the start time falls between the Start and end time of the prior row then flag it. Whether that flag is binary or x doesn't matter, just needs to be counted.
So that the new column calls out the instances where the start time of the current row is between the Start and End time of the prior row. My results should look like this.
My thoughts are that LAG and/or LEAD need to be used here but I'm horribly novice at both of those. I'm also thinking I need to create a ROW() for these to make it work. Either way, looking for some guidance on this. I need to be able to track conversation times to see how many times an individual is handling simultaneous conversations (usually no more than 2).
Assuming you have a primary key like ID in the example below you can do something like the below
WITH data
AS (SELECT * FROM YOUR_TABLE),
d1
AS (SELECT d.*,
Lead(start_date)
over (
ORDER BY id) lead_start_date
FROM data d)
SELECT id,
start_date,
end_date,
CASE
WHEN lead_start_date BETWEEN start_date AND end_date THEN 1
ELSE 0
END marker
FROM d1;
One method is exists:
select t.*,
(case when exists (select 1
from t t2
where t2.starttime <= t.starttime and
t2.endtime >= t.starttime
)
then 1 else 0
end) as dual_convo
from t;
If I understand correctly, I think you can also use a cumulative maximum:
select t.*,
(case when max(endtime) over (order by starttime, endtime
rows between unbounded preceding and 1 preceding
) > starttime
then 1 else 0
end) as dual_convo
from t;
Your data only has examples where the previous row overlaps. But presumably you could have overlaps on earlier rows, such as:
1 9
2 3
4 5
8 12
All but the first overlap, and only the first with the "previous" row.

Expected payments by day given start and end date

I'm trying to create a SQL view that gives me the expected amount to be received by calendar day for recurring transactions. I have a table containing recurring commitments data, with the following columns:
id,
start_date,
end_date (null if still active),
payment day (1,2,3,etc.),
frequency (monthly, quarterly, semi-annually, annually),
commitment amount
For now, I do not need to worry about business days vs calendar days.
In its simplest form, the end result would contain every historical calendar day as well as future dates for the next year, and produce how much was/is expected to be received in those particular days.
I've done quite a bit of researching, but cannot seem to find an answer that addresses the specific problem. Any direction on where to start would be greatly appreciated.
The expect output would look something like this:
| Date | Expected Amount |
|1/1/18 | 100 |
|1/2/18 | 200 |
|1/3/18 | 150 |
Thank you ahead of time!
Link to data table in db-fiddle
Expected Output Spreadsheet
It's something like this, but I've never used Netezza
SELECT
cal.d, sum(r.amount) as expected_amount
FROM
(
SELECT MIN(a.start_date) + ROW_NUMBER() OVER(ORDER BY NULL) as d
FROM recurring a, recurring b, recurring c
) cal
LEFT JOIN
recurring r
ON
(
(r.frequency = 'monthly' AND r.payment_day = DATE_PART('DAY', cal.d)) OR
(r.frequency = 'annually' AND DATE_PART('MONTH', cal.d) = DATE_PART('MONTH', r.start_date) AND r.payment_day = DATE_PART('DAY', cal.d))
) AND
r.start_date >= cal.d AND
(r.end_date <= cal.d OR r.end_date IS NULL)
GROUP BY cal.d
In essence, we cartesian join our recurring table together a few times to generate a load of rows, number them and add the number onto the min date to get an incrementing date series.
The payments data table is left joined onto this incrementing date series on:
(the day of the date from the series) = (the payment day) for monthlies
(the month-day of the date from the series) = (the month and payment day of the start_date)
Finally, the whole lot is grouped and summed
I don't have a test instance of Netezza so if you encounter some minor syntax errors, do please have a stab at fixing them up yourself (to make it faster for you to get a solution). If you reach a point where you can't work out what the query is doing, let me know
Disclaimer: I'm no expert on Netezza, so I decided to write you a standard SQL that may need some tweaking to run on Netezza.
with
digit as (select 0 as x union select 1 union select 2 union select 3 union select 4
union select 5 union select 6 union select 7 union select 8 union select 9
),
number as ( -- produces numbers from 0 to 9999 (28 years)
select d1.x + d2.x * 10 + d3.x * 100 + d4.x * 1000 as n
from digit d1
cross join digit d2
cross join digit d3
cross join digit d4
),
expected_payment as ( -- expands all expected payments
select
c.start_date + nb.n as day,
c.committed_amount
from recurring_commitement c
cross join number nb
where c.start_date + nb.n <= c.end_data
and c.frequency ... -- add logic for monthly, quarterly, etc. here
)
select
day,
sum(committed_amout) as expected_amount
from expected_payment
group by day
order by day
This solution is valid for commitments that do not exceed 28 years, since the number CTE (Common Table Expression) is producing up to a maximum of 9999 days. Expand with a fifth digit if you need longer commitments.
Note: I think the way I'm adding days to a day to a date is not correct in Netezza's SQL. The expression c.start_date + nb.n may need to be rephrased.

Find Intersection Between Date Ranges In PostgreSQL

I have records with a two dates check_in and check_out, I want to know the ranges when more than one person was checked in at the same time.
So if I have the following checkin / checkouts:
Person A: 1PM - 6PM
Person B: 3PM - 10PM
Person C: 9PM - 11PM
I would want to get 3PM - 6PM (Overlap of person A and B) and 9PM - 10PM (overlap of person B and C).
I can write an algorithm to do this in linear time with code, is it possible to do this via a relational query in linear time with PostgreSQL as well?
It needs to have a minimal response, meaning no overlapping ranges. So if there were a result which gave the range 6PM - 9PM and 8PM - 10PM it would be incorrect. It should instead return 6PM - 10pm.
Assumptions
The solution heavily depends on the exact table definition including all constraints. For lack of information in the question I'll assume this table:
CREATE TABLE booking (
booking_id serial PRIMARY KEY
, check_in timestamptz NOT NULL
, check_out timestamptz NOT NULL
, CONSTRAINT valid_range CHECK (check_out > check_in)
);
So, no NULL values, only valid ranges with inclusive lower and exclusive upper bound, and we don't really care who checks in.
Also assuming a current version of Postgres, at least 9.2.
Query
One way to do it with only SQL using a UNION ALL and window functions:
SELECT ts AS check_id, next_ts As check_out
FROM (
SELECT *, lead(ts) OVER (ORDER BY ts) AS next_ts
FROM (
SELECT *, lag(people_ct, 1 , 0) OVER (ORDER BY ts) AS prev_ct
FROM (
SELECT ts, sum(sum(change)) OVER (ORDER BY ts)::int AS people_ct
FROM (
SELECT check_in AS ts, 1 AS change FROM booking
UNION ALL
SELECT check_out, -1 FROM booking
) sub1
GROUP BY 1
) sub2
) sub3
WHERE people_ct > 1 AND prev_ct < 2 OR -- start overlap
people_ct < 2 AND prev_ct > 1 -- end overlap
) sub4
WHERE people_ct > 1 AND prev_ct < 2;
SQL Fiddle.
Explanation
In subquery sub1 derive a table of check_in and check_out in one column. check_in adds one to the crowd, check_out subtracts one.
In sub2 sum all events for the same point in time and compute a running count with a window function: that's the window function sum() over an aggregate sum() - and cast to integer or we get numeric from this:
sum(sum(change)) OVER (ORDER BY ts)::int
In sub3 look at the count of the previous row
In sub4 only keep rows where overlapping time ranges start and end, and pull the end of the time range into the same row with lead().
Finally, only keep rows, where time ranges start.
To optimize performance I would walk through the table once in a plpgsql function like demonstrated in this related answer on dba.SE:
Calculate Difference in Overlapping Time in PostgreSQL / SSRS
Idea is to divide time in periods and save them as bit values with specified granularity.
0 - nobody is checked in one grain
1 - somebody is checked in one grain
Let's assume that granularity is 1 hour and period is 1 day.
000000000000000000000000 means nobody is checked in that day
000000000000000000000110 means somebody is checked between 21 and 23
000000000000011111000000 means somebody is checked between 13 and 18
000000000000000111111100 means somebody is checked between 15 and 22
After that we do binary OR on the each value in the range and we have our answer.
000000000000011111111110
It can be done in linear time. Here is an example from Oracle but it can be transformed to PostgreSQL easily.
with rec (checkin, checkout)
as ( select 13, 18 from dual
union all
select 15, 22 from dual
union all
select 21, 23 from dual )
,spanempty ( empt)
as ( select '000000000000000000000000' from dual) ,
spanfull( full)
as ( select '111111111111111111111111' from dual)
, bookingbin( binbook) as ( select substr(empt, 1, checkin) ||
substr(full, checkin, checkout-checkin) ||
substr(empt, checkout, 24-checkout)
from rec
cross join spanempty
cross join spanfull ),
bookingInt (rn, intbook) as
( select rownum, bin2dec(binbook) from bookingbin),
bitAndSum (bitAndSumm) as (
select sum(bitand(b1.intbook, b2.intbook)) from bookingInt b1
join bookingInt b2
on b1.rn = b2.rn -1 ) ,
SumAll (sumall) as (
select sum(bin2dec(binbook)) from bookingBin )
select lpad(dec2bin(sumall - bitAndSumm), 24, '0')
from SumAll, bitAndSum
Result:
000000000000011111111110

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.
Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.
Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).