Date jm Text
-------- ---- ----
6/3/2015 ne Good
6/4/2015 ne Good
6/5/2015 ne Same
6/8/2015 ne Same
I want to count how often the "same" value occurs in a set of consecutive days.
I dont want to count the value for the whole database. Now on the current date it is 2 (above example).
It is very important for me that "Same" never occurs...
The query has to ignore the weekend (6 and 7 june).
Date jm Text
-------- ---- ----
6/3/2015 ne Same
6/4/2015 ne Same
6/5/2015 ne Good
6/8/2015 ne Good
In this example the count is zero
Okay, I'm starting to get the picture, although at first I thought you wanted to count by jm, and now it seems you want to count by Text = 'Same'. Anyway, that's what this query should do. It gets the row for the current date. Is connects all previous rows and counts them. Also, it shows whether the current text (and that of the connected rows).
So the query will return one row (if there is one for today), which will show the date, jm and Text of the current date, the number of consecutive days for which the Text has been the same (just in case you want to know how many days it is 'Good'), and the number of days (either 0 or the same as the other count) for which the Text has been 'Same'.
I hope this query is right, or at least it gives you an idea of how to solve the problem using CONNECT BY. I should mention I based the 'Friday-detection' on this question.
Also, I don't have Oracle at hand, so please forgive me for any minor syntax errors.
WITH
VW_SAMESTATUSES AS
( SELECT t.*
FROM YourTable t
START WITH -- Start with the row for today
t.Date = trunc(sysdate)
CONNECT BY -- Connect to previous row that have a lower date.
-- Note that PRIOR refers to the prior record, which is
-- actually the NEXT day. :)
t.Date = PRIOR t.Date +
CASE MOD(TO_CHAR(t.Date, 'J'), 7) + 1
WHEN 5 THEN 3 -- Friday, so add 3
ELSE 1 -- Other days, so add one
END
-- And the Text also has to match to the one of the next day.
AND t.Text = PRIOR t.Text)
SELECT s.Date,
s.jm,
MAX(Text) AS CurrentText, -- Not really MAX, they are actually all the same
COUNT(*) AS ConsecutiveDays,
COUNT(CASE WHEN Text = 'Same' THEN 1 END) as SameCount
FROM VW_SAMESTATUSES s
GROUP BY s.Date,
s.jm
This recursive query (available from Oracle version 11g) might be useful:
with s(tcode, tdate) as (
select tcode, tdate from test where tdate = date '2015-06-08'
union all
select t.tcode, t.tdate from test t, s
where s.tcode = t.tcode
and t.tdate = s.tdate - decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) )
select count(1) cnt from s
SQLFiddle
I prepared sample data according to your original question, without further edits, you can see them in attached SQLFiddle. Additional conditions for column 'Text'
are very simple, just add something like ... and Text ='Same' in where clauses.
In current version query counts number of previous days starting from given date (change it in line 2) where dates are consecutive (excluding weekend days) and values in column tcode is the same for all days.
Part: decode(s.tdate-trunc(s.tdate, 'iw'), 0, 3, 1) is for substracting days depending if it's Monday or other day, and should work independently from NLS settings.
Related
I have a dataset like this called data_per_day
instructional_day
points
2023-01-24
2
2023-01-23
2
2023-01-20
1
2023-01-19
0
and so on. the table shows weekdays (days minus holidays and weekends) and the number of points someone has earned. 1 is the start of a streak and 0 is the end of a streak. 2 is max points after a streak has started.
I need to find how long is the latest streak. so in this case the result should be 3
I created a recursive cte but the query returns 2 as the streak count because i'm using lag mechanism with days. instead I need to adjust so that the instructional days are used rather than all dates.
RECURSIVE cte AS (
SELECT
student_unique_id,
instructional_day,
points,
1 AS cnt
FROM
`data_per_day`
WHERE
instructional_day = DATE_ADD(CURRENT_DATE('America/Chicago'), INTERVAL -1 DAY)
UNION ALL
SELECT
a.student_unique_id,
a.instructional_day,
a.points,
c.cnt+1
FROM (
SELECT
*
FROM
`data_per_day`
WHERE
points > 0 ) a
INNER JOIN
cte c
ON
a.student_unique_id = c.student_unique_id
AND a.instructional_day = c.instructional_day - INTERVAL '1' day )
SELECT
student_unique_id,
MAX(cnt) AS streak
FROM
cte --
WHERE
student_unique_id = "419"
GROUP BY
student_unique_id
How do I adjust the query?
This is not a trivial coding exercise, so I won't actually write the code and provide it.
What you have here is a gaps and islands question. You want to identify the largest "island" of days with points within a date range. Depending upon what dates are contained in your data, you may need to generate a list of sequential dates that meet your criteria.
One problem I see is that you are trying to combine the steps to generate the date range (the recursive CTE) with the points. You'll need to separate those steps.
Define the date range.
Generate the dates within the range.
Filter the dates with isweekday = 'no' and isholiday = 'no'. You will probably want to add a row number during this step.
[left] join the dates to your data, including coalesce(points, 0)
Filter the data to points > 0.
Identify the islands.
Identify the largest island per student.
I'm building a scheduling system where I store an initial appointment and how often it repeats. My table looks something like this:
CREATE TABLE (
id serial primary key,
initial_timestamp timestamp not null,
recurring interval
);
id initial_timestamp recurring
27 2020-06-02 3 weeks
24 2020-06-03 10 days
Assuming I can handle the time component, and that the only intervals we'll run across are days and weeks, how can I find the when those two appointments will overlap? For example, the previous example will overlap on June 23rd. It's 3 weeks from June 2nd and 20 days from June 3rd, so the first appointment will repeat once on that day and the second appointment will repeat on the 13th and then the 23rd.
In my program, I have another date, say June 7th with a recurring interval of 12 days. What query can I use to find the time it will take for a recurring appointment starting on June 7th to overlap with every existing recurring appointment? So for example, this appointment will repeat on June 19, July 1, and July 13. Appointment #24 from the table above will repeat on June 13, June 23, July 3, and July 13, if my math is right. I'd like my query comparing this appointment to appointment #24 to return, first of all, July 13th, then also how long it would take to repeat again, which I assume would be like finding the least common multiple of the two intervals, in this case, 60 days (LCM of 12 and 10). So I could expect it to repeat again on July 13 + 60 days = Sept 11.
I tried using generate_series, but since I don't know the size of the intervals, the series would have to continue infinitely, right? It's probably not the best choice here. I assume the answer would have more to do with the math of multiplying intervals somehow.
Note that recurring can be null, so I'd assume there has to be something like WHERE recurring IS NOT NULL in there somewhere. Another thing to note: no initial appointments overlap. I've already guarded against that. The search term doesn't overlap with any of the appointment's initial times either.
If it helps at all, I'm using PHP 5.3 to send queries to Postgres 9.4 (I know, it's an ancient setup). I'd prefer to do most of this in SQL just because most of the other logic is in SQL right now, so I can just run the query and start manipulating the results with PHP.
So in summary, if my math is right, what Postgres query should I use with the table above to compare a given date and interval with every date and interval pair from the table to find the next date those two overlap and how far apart each overlap instance would be?
This was hard.
WITH RECURSIVE moving_target(initial_timestamp, recurring) AS (
VALUES (timestamp '2020-06-07', interval '12 days') -- search term
)
, x AS ( -- advance to the closest day before or at moving target
SELECT t.id
, t_date + ((m_date - t_date) / t_step) * t_step AS t_date
, t_step
, m.*
FROM ( -- normalize table data
SELECT id
, initial_timestamp::date AS t_date
, EXTRACT ('days' FROM recurring)::int AS t_step
FROM tbl
WHERE recurring IS NOT NULL -- exclude!
) t
CROSS JOIN ( -- normalize input
SELECT initial_timestamp::date AS m_date
, EXTRACT ('days' FROM recurring)::int AS m_step
FROM moving_target
) m
)
, rcte AS ( -- recursive CTE
SELECT id, t_date, t_step, m_date, m_step
, ARRAY[m_date - t_date] AS gaps -- keep track of gaps
, CASE
WHEN t_date = m_date THEN true -- found match
WHEN t_step % m_step = 0 THEN false -- can never match
WHEN (m_date - t_date) % 2 = 1 -- odd gap ...
AND t_step % 2 = 0 -- ... but even steps
AND m_step % 2 = 0 THEN false -- can never match
-- WHEN <stop conditions?> THEN false -- hard to determine!
-- ELSE null -- keep searching
END AS match
FROM x
UNION ALL
SELECT id, t_date, t_step, m_date, m_step
, gaps || m_date - t_date
, CASE
WHEN t_date = m_date THEN true
WHEN (m_date - t_date) = ANY (gaps) THEN false -- gap repeated!
-- ELSE null -- keep searching
END AS match
FROM (
SELECT id
, t_date + (((m_date + m_step) - t_date) / t_step) * t_step AS t_date
, t_step
, m_date + m_step AS m_date -- + 1 step
, m_step
, gaps
FROM rcte
WHERE match IS NULL
) sub
)
SELECT id, t.initial_timestamp, t.recurring
, CASE WHEN r.match THEN r.t_date END AS match_date
FROM rcte r
JOIN tbl t USING (id)
WHERE r.match IS NOT NULL;
db<>fiddle here - with more test rows
There may be potential to improve further. The core problem is in the realm of
prime factorization. As it seems reasonable to expect fairly small intervals, I solved it by testing for cycles: If, while incrementally stepping forward, a gap between dates is detected that we have seen before, and dates didn't overlap yet, they will never overlap and we can stop. This loops at most GREATEST(m_step, t_step) times (the number of days in the bigger interval), so it shouldn't scale terribly.
I identified some basic mathematical stop conditions to avoid looping in hopeless cases a priori. There may be more ...
Explaining everything that's going on here is more work than devising the query. I added comments that should explain basics ...
Then again, while intervals are small, a "brute force" approach based on generate_series() may still be faster.
I have two date fields, DATE_FIELD_ONE = 8/30/2018 and DATE_FIELD_TWO = DATE_FIELD_ONE + 20. I need to find what DATE_FIELD_TWO should be if I'm only added 20 business days . How would I accomplish this? I thought maybe trying 'DY' but not sure how to get it to work. Thanks.
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SAT' THEN 1 ELSE 0 END
CASE WHEN TO_CHAR(TO_DATE(DATE_FIELD_ONE),'DY')='SUN' THEN 1 ELSE 0 END
You may try this :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
level
end as int) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*1.5
-- 20 is the day to be added, every time 5(#of business days)*1.5 > 7(#of week days)
-- 7=5+2<5+(5/2)=5*(1+1/2)=5*1.5 [where 1.5 is just a coefficient might be replaced a greater one like 2]
-- so 4*5*1.5=20*1.5 > 4*7
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
27.09.2018
by using connect by dual clause.
P.S. Ignored the case for public holidays, which differ from one culture to another , depending on the question being related with only weekends.
Rextester Demo
Edit : Assume you have a national holidays on '2018-09-25' and '2018-09-26' (in this set of days), then consider the following :
select max(date_field_two) as date_field_two
from
(
select date'2018-08-30'+
(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
level
end) as date_field_two,
sum(cast(case when to_char(date'2018-08-30'+level,'D','NLS_DATE_LANGUAGE=ENGLISH')
in ('6','7') then
0
when date'2018-08-30'+level in (date'2018-09-25',date'2018-09-26') then
0
else
1
end as int)) over (order by level) as next_day
from dual
connect by level <= 20*2
)
where next_day = 20;
DATE_FIELD_TWO
-----------------
01.10.2018
which iterates one day next, as in this case, unless this holiday coincides with weekend.
you can define workdays to be whatever you like if you use a PL/SQL function
Have a simple prototype here - without any holidays - but it could be adapted for that purpose using the same kind of logic.
create or replace function add_business_days (from_date IN date, bd IN integer) return date as
fd date := trunc(from_date,'iw');
cnt int := (from_date-fd)+bd-1;
ww int := ceil(cnt/5);
wd int := mod(cnt,5);
begin
return from_date + (ww*7)+wd;
end;
/
I realize you already have an answer, but for what it's worth this is something we deal with all the time and have what has turned out to be a very good solution.
In effect, we maintain a separate table called "work days" that has every conceivable date we would ever compare (and that definition will vary from application to application, of course -- but in any case it will never be "huge" by RDBMS standards). There is a boolean flag that dictates if the date is a work day or a weekend/holiday, but more importantly there is an index value that only increments on work days. The table looks like this:
The advantage to this is transparency and scalability. If you want the difference between two dates in work days:
select
h.entry_date, h.invoice_date, wd2.workday_index - wd1.workday_index as delta
from
sales_order_data h
join util.work_days wd1 on h.sales_order_entry_dte = wd1.cal_date
join util.work_days wd2 on h.invoice_dte = wd2.cal_date
If you need to take a date in a table and add 20 days (like your original problem statement):
select
h.date_field_1, wd2.cal_date as date_field_1_plus_20
from
my_table h
join util.work_days wd1 on h.date_field_1 = wd1.cal_date
join util.work_days wd2 on
wd1.workday_index + 20 = wd2.workday_index and
wd2.is_workday
(disclaimer, this is in PostgreSQL, which is why I have the boolean. In Oracle, I'm guessing you need to change that to an integer and say =1)
Also, for the bonus question, this also gives two different options for defining "work day," one that rolls forward and another that rolls backwards (hence the workday_index and workday_index_back). For example, if you need something on a Saturday, and Saturday is not a work day, that means you need it on Friday. Conversely, if something is to be delivered on Saturday, and Saturday is not on a work day, then that means it will be available on Monday. The context of how to handle non-workdays differs, and this method affords you the option of chosing the right one.
As a final selling point, this option allows you to define holidays as non-work days also... and you can do this or not do this; it's up to you. The solution permits either option. You could theoretically add two more columns for work day index weekend only that gave you both options.
I am looking for some general advice rather than a solution. My problem is that I have a list of dates per person where due to administrative procedures, a person may have multiple records stored for this one instance, yet the date recorded is when the data was entered in as this person is passed through the paper trail. I understand this is quite difficult to explain so I'll give an example:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2000-01-01 B
1 2000-01-02 C
1 2003-04-01 A
1 2003-04-03 A
where I want to know how many valid records a person has by removing annoying audits that have recorded the date as the day the data was entered, rather than the date the person first arrives in the dataset. So for the above person I am only interested in:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2003-04-01 A
what makes this problem difficult is that I do not have the luxury of an audit column (the audit column here is just to present how to data is collected). I merely have dates. So one way where I could crudely count real events (and remove repeat audit data) is to look at individual weeks within a persons' history and if a record(s) exists for a given week, add 1 to my counter. This way even though there are multiple records split over a few days, I am only counting the succession of dates as one record (which after all I am counting by date).
So does anyone know of any db2 functions that could help me solve this problem?
If you can live with standard weeks it's pretty simple:
select
person, year(dt), week(dt), min(dt), min(audit)
from
blah
group by
person, year(dt), week(dt)
If you need seven-day ranges starting with the first date you'd need to generate your own week numbers, a calendar of sorts, e.g. like so:
with minmax(mindt, maxdt) as ( -- date range of the "calendar"
select min(dt), max(dt)
from blah
),
cal(dt,i) as ( -- fill the range with every date, count days
select mindt, 0
from minmax
union all
select dt+1 day , i+1
from cal
where dt < (select maxdt from minmax) and i < 100000
)
select
person, year(blah.dt), wk, min(blah.dt), min(audit)
from
(select dt, int(i/7)+1 as wk from cal) t -- generate week numbers
inner join
blah
on t.dt = blah.dt
group by person, year(blah.dt), wk
I have basic knowledge of SQL and have a question:
I am trying to select data from a time series (date and windspeed). I want to select the original wind speed value if it lies between hours 7 and 21. If the hour is outside this range I would like to assign the wind speed to the previous wind speed at hour 21. There is also a concern that there is the occasional point where hour 21 does not exist and would like to assign the windspeed as hour 20... 19 etc until it finds the next available hour.
SELECT
date,
CASE WHEN DATEPART(HH,date) < 7 OR DATEPART(HH,date) > 21
THEN '<WIND SPEED AT HOUR 21> ELSE <WIND SPEED> END AS ModifiedWindspeed
,WindSpeed, winddirection
from TerrainCorrectedHourlyWind w
This might make things clearer. If the hour is in the specified range, select windspeed. If not then select the wind speed from the prior day at 21 hours.
Though you've tagged the question mysql, I'm guessing this is actually SQL Server because of the DATEPART() function used. Try the following, which uses an OUTER APPLY to get your alternate value:
SELECT Date
, CASE
WHEN DATEPART(HOUR, Date)BETWEEN 7 AND 21 THEN w.WindSpeed
ELSE m.WindSpeed
END AS ModifiedWindSpeed
, w.WindSpeed
, w.WindDirection
FROM TerrainCorrectedHourlyWind AS w
OUTER APPLY(SELECT TOP 1 WindSpeed
FROM TerrainCorrectedHourlyWind
WHERE DATEPART(HOUR, Date)BETWEEN 7 AND 21
AND Date < w.Date
ORDER BY Date DESC)AS m;
Just to explain what this is doing--the OUTER APPLY will get the single most recent record (TOP 1 and ORDER BY Date DESC) for dates prior to the record in question (Date < w.Date) as well as within the hours specified. The CASE near the top chooses whether to use the current value or this alternate one based on the hour.