Populate null records scenario

Populate null records scenario - sql

I have a sql table the following columns:
FirstName, LastName, Points, StartTime
I have the data right now with the StartTime populated for the person with the highest points. StartTime is Null for everyone else.
I want to do the following in a stored procedure:
Populate StartTime with intervals of 30 minutes. So right now there is only one person with StartTime. The next person in line in terms of points gets StartDate 30 minutes after the previous one. There are some conditions. Right now StartTime for the person with highest points is set to a working day and time is 8:00AM. The next one should be 8:30AM and so on. The last time for a working day is 5:00PM and then it should go to the next working day (skipping weekends) and continue assigning dates and times starting from 8:00AM to 5:00PM with 30 minutes increment - Only work days until all the StartTimes for all the rows are populated.
So there should be 18 people with a starttime for the same working day.
Any ideas?
Thank you

Because of the fairly simple repeating pattern you can generate start times using some simple maths. Assuming a working week of Monday to Friday, this SQL will generate the available slot date/times;
declare #start_point datetime = '20180524 08:00' -- this MUST be a Monday date at 8am
declare #slot_duration int = 30 -- minutes
declare #slots_per_day int = 18
declare #slots_per_week int = #slots_per_day * 5
;with numbers as (select row_number() over (order by so1.object_id) - 1 as rn from sys.objects so1 cross join sys.objects so2)
select
rn,
rn / #slots_per_week as weeks, -- zero based number
(rn % #slots_per_week) / #slots_per_day days, -- zero based number
rn % #slots_per_day as slots, -- zero based number
dateadd(minute, (rn % #slots_per_day) * #slot_duration,
dateadd(day, (rn % #slots_per_week) / #slots_per_day,
dateadd(week, rn / #slots_per_week, #start_point)
)
) as StartTime
from
numbers
order by rn
Now you can join to this data (subquery, temp table etc) to update your original table.

Related

Between numerical values with no lower limit in Oracle SQL

My data looks something similar to:
days
weight
start date
end date
180
1
01/01/2020
null
365
0.75
01/01/2020
null
And I want to be able to select this to assign the correct value where say if the days were 0-180, they would be row 1 and 181-365 it would be row 2. If it was 365+ it would be row 2. I have already found out I can use between sql syntax for the date.
My initial code tries to do this:
select weight from (select * from table where days >= #DAYS order by days ASC) where rownum =1
But then if you do more than the last value it doesn't show anything so i've then tried to introduce a maximum element trying to find the maximum value and saying
>= #DAYS
or
>= MAX(#DAYS)
Is there a simpler way to do this?
Thanks.

select weight
from (select t.*, max(days) over () as max_day from table t) v
where days >= least(#DAY,max_day)
order by days asc
fetch first 1 row only
I'd suggest this option. When #DAY becomes larger than the largest days entry, we use max_days instead.

Select max(weight) from table
Where days=(Select max(days) from table
Where days >= #DAYS)
The first max() function is defensive in case your table has 2 entries with the same days number.

How to get six weeks data from a week column?

I have a legacy query in which I am looking data for six weeks as shown below. In my below AND condition I get data for past six weeks and it worked fine in 2020 middle and end. But since 2021 started, this stopped working because of obvious subtraction I am doing with 6.
AND data.week_col::integer BETWEEN DATE_PART(w, CURRENT_DATE) - 6 AND DATE_PART(w, CURRENT_DATE) - 1
There is a bug in above query because of which it stopped working in 2021. How can I change above condition so that it can work entire year without any issues and give me data for past 6 weeks.
Update
Below is my query which I am running:
select *,
dateadd(d, - datepart(dow, trunc(CONVERT_TIMEZONE('UTC','PST8PDT',client_date))), trunc(CONVERT_TIMEZONE('UTC','PST8PDT',client_date)) + 6) as day,
date_part(week, day) as week_col
from holder data
where data.week_col::integer BETWEEN DATE_PART(w, CURRENT_DATE) - 6 AND DATE_PART(w, CURRENT_DATE) - 1
client_date column has values like this - 2021-01-15 21:30:00.0. And from that I get value of day column and from day column I get value of
week_col column as shown above.
week_col column has values like 53, 52 .... It's a week number in general.
Because of my AND condition I am getting data for week 1 only but technically I want data for 49, 50, 51, 52, 53 and 1 as it is past six weeks. Can I use day column here to get correct past six weeks?

Would this serve as a solution? I do not know much about the redshirt syntax but I read it supports dateadd(). If you are normalizing client_date to a time zone converted day with no time then why not simply use that in the comparison to the current date converted to the same time zone.
WHERE
client_date BETWEEN
DATEADD(WEEK,-6,trunc(CONVERT_TIMEZONE('UTC','PST8PDT',CURRENT_DATE)))
AND
DATEADD(WEEK,-1,trunc(CONVERT_TIMEZONE('UTC','PST8PDT',CURRENT_DATE)))
If the above logic works out then you may want to convert the -6 and -1 week to variables, if that is supported.
Solution 2
This is a bit more verbose but involves virtualizing a calender table and then joining your current date parameter into the calender data, for markers. Finally, you can join your data against the calender which has been normalized by weeks in time chronologically.
This is SQL Server syntax, however, I am certain it can be converted to RS.
DECLARE #D TABLE(client_date DATETIME)
INSERT #D VALUES
('11/20/2020'),('11/27/2020'),
('12/4/2020'),('12/11/2020'),('12/18/2020'),('12/25/2020'),
('01/8/2021'),('01/8/2021'),('1/15/2021'),('1/22/2021'),('1/29/2021')
DECLARE #Date DATETIME = '1/23/2021'
DECLARE #StartDate DATETIME = '01/01/2010'
DECLARE #NumberOfDays INT = 6000
;WITH R1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
R2(N) AS (SELECT 1 FROM R1 a, R1 b),
R3(N) AS (SELECT 1 FROM R2 a, R2 b),
Tally(Number) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM R3)
,WithTally AS
(
SELECT CalendarDate = DATEADD(DAY,T.Number,#StartDate)
FROM Tally T
WHERE T.Number < #NumberOfDays
)
,Calendar AS
(
SELECT
CalendarDate,
WeekIndex = DENSE_RANK() OVER(ORDER BY DATEPART(YEAR, CalendarDate), DATEPART(WEEK, CalendarDate))
FROM
WithTally
),
CalendarAlignedWithCurrentDateParamater AS
(
SELECT *
FROM
Calendar
CROSS JOIN (SELECT WeekIndexForToday=WeekIndex FROM Calendar WHERE Calendar.CalendarDate=#Date ) AS X
)
SELECT
D.*,
C.WeekIndex,
C.WeekIndexForToday
FROM
CalendarAlignedWithCurrentDateParamater C
INNER JOIN #D D ON D.client_date = C.CalendarDate
WHERE
C.WeekIndex BETWEEN C.WeekIndexForToday-6 AND C.WeekIndexForToday-1
OPTION (MAXRECURSION 0)

How to implement loops in SQL?

I am trying to calculate a KPI for each patient, the KPI is called "Initial prescription start date(IPST)".
The definition of IPST is if the patient has a negative history of using that particular medication for 60 days before a start date that start date is a IPST.
For example- See screen shot below, for patient with ID=101, I will start with IPST as 4/15/2019 , the difference in days between 4/15/2019 and 4/1/2019 is 14 <60 thus I will change my IPST to 4/1/2019.
Continuing with this iteration IPST for 101 is 3/17/2019 and 102 is 3/18/2018 as shown on the right hand side table.
I tried to build a UDF as below, where I am passing id of a patient and UDF is returning the IPST.
CREATE FUNCTION [Initial_Prescription_Date]
(
#id Uniqueidentifier
)
RETURNS date
AS
BEGIN
{
I am failing to implement this code here
}
I can get a list of Start_dates for a patient from a medication table like this
Select id, start_date from patient_medication
I will have to iterate through this list to get to the IPST for a patient.

I'll answer in order to start a dialog that we can work on.
The issue that I have is the the difference in days for ID = 102 between the last record and the one you've picked as the IPST is 29 days, but the IPST you've picked for 102 is 393 days, is that correct?
You don't need to loop to solve this problem. If you're comparing all of your dates only to your most recent, you can simply use MAX:
DECLARE #PatientRecords TABLE
(
ID INTEGER,
StartDate DATE,
Medicine VARCHAR(100)
)
INSERT INTO #PatientRecords VALUES
(101,'20181201','XYZ'),
(101,'20190115','XYZ'),
(101,'20190317','XYZ'),
(101,'20190401','XYZ'),
(101,'20190415','XYZ'),
(102,'20190401','XYZ'),
(102,'20190415','XYZ'),
(102,'20190315','XYZ'),
(102,'20180318','XYZ');
With maxCTE AS
(
SELECT *, DATEDIFF(DAY, StartDate, MAX(StartDate) OVER (PARTITION BY ID, MEDICINE ORDER BY StartDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) [IPSTDateDifference]
FROM #PatientRecords
)
SELECT m.ID, m.Medicine, MIN(m.StartDate) [IPST]
FROM maxCTE m
WHERE [IPSTDateDifference] < 60
GROUP BY m.ID, m.Medicine
ORDER BY 1,3;

Set week number based on first Monday of Year

I have a requirement to set the week number of a table from the first day to the first Monday to the next Monday and so on. I can easily get the first Day and first Monday of year but I do not know how to increment trough the table in 7 days intervals from the first Monday so that I can set the week number.
I have something like this:
UPDATE table
SET weeknumberofyear = #WeekNumber + 1
WHERE datefield = DATEADD(Day,7,(SELECT DATEADD(DAY, (##DATEFIRST - DATEPART(WEEKDAY, #Date) + (8 - ##DATEFIRST) * 2) % 7, #Date)))

Since the datepart(week,datefield) function gets week number based on Sunday as the first day of the week, all you have to do is check datepart(weekday,datefield) and if it is 1 (Sunday) or 2 (Monday), subtract 1 from the datepart(week,datefield) function:
update table
set weeknumberofyear = datepart(week,datefield) -
case when datepart(weekday,datefield) in(1,2) then 1 else 0 end
EDIT This doesn't account for Years when Sunday or Monday are the first day of the year. In those cases, you would get 0 for weeknumberofyear. To fix this, perform a second update to your table. Even though this takes two updates, I still think it is more efficient than cycling through all the records.
update table
set weeknumberoftheyear = weeknumberoftheyear + 1
where year(datefield) in(
select distinct year(datefield)
from table
where weeknumberoftheyear = 0
)
EDIT WeekNumberOfTheMonth Update - Now that we have the WeekNumberOfTheYear value, we can use a ranking function on that field to update the WeekNumberOfTheMonth column without any recursion.
update t
set t.weeknumberofthemonth = u.weeknumberofthemonth
from table t
inner join (
select distinct weeknumberoftheyear,
dense_rank() over(partition by month(datefield)
order by weeknumberoftheyear) weeknumberofthemonth
from table ) u
on u.weeknumberofyear = t.weeknumberofyear

I'm not sure what you mean by all the talk about "Monday", but if you're looking to get the week number from a date, you can do something like this:
UPDATE table
SET weeknumberofyear = DATEPART(wk, datefield)

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.

Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.

Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Populate null records scenario - sql

Related

Between numerical values with no lower limit in Oracle SQL

How to get six weeks data from a week column?

How to implement loops in SQL?

Set week number based on first Monday of Year

Count occurrences of combinations of columns

Categories

Resources