Group data by weeks since the start of event in sql - sql

I’m a data analyst in the insurance industry and we currently have a program in SAS EG that tracks catastrophe development week by week since the start of the event for all of the catastrophic events that are reported.
(I.E week 1 is catastrophe start date + 7 days, week 2 would be end of week 1 + 7 days and so on) then all transaction amounts (dollars) for the specific catastrophes would be grouped into the respective weeks based on the date each transaction was made.
Problem that we’re faced with is we are moving away from SAS EG to GCP big query and the current process of calculating those weeks is a manually read in list which isn’t very efficient and not easily translated to BigQuery.
Curious if anybody has an idea that would allow me to calculate each week number in periods of 7 days since the start of an event in SQL or has an idea specific for BigQuery? There would be different start dates for each event.
It is complex, I know and I’m willing to give more explanation as needed. Open to any ideas for this as I haven’t been able to find anything.

Related

Organizing per-week and daily data in SQL

Problem overview
I'm working on a simple app for reminding the user of weekly goals. Let's say the goal is to do 30 minutes of exercise on specific days of the week.
Sample goal: do exercise on Mon, Wed, Fri.
The app also needs to track past record, i.e. dates when the user did exercise. It could be just dates, e.g.: 2019-09-02, 2019-09-05, 2019-09-11 means the user did exercise on these days and did not on the others (doesn't need to be on "exercise goal" days of the week).
The goal can change in time. Let's say today is 2019-09-11 and the goal for this week ([2019-09-09, 2019-09-15]) is Mon, Wed, Fri but from 2019-08-05 to 2019-09-08 it was Mon, Thu (repeatedly for all these weeks).
I need to store these week-oriented goals and historic exercise of data and be able to retrieve the following:
The goal days for the current week (or any week, let's say I can compute start and end day for any week given a date).
Exercise history for a larger range of days together with goal days for that range (e.g. to show when the user was supposed to exercise and when they actually did in the last month).
Question
How to best store this data in SQL.
This is a little bit academic because I'm working on a small Android app and the data is just for a single user. So there will be little data and I can successfully use any approach, even a very clumsy one will be efficient enough.
However, I'd like to explore the topic and maybe learn a thing or two.
Possible solutions
Here are two approaches that come to my mind.
In both cases I would store exercise history as a table of dates. If there is an entry for that date it means the user did exercise on that day.
It's the goal storage that is interesting.
Approach 1
Store the goals per-week (it's SQLite so dates are stored as strings - all dates are just 'YEAR-MONTH-DAY'):
CREATE TABLE goals (
start_date TEXT,
exercise_days TEXT);
"start_date" is the first day of the week,
"exercise_days" is a comma-separated list of weekdays (let's say numbers 1-7).
So for the example above we might have two rows:
'2019-08-05', '1,4'
'2019-09-09', '1,3,5'
meaning that since 2019-08-05 the goal is Mon, Thu for all weeks until 2019-09-09, when the goal becomes Mon, Wed, Fri. So there is a gap in the data. I wouldn't want to generate data for weeks starting on 2019-08-12, 2019-08-19, 2019-08-26.
With this approach it is easy to work with the data week-wise. The current goal is the one with MAX('start_date'). The goal for a week for a given date is MAX('start_date') WHERE 'start_date' <= :date.
However it gets cumbersome when I want to get data for the last 3 months and show the user their progress.
Or maybe I want to show the user the percentage of actual exercise days to what they set as their goal in a year.
In this case it seems the best approach is to fetch the data separately and merge it in the application (or maybe write some complex queries), processing week by week. This is ok performance-wise because the amount of data is small and I rarely need more than a handful of weeks.
Approach 2
Store goals in such a way that each goal day is a record:
CREATE TABLE goals (
day TEXT,
);
"day" is a day when the user should exercise. So for the week starting 2019-09-09 (Mon, Wed, Fri) we would have:
'2019-09-09'
'2019-09-11'
'2019-09-13'
and for the week starting 2019-08-05 (Mon, Thu) we would have:
'2019-08-05'
'2019-08-09'
but what for the weeks in-between?
If my app could fill all the weeks in-between then it would be easy to merge this data with the exercise history and display days when the user was supposed to exercise and when they actually did. Extracting the goal for any given week would also be easy.
The problem is: this requires the app to generate data for the "gap" weeks even if the user doesn't tweak the goal. This can be implemented as a transaction that is run each time the app process starts. In some cases it could take noticeable time for occasional users of the app (think progress bar for a second).
Maybe there a smart way to generate the data in-between when making a SELECT query?
I don't like the fact that it requires generating data. I do like the fact that I can just join the tables and then process that (e.g. compute how many exercise days there were supposed to be in August and how many days the user did actually exercise and then show them percentage like "you did 85% of your goal" - in fact I can do this without joining the tables).
Also, it seems this approach gives me more flexibility for analysis in the future.
But is there a third way? Or maybe I am overthinking this? :)
(I am asking mostly for the way of organizing the data, there's no need for exact SQL queries)
Perhaps I'm over-thinking this, but if a goal can have multiple components to it, and can change over time I'd have a goal header record, with the ID, name and other data about the goal as a whole, and then a separate table linked with the components of that goal which are time-boxed, for example:
CREATE TABLE goal_days (goal_day_ID INT,
goal_ID INT,
day_ID INT,
target_minutes INT,
start_date TEXT,
end_date TEXT)
I'd have thought that allows you to easily check against the history to map against each day of the goal - e.g. they got 100% of the Mondays, but kept missing Thursday - however when the goal was changed to Friday instead they got better.

Q > SQL Business days for specific month/region

I am trying to write a sql query that would automate my reporting, I am pulling total renewal volume for month X and dividing it by the # of business days for that specific month (manual process for now). The issue that I am having and cannot seem to solve is that I report on two centers (Toronto & Montreal) and these two cities do not share the same holidays so depending on the month the # of business days will vary between cities.
The solution that I can think of is making a table within sql manually with a year or two ahead of the holidays for these 2 cities then sourcing the table, unless anyone else can suggest something better :).
Thanks in advance

Bigquery - Table decorators changed weirdly

I used to have a number of queries running on the past 40 days of data using a decorator with [dataset.table#-4123456789-].
However, since September 15 all the decorators return maximum 10 days of data.
By the way [dataset.table#0] returns the whole table and not the past 7 days as told in the documentation.
Does anyone know what is going on. Do I have to move my table to partition in order to receive data for a limited period of time but more the a week?
Thanks

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

how to model (mvc) an abstract time period for employe shift management

I've started thinking about an employee shift management application to handle the shifts (who works when, trading, etc) at my current workplace (that uses pen and paper and hasn't got anyway for us employees to communicate about changes without going through the boss and be on site).
Currently the shifts are modeled loosely as:
There is a recurring 4 week period (from Monday week 1 to Sunday week 4)
There is a template for placing employees in this 4 week period
Every 4 months (ie 3 times a year) the 4 week template is projected over the next 4 month period
The shifts have been the same for a long time and it seems many employees would prefer to have them changed (I can say this by the requests for change that come in every time a new 4 month is set).
What I'm aiming at are the models:
Shift_group_tpl (the 4 week period above)
Shift_tpl (a single shift in the 4 week period, including info on who defaults to work this shift)
Shift_group (a set period of time whit actual shifts)
Shift (a set shift whit a real time period and an employee - and the possibility to be changed both in start_time, end_time and employee)
I've thought of a way to do this with recurring iCalendar events: Creating RRULE's (without an endtime) and then calculate (using temporary start and end times) if that specific Shift_group_tpl could be used within a real Shift_group. (The problem with this approach is that I can't figure out how to trim the Shift_group_tpl's to fit into the start or end of a Shift_group.)
What I'm looking for are some other perspectives or ways of doing it or even just a pat on the shoulder letting me know that I'm on the right track (and then giving advice on the trimming problem).
/iole1
What I'm aiming at are the models:
Shift_group_tpl (the 4 week period above)
Shift_tpl (a single shift in the 4 week period, including info on who defaults to work this shift)
Shift_group (a set period of time whit actual shifts)
Shift (a set shift whit a real time period and an employee - and the possibility to be changed both in start_time, end_time and employee)
You have "sql" as a tag for this post? So im guessing you want these as SQL tables?
By the sounds, the problem is that your considering the data you have, rather than the abstract concepts you need to store that data. Which is what you'd need to do to create an application. (Most likely a "Shifts" table, rather than the four tables above).
There is little information here to help, Consider refining your thoughts and ask another question.