Q > SQL Business days for specific month/region - sql

I am trying to write a sql query that would automate my reporting, I am pulling total renewal volume for month X and dividing it by the # of business days for that specific month (manual process for now). The issue that I am having and cannot seem to solve is that I report on two centers (Toronto & Montreal) and these two cities do not share the same holidays so depending on the month the # of business days will vary between cities.
The solution that I can think of is making a table within sql manually with a year or two ahead of the holidays for these 2 cities then sourcing the table, unless anyone else can suggest something better :).
Thanks in advance

Related

Group data by weeks since the start of event in sql

I’m a data analyst in the insurance industry and we currently have a program in SAS EG that tracks catastrophe development week by week since the start of the event for all of the catastrophic events that are reported.
(I.E week 1 is catastrophe start date + 7 days, week 2 would be end of week 1 + 7 days and so on) then all transaction amounts (dollars) for the specific catastrophes would be grouped into the respective weeks based on the date each transaction was made.
Problem that we’re faced with is we are moving away from SAS EG to GCP big query and the current process of calculating those weeks is a manually read in list which isn’t very efficient and not easily translated to BigQuery.
Curious if anybody has an idea that would allow me to calculate each week number in periods of 7 days since the start of an event in SQL or has an idea specific for BigQuery? There would be different start dates for each event.
It is complex, I know and I’m willing to give more explanation as needed. Open to any ideas for this as I haven’t been able to find anything.

Organizing per-week and daily data in SQL

Problem overview
I'm working on a simple app for reminding the user of weekly goals. Let's say the goal is to do 30 minutes of exercise on specific days of the week.
Sample goal: do exercise on Mon, Wed, Fri.
The app also needs to track past record, i.e. dates when the user did exercise. It could be just dates, e.g.: 2019-09-02, 2019-09-05, 2019-09-11 means the user did exercise on these days and did not on the others (doesn't need to be on "exercise goal" days of the week).
The goal can change in time. Let's say today is 2019-09-11 and the goal for this week ([2019-09-09, 2019-09-15]) is Mon, Wed, Fri but from 2019-08-05 to 2019-09-08 it was Mon, Thu (repeatedly for all these weeks).
I need to store these week-oriented goals and historic exercise of data and be able to retrieve the following:
The goal days for the current week (or any week, let's say I can compute start and end day for any week given a date).
Exercise history for a larger range of days together with goal days for that range (e.g. to show when the user was supposed to exercise and when they actually did in the last month).
Question
How to best store this data in SQL.
This is a little bit academic because I'm working on a small Android app and the data is just for a single user. So there will be little data and I can successfully use any approach, even a very clumsy one will be efficient enough.
However, I'd like to explore the topic and maybe learn a thing or two.
Possible solutions
Here are two approaches that come to my mind.
In both cases I would store exercise history as a table of dates. If there is an entry for that date it means the user did exercise on that day.
It's the goal storage that is interesting.
Approach 1
Store the goals per-week (it's SQLite so dates are stored as strings - all dates are just 'YEAR-MONTH-DAY'):
CREATE TABLE goals (
start_date TEXT,
exercise_days TEXT);
"start_date" is the first day of the week,
"exercise_days" is a comma-separated list of weekdays (let's say numbers 1-7).
So for the example above we might have two rows:
'2019-08-05', '1,4'
'2019-09-09', '1,3,5'
meaning that since 2019-08-05 the goal is Mon, Thu for all weeks until 2019-09-09, when the goal becomes Mon, Wed, Fri. So there is a gap in the data. I wouldn't want to generate data for weeks starting on 2019-08-12, 2019-08-19, 2019-08-26.
With this approach it is easy to work with the data week-wise. The current goal is the one with MAX('start_date'). The goal for a week for a given date is MAX('start_date') WHERE 'start_date' <= :date.
However it gets cumbersome when I want to get data for the last 3 months and show the user their progress.
Or maybe I want to show the user the percentage of actual exercise days to what they set as their goal in a year.
In this case it seems the best approach is to fetch the data separately and merge it in the application (or maybe write some complex queries), processing week by week. This is ok performance-wise because the amount of data is small and I rarely need more than a handful of weeks.
Approach 2
Store goals in such a way that each goal day is a record:
CREATE TABLE goals (
day TEXT,
);
"day" is a day when the user should exercise. So for the week starting 2019-09-09 (Mon, Wed, Fri) we would have:
'2019-09-09'
'2019-09-11'
'2019-09-13'
and for the week starting 2019-08-05 (Mon, Thu) we would have:
'2019-08-05'
'2019-08-09'
but what for the weeks in-between?
If my app could fill all the weeks in-between then it would be easy to merge this data with the exercise history and display days when the user was supposed to exercise and when they actually did. Extracting the goal for any given week would also be easy.
The problem is: this requires the app to generate data for the "gap" weeks even if the user doesn't tweak the goal. This can be implemented as a transaction that is run each time the app process starts. In some cases it could take noticeable time for occasional users of the app (think progress bar for a second).
Maybe there a smart way to generate the data in-between when making a SELECT query?
I don't like the fact that it requires generating data. I do like the fact that I can just join the tables and then process that (e.g. compute how many exercise days there were supposed to be in August and how many days the user did actually exercise and then show them percentage like "you did 85% of your goal" - in fact I can do this without joining the tables).
Also, it seems this approach gives me more flexibility for analysis in the future.
But is there a third way? Or maybe I am overthinking this? :)
(I am asking mostly for the way of organizing the data, there's no need for exact SQL queries)
Perhaps I'm over-thinking this, but if a goal can have multiple components to it, and can change over time I'd have a goal header record, with the ID, name and other data about the goal as a whole, and then a separate table linked with the components of that goal which are time-boxed, for example:
CREATE TABLE goal_days (goal_day_ID INT,
goal_ID INT,
day_ID INT,
target_minutes INT,
start_date TEXT,
end_date TEXT)
I'd have thought that allows you to easily check against the history to map against each day of the goal - e.g. they got 100% of the Mondays, but kept missing Thursday - however when the goal was changed to Friday instead they got better.

Add column of customer's past purchase total at time of current purchase and find rate of purchases that are from a returning customer - SQL

I am working with a table containing the purchase history for a shop. There is a purchase id, a date column and a customer id. I am trying (without much success so far) to do two things:
Add a column which for each purchase tells how many purchases the customer made before this (in the last month). I started by joining the table on itself but haven't got much further. I know I'll need to somehow filter the date so it only counts purchases before this date and not more than a month ago. Any suggestions on a simple way to tackle this?
The second thing I would like to see is what the weekly rate of returning customer transactions is. That is, what proportion of the purchases are by someone who purchased recently (in the last month). Ideally I would be able to graph this so from my sql queries I would like to end up with a date, weekly total (the 7 days up to the date) and weekly rate. I have been reading up on rolling windows and to be honest am having a bit of trouble getting my head around it. My SQL level is still quite low unfortunately. Any tips on a relatively simple way to do this would be much appreciated.
Thanks
I would need to see your data structure for the table(s) to better answer your question. But right off the top of my head is seems like you just need a simple SELECT COUNT.
So something like this would return all transactions from a single customer made in the past month:
SELECT COUNT(purchase_id)
FROM purchases
WHERE customer_id='some_customer_id'
AND date >= DATEADD(m, -1, GETDATE());
As for your second question you would probably want to setup a job (jenkins, ect..) that would run a query every month. The results of which you would plot. Checkout https://oss.oetiker.ch/rrdtool/ for graphing

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

DAX sum different DateTime

I have a problem here, i would like to sum the work time from my employee based on the data (time2 - time 1) daily and here is my query:
Effective Minute Work Time = 24. * 60 * (LASTNONBLANK(time2,0) -FIRSTNONBLANK(time1,0))
It works daily, but if i drill up to weekly / monthly data it show the wrong sum as it shown below :
What i want is summary of minute between daily different times (time2-time1)
Thanks for your help :)
You have several approaches you can take: the hard way or the easier way :). The harder (at least for me :)) is to use DAX to do this. You would:
1) create a date table,
2) Use the DAX calculate function to evaluate your last non-blank and first non-blank values (you might need to use calculate table, but I'm not sure; DAX experts jump in). Then subtract one vs. the other.
This will give you correct values for a given day for a given person. You can enforce the latter condition by putting a 'has one value' guard on the person name so that your measure informs the report author if they're not using it right.
Doing the same for dates is a little trickier. In the example you show you are including the date in the row grouping. But if you change your mind and want instead to have 'total hours worked by person' or 'total hours worked by everyone' you're not done with modelling yet.
Your next step is to use calculate table in combination with calculate to create a measure that returns the total. You'll use calculate table so you evaluate each date and the hours worked on that date by person. Then you'll use calculate to summarize that all down to a single number. If you're not careful with your DAX (or report authoring) you might mix which person you're summarizing for so that your first/last non blank are not at the person level. It gets intense quickly.
Your easier solution, though it might be more limited in its application - depends really on your scenario - is to use the query to transform the data into a summary by day and person using the group by command. This will give you a row per person per day with their start and end times. Then you can quickly calculate the hours worked on that day. Then you can quite easily build visuals on top of the summary data. Of course you give up some of the flexibility of the having a proper data model. However if you have a date table, a person table, and your summary table and then setup your relationships correctly you can achieve answers to the most common questions.