T-SQL query for getting the sum of active datasets per day if I have start and end date for every dataset - sql

I have the following SQL Server database table with a starting and end date for every dataset:
Task
StartDate
EndDate
FirstTask
2022-12-02
2022-12-06
SecondTask
2022-12-03
2022-12-06
ThirdTask
2022-12-06
2022-12-07
Now I am looking for a query which gives me for every date between the lowest start and the highest end date the number of active tasks for every day:
Day
NumberOfActiveTasks
2022-12-02
1
2022-12-03
2
2022-12-04
2
2022-12-05
2
2022-12-06
3
2022-12-07
1
Can someone please point me in the right direction? I guess with the standard SQL functions I can not do this :-(

As I mentioned in the comment, use your Calendar Table. If you don't have one, invest in one. There are 100's (possibly 1,000's) of articles out there on how to both create and populate one, so I'm not going to cover that here, and every person's/business' calender table tends to a "little" different for bespoke needs.
Once you have your Calender table, you can just JOIN to it and then aggregate on the calendar date:
SELECT CT.CalendarDate,
COUNT(*) AS ActiveTasks
FROM (VALUES('FirstTask','2022-12-02','2022-12-06'),
('SecondTask','2022-12-03','2022-12-06'),
('ThirdTask','2022-12-06','2022-12-07'))V(Task,StartDate,EndDate)
JOIN dbo.CalendarTable CT ON CT.CalendarDate BETWEEN V.StartDate AND V.EndDate
GROUP BY CT.CalendarDate
ORDER BY CT.CalendarDate;

Related

How to get the set of dates between date_from and date_to

I have a table:
good_id
from_date
to_date
1
2021-10-01
2021-10-03
I want to get a data table like this:
good_id
all_date
1
2021-10-01
1
2021-10-02
1
2021-10-03
I tried using Cross Join with an all_date table containing all the dates in October. but it didn't work. Do you have any ideas for this problem?
Actually, the solution for your problem typically is done in the other direction. We usually start with a calendar table looking like:
dates (dt)
----------
2021-10-01
2021-10-02
2021-10-03
And then left join this table to your table containing the date ranges, e.g.
SELECT d.dt
FROM dates d
LEFT JOIN yourTable t
ON d.dt BETWEEN t.from_date AND t.to_date;
Note that SQL usually is not so good as generating new data. Mainly, it is used for extracting or altering data which already exists. Using a calendar table as shown above is a standard way of handling your problem. In practice, you might include more dates to cover whatever data you expect in your table.

Cross apply historical date range in BigQuery

I have a growing table of orders which looks something like this:
units_sold
timestamp
1
2021-03-02 10:00:00
2
2021-03-02 11:00:00
4
2021-03-02 12:00:00
3
2021-03-03 13:00:00
9
2021-03-03 14:00:00
I am trying to partition the table into each day, and gather statistics on units sold on the day, and on the day before. I can pretty easily get the units sold today and yesterday for just today, but I need to cross apply a date range for every date in my orders table.
The expected result would look like this:
units_sold_yesterday
units_sold_today
date_measured
12
7
2021-03-02
NULL
12
2021-03-03
One way of doing it, is by creating or appending the order data every day to a new table. However, this table could grow very large and also I need historical data as well.
In my minds eye I know I have cascade the data, so that BigQuery compares the data to "todays date" which would shift across a all the dates in the table.
I'm thinking this shift could come from a cross apply of all the distinct dates in the table, and so I would get a copy of the orders table for each date, but with a different "todays date" column that I can extrapolate the units_sold_today data from by using that column to date-diff the salesdate to.
This would still, however, create a massive amount of data to process, and I guess maybe there is a simple function for this in BigQuery or standard SQL syntax.
This sounds like aggregation and lag():
select timestamp_trunc(timestamp, day), count(*) as sold_today,
lag(count(*)) over (order by min(timestamp)) as sold_yesterday
from t
group by 1
order by 1;
Note: This assumes that you have data for every day.
Consider below
select date_measured, units_sold_today,
lag(units_sold_today) over(order by date_measured) units_sold_yesterday,
from (
select date(timestamp) date_measured,
sum(units_sold) units_sold_today
from `project.dataset.table`
group by date_measured
)
if applied to sample data in your question - output is

Time between date. (More advanced than just Datediff)

I have a table that contains Guest_ID and Trip_Date. I have been tasked with trying to find out for each Guest_ID how many times they have had over 365 days between trips. I know that for the time between the dates I can use datediff formula but I am unsure of how to get the dates plugged in properly. I think if I can get help with this part I can do the rest.
For each time this happened I need to report back Guest_ID, Prior_Last_Trip, New_Trip, days between. This data goes back for over a decade so it is possible for a Guest to have multiple periods of over a year between visits.
I was thinking of just loading a table with that data that can be queried later. That way once I figure out how to make this work the first time I can setup a stored procedure or trigger to check for new occurrences of this and populate the table.
I was not sure were to begin on this code. I was thinking recursion might be the answer but I do not know recursion just that it exist.
This table is quite large. Around 1.5 million unique Guest_ID's with over 30 million trips.
I am using SQL Server 2012. If there is anything else I can add to help this let me know. I will edit and update this as I have ideas on how to make this work myself.
Edit 1: Sample Data and Desired Results
Guest_ID Trip_Date
1 1/1/2013
1 2/5/2013
1 12/5/2013
1 1/1/2015
1 6/5/2015
1 8/1/2017
1 10/2/2017
1 1/6/2018
1 6/7/2018
1 7/1/2018
1 7/5/2018
2 1/1/2018
2 2/6/2018
2 4/2/2018
2 7/3/2018
3 1/1/2014
3 6/5/2014
3 9/4/2014
Guest_ID Prior_Last_Trip New_Trip DaysBetween
1 12/5/2013 1/1/2015 392
1 6/5/2015 8/1/2017 788
So you can see that Guest 1 had 2 different times where they did not have a trip for over a year and that those two instances are recorded in the results. Guest 2 never had a gap of over a year and therefor has no records in the results. Guest 3 has not had a trip in over a year but without have the return trip currently does not qualify for the result set. Should Guest 3 ever make another trip they would then be added to the result set.
Edit 2: Working Query
Thanks to #Code4ml I got this working. Here is the complete query.
Select
Guest_ID, CurrentTrip, DaysBetween, Lasttrip
From (
Select
Guest_ID
,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date) as LastTrip
,Trip_Date as CurrentTrip
,DATEDIFF(d,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date),Trip_Date) as DaysBetween
From UCS
) as A
Where DaysBetween > 365
You may try SQL LAG function to access previous trip date like below.
SELECT guest_id, trip_date,
LAG (trip_date,1) OVER (PARTITION BY guest_id ORDER BY trip_date desc) AS prev_trip_date
FROM tripsTable
Now you can use this as a subquery to calculate number of days between trips and filter the data as required.

Get latest cumulative sales amount for various evaluation dates in SAS

I have a list of evaluation dates stored in a table, datelist. It's technically two columns, start_date and end_date, for each evaluation period. The end_date will definitely need to be used, but the start_date may not. I only care about periods that are completed, so, for example, the period from 2016-01-01 to 2016-07-01 is in progress but not complete. So, it's not in the table.
start_date end_date
2012-01-01 2012-07-01
2012-07-01 2013-01-01
2013-01-01 2013-07-01
2013-07-01 2014-01-01
2014-01-01 2014-07-01
2014-07-01 2015-01-01
2015-01-01 2015-07-01
2015-07-01 2016-01-01
I have a separate table that lists cumulative sales by customer, sales_table with three columns, customer_ID, cumul_sales, transaction_date. For example, let's say customer 4793 bought $100 worth of stuff on 2/14/2014 and $200 worth of stuff on 3/30/2014 and $75 on 7/27/2014, the table will have the following rows:
customer_ID cumul_sales transaction_date
4793 100 2014-02-14
4793 300 2014-03-30
4793 375 2014-07-27
Now, for each evaluation date and for each customer, I want to know what's the cumulative sales as of the evaluation date for that customer? If a customer hadn't purchased anything by an evaluation date, then I wouldn't want a row for that customer at all corresponding to said evaluation date. This would be stored in a new table, called sales_by_eval, with columns customer_ID, cumul_sales, eval_date. For the example customer above, I'd have the following rows:
customer_ID cumul_sales eval_date
4793 300 2014-07-01
4793 375 2015-01-01
4793 375 2015-07-01
4793 375 2016-01-01
I can do this, but I'm looking to do it in an efficient way so I don't have to read through the data once for each evaluation date. If there are a lot of rows in the sales_table and 40 evaluation dates, that would be a large waste to read through the data 40 times, once for each evaluation date. Would it be possible with only one read through the data, for example?
The basic idea of the current process is a macro loop that loops once per evaluation period. Each loop has a data step that creates a new table (one table per loop) to check each transaction to see if it has occurred before or on the end_date of that corresponding evaluation period. That is, each table has all the transactions that occur before or on that evaluation date but none of the ones that occur after. Then, a later data step uses "last." to get only the last transaction for each customer before that evaluation date. And, finally, all the various tables created are put back together in another data step where they are all listed in the SET statement.
This is in SAS, so anything SAS can do, including SQL and macros, is fine with me.
In SAS, when you use group by statement, you can still use not grouping variables in select statement, like this:
proc sql;
create table sales_by_eval as
select s.customer_ID, s.cumul_sales, d.end_date as eval_date
from datelist d
join sales_table s
on d.end_date > s.transaction_date
group by s.customer_ID, d.end_date
having max(s.transaction_date) = s.transaction_date
;
quit;
This mean that for each combination of selected variablem SAS will return rekord with measures summarized within defined group. To limit the result to the last state of transaction value, use having condition, where you select only those records that have transaction_date equal to max(transaction_date) within s.customer_ID, d.end_date group.

Join to Calendar Table - 5 Business Days

So this is somewhat of a common question on here but I haven't found an answer that really suits my specific needs. I have 2 tables. One has a list of ProjectClosedDates. The other table is a calendar table that goes through like 2025 which has columns for if the row date is a weekend day and also another column for is the date a holiday.
My end goal is to find out based on the ProjectClosedDate, what date is 5 business days post that date. My idea was that I was going to use the Calendar table and join it to itself so I could then insert a column into the calendar table that was 5 Business days away from the row-date. Then I was going to join the Project table to that table based on ProjectClosedDate = RowDate.
If I was just going to check the actual business-date table for one record, I could use this:
SELECT actual_date from
(
SELECT actual_date, ROW_NUMBER() OVER(ORDER BY actual_date) AS Row
FROM DateTable
WHERE is_holiday= 0 and actual_date > '2013-12-01'
ORDER BY actual_date
) X
WHERE row = 65
from here:
sql working days holidays
However, this is just one date and I need a column of dates based off of each row. Any thoughts of what the best way to do this would be? I'm using SQL-Server Management Studio.
Completely untested and not thought through:
If the concept of "business days" is common and important in your system, you could add a column "Business Day Sequence" to your table. The column would be a simple unique sequence, incremented by one for every business day and null for every day not counting as a business day.
The data would look something like this:
Date BDAY_SEQ
========== ========
2014-03-03 1
2014-03-04 2
2014-03-05 3
2014-03-06 4
2014-03-07 5
2014-03-08
2014-03-09
2014-03-10 6
Now it's a simple task to find the N:th business day from any date.
You simply do a self join with the calendar table, adding the offset in the join condition.
select a.actual_date
,b.actual_date as nth_bussines_day
from DateTable a
join DateTable b on(
b.bday_seq = a.bday_seq + 5
);