Daily Partitioning and archiving - sql

I'm trying to partition a table by month say Jan, Feb, Mar. The column through which I'll partition is a datetime type column with an ISO Format ('20190101', '20190201', etc).
For example, I have sales data for jan, feb, mar. I'd like the data to be partitioned by daily partitioned . ('20190101', '20190201','20190301' etc)
E.X:
Jan, Feb, Mar etc. Also I would like to keep the data less so I would like to delete daily day wise data keeping only 1 month data maximum, for example I will create jan 31 , feb 28 , mar 31, apr 30. How do I manage partition dynamically as some month is 31 days some are 28 days and 30 days. Also I need to retain only one month data for example if its, 1st of sep then I need to keep aug 31 days data, and can delete 31st jul day data now on 2nd sep I can delete 1st august data so I need to delete daily data and keep only 30 days data .
My question is: is it even possible? If it is, how an I automate the process using SSIS?

You may try this. As you want to remove previous 30 days data from current date or any specific date, so you can easily calculate the date range by subtracting 30 days from the current date.
For 30 days
--- Instead of getdate() you may use any of your date column for filter
delete from yourtable where yourdate < DATEADD( day, -30, getdate()) ---- here on place of 30 you may use any days you want to delete
For 1 month
--- Instead of getdate() you may use any of your date column for filter
delete from yourtable where yourdate < DATEADD( month, -1, getdate()) ---- here on place of 30 you may use any days you want to delete

Instead of having your partition boundaries be the end of the month, have them be the beginning. That is, do something like:
[2019-01-01, 2019-02-01),
[2019-02-01, 2019-03-01),
[2019-03-01, 2019-04-01),
[2019-04-01, 2019-05-01),
[2019-05-01, 2019-06-01),
etc
That is, the left-hand boundary is in the partition and the right-hand boundary isn't. If you're using actual Partitioning, you'd define your partition function as a "boundary left" function. See the documentation for more details.

Related

Custom month numbers that take last 30 days instead of Number of month (SQL Server)

I am trying to create a lag function to return current month and last month streams for an artist.
Instead of returning streams for Feb vs Jan, I wan the function to use the last 30 days as a period for current month, and the previous 30 days as the previous month.
The query that I am currently using is this:
SELECT
DATEPART(month, date) AS month,
artist,
SUM([Streams]) AS streams,
LAG(SUM([Streams])) OVER (PARTITION BY artist ORDER BY DATEPART(month, date)) AS previous_month_streams
FROM combined_artist
WHERE date > DATEADD(m, -2, DATEADD(DAY, 2 - DATEPART(WEEKDAY, GETDATE()-7), CAST(GETDATE()-7 AS DATE)))
GROUP BY DATEPART(month, date), artist;
While this works, it is not giving me the data I need. This is returning the sum of streams for February vs the Streams for the month of January. February seems very low because we only have one week worth of data in February.
My goal is to get the last 30 days from the max date in the table using a lag function. So if the max date is Feb. 7 2023, I want the current month to include data from Jan. 7 2023 - Feb. 7 2023, and the previous month to include data from Dec. 7 2022 - Jan. 7 2023. I am thinking to create a custom month date part that will start from the max date and give a month number to the last 30 days . (2 for Jan 7 - Feb 7, 1 for Dec 7 - Jan-7...) I am not sure how to go about this. This is in SQL Server and I am looking to use the lag function for performance reasons.
I think you could probably use something like datediff(d, date_you_care_about, max_date)/30 in your group by and partition by clauses.
The basic idea is that integer division rounds down, so if the difference between the dates is < 30, dividing it by 30 is 0. If the difference is >=30 but less than 60, dividing it by 30 is 1. And so forth.
You can see a proof of concept in this Fiddle.

After certain week of 2022 and continue with this new year

I would like to request some advice about how to set a Where Condition, but after a certain week
What I mean is:
I have dirty data before a specific week of 2022, so I made this:
DATEPART(WK, SA.FECHAE) >= 44
AND
YEAR(SA.FECHAE) >= 2022
But, We're on 2023, so, I need to add the new information of this new year year too into the query.
The query result shows me until 12-31-2022 and need it until today after the week 44 of 2022
...
WHERE (
DATEPART(WEEK, SA.FECHAE) >= 44
AND YEAR(SA.FECHAE) = 2022
)
OR (
YEAR(SA.FECHAE) >= 2023
)
In the OPs question they ask how to add an additional date range to their WHERE clause. The addition of this OR allows a second date range (in this case anything where the year is greater than or equal to 2023) to match the predicate and be returned, without impacting the original.
Plain English definition of the amended where clause:
Week 44 of 2022, or any week of any year from 2023 forward.

SQL - Count month difference in non-consecutive date period

I am trying to extract how many months of membership a member gets up to date. As the picture shows, this member got four years of subscription since 2018. However, she stopped the subscription for a year ending in 2019. And then, restart the membership in 2020 again.
Each membership lasts for 12 months. if we look at the last membership starting from 2022-05-08, it will end up on 2023-05-08. However, I only want to get the total month count up to date(getdate - 2022-09-14).
Please advise how I could approach this matter. Thanks!
enter image description here
Assuming you were planning to apply sum(monthcount), you could wrap the monthcount within a CASE statement to checks if the vip_end is greater than today's date:
sum(case when vip_end > getdate() then ... else monthcount end)
What you do within that ... depends on whether you wanted to just count the different number of months within the date range (e.g. 31st Jan -> 01st Feb is counted as a whole month because it just considers Jan -> Feb):
datediff(month, datecreated, getdate())
or perhaps calculate the number of months based on the average days in a month:
datediff(day, datecreated, getdate())*12.0/365.25
or maybe something else... it really depends on what level of detail you want to achieve.

Compare 2 weeks data from this year to 2 weeks data last year

Imagine a simple parameterized query:
SELECT * FROM table
WHERE date BETWEEN #DS_START_DATE and #DS_END_DATE
This is useful for seeing data from today, yesterday, one week data and one year data. Now I am required to compare 2 weeks data of this week compared to last year. For example, today is the 30th of August. I have to compare 16-29 Aug 2021 data to 16-29 Aug 2020 data. Can anyone help with this?
You can return the results using or:
SELECT t.*
FROM table t
WHERE t.date BETWEEN #DS_START_DATE and #DS_END_DATE OR
t.date BETWEEN DATE_ADD(#DS_START_DATE, INTERVAL -1 YEAR) AND DATE_ADD(#DS_END_DATE, INTERVAL -1 YEAR);
One caution is that the comparison periods might be off if they include leap days. Your question doesn't specify how to handle that.

Snowflake retrieving data of past month on a particular date of current month

I am new to snowflake and my manager wants me to retrieve the data of the past month when it is 5th of the current month. For example if today is 5th April, then ask snowflake to retrieve the data of the past month i.e. from 1st March 2021 to 31st March 2021 and similar for all the other months.
The reason why he wants to update the last month data on 5th of every next month because that is the day when we get the data.
I tried to use the DATEADD function but it is more complicated than just using this function.
Thanks in advance!
PS: The data for every month has same date. for example: the date is like - April 20th will be stored in the database as "2021-4-01" - and same for April 25th date will be stored as "2021-4-01" .
The day doesn't change in the database, just the month and year.
as to the prior month window that can be done via DATE_TRUNC and DATEADD
select
current_date as cd
,date_trunc('month', cd) as end_range
,dateadd('month', -1, end_range) as start_range
;
gives:
CD END_RANGE START_RANGE
2021-04-21 2021-04-01 2021-03-01
the other half of the question only do it on the 5th, if you have a task run daily etc. can be solved via
,day(current_date) = 5 as is_the_fifth
or if in an inline way
iff(day(current_date) = 5, <do true stuff>, <do false stuff>)