How to use pd.Grouper with an offset? - pandas

I have a dataframe df indexed by DateTime, spanning multiple years. Now, I wish to group the data where each group will have 16th as starting date, and 15th of next month as ending date. How do I do this?
I tried df2.groupby(pd.Grouper(freq="MS", offset="16D"), but the offset doesn't seem to have any effect and it still gives me groups starting at the 1st of each month.

Related

Use existing datetime data to create Week tags and week numbers

I have the following code which creates a range of dates from 1st Jan 2021. This runs until
15th May 2022.
import pandas as pd
range = pd.date_range('2021-01-01', periods=500, freq='D')
df = pd.DataFrame({ 'Date': range})
df.head()
I would like to use the current datetime data to create another column which provides tags
for weeks that are counted from every Thursday to Friday. This will be considered as 1 week count.
For example, the column should list date, month and year for each row of the matching 'range' column. This can be like:
10.03.21 to 18.03.21, for 10th March to 18th March (repeated for all rows for date ranges where it falls in range for Thursday to Friday each week).
Basically, my requirements for one week starts from Thursday, and ends the following week Friday.
For each week completion, another column should indicate the count, for e.g Week 1, Week 2 and etc.
How can this be achieved using Pandas datetime function, to easily produce another column with tagging for week categorisation ?

SQL calculate number of days in month excluding weekends and holidays days

I have approximately the same table (excluding count column). I want to calculate the number of working days (Mon-Fri) and exclude public holidays.
I tried to try the following query
SELECT count(distinct(date)) from MYDB where dummy <> 1
However, it gives the only total number of days including weekends. Additionally, if use this command it counts distinct dates, however, my dates do not show a full month, so another logic should've used. Could you help to figure out which code is better to use?
there should be a function in Vertica that extracts weekday from date, so to exclude weekends you'll need to add another condition like
extract(dow from date) not in (6,0)
(6 is Sat, 0 is Sun in this case)

BigQuery partition on calendar week

Every week, I get a new dataset that I need to insert in BigQuery. The data can arrive on any day of the week. Once the data is ingested, I want to query data that arrived last week.
One option is to use date as partitioning when the data arrived but then the developers would need to know the exact date when data arrived to query the partition.
Instead of this, while ingestion, I want to create an INTEGER column which represents the calendar week of the year. The format will be 202005 or 202153 where former represents fifth week of 2020 and latter represents second last week of year 2021.
Since this is an integer, the only option for partition seems to be range partitioning. For it, BigQuery is asking for a start, end and interval. What values should I define?
I can define the following but as you can imagine that this sounds wrong
start 202001
end 203054
inerval 1
Update:
It seems that bigquery will only create partitions for which it has data. I checked that by executing
#legacySQL
SELECT
project_id, dataset_id, table_id, partition_id, TIMESTAMP(creation_time/1000) AS creation_time
FROM [PROJECT_ID:DATASET_ID.TABLE_ID$__PARTITIONS_SUMMARY__]
Another option would be to still Partition by date - but not ingestion date or whatever date you have in mind, rather start date of respective week with the help of DATE_TRUNC function
DATE_TRUNC(your_date, WEEK)
Note: You even can define start day of the week
WEEK(): Truncates date_expression to the preceding week boundary, where weeks begin on WEEKDAY. Valid values for WEEKDAY are SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.

BigQuery, Sum by week

I am using standard SQL and am trying to add the weekly sum for product usage by week.
Using code below, I was able to add to each row the respective week and year it falls into. How would I go about summing the totals for an item by week and outputting it in columns, say up to the last 8 weeks.
extract(week from Metrics_Date) as week, EXTRACT(YEAR FROM Metrics_Date) AS year
Image is my raw data with the week and year next to an item:
This image is of above raw data being analyzed further(grouping them together). Here is where I would want to add columns, current_week & firstday of week date, and a sum of that weeks totals.
Any help would be appreciated.
You don't need the extract() by the way, you can do truncation DATE_TRUNC(your_date, WEEK) and it will truncate it to the week, usually easier.
Also, because the result of the truncation is a date, you will have the first day of the week already.
The rest I believe you have it figured out already, but just in case:
SELECT DATE_TRUNC(your_date_field, WEEK) AS week, SUM(message_count) AS total_messages FROM your_table GROUP BY 1

How do I summarise month to date forecast for current month in Power Pivot?

I have a table in a data model that has forecast figures for the next 3 months. What I want to do is to show what the forecast number for the current month to date is.
When I use the DATESMTD function like this:
=CALCULATE(SUM(InternetSales_USD[SalesAmount_USD]),DATESMTD(DateTime[DateKey]))
I get the last month of my data summarised as a total. I assume that is because the DATESMTD function takes the last date in the column and that is 3 months away.
How do I make sure I get this current month MTD total rather then the end of the calendar? The formula should be clever enough to realise I am in May and want the May MTD not the August MTD.
Any ideas?
The way to do this is to do this:
Forecast_Transaction_MTD:=CALCULATE(sum('ATO Online'[2017 Transaction Forecast]), DATESINPERIOD('ATO Online'[Current Year],TODAY(),-day(TODAY()),day))
the last -day(TODAY()) gets the day number for the current day and subtract it from today's date. So, today is the 25 May. the -day(TODAY())),day)) extracts the day (25) and subtracts it from the current date to get me to the 1 May.
The rest of the formula just adds the total for the dates.