Changing the year of a slice/series obtained from a pandas dataframe - pandas

I got a large dataset which is the imported schedule (spanning multiple years) for my team. I cleaned the data (made it long instead of wide), however I encounter a problem.
First an explanation of the data:
'year' and 'period' are obtained from the split sheetname. Both strings.
'week' the week of the year, obtained from the roster. Float.
'date' converted from string, for which I wrote a function as the dates were in Dutch and needed to be normalized, no year was defined so therefore the year from the first column is used. After processing; datetime format.
'shift' the type of shift it belongs to. S1 > early, S2 > late, S3 > night.
Each rule is assigned to one of my employees, those names are erased for privacy reasons.
I've written a class with several methods that apply rules our government enforces on schedules.
Now my problem:
As you can see: entries 1137 and 1138 should belong to the year 2022. But how do I change this easily? I tried:
for week, date in prepocessed_data_merged[['week', 'date']].values:
# There are always more than 52 weeks in a year.
# If the month of the date in week 52 is 1 (Jan), then something is wrong.
if (week == 52) & (date.month == 1):
prepocessed_data_merged.loc[(prepocessed_data_merged['week'] == week)
& (prepocessed_data_merged['date']), 'date'] = ???
But as you might expect this returns a series since there are three shifts on a day, so three entries of a date that need their year changed. So, how does one change the year of a selected series/slice, simultaneously changing it in the dataframe?
I know I can use: dt.replace(year=current_year+1) but how do I enforce this replace on this selected series in the preprocessed_data DF? Thanks in advance!

Have you tried:
cond = prepocessed_data_merged['week'].eq(52) & prepocessed_data_merged['date'].dt.month.eq(1)
prepocessed_data_merged.loc[cond, 'date'] += pd.DateOffset(years=1)

Related

Need column comprised of data from date two weeks ago for comparison

Let me start by saying that I am somewhat new to SQL/Snowflake and have been putting together queries for roughly 2 months. Some of my query language may not be ideal and I fully understand if there's a better, more efficient way to execute this query. Any and all input is appreciated. Also, this particular query is being developed in Snowflake.
My current query is pulling customer volumes by department and date based on a 45 day window with a 24 day lookback from current date and a 21 day look forward based on scheduled appointments. Each date is grouped based on where it falls within that 45 day window: current week (today through next 7 days), Week 1 (forward-looking days 8-14), and Week 2 (forward-looking days 15-21). I have been working to try and build out a comparison column that, for any date that lands within either the Week 1 or Week 2 group, will pull in prior period volumes from either 14 days prior (Week 1) or 21 days prior (Week 2) but am getting nowhere. Is there a best-practice for this type of column? Generic example of the current output is attached. Please note that the 'Prior Wk' column in the sample output was manually populated in an effort to illustrate the way this column should ideally work.
I have tried several different iterations of count(case...) similar to that listed below; however, the 'Prior Wk' column returns the count of encounters/scheduled encounters for the same day rather than those that occurred 14 or 21 days ago.
Count(Case When datediff(dd,SCHED_DTTM,getdate())
between -21 and -7 then 1 else null end
) as "Prior Wk"
I've tried to use an IFF statement as shown below, but no values return.
(IFF(ENCOUNTER_DATE > dateadd(dd,8,getdate()),
count(case when ENC_STATUS in (“Phone”,”InPerson”) AND
datediff(dd,ENCOUNTER_Date,getdate()) between 7 and 14 then 1
else null end), '0')
) as "Prior Wk"
Also have attempted creating and using a temporary table (example included) but have not managed to successfully pull information from the temp table that didn't completely disrupt my encounter/scheduled counts. Please note for this approach I've only focused on the 14 day group and have not begun to look at the 21 day/Week 2 group. My attempt to use the temp table to resolve the problem centered around the following clause (temp table alias: "Date1"):
CASE when AHS.GL_Number = "DATEVISIT1"."GL_NUMBER" AND
datevisit1.lookback14 = dateadd(dd,14,PE.CONTACT_Date)
then "DATEVISIT1"."ENC_Count"
else null end
as "Prior Wk"*
I am extremely appreciative of any insight on the current best practices around pulling prior period data into a column alongside current period data. Any misuse of terminology on my part is not deliberate.
I'm struggling to understand your requirement but it sounds like you need to use window functions https://docs.snowflake.com/en/sql-reference/functions-analytic.html, in this case likely a SUM window function. The LAG window function, https://docs.snowflake.com/en/sql-reference/functions/lag.html, might also be of some help

Calculating mean of a specific column by specific rows

I have a dataframe that looks like in the pictures.
Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far:
train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean()
train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean()
but this just adds the average of all the power values that correspond to day, or similarly - night, which isn't what I'm after: a specific average for each day/night separately.
I need something like: train['avg_by_day'] == train.power.mean() when day == 1 and day_or_night == 1, and this for each day.
So you want to group the dataframe by day and day_or_night and create a new column with mean power values for each group:
train['avg_by_day'] = train.groupby(['day','day_or_night'])['power']\
.transform('mean')
Maybe you should also include year and month in the grouping columns because otherwise it's going to group the 1st day of every month together, same for the 2nd day and so on.

Creating a DAX pattern that counts days between a date field and a month value on a chart's x-axis

I am struggling with a DAX pattern to allow me to plot an average duration value on a chart.
Here is the problem: My dataset has a field called dtOpened which is a date value describing when something started, and I want to be able to calculate the duration in days since that date.
I then want to be able to create an average duration since that date over a time period.
It is very easy to do when thinking about the value as it is now, but I want to be able to show a chart that describes what that average value would have been over various time periods on the x-axis (month/quarter/year).
The problem that I am facing is that if I create a calculated column to find the current age (NOW() - [dtOpened]), then it always uses the NOW() function - which is no use for historic time spans. Maybe I need a Measure for this, rather than a calculated column, but I cannot work out how to do it.
I have thought about using LASTDATE (rather than NOW) to work out what the last date would be in the filter context of any single month/quarter/year, but if the current month is only half way through, then it would probably need to consider today's date as the value from which to subtract the dtOpened value.
I would appreciate any help or pointers that you can give me!
It looks like you have a table (let's call it Cases) storing your cases with one record per case with fields like the following:
casename, dtOpened, OpenClosedFlag
You should create a date table with on record per day spanning your date range. The date table will have a month ending date field identifying the last day of the month (same for quarter & year). But this will be a disconnected date table. Don't create a relationship between the Date on the Date table and your case open date.
Then use iterative averagex to average the date differences.
Average Duration (days) :=
CALCULATE (
AVERAGEX ( Cases, MAX ( DateTable[Month Ending] ) - Cases[dtopened] ),
FILTER ( Cases, Cases[OpenClosedFlag] = "Open" ),
FILTER ( Cases, Cases[dtopened] <= MAX ( DateTable[Month Ending] ) )
)
Once you plot the measure against your Month you should see the average values represented correctly. You can do something similar for quarter & year.
You're a genius, Rory; Thanks.
In my example, I had a dtClosed field rather than an Opened/Closed flag, so there was one extra piece of filtering to do to test if the Case was closed at that point in time. So my measure ended up looking like this:
Average Duration:=CALCULATE(
AVERAGEX(CasesOnly, MAX(DT[LastDateM]) - CasesOnly[Owner Opened dtOnly]),
FILTER(CasesOnly, OR(ISBLANK(CasesOnly[Owner Resolution dtOnly]),
CasesOnly[Owner Resolution dtOnly] > MAX(DT[LastDateM]))),
FILTER(CasesOnly, CasesOnly[Owner Opened dtOnly] <= MAX(DT[LastDateM]))
)
And to get the chart, I plotted the DT[Date] field on the x-axis.
Thanks very much again.

Need NetSuite search formula to display employee time records by projects (in rows) and day of week (in columns)

I'm trying to create a NetSuite Time search that emulates the chart style display on an employee's weekly time record, with projects listed in rows and days of the week listed in columns, with totals by day and by project. The goal is to have a search auto filtered by "Last Week" that can be used with a drop down selector filter for employees. I know there are better ways, but this is a very specific demand from someone above who believes the NS time record is a "query" and wants it to act like one.
I'm good with NS searches but know almost next to nothing about coding. I tried some basic sum formulas using CASE WHEN but am having 2 issues:
1) Can't figure out how to get CASE WHEN to sort by the weekday output from DAY of the {date} and subsequently total the hours.
2) Not sure how to total hh:mm formatted time in searches, and can't figure out what the system name of the "Duration (Decimal)" field is.
Just need one line of a sum formula to total time data from one day of the week, and a way to solve the hh:mm issue and I am good to go from there.
CASE WHEN to_char({date}, 'D') LIKE 1 THEN {durationdecimal} ELSE 0 END
SUN = 1, MON = 2, etc.

DAX formula for calculate Sum between 2 dates

I have a couple of tables in PowerPivot:
A Stock table - WKRelStrength whose fields are:
Ticker, Date, StockvsMarket% (values are percentages), RS+- (values can be 0 or 1)
A Calendar Table - Cal with a Date field.
There is a many to one relationship between the tables.
I am trying to aggregate RS+-against each row for dates between 3 months ago to the date for that row - i.e a 3 month to date sum. I have tried numerous calculations but the best I can return is an circular reference error. Here is my formula:
=calculate(sum([RS+-]),DATESINPERIOD(Cal[Date],LASTDATE(Cal[Date]),-3,Month))
Here is the xlsx file.
I couldn't download the file but what you are after is what Rob Collie calls the 'Greatest Formula in the World' (GFITW). This is untested but try:
= CALCULATE (
SUM ( WKRelStrength[RS+-] ),
FILTER (
ALL ( Cal ),
Cal[Date] <= MAX ( Cal[Date] )
&& Cal[Date]
>= MAX ( Cal[Date] ) - 90
) )
Note, this will give you the previous 90 days which is approx 3 months, getting exactly the prior 3 calendar months may be possible but arguably is less optimal as you are going to be comparing slightly different lengths of time (personal choice I guess).
Also, this will behave 'strangely' if you have a total in that it will use the last date in your selection.
First of all, the formula that you are using is designed to work as a Measure. This may not work well for a Calculated Column. Secondly, it is better to do such aggregations as a Measure level, than at individual records.
Then again, I do not fully understand your situation, but if it is absolutely important for you to do this at a Record level, you may want to use the "Earlier" Function.
If you want to filter a function, based on a value in the correspontinf row, you just have to wrap your Column name with the Earlier Function. Try changing the LastDate to Earlier in your formula.