Pandas create graph from Date and Time while them being in different columns - pandas

My data looks like this:
Creation Day Time St1 Time St2
0 28.01.2022 14:18:00 15:12:00
1 28.01.2022 14:35:00 16:01:00
2 29.01.2022 00:07:00 03:04:00
3 30.01.2022 17:03:00 22:12:00
It represents parts being at a given station. What I now need is something that counts how many Columns have the same Day and Hour e.g. How many parts were at the same station for a given Hour.
Here 2 Where at Station 1 for the 28th and the timespan 14-15.
Because in the end I want a bar graph that show production speed. Additionally later in the project I want to highlight Parts that havent moved for >2hrs.
Is it practical to create a datetime object for every Station (I have 5 in total)? Or is there a much simpler way to do this?
FYI I import this data from an excel sheet

I found the solution. As they are just strings I can just add them and reformat the result with pd.to_datetime().
Example:
df["Time St1"] = pd.to_datetime(
df["Creation Day"] + ' ' + df["Time St1"],
infer_datetime_format=False, format='%d.%m.%Y %H:%M:%S'
)

Related

How to set Custom Business Day End Frequency in Pandas

I have a pandas dataframe with an unusual DatetimeIndex. The frame contains daily data (end of each day) from 1985 to 1990 but some "random" days are missing:
DatetimeIndex(['1985-01-02', '1985-01-03', '1985-01-04', '1985-01-07',
'1985-01-08', '1985-01-09', '1985-01-10', '1985-01-11',
'1985-01-14', '1985-01-15',
...
'1990-12-17', '1990-12-18', '1990-12-19', '1990-12-20',
'1990-12-21', '1990-12-24', '1990-12-26', '1990-12-27',
'1990-12-28', '1990-12-31'],
dtype='datetime64[ns]', name='date', length=1516, freq=None)
I often need operations like shifting an entire column such that a value that is at the last day of a month (which could e.g. in my DatetimeIndex be '1985-05-30') is shifted to the last day of the next (which could e.g. my DatetimeIndex be '1985-06-27').
While looking for a smart way to perform such shifts, I stumbled over Offset Aliases provided by pandas.tseries.offsets. It can be observed that there are the aliases custom business day frequency (C) and custom business month end frequency (CBM). When looking at an example, it seems like that this could provide exactly what I need:
mth_us = pd.offsets.CustomBusinessMonthEnd(calendar=USFederalHolidayCalendar())
day_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
df['Col1_shifted'] = df['Col1'].shift(periods=1, freq = mth_us) # shifted by 1 month
df['Col2_shifted'] = df['Col2'].shift(periods=1, freq = day_us) # shifted by 1 day
The problem is that my DatetimeIndex is not equal to USFederalHolidayCalendar(). Can someone please tell me how I can use pd.offsets.CustomBusinessMonthEnd (and also pd.offsets.CustomBusinessDay) with my own custom DatetimeIndex?
If not, has any of you an idea how to tackle this issue in a different way?
Thanks a lot for your help!

Make a for loop for a dataframe to substract dates and put it in a variable

I have a dataframe that looks like this with a lot of products
Product
Start Date
00001
2021/08/10
00002
2021/01/10
I want to make a cycle so that it goes from product to product subtracting three months from the date and then putting it in a variable, something like this.
date[]=''
for dataframe in i:
date['3monthsbefore']=i['start date']-3 months
date['3monthsafter']=i['start date']+3 months
date['product']=i['product']
"Another process with those variables"
And then concat all this data in a dataframe I´m a little bit lost,
I want to do this because I need to use those variables in another process, so I't is possible to do this?.
Using pandas, you usually don't need to loop over your DataFrame. In this case, you can get the 3 months before/after for all rows pretty simply using pd.DateOffset:
df["Start Date"] = pd.to_datetime(df["Start Date"])
df["3monthsbefore"] = df["Start Date"] - pd.DateOffset(months=3)
df["3monthsafter"] = df["Start Date"] + pd.DateOffset(months=3)
This gives:
Product Start Date 3monthsbefore 3monthsafter
0 00001 2021-08-10 2021-05-10 2021-11-10
1 00002 2021-01-10 2020-10-10 2021-04-10
Data:
df = pd.DataFrame({"Product": ["00001", "00002"], "Start Date": ["2021/08/10", "2021/01/10"]})

Pandas - Extract all content based on certain keywords

I am trying to extract all the content from a Dataframe till a specific word appears. I am trying to extract the entire content till the following words appear:
high, medium, low
Sample view of the text in the Dataframe:
text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x
Expected output:
text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months
You need replace and regex.
The idea will be to match any words from your list and then replace it and anything after it.
We use .* to match anything until the end of a string:
words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'
df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)
print(df['text_new'])
0 Ticket creation dropped in last 24 hours
1 Calls dropped in last 3 months
Name: text, dtype: object

Mapping column values to a combination of another csv file's information

I have a dataset that indicates date & time in 5-digit format: ddd + hm
ddd part starts from 2009 Jan 1. Since the data was collected only from then to 2-years period, its [min, max] would be [1, 365 x 2 = 730].
Data is observed in 30-min interval, making 24 hrs per day period to lengthen to 48 at max. So [min, max] for hm at [1, 48].
Following is the excerpt of daycode.csv file that contains ddd part of the daycode, matching date & hm part of the daycode, matching time.
And I think I agreed to not showing the dataset which is from ISSDA. So..I will just describe that the daycode in the File1.txt file reads like '63317'.
This link gave me a glimpse of how to approach this problem, and I was in the middle of putting up this code together..which of course won't work at this point.
consume = pd.read_csv("data/File1.txt", sep= ' ', encoding = "utf-8", names =['meter', 'daycode', 'val'])
df1= pd.read_csv("data/daycode.csv", encoding = "cp1252", names =['code', 'print'])
test = consume[consume['meter']==1048]
test['daycode'] = test['daycode'].map(df1.set_index('code')['print'])
plt.plot(test['daycode'], test['val'], '.')
plt.title('test of meter 1048')
plt.xlabel('daycode')
plt.ylabel('energy consumption [kWh]')
plt.show()
Not all units(thousands) have been observed at full length but 730 x 48 is a large combination to lay out on excel by hand. Tbh, not an elegant solution but I tried by dragging - it doesn't quite get it.
If I could read the first 3 digits of the column values and match with another file's column, 2 last digits with another column, then combine.. is there a way?
For the last 2 lines you can just do something like this
df['first_3_digits'] = df['col1'].map(lambda x: str(x)[:3])
df['last_2_digits'] = df['col1'].map(lambda x: str(x)[-2:])
for joining 2 dataframes
df3 = df.merge(df2,left_on=['first_3_digits','last_2_digits'],right_on=['col1_df2','col2_df2'],how='left')

how to calculate a rolling average based on a column in spotfire

I have a data set where you have a Document Property that Selects "items", each "item" has a particular "usage days". I want to calculate an output of "Moving Average" for 1 or more selected items. the data for the moving average lives under a column named "usage days".
How do I calculate this taking into account the "selected date of my choice" and the rolling average number of days of my choice.
Do you have particular ideas of how I can perform the calculation i.e. in a calculated column or a text field?
Car/ Trip / Start Date/ End Date / Days on trip
1 AB123 / 2 / 6/07/2013
1 AB234 / 29/07/2013 / 6/09/2013 / 42
1 AB345 /6/09/2013 /28/09/2013 /22
1 AB456 /29/09/2013 /21/10/2013 /23
2 AB567 / 26/10/2013 / 12/11/2013 / 22
2 AB678 /12/11/2013 /8/12/2013 /26
[The rows above have an example of the problem (sorry couldn't paste an image because im new), I want to calculate the %usage of the Car and or cars for a selected range of time e.g (Select date range JUlY to AUGUST then (#of days on trip for car 1and 2)/#on days in that period)/2*100]
As phiver said, it is still difficult to see what you expect as a result... but I think I have something that might work. First, I slightly altered the dataset you provided, like so:
car trip startDate endDate daysOnTrip
1 AB123 7/6/2013 7/29/2013 23
1 AB234 7/29/2013 9/6/2013 42
1 AB345 9/6/2013 9/28/2013 22
1 AB456 9/29/2013 10/21/2013 23
2 AB567 10/26/2013 11/12/2013 22
2 AB678 11/12/2013 12/8/2013 26
I then added 2 document properties, "DateRangeFirst" and "DateRangeLast", to allow the user to select beginning and ending dates. Next I made input box property controls for each of the aforementioned document properties in a text area so the user can alter the date range. I then added a datatable visualization with a "Limit data using expression:" of "[startDate] >= Date(${DateRangeFirst}) and [endDate]<= Date(${DateRangeLast})" so we could see the trips selected. Finally, to get the average you appear to be looking for, a barchart set to % of total (daysOnTrip) / car with the same data limiting expression as above. The below screenshot should have everything you need to reproduce my results. I hope this gives you what you need.
NOTE: With this method if you select a date in the middle of a trip, an entire row and all of the days on that trip will be ignored.