Trouble converting "Excel time" to "R time" - posixct

I have an Excel column that consists of numbers and times that were supposed to all be entered in as only time values. Some are in number form (915) and some are in time form (9:15, which appear as decimals in R). It seems like I managed to get them all to the same format in Excel (year-month-day hh:mm:ss), although the date's are incorrect - which doesn't really matter, I just need the time. However, I can't seem to convert this new column (time - new) back to the correct time value in R (in character or time format).
I'm sure this answer already exists somewhere, I just can't find one that works...
# Returns incorrect time
x$new_time <- times(strftime(x$`time - new`,"%H:%M:%S"))
# Returns all NA
x$new_time2 <- as.POSIXct(as.character(x$`time - new`),
format = '%H:%M:%S', origin = '2011-07-15 13:00:00')
> head(x)
# A tibble: 6 x 8
Year Month Day `Zone - matched with coordinate tab` Time `time - new` new_time new_time2
<dbl> <dbl> <dbl> <chr> <dbl> <dttm> <times> <dttm>
1 2017 7 17 Crocodile 103 1899-12-31 01:03:00 20:03:00 NA
2 2017 7 17 Crocodile 113 1899-12-31 01:13:00 20:13:00 NA
3 2017 7 16 Crocodile 118 1899-12-31 01:18:00 20:18:00 NA
4 2017 7 17 Crocodile 123 1899-12-31 01:23:00 20:23:00 NA
5 2017 7 17 Crocodile 125 1899-12-31 01:25:00 20:25:00 NA
6 2017 7 16 West 135 1899-12-31 01:35:00 20:35:00 NA

Found this answer here:
Extract time from timestamp?
library(lubridate)
# Adding new column to verify times are correct
x$works <- format(ymd_hms(x$`time - new`), "%H:%M:%S")

Related

Date dependent calculation from 2 dataframes - average 6-month return

I am working with the following dataframe, I have data for multiple companies, each row associated with a specific datadate, so I have many rows related to many companies - with ipo date from 2009 to 2022.
index ID price daily_return datadate daily_market_return mean_daily_market_return ipodate
0 1 27.50 0.008 01-09-2010 0.0023 0.03345 01-12-2009
1 2 33.75 0.0745 05-02-2017 0.00458 0.0895 06-12-2012
2 3 29,20 0.00006 08-06-2020 0.0582 0.0045 01-05-2013
3 4 20.54 0.00486 09-06-2018 0.0009 0.0006 27-11-2013
4 1 21.50 0.009 02-09-2021 0.0846 0.04345 04-05-2009
5 4 22.75 0.00539 06-12-2019 0.0003 0.0006 21-09-2012
...
26074 rows
I also have a dataframe containing the Market yield on US Treasury securities at 10-year constant maturity - measured daily. Each row represents the return associated with a specific day, each day from 2009 to 2022.
date dgs10
1 2009-01-02 2.46
2 2009-01-05 2.49
3 2009-01-06 2.51
4 2009-01-07 2.52
5 2009-01-08 2.47
6 2009-01-09 2.43
7 2009-01-12 2.34
8 2009-01-13 2.33
...
date dgs10
3570 2022-09-08 3.29
3571 2022-09-09 3.33
3572 2022-09-12 3.37
3573 2022-09-13 3.42
3574 2022-09-14 3.41
My goal is to calculate, for each ipodate (from dataframe 1), the average of the previous 6-month return of the the Market yield on US Treasury securities at 10-year constant maturity (from dataframe 2). The result should either be in a new dataframe or in an additionnal column in dataframe 1. Both dataframes are not the same length. I tried using rolling(), but it doesn't seem to be working. Anyone knows how to fix this?
# Make sure that all date columns are of type Timestamp. They are a lot easier
# to work with
df1["ipodate"] = pd.to_datetime(df1["ipodate"], dayfirst=True)
df2["date"] = pd.to_datetime(df2["date"])
# Calculate the mean market yield of the previous 6 months. Six month is not a
# fixed length of time so I replaced it with 180 days.
tmp = df2.rolling("180D", on="date").mean()
# The values of the first 180 days are invalid, because we have insufficient
# data to calculate the rolling mean. You may consider extending df2 further
# back to 2008. (You may come up with other rules for this period.)
is_invalid = (tmp["date"] - tmp["date"].min()) / pd.Timedelta(1, "D") < 180
tmp.loc[is_invalid, "dgs10"] = np.nan
# Result
df1.merge(tmp, left_on="ipodate", right_on="date", how="left")

How to calculate slope of a dataframe, upto a specific row number?

I have this data frame that looks like this:
PE CE time
0 362.30 304.70 09:42
1 365.30 303.60 09:43
2 367.20 302.30 09:44
3 360.30 309.80 09:45
4 356.70 310.25 09:46
5 355.30 311.70 09:47
6 354.40 312.98 09:48
7 350.80 316.70 09:49
8 349.10 318.95 09:50
9 350.05 317.45 09:51
10 352.05 315.95 09:52
11 350.25 316.65 09:53
12 348.63 318.35 09:54
13 349.05 315.95 09:55
14 345.65 320.15 09:56
15 346.85 319.95 09:57
16 348.55 317.20 09:58
17 349.55 316.26 09:59
18 348.25 317.10 10:00
19 347.30 318.50 10:01
In this data frame, I would like to calculate the slope of both the first and second columns separately to the time period starting from (say in this case is 09:42 which is not fixed and can vary) up to the time 12:00.
please help me to write it..
Computing the slope can be accomplished by use of the equation:
Slope = Rise/Run
Given you want to define compute the slope between two time entries, all you need to do is find:
the *Run = timedelta between start and end times
the Rise** = the difference between cell entries at the start and end.
The tricky part of these calculations is making sure you properly handle the time functions:
import pandas as pd
from datetime import datetime
Thus you can define a function:
def computeSelectedSlope(df:pd.DataFrame, start:str, end:str, timecol:str, datacol:str) -> float:
assert timecol in df.columns # prove timecol exists
assert datacol in df.columns # prove datacol exists
rise = (df[datacol][df[timecol] == datetime.strptime(end, '%H:%M:%S').time()].values[0] -
df[datacol][df[timecol] == datetime.strptime(start, '%H:%M:%S').time()].values[0])
run = (int(df.index[df['T'] == datetime.strptime(end, '%H:%M:%S').time()].values) -
int(df.index[df['T'] == datetime.strptime(start, '%H:%M:%S').time()].values))
return rise/run
Now given a dataframe df of the form:
A B T
0 2.632 231.229 00:00:00
1 2.732 239.026 00:01:00
2 2.748 251.310 00:02:00
3 3.018 285.330 00:03:00
4 3.090 308.925 00:04:00
5 3.366 312.702 00:05:00
6 3.369 326.912 00:06:00
7 3.562 330.703 00:07:00
8 3.590 379.575 00:08:00
9 3.867 422.262 00:09:00
10 4.030 428.148 00:10:00
11 4.210 442.521 00:11:00
12 4.266 443.631 00:12:00
13 4.335 444.991 00:13:00
14 4.380 453.531 00:14:00
15 4.402 462.531 00:15:00
16 4.499 464.170 00:16:00
17 4.553 471.770 00:17:00
18 4.572 495.285 00:18:00
19 4.665 513.009 00:19:00
You can find the slope for any time difference by:
computeSelectedSlope(df, '00:01:00', '00:15:00', 'T', 'B')
Which yields 15.964642857142858

Tallying events within specific time prior to a current event

I am trying to tally the number of events that happened in specific periods of time previous to each of my events (day/week/month) in a data frame.
I have a data frame with 50 individuals, each of who have events scattered throughout different periods of time (days/weeks/months) in the dataframe. Every row in the data frame is an event, and I'm trying to understand how the number of events in the previous day/week/month impacted the way the individual responded to the current event. Every event is marked with an individual ID (ID.2) and has a date and time associated with it (Datetime). I have already created columns for day (epd), week (epw), month (epm) and want to populate them, for each event, with the number of events for that specific individual in the previous day, week and month respectively.
My data looks like this:
> head(ACss)
Date Datetime ID.2 month day year epd epw epm
1 2019-05-25 2019-05-25 11:57 139 5 25 2019 NA NA NA
2 2019-06-09 2019-06-09 19:42 43 6 9 2019 NA NA NA
3 2019-07-05 2019-07-05 20:12 139 7 5 2019 NA NA NA
4 2019-07-27 2019-07-27 17:27 152 7 27 2019 NA NA NA
5 2019-08-04 2019-08-04 9:13 152 8 4 2019 NA NA NA
6 2019-08-04 2019-08-04 16:18 139 8 4 2019 NA NA NA
I have no idea how to go about doing this so haven't tried anything yet! Any and all suggestions are greatly appreciated!

Change date of POSIXct variable based on other columns in R

Is there a way to change the date of a dttm column based on the values from other columns? The time in the "Date_Time" column is correct, but the dates need to be changed to match those in the column "Date" (or from all three columns "Year", "Month", and "Day").
This is likely close to what I need to do, but it gives me this error:
library(lubridate)
df$new <- with(df, ymd_hm(sprintf('%04d%02d%02d', Year, Month, day, Time))) #'Time' is new character column of just time component from 'Date_Time'
# Not sure what this means..
invalid format '%04d'; use format %s for character objects
> head(df,5)
# A tibble: 5 x 5
Date Year Month Day Date_Time
<chr> <fct> <dbl> <dbl> <dttm>
1 2020-11-14 2020 11 14 1899-12-31 10:46:00
2 2020-11-14 2020 11 14 1899-12-31 10:57:00
3 2020-11-14 2020 11 14 1899-12-31 09:16:00
4 2012-8-11 2012 8 11 1899-12-31 14:59:00
5 2012-8-11 2012 8 11 1899-12-31 13:59:00
First update the Date column to be a date. Then use lubridate to assign that date to the Date_Time column:
df$Date <- as.Date(df$Date)
lubridate::date(df$Date_Time) <- df$Date
And if necessary, update the timezone to whatever if needs to be:
attr(df$Date_Time, "tzone") <- "Europe/Paris" # Update timezone

Built difference between values in the same column

Lets say I have got the following datatable which has one column which gives back the first of each month from 2000 until 2005 and the second column gives back some values which are positive or negative.
What I want to do is that I want to build the difference between two observations from the same month but from different years.
So for example:
I want to calculate the difference between 2001-01-01 and 2000-01-01 and write the value in a new column in the same row where my 2001-01-01 date stands.
I want to do this for all my observations and for the ones who do not have a value in the previous year to compare to, just give back NA.
Thank you for your time and help :)
If there are no gaps in your data, you could use the lag function:
library(dplyr)
df <- data.frame(Date = as.Date(sapply(2000:2005, function(x) paste(x, 1:12, 1, sep = "-"))),
Value = runif(72,0,1))
df$Difference <- df$Value-lag(df$Value, 12)
> df[1:24,]
Date Value Difference
1 2000-01-01 0.83038968 NA
2 2000-02-01 0.85557483 NA
3 2000-03-01 0.41463862 NA
4 2000-04-01 0.16500688 NA
5 2000-05-01 0.89260904 NA
6 2000-06-01 0.21735933 NA
7 2000-07-01 0.96691686 NA
8 2000-08-01 0.99877057 NA
9 2000-09-01 0.96518311 NA
10 2000-10-01 0.68122410 NA
11 2000-11-01 0.85688662 NA
12 2000-12-01 0.97282720 NA
13 2001-01-01 0.83614146 0.005751778
14 2001-02-01 0.07967273 -0.775902097
15 2001-03-01 0.44373647 0.029097852
16 2001-04-01 0.35088593 0.185879052
17 2001-05-01 0.46240321 -0.430205836
18 2001-06-01 0.73177425 0.514414912
19 2001-07-01 0.52017554 -0.446741315
20 2001-08-01 0.52986486 -0.468905713
21 2001-09-01 0.14921003 -0.815973080
22 2001-10-01 0.25427134 -0.426952761
23 2001-11-01 0.36032777 -0.496558857
24 2001-12-01 0.20862578 -0.764201423
I think you should try the lubridate package, very usefull to work with dates.
https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html