Change date of POSIXct variable based on other columns in R - posixct

Is there a way to change the date of a dttm column based on the values from other columns? The time in the "Date_Time" column is correct, but the dates need to be changed to match those in the column "Date" (or from all three columns "Year", "Month", and "Day").
This is likely close to what I need to do, but it gives me this error:
library(lubridate)
df$new <- with(df, ymd_hm(sprintf('%04d%02d%02d', Year, Month, day, Time))) #'Time' is new character column of just time component from 'Date_Time'
# Not sure what this means..
invalid format '%04d'; use format %s for character objects
> head(df,5)
# A tibble: 5 x 5
Date Year Month Day Date_Time
<chr> <fct> <dbl> <dbl> <dttm>
1 2020-11-14 2020 11 14 1899-12-31 10:46:00
2 2020-11-14 2020 11 14 1899-12-31 10:57:00
3 2020-11-14 2020 11 14 1899-12-31 09:16:00
4 2012-8-11 2012 8 11 1899-12-31 14:59:00
5 2012-8-11 2012 8 11 1899-12-31 13:59:00

First update the Date column to be a date. Then use lubridate to assign that date to the Date_Time column:
df$Date <- as.Date(df$Date)
lubridate::date(df$Date_Time) <- df$Date
And if necessary, update the timezone to whatever if needs to be:
attr(df$Date_Time, "tzone") <- "Europe/Paris" # Update timezone

Related

Trouble converting "Excel time" to "R time"

I have an Excel column that consists of numbers and times that were supposed to all be entered in as only time values. Some are in number form (915) and some are in time form (9:15, which appear as decimals in R). It seems like I managed to get them all to the same format in Excel (year-month-day hh:mm:ss), although the date's are incorrect - which doesn't really matter, I just need the time. However, I can't seem to convert this new column (time - new) back to the correct time value in R (in character or time format).
I'm sure this answer already exists somewhere, I just can't find one that works...
# Returns incorrect time
x$new_time <- times(strftime(x$`time - new`,"%H:%M:%S"))
# Returns all NA
x$new_time2 <- as.POSIXct(as.character(x$`time - new`),
format = '%H:%M:%S', origin = '2011-07-15 13:00:00')
> head(x)
# A tibble: 6 x 8
Year Month Day `Zone - matched with coordinate tab` Time `time - new` new_time new_time2
<dbl> <dbl> <dbl> <chr> <dbl> <dttm> <times> <dttm>
1 2017 7 17 Crocodile 103 1899-12-31 01:03:00 20:03:00 NA
2 2017 7 17 Crocodile 113 1899-12-31 01:13:00 20:13:00 NA
3 2017 7 16 Crocodile 118 1899-12-31 01:18:00 20:18:00 NA
4 2017 7 17 Crocodile 123 1899-12-31 01:23:00 20:23:00 NA
5 2017 7 17 Crocodile 125 1899-12-31 01:25:00 20:25:00 NA
6 2017 7 16 West 135 1899-12-31 01:35:00 20:35:00 NA
Found this answer here:
Extract time from timestamp?
library(lubridate)
# Adding new column to verify times are correct
x$works <- format(ymd_hms(x$`time - new`), "%H:%M:%S")

Compare Cumulative Sales per Year-End

Using this sample dataframe:
np.random.seed(1111)
df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B','Group C','Group D'], 10000),
'Sub-Category':np.random.choice( ['X','Y','Z'], 10000),
'Sub-Category-2':np.random.choice( ['G','F','I'], 10000),
'Product':np.random.choice( ['Product 1','Product 2','Product 3'], 10000),
'Units_Sold':np.random.randint(1,100, size=(10000)),
'Dollars_Sold':np.random.randint(100,1000, size=10000),
'Customer':np.random.choice(pd.util.testing.rands_array(10,25,dtype='str'),10000),
'Date':np.random.choice( pd.date_range('1/1/2016','12/31/2020',
freq='M'), 10000)})
I am trying to compare 12 month time frames with seaborn plots for a sub-grouping of category. For example, I'd like to compare the cumulative 12 months for each year ending 4-30 vs. the same time period for each year. I cannot wrap my head around how to get a running total of data for each respective year (5/1/17-4/30/18, 5/1/18-4/30/19, 5/1/19-4/30/20). The dates are just examples - I'd like to be able to compare different year-end data points, even better would be able to compare 365 days. For instance, I'd love to compare 3/15/19-3/14/20 to 3/15/18-3/14/19, etc.
I envision a graph for each 'Category' (A,B,C,D) with lines for each respective year representing the running total starting with zero on May 1, building through April 30 of the next year. The x axis would be the month (starting with May 1) & y axis would be 'Units_Sold' as it grows.
Any help would be greatly appreciated!
One way to convert the date to fiscal quarters and extract the fiscal year:
df = pd.DataFrame({'Date':pd.date_range('2019-01-01', '2019-12-31', freq='M'),
'Values':np.arange(12)})
df['fiscal_year'] = df.Date.dt.to_period('Q-APR').dt.qyear
Output:
Date Values fiscal_year
0 2019-01-31 0 2019
1 2019-02-28 1 2019
2 2019-03-31 2 2019
3 2019-04-30 3 2019
4 2019-05-31 4 2020
5 2019-06-30 5 2020
6 2019-07-31 6 2020
7 2019-08-31 7 2020
8 2019-09-30 8 2020
9 2019-10-31 9 2020
10 2019-11-30 10 2020
11 2019-12-31 11 2020
And now you can group by fiscal_year to your heart's content.

Select all rows of dataframe that have a minimum value for a group

I have a dataframe of dates, times, and values, and I would like to create a new dataframe with the date and value of the earliest time for each date (think like an opening stock price)
For example,
date time value
1/12 9:07 10
1/12 9:03 13
1/13 10:35 8
1/13 11:02 15
1/13 11:54 6
I would want:
date value
1/12 13
1/13 8
Since those values correspond to the earliest time for each date.
So far I got:
timegroup = (result.groupby('date')['time'].min()).to_dict()
But can't figure out where to go from here.
Use DataFrame.sort_values + DataFrame.drop_duplicates.
df.sort_values(['date','time']).drop_duplicates(subset ='date')[['date','value']]
# date value
#1 1/12 13
#2 1/13 8
or
df.sort_values(['date','time']).groupby('date',as_index=False).first()[['date','value']]
# date value
# 0 1/12 13
# 1 1/13 8

how to group by date and unique each group and count each group with pandas

how to group by date and unique each group and count each group with pandas?
Count number of unique MAC address each day
pd.concat([df[['date','Client MAC']], df8[['date',"MAC address"]].rename(columns={"MAC address":"Client MAC"})]).groupby(["date"])
one of column , data example
Association Time
Mon May 14 19:41:20 HKT 2018
Mon May 14 19:43:22 HKT 2018
Tue May 15 09:24:57 HKT 2018
Mon May 14 19:53:33 HKT 2018
i use
starttime=datetime.datetime.now()
dff4 = (df4[['Association Time','Client MAC Address']].groupby(pd.to_datetime(df4["Association Time"]).dt.date.apply(lambda x: dt.datetime.strftime(x, '%Y-%m-%d'))).nunique())
print datetime.datetime.now()-starttime
it runs for 2 minutes, but it also group by association time, it is wrong,
not need to group by association time
Association Time Client MAC Address
Association Time
2017-06-21 1 3
2018-02-21 2 8
2018-02-27 1 1
2018-03-07 3 3
I believe need add ['Client MAC'].nunique():
df = (pd.concat([df[['date','Client MAC']],
df8[['date',"MAC address"]].rename(columns={"MAC address":"Client MAC"})])
.groupby(["date"])['Client MAC']
.nunique())
If dates are datetimes:
df = (pd.concat([df[['date','Client MAC']],
df8[['date',"MAC address"]].rename(columns={"MAC address":"Client MAC"})]))
df = df['Client MAC'].groupby(df["date"].dt.date).nunique()

Built difference between values in the same column

Lets say I have got the following datatable which has one column which gives back the first of each month from 2000 until 2005 and the second column gives back some values which are positive or negative.
What I want to do is that I want to build the difference between two observations from the same month but from different years.
So for example:
I want to calculate the difference between 2001-01-01 and 2000-01-01 and write the value in a new column in the same row where my 2001-01-01 date stands.
I want to do this for all my observations and for the ones who do not have a value in the previous year to compare to, just give back NA.
Thank you for your time and help :)
If there are no gaps in your data, you could use the lag function:
library(dplyr)
df <- data.frame(Date = as.Date(sapply(2000:2005, function(x) paste(x, 1:12, 1, sep = "-"))),
Value = runif(72,0,1))
df$Difference <- df$Value-lag(df$Value, 12)
> df[1:24,]
Date Value Difference
1 2000-01-01 0.83038968 NA
2 2000-02-01 0.85557483 NA
3 2000-03-01 0.41463862 NA
4 2000-04-01 0.16500688 NA
5 2000-05-01 0.89260904 NA
6 2000-06-01 0.21735933 NA
7 2000-07-01 0.96691686 NA
8 2000-08-01 0.99877057 NA
9 2000-09-01 0.96518311 NA
10 2000-10-01 0.68122410 NA
11 2000-11-01 0.85688662 NA
12 2000-12-01 0.97282720 NA
13 2001-01-01 0.83614146 0.005751778
14 2001-02-01 0.07967273 -0.775902097
15 2001-03-01 0.44373647 0.029097852
16 2001-04-01 0.35088593 0.185879052
17 2001-05-01 0.46240321 -0.430205836
18 2001-06-01 0.73177425 0.514414912
19 2001-07-01 0.52017554 -0.446741315
20 2001-08-01 0.52986486 -0.468905713
21 2001-09-01 0.14921003 -0.815973080
22 2001-10-01 0.25427134 -0.426952761
23 2001-11-01 0.36032777 -0.496558857
24 2001-12-01 0.20862578 -0.764201423
I think you should try the lubridate package, very usefull to work with dates.
https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html