Derive pandas datetime from mix integer format - pandas

I want to derive a DateTime column from a mixed range of integer column in a panadas dataFrame. The input column is as below. As you see there is a various length of integer numbers in that column. I want to return:
180000 = 18:00:00
60000 = 06:00:00
0 =00:00:00
13 |180000
14 | 0
15 | 60000
16 |100000
17 | 0
18 | 60000
Thanks,
Pedram.

Use to_datetime:
df['Time'] = pd.to_datetime(df['value'].replace(0, '0'*6), format='%H%M%S', errors='coerce').dt.time
Result:
id value Time
0 13 180000 18:00:00
1 14 0 00:00:00
2 15 60000 06:00:00
3 16 100000 10:00:00
4 17 0 00:00:00
5 18 60000 06:00:00

Related

How to merge two dataframe base on dates which the datediff is one day?

Input
df1
id A
2020-01-01 10
2020-02-07 20
2020-04-09 30
df2
id B
2019-12-31 50
2020-02-06 20
2020-02-07 70
2020-04-08 34
2020-04-09 44
Goal
df
id A B
2020-01-01 10 50
2020-02-07 20 20
2020-04-09 30 34
The detail as follows:
df1 merges df2 base on id, which add columns from df2.
the type of id is datetime.
merge rules: df1 based on yesterday
Could you simply add 1 day to df2's ID column before merging?
df1.merge(df2.assign(id=df2['id'] + pd.Timedelta(days=1)), on='id')
id A B
0 2020-01-01 10 50
1 2020-02-07 20 20
2 2020-04-09 30 34
Try pd.merge_asof
df = pd.merge_asof(df1,df2,on='id',tolerance=pd.Timedelta('1 day'),allow_exact_matches=False)
id A B
0 2020-01-01 10 50
1 2020-02-07 20 20
2 2020-04-09 30 34

Transform data by time & Class

I have a dataframe nf as following:
DateTime Class Count
0 2017-10-01 00:00:00 1 0
1 2017-10-01 00:00:00 2 240
2 2017-10-01 00:00:00 3 17
3 2017-10-01 00:00:00 4 0
4 2017-10-01 00:00:00 5 1
5 2017-10-01 00:00:00 6 0
6 2017-10-01 00:00:00 7 0
7 2017-10-01 00:00:00 8 0
8 2017-10-01 00:00:00 9 0
9 2017-10-01 00:00:00 10 0
10 2017-10-01 00:00:00 11 0
11 2017-10-01 00:00:00 12 0
12 2017-10-01 00:00:00 13 0
13 2017-10-01 00:00:00 14 0
14 2017-10-01 00:00:00 15 0
..............................
30 2017-10-01 01:00:00 1 0
31 2017-10-01 01:00:00 2 209
32 2017-10-01 01:00:00 3 14
33 2017-10-01 01:00:00 4 0
34 2017-10-01 01:00:00 5 4
35 2017-10-01 01:00:00 6 0
36 2017-10-01 01:00:00 7 0
37 2017-10-01 01:00:00 8 0
38 2017-10-01 01:00:00 9 0
39 2017-10-01 01:00:00 10 0
40 2017-10-01 01:00:00 11 0
41 2017-10-01 01:00:00 12 0
42 2017-10-01 01:00:00 13 0
43 2017-10-01 01:00:00 14 0
44 2017-10-01 01:00:00 15 0
....... and so on
There are total 15 classes and counts for each class for each hour
I want to transform the data into columnwise on a per hour for each count basis as follows
Output req
DateTime Class1 Class2 Class3 Class4.........Class15
2017-10-01 00:00:00 0 240 17 0 ......... 0
2017-10-01 00:01:00 0 209 14 0 ......... 0
....
and so on
You can use pandas to read the data into a pd.Dataframe(), select the counts for each class by slicing the dataframe with conditions and concate the data after that by using the datetime as index:
import pandas as pd
# create dataframe from file
df = pd.read_csv('fname')
# or from numpy array
df = pd.Dataframe(data=np_array, columns=['DateTime', 'Class', 'Count'])
# select the counts for each class
df_c1 = df[df.Class == 1]
df_c2 = df[df.Class == 2]
df_c3 = df[df.Class == 3]
df_c4 = df[df.Class == 4]
df_new = pd.Dataframe()
df_new['DateTime'] = df_c1['DateTime']
df_new['Class1'] = df_c1['Count']
df_new['Class2'] = df_c2['Count']
df_new['Class3'] = df_c3['Count']
df_new['Class4'] = df_c4['Count']
The code example is really dirty and I'm probably missing alot, but maybe it gives you an inspiration. I would also recommend you to check the pandas documentation for concat() and Dataframe()
I'm going to review and refactor my example code tomorrow, in case the problem is not solved already. Meanwhile you could fix the layout of the data in your question it's not readable.
Try pivot_table:
(df.pivot_table(index='DateTime',columns='Class',
values='Count',
aggfunc='sum')
.add_prefix('Class_'))

Subtract day column from date column in pandas data frame

I have two columns in my data frame.One column is date(df["Start_date]) and other is number of days.I want to subtract no of days column(df["days"]) from Date column.
I was trying something like this
df["new_date"]=df["Start_date"]-datetime.timedelta(days=df["days"])
I think you need to_timedelta:
df["new_date"]=df["Start_date"]-pd.to_timedelta(df["days"], unit='D')
Sample:
np.random.seed(120)
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10)
df = pd.DataFrame({'Start_date': rng, 'days': np.random.choice(np.arange(10), size=10)})
print (df)
Start_date days
0 2015-02-24 7
1 2015-02-25 0
2 2015-02-26 8
3 2015-02-27 4
4 2015-02-28 1
5 2015-03-01 7
6 2015-03-02 1
7 2015-03-03 3
8 2015-03-04 8
9 2015-03-05 9
df["new_date"]=df["Start_date"]-pd.to_timedelta(df["days"], unit='D')
print (df)
Start_date days new_date
0 2015-02-24 7 2015-02-17
1 2015-02-25 0 2015-02-25
2 2015-02-26 8 2015-02-18
3 2015-02-27 4 2015-02-23
4 2015-02-28 1 2015-02-27
5 2015-03-01 7 2015-02-22
6 2015-03-02 1 2015-03-01
7 2015-03-03 3 2015-02-28
8 2015-03-04 8 2015-02-24
9 2015-03-05 9 2015-02-24

Setting the day in a pandas frame column, from a string list containing only the hours

I wonder if anyone could please help me with this issue: I have a pandas data frame (generated from a text file) which should have a structure similar to this one:
import pandas as pd
data = {'Objtype' : ['bias', 'bias', 'flat', 'flat', 'StdStar', 'flat', 'Arc', 'Target1', 'Arc', 'Flat', 'Flat', 'Flat', 'bias', 'bias'],
'UT' : pd.date_range("23:00", "00:05", freq="5min").values,
'Position' : ['P0', 'P0', 'P0', 'P0', 'P1', 'P1','P1', 'P2','P2','P2', 'P0', 'P0', 'P0', 'P0']}
df = pd.DataFrame(data=data)
I would like to do some operations taking in consideration the time of the observation so I change the UT column from a string format to a numpy datetime64:
df['UT'] = pd.to_datetime(df['UT'])
Which gives me something like this:
Objtype Position UT
0 bias P0 2016-08-31 23:45:00
1 bias P0 2016-08-31 23:50:00
2 flat P0 2016-08-31 23:55:00
3 flat P0 2016-08-31 00:00:00
4 StdStar P1 2016-08-31 00:05:00
5 flat P1 2016-08-31 00:10:00
6 Arc P1 2016-08-31 00:15:00
7 Target1 P1 2016-08-31 00:20:00
However, in here there are two issues:
First) the year/month/day is assigned to the current one.
Second) the day has not changed from 23:59 -> 00:00. Rather it has gone backwards.
If we know the true date at the first data frame index row and we know that all the entries are sequentially (and they always go from sunset to sunrise). How could we correct for these issues?
To find the time delta between 2 rows:
df.UT - df.UT.shift()
Out[48]:
0 NaT
1 00:05:00
2 00:05:00
3 -1 days +00:05:00
4 00:05:00
5 00:05:00
6 00:05:00
7 00:05:00
Name: UT, dtype: timedelta64[ns]
To find when time goes backwards:
df.UT - df.UT.shift() < pd.Timedelta(0)
Out[49]:
0 False
1 False
2 False
3 True
4 False
5 False
6 False
7 False
Name: UT, dtype: bool
To have an additional 1 day for each row going backward:
((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D'))
Out[50]:
0 0 days
1 0 days
2 0 days
3 1 days
4 0 days
5 0 days
6 0 days
7 0 days
Name: UT, dtype: timedelta64[ns]
To broadcast forward the additional days down the series, use the cumsum pattern:
((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D')).cumsum()
Out[53]:
0 0 days
1 0 days
2 0 days
3 1 days
4 1 days
5 1 days
6 1 days
7 1 days
Name: UT, dtype: timedelta64[ns]
Add this correction vector back to your original UT column:
df.UT + ((df.UT - df.UT.shift() < pd.Timedelta(0))*pd.Timedelta(1, 'D')).cumsum()
Out[51]:
0 2016-08-31 23:45:00
1 2016-08-31 23:50:00
2 2016-08-31 23:55:00
3 2016-09-01 00:00:00
4 2016-09-01 00:05:00
5 2016-09-01 00:10:00
6 2016-09-01 00:15:00
7 2016-09-01 00:20:00
Name: UT, dtype: datetime64[ns]

convert hourly time period in 15-minute time period

I have a dataframe like that:
df = pd.read_csv("fileA.csv", dtype=str, delimiter=";", skiprows = None, parse_dates=['Date'])
Date Buy Sell
0 01.08.2009 01:00 15 25
1 01.08.2009 02:00 0 30
2 01.08.2009 03:00 10 18
But I need that one (in 15-min-periods):
Date Buy Sell
0 01.08.2009 01:00 15 25
1 01.08.2009 01:15 15 25
2 01.08.2009 01:30 15 25
3 01.08.2009 01:45 15 25
4 01.08.2009 02:00 0 30
5 01.08.2009 02:15 0 30
6 01.08.2009 02:30 0 30
7 01.08.2009 02:45 0 30
8 01.08.2009 03:00 10 18
....and so on.
I have tried df.resample(). But it does not worked. Does someone know a nice pandas method?!
If fileA.csv looks like this:
Date;Buy;Sell
01.08.2009 01:00;15;25
01.08.2009 02:00;0;30
01.08.2009 03:00;10;18
then you could parse the data with
df = pd.read_csv("fileA.csv", delimiter=";", parse_dates=['Date'])
so that df will look like this:
In [41]: df
Out[41]:
Date Buy Sell
0 2009-01-08 01:00:00 15 25
1 2009-01-08 02:00:00 0 30
2 2009-01-08 03:00:00 10 18
You might want to check df.info() to make sure you successfully parsed your data into a DataFrame with three columns, and that the Date column has dtype datetime64[ns]. Since the repr(df) you posted prints the date in a different format and the column headers do not align with the data, there is a good chance that the data has not yet been parsed properly. If that's true and you post some sample lines from the csv, we should be able help you parse the data into a DataFrame.
In [51]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 3 columns):
Date 3 non-null datetime64[ns]
Buy 3 non-null int64
Sell 3 non-null int64
dtypes: datetime64[ns](1), int64(2)
memory usage: 96.0 bytes
Once you have the DataFrame correctly parsed, resampling to 15 minute time periods can be done with asfreq with forward-filling the missing values:
In [50]: df.set_index('Date').asfreq('15T', method='ffill')
Out[50]:
Buy Sell
2009-01-08 01:00:00 15 25
2009-01-08 01:15:00 15 25
2009-01-08 01:30:00 15 25
2009-01-08 01:45:00 15 25
2009-01-08 02:00:00 0 30
2009-01-08 02:15:00 0 30
2009-01-08 02:30:00 0 30
2009-01-08 02:45:00 0 30
2009-01-08 03:00:00 10 18