what's the most pandas-appropriate way of achieving this? I want to create a column with datetime objects from the 'year','month' and 'day' columns, but all I came up with is some code that looks way too cumbersome:
myList=[]
for row in df_orders.iterrows(): #df_orders is the dataframe
myList.append(dt.datetime(row[1][0],row[1][1],row[1][2]))
#-->year, month and day are the 0th,1st and 2nd columns.
mySeries=pd.Series(myList,index=df_orders.index)
df_orders['myDateFormat']=mySeries
thanks a lot for any help.
Try this:
In [1]: df = pd.DataFrame(dict(yyyy=[2000, 2000, 2000, 2000],
mm=[1, 2, 3, 4], day=[1, 1, 1, 1]))
Convert to an integer:
In [2]: df['date'] = df['yyyy'] * 10000 + df['mm'] * 100 + df['day']
Convert to a string, then a datetime (as pd.to_datetime will interpret the integer differently):
In [3]: df['date'] = pd.to_datetime(df['date'].apply(str))
In [4]: df
Out[4]:
day mm yyyy date
0 1 1 2000 2000-01-01 00:00:00
1 1 2 2000 2000-02-01 00:00:00
2 1 3 2000 2000-03-01 00:00:00
3 1 4 2000 2000-04-01 00:00:00
Related
There are three dates in a df Date column sorted in ascending order. How to write text 'Short' for nearest date, 'Mid' for next date, 'Long' for the farthest date in a new column adjacent to the Date column ? i.e. 2021-04-23 = Short, 2021-05-11 = Mid and 2021-10-08 = Long.
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers"],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50],
"No_Of_Units":[5,5, 10, 20, 20, 8],
"Available_Quantity":[5,6,10,1,3,2],
"Date":['11-05-2021', '23-04-2021', '08-10-2021','23-04-2021', '08-10-2021','11-05-2021']
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'], format = '%d-%m-%Y')
df = df.sort_values(by='Date')
Convert to_datetime and rank the dates, then map your values in the desired order:
df['New'] = (pd.to_datetime(df['Date']).rank(method='dense')
.map(dict(enumerate(['Short', 'Mid', 'Long'], start=1)))
)
Output:
product_name Unit_Price No_Of_Units Available_Quantity Date New
1 Mouse 200.000 5 6 2021-04-23 Short
3 CPU 10000.550 20 1 2021-04-23 Short
0 Keyboard 500.000 5 5 2021-05-11 Mid
5 Speakers 250.500 8 2 2021-05-11 Mid
2 Monitor 5000.235 10 10 2021-10-08 Long
4 CPU 10000.550 20 3 2021-10-08 Long
I have a column with dates in string format '2017-01-01'. Is there a way to extract day and month from it using pandas?
I have converted the column to datetime dtype but haven't figured out the later part:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df.dtypes:
Date datetime64[ns]
print(df)
Date
0 2017-05-11
1 2017-05-12
2 2017-05-13
With dt.day and dt.month --- Series.dt
df = pd.DataFrame({'date':pd.date_range(start='2017-01-01',periods=5)})
df.date.dt.month
Out[164]:
0 1
1 1
2 1
3 1
4 1
Name: date, dtype: int64
df.date.dt.day
Out[165]:
0 1
1 2
2 3
3 4
4 5
Name: date, dtype: int64
Also can do with dt.strftime
df.date.dt.strftime('%m')
Out[166]:
0 01
1 01
2 01
3 01
4 01
Name: date, dtype: object
A simple form:
df['MM-DD'] = df['date'].dt.strftime('%m-%d')
Use dt to get the datetime attributes of the column.
In [60]: df = pd.DataFrame({'date': [datetime.datetime(2018,1,1),datetime.datetime(2018,1,2),datetime.datetime(2018,1,3),]})
In [61]: df
Out[61]:
date
0 2018-01-01
1 2018-01-02
2 2018-01-03
In [63]: df['day'] = df.date.dt.day
In [64]: df['month'] = df.date.dt.month
In [65]: df
Out[65]:
date day month
0 2018-01-01 1 1
1 2018-01-02 2 1
2 2018-01-03 3 1
Timing the methods provided:
Using apply:
In [217]: %timeit(df['date'].apply(lambda d: d.day))
The slowest run took 33.66 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 210 µs per loop
Using dt.date:
In [218]: %timeit(df.date.dt.day)
10000 loops, best of 3: 127 µs per loop
Using dt.strftime:
In [219]: %timeit(df.date.dt.strftime('%d'))
The slowest run took 40.92 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 284 µs per loop
We can see that dt.day is the fastest
This should do it:
df['day'] = df['Date'].apply(lambda r:r.day)
df['month'] = df['Date'].apply(lambda r:r.month)
I have a data frame with two columns Date and value.
I want to add new column named week_number that basically is how many weeks back from the given date
import pandas as pd
df = pd.DataFrame(columns=['Date','value'])
df['Date'] = [ '04-02-2019','03-02-2019','28-01-2019','20-01-2019']
df['value'] = [10,20,30,40]
df
Date value
0 04-02-2019 10
1 03-02-2019 20
2 28-01-2019 30
3 20-01-2019 40
suppose given date is 05-02-2019.
Then I need to add a column week_number in a way such that how many weeks back the Date column date is from given date.
The output should be
Date value week_number
0 04-02-2019 10 1
1 03-02-2019 20 1
2 28-01-2019 30 2
3 20-01-2019 40 3
how can I do this in pandas
First convert column to datetimes by to_datetime with dayfirst=True, then subtract from right side by rsub, convert timedeltas to days, get modulo by 7 and add 1:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['week_number'] = df['Date'].rsub(pd.Timestamp('2019-02-05')).dt.days // 7 + 1
#alternative
#df['week_number'] = (pd.Timestamp('2019-02-05') - df['Date']).dt.days // 7 + 1
print (df)
Date value week_number
0 2019-02-04 10 1
1 2019-02-03 20 1
2 2019-01-28 30 2
3 2019-01-20 40 3
I have two DataFrames in pandas. One of them has data every month, the other one has data every year. I need to do some computation where the yearly value is added to the monthly value.
Something like this:
df1, monthly:
2013-01-01 1
2013-02-01 1
...
2014-01-01 1
2014-02-01 1
...
2015-01-01 1
df2, yearly:
2013-01-01 1
2014-01-01 2
2015-01-01 3
And I want to produce something like this:
2013-01-01 (1+1) = 2
2013-02-01 (1+1) = 2
...
2014-01-01 (1+2) = 3
2014-02-01 (1+2) = 3
...
2015-01-01 (1+3) = 4
Where the value of the monthly data is added to the value of the yearly data depending on the year (first value in the parenthesis is the monthly data, second value is the yearly data).
Assuming your "month" column is called date in the Dataframe df, then you can obtain the year by using the dt member:
pd.to_datetime(df.date).dt.year
Add a column like that to your month DataFrame, and call it year. (See this for an explanation).
Now do the same to the year DataFrame.
Do a merge on the month and year DataFrames, specifying how=left.
In the resulting DataFrame, you will have both columns. Now just add them.
Example
month_df = pd.DataFrame({
'date': ['2013-01-01', '2013-02-01', '2014-02-01'],
'amount': [1, 2, 3]})
year_df = pd.DataFrame({
'date': ['2013-01-01', '2014-02-01', '2015-01-01'],
'amount': [7, 8, 9]})
month_df['year'] = pd.to_datetime(month_df.date).dt.year
year_df['year'] = pd.to_datetime(year_df.date).dt.year
>>> pd.merge(
month_df,
year_df,
left_on='year',
right_on='year',
how='left')
amount_x date_x year amount_y date_y
0 1 2013-01-01 2013 7 2013-01-01
1 2 2013-02-01 2013 7 2013-01-01
2 3 2014-02-01 2014 8 2014-02-01
I have a pandas dataframe like the following
Year Month Day Securtiy Trade Value NewDate
2011 1 10 AAPL Buy 1500 0
My question is, how can I merge the columns Year, Month, Day into column NewDate
so that the newDate column looks like the following
2011-1-10
The best way is to parse it when reading as csv:
In [1]: df = pd.read_csv('foo.csv', sep='\s+', parse_dates=[['Year', 'Month', 'Day']])
In [2]: df
Out[2]:
Year_Month_Day Securtiy Trade Value NewDate
0 2011-01-10 00:00:00 AAPL Buy 1500 0
You can do this without the header, by defining column names while reading:
pd.read_csv(input_file, header=['Year', 'Month', 'Day', 'Security','Trade', 'Value' ], parse_dates=[['Year', 'Month', 'Day']])
If it's already in your DataFrame, you could use an apply:
In [11]: df['Date'] = df.apply(lambda s: pd.Timestamp('%s-%s-%s' % (s['Year'], s['Month'], s['Day'])), 1)
In [12]: df
Out[12]:
Year Month Day Securtiy Trade Value NewDate Date
0 2011 1 10 AAPL Buy 1500 0 2011-01-10 00:00:00
df['Year'] + '-' + df['Month'] + '-' + df['Date']
You can create a new Timestamp as follows:
df['newDate'] = df.apply(lambda x: pd.Timestamp('{0}-{1}-{2}'
.format(x.Year, x.Month, x.Day),
axix=1)
>>> df
Year Month Day Securtiy Trade Value NewDate newDate
0 2011 1 10 AAPL Buy 1500 0 2011-01-10