dataframe python converting to weekday from year, month, day - pandas

I am trying to add new Dataframe column by manipulating other cols.
import pandas as pd
import numpy as np
from pandas import DataFrame, read_csv
from pandas import read_csv
import datetime
df = pd.read_csv('PRSA_data_2010.1.1-2014.12.31.csv')
df.head()
When I am trying to manipulate
df['weekday']= np.int(datetime.datetime(df.year, df.month, df.day).weekday())
I am keep getting error cannot convert the series to class 'int'.
Can anyone tell me a reason behind this and how I can fix it?
Thanks in advance!

Convert columns to datetimes and then to weekdays by Series.dt.weekday:
df['weekday'] = pd.to_datetime(df[['year', 'month', 'day']].dt.weekday
Or convert columns to datetime column in read_csv:
df = pd.read_csv('PRSA_data_2010.1.1-2014.12.31.csv',
date_parser=lambda y,m,d: y + '-' + m + '-' + d,
parse_dates={'datetimes':['year','month','day']})
df['weekday'] = df['datetimes'].dt.weekday

Related

How to subtract sales for month 1 and month 2 for every customer in my dataframe using pandas?

This is my data frame
`
c = pd.DataFrame({"Product":["p1","p1","p2","p2","p3","p3","p4","p4"],
"sales":[10000,20000,30000,40000,10000,24000,13000,20000],
"Month":["M1","M2","M1","M2","M1","M2","M1","M2"]})
`
The answer should be another dataframe
I tired using boolean masking but I am not sure how to work with both the columns.
Is this what you are looking for?:
import pandas as pd
import numpy as np
c = pd.DataFrame({"Product":["p1","p1","p2","p2","p3","p3","p4","p4"],
"sales":[10000,20000,30000,40000,10000,24000,13000,20000],
"Month":["M1","M2","M1","M2","M1","M2","M1","M2"]})
c['sales'] = np.where(c['Month'] == "M2", c['sales'] * -1, c['sales'])
c.groupby('Product').sum()
This will work only in the case where you have only 'M1' and 'M2'

Xarray datetime to ordinal

In pandas there is a toordinal function to convert the datetime to ordinal, such as:Convert date to ordinal python? or Pandas datetime column to ordinal. I have an xarray dataarray with time coordinate that I want to convert it to ordinal. Is there similar panda's toordinal to do it in xarray?
sample:
Coordinates:
time
array(['2019-07-31T10:00:00.000000000', '2019-07-31T10:15:00.000000000',
'2019-07-31T10:30:00.000000000', '2019-07-31T10:45:00.000000000',
'2019-07-31T11:00:00.000000000', '2019-07-31T11:15:00.000000000',
'2019-07-31T11:30:00.000000000', '2019-07-31T11:45:00.000000000',
'2019-07-31T12:00:00.000000000'], dtype='datetime64[ns]')
I didn't find a xarray-native way to do it.
But, you can work around it by converting the time values to datetime objects, on which you can then use toordinal:
import pandas as pd
import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
time_ordinal = [pd.to_datetime(x).toordinal() for x in ds.time.values]
print(time_ordinal[:5])
# [734869, 734869, 734869, 734869, 734870]

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

How do you append a column and drop a column with pandas dataframes? Can't figure out why it won't print the dataframe afterwards

The DataFrame that I am working with has a datetime object that I changed to a date object. I attempted to append the date object to be the last column in the DataFrame. I also wanted to drop the datetime object column.
Both the append and drop operations don't work as expected. Nothing prints out afterwards. It should print the entire DataFrame (shortened it is long).
My code:
import pandas as pd
import numpy as np
df7=pd.read_csv('kc_house_data.csv')
print(df7)
mydates = pd.to_datetime(df7['date']).dt.date
print(mydates)
df7.append(mydates)
df7.drop(['date'], axis=1)
print(df7)
Why drop/append? You can overwrite
df7['date'] = pd.to_datetime(df7['date']).dt.date
import pandas as pd
import numpy as np
# read csv, convert column type
df7=pd.read_csv('kc_house_data.csv')
df7['date'] = pd.to_datetime(df7['date']).dt.date
print(df7)
Drop a column using df7.drop('date', axis=1, inplace=True).
Append a column using df7['date'] = mydates.

Pandas- conditional information retrieval with on a date range

I'm still fairly new to pandas and the script i wrote to accomplish a seemily easy task seems needlessly complicated. If you guys know of an easier way to accomplish this I would be extremely grateful.
task:
I hate two spreadsheets (df1&df2), each with an identifier (mrn) and a date. my task is to retrieve an value from df2 for each row in df1 if the following conditions are met:
the identifier for a given row in df1 exists in df2
if above is true, then retrieve the value in df2 if the associated date is within a +/-5 day range from the date in df1.
I have written the following code which accomplishes this:
#%%housekeeping
import numpy as np
import pandas as pd
import csv
import datetime
from datetime import datetime, timedelta
import sys
from io import StringIO
#%%dataframe import
df1=',mrn,date,foo\n0,1,2015-03-06,n/a\n1,11,2009-08-14,n/a\n2,14,2009-05-18,n/a\n3,20,2010-06-19,n/a\n'
df2=',mrn,collection Date,Report\n0,1,2015-03-06,report to import1\n1,11,2009-08-12,report to import11\n2,14,2009-05-21,report to import14\n3,20,2010-06-25,report to import20\n'
df1 = pd.read_csv(StringIO(df1))
df2 = pd.read_csv(StringIO(df2))
#converting to date-time format
df1['date']=pd.to_datetime(df1['date'])
df2['collection Date']=pd.to_datetime(df2['collection Date'])
#%%mask()
def mask(df2, rangeTime):
mask= (df2> rangeTime -timedelta(days=5)) & (df2 <= rangeTime + timedelta(days=5))
return mask
#%% detailLoop()
i=0
for element in df1["mrn"]:
df1DateIter = df1.ix[i, 'date']
df2MRNmatch= df2.loc[df2['mrn']==element, ['collection Date', 'Report']]
df2Date= df2MRNmatch['collection Date']
df2Report= df2MRNmatch['Report']
maskOut= mask(df2Date, df1DateIter)
dateBoolean= maskOut.iloc[0]
if dateBoolean==True:
df1.ix[i, 'foo'] = df2Report.iloc[0]
i+=1
#: once the script has been run the df1 looks like:
Out[824]:
mrn date foo
0 1 2015-03-06 report to import1
1 11 2009-08-14 report to import11
2 14 2009-05-18 report to import14
3 20 2010-06-19 NaN