I'm trying to change a column heading but pandas won't let me. It has changed one of the columns I asked, but not the other.
new_df2 = new_df2.rename(columns={'value': 'Euro to Dollar rate','date':'exchange rate date'})
output:
exchange rate date value
0 2020-12-31 1.2216
1 2020-12-31 1.2216
2 2020-12-31 1.2216
3 2020-12-31 1.2216
4 NaT NaN
Goal is to change 'value' column to 'Euro to Dollar rate'
print (new_df2.columns.tolist()) returns:'exchange rate date', ' value']
This resulted in me spotting the space character in 'value'. Removed it and it works.
Related
I am working with datetime. Is there anyway to get a value of n months before.
For example, the data look like:
dft = pd.DataFrame(
np.random.randn(100, 1),
columns=["A"],
index=pd.date_range("20130101", periods=100, freq="M"),
)
dft
Then:
For every Jul of each year, we take value of December in previous year and apply it to June next year
For other month left (from Aug this year to June next year), we take value of previous month
For example: that value from Jul-2000 to June-2001 will be the same and equal to value of Dec-1999.
What I've been trying to do is:
dft['B'] = np.where(dft.index.month == 7,
dft['A'].shift(7, freq='M') ,
dft['A'].shift(1, freq='M'))
However, the result is simply a copy of column A. I don't know why. But when I tried for single line of code :
dft['C'] = dft['A'].shift(7, freq='M')
then everything is shifted as expected. I don't know what is the issue here
The issue is index alignment. This shift that you performed acts on the index, but using numpy.where you convert to arrays and lose the index.
Use pandas' where or mask instead, everything will remain as Series and the index will be preserved:
dft['B'] = (dft['A'].shift(1, freq='M')
.mask(dft.index.month == 7, dft['A'].shift(7, freq='M'))
)
output:
A B
2013-01-31 -2.202668 NaN
2013-02-28 0.878792 -2.202668
2013-03-31 -0.982540 0.878792
2013-04-30 0.119029 -0.982540
2013-05-31 -0.119644 0.119029
2013-06-30 -1.038124 -0.119644
2013-07-31 0.177794 -1.038124
2013-08-31 0.206593 -2.202668 <- correct
2013-09-30 0.188426 0.206593
2013-10-31 0.764086 0.188426
... ... ...
2020-12-31 1.382249 -1.413214
2021-01-31 -0.303696 1.382249
2021-02-28 -1.622287 -0.303696
2021-03-31 -0.763898 -1.622287
2021-04-30 0.420844 -0.763898
[100 rows x 2 columns]
I have great difficulties. I have read a csv files, and set the index on "Timestamp" column like this
# df = pd.read_csv (csv_file, quotechar = "'", decimal = ".", delimiter=";", parse_dates = True, index_col="Timestamp")
# df
XYZ PRICE position nrLots posText
Timestamp
2014-10-14 10:00:29 30 140 -1.0 -1.0 buy
2014-10-14 10:00:30 21 90 -1.0 -5.0 buy
2014-10-14 10:00:31 3 110 1.0 2.0 sell
2014-10-14 10:00:32 31 120 1.0 1.0 sell
2014-10-14 10:00:33 4 70 -1.0 -5.0 buy
So if I want to get the price of 2nd row, I want to do like this:
df.loc [2,"PRICE"]
But that does not work. If I want to use df.loc[] operator, I need to insert a Timestamp, like this:
df.loc["2014-10-14 10:00:31", "PRICE"]
If I want to use row numbers, I need to do like this instead:
df["PRICE"].iloc[2]
which sucks. The syntax is ugly. However, it works. I can get the value, and I can set the value - which is what I want.
If I want to find the Timestamp from a row, I can do like this:
df.index[row]
Question) Is there a more elegant syntax to get and set the value, when you always work with a row number? I always iterate over the row numbers, never iterate over Timestamps. I never use the Timestamp to access values, I always use row numbers.
Bonusquestion) If I have a Timestamp, how can I find the corresponding row number?
There is way to do this .
First use df = df.reset_index() .
"Timestamp" will be new column added to df , now you get new index as integer.
And you access any row element with df.loc[] or df.iat[] and you can find any row with specific element .
I have struggled with this even after looking at the various past answers to no avail.
My data consists of columns numeric and non numeric. I'd like to average the numeric columns and display my data on the GUI together with the information on the non-numeric columns.The non numeric columns have info such as names,rollno,stream while the numeric columns contain students marks for various subjects. It works well when dealing with one dataframe but fails when I combine two or more dataframes in which it returms only the average of the numeric columns and displays it leaving the non numeric columns undisplayed. Below is one of the codes I've tried so far.
df=pd.concat((df3,df5))
dfs =df.groupby(df.index,level=0).mean()
headers = list(dfs)
self.marks_table.setRowCount(dfs.shape[0])
self.marks_table.setColumnCount(dfs.shape[1])
self.marks_table.setHorizontalHeaderLabels(headers)
df_array = dfs.values
for row in range(dfs.shape[0]):
for col in range(dfs.shape[1]):
self.marks_table.setItem(row, col,QTableWidgetItem(str(df_array[row,col])))
A working code should return averages in something like this
STREAM ADM NAME KCPE ENG KIS
0 EAGLE 663 FLOYCE ATI 250 43 5
1 EAGLE 664 VERONICA 252 32 33
2 EAGLE 665 MACREEN A 341 23 23
3 EAGLE 666 BRIDGIT 286 23 2
Rather than
ADM KCPE ENG KIS
0 663.0 250.0 27.5 18.5
1 664.0 252.0 26.5 33.0
2 665.0 341.0 17.5 22.5
3 666.0 286.0 38.5 23.5
Sample data
Df1 = pd.DataFrame({
'STREAM':[NORTH,SOUTH],
'ADM':[437,238,439],
'NAME':[JAMES,MARK,PETER],
'KCPE':[233,168,349],
'ENG':[70,28,79],
'KIS':[37,82,79],
'MAT':[67,38,29]})
Df2 = pd.DataFrame({
'STREAM':[NORTH,SOUTH],
'ADM':[437,238,439],
'NAME':[JAMES,MARK,PETER],
'KCPE':[233,168,349],
'ENG':[40,12,56],
'KIS':[33,43,43],
'MAT':[22,58,23]})
Your question not clear. However guessing the origin of question based on content. I have modified your datframes which were not well done by adding a stream called 'CENTRAL', see
Df1 = pd.DataFrame({'STREAM':['NORTH','SOUTH', 'CENTRAL'],'ADM':[437,238,439], 'NAME':['JAMES','MARK','PETER'],'KCPE':[233,168,349],'ENG':[70,28,79],'KIS':[37,82,79],'MAT':[67,38,29]})
Df2 = pd.DataFrame({ 'STREAM':['NORTH','SOUTH','CENTRAL'],'ADM':[437,238,439], 'NAME':['JAMES','MARK','PETER'],'KCPE':[233,168,349],'ENG':[40,12,56],'KIS':[33,43,43],'MAT':[22,58,23]})
I have assumed you want to merge the two dataframes and find avarage
df3=Df2.append(Df1)
df3.groupby(['STREAM','ADM','NAME'],as_index=False).sum()
Outcome
i have a column which is of type object with different currencies. I want to create a new column that converts currencies into rupees. I also have a dictionary that have currency conversion details to indian rupees meaning what is the ruppes conversion of 1$ etc (Example- 1$:70, 1€:79,1€:90,1¥:0.654)
Dictionary looks like this:-
d1 = {'1$':70, '1€':79,'1€':90,'1¥':0.654}
Dataframe looks like this:-
Currency
'20$'
'30€'
'40€'
'35¥'
I want to get:-
Currency Rup_convrtd
'20$' 1400
'30€' 2700
'40€' 3600
'35¥' 22.855
Please somebody help me in getting the Rup_convrtd column using Pandas.
Use:
d1 = {'1$':70, '1€':79,'1€':90,'1¥':0.654}
#first remove 1 with indexing
d = {k[1:]:v for k, v in d1.items()}
print (d)
{'$': 70, '€': 90, '¥': 0.654}
#map last value of column by dict and multiple all values without last
df['Rup_convrtd'] = df['Currency'].str[-1].map(d).mul(df['Currency'].str[:-1].astype(int))
print (df)
Currency Rup_convrtd
0 20$ 1400.00
1 30€ 2700.00
2 40€ 3600.00
3 35¥ 22.89
I have certain numbers within a column of my dataframe that have negative numbers in a string format like this: "500.00-" I need to convert every negative number within the column to numeric format. I'm sure there's an easy way to do this, but I have struggled finding one specific to pandas dataframe. Any help would be greatly appreciated.
I have tried the basic to_numeric function as shown below, but it doesn't read it in correctly. Also, only some of the numbers within the column are negative, therefore I can't simply remove all the negative signs and multiply the column by 1.
Q1['Credit'] = pd.to_numeric(Q1['Credit'])
Sample data:
df:
num
0 50.00
1 60.00-
2 70.00+
3 -80.00
Using series str accessor to check last digit. If it is '-' or '+', swap it to front. Use df.mask to apply it only to rows having -/+ as suffix. Finally, astype column to float
df.num.mask(df.num.str[-1].isin(['-','+']), df.num.str[-1].str.cat(df.num.str[:-1])).astype('float')
Out[1941]:
0 50.0
1 -60.0
2 70.0
3 -80.0
Name: num, dtype: float64
Possibly a bit explicit but would work
# build a mask of negative numbers
m_neg = Q1["Credit"].str.endswith("-")
# remove - signs
Q1["Credit"] = Q1["Credit"].str.rstrip("-")
# convert to number
Q1["Credit"] = pd.to_numeric(Q1["Credit"])
# Apply the mask to create the negatives
Q1.loc[m_neg, "Credit"] *= -1
Let us consider the following example dataframe:
Q1 = pd.DataFrame({'Credit':['500.00-', '100.00', '300.00-']})
Credit
0 500.00-
1 100.00
2 300.00-
We can use str.endswith to create a mask which indicates the negative numbers. Then we use np.where to conditionally convert the numbers to negative:
m1 = Q1['Credit'].str.endswith('-')
m2 = Q1['Credit'].str[:-1].astype(float)
Q1['Credit'] = np.where(m1, -m2, m2)
Output
Credit
0 -500.0
1 100.0
2 -300.0