Pandas help!
I have a specific column like this,
Mpg
0 18
1 17
2 19
3 21
4 16
5 15
Mpg is mile per gallon,
Now I need to replace that 'MPG' column to 'litre per 100 km' and change those values to litre per 100 km' at the same time. Any help? Thanks beforehand.
-Tom
I changed the name of the column but doing both simultaneously,i could not.
Use pop to return and delete the column at the same time and rdiv to perform the conversion (1 mpg = 1/235.15 liter/100km):
df['litre per 100 km'] = df.pop('Mpg').rdiv(235.15)
If you want to insert the column in the same position:
df.insert(df.columns.get_loc('Mpg'), 'litre per 100 km',
df.pop('Mpg').rdiv(235.15))
Output:
litre per 100 km
0 13.063889
1 13.832353
2 12.376316
3 11.197619
4 14.696875
5 15.676667
An alternative to pop would be to store the result in another dataframe. This way you can perform the two steps at the same time. In my code below, I first reproduce your dataframe, then store the constant for conversion and perform it on all entries using the apply method.
df = pd.DataFrame({'Mpg':[18,17,19,21,16,15]})
cc = 235.214583 # constant for conversion from mpg to L/100km
df2 = pd.DataFrame()
df2['litre per 100 km'] = df['Mpg'].apply(lambda x: cc/x)
print(df2)
The output of this code is:
litre per 100 km
0 13.067477
1 13.836152
2 12.379715
3 11.200694
4 14.700911
5 15.680972
as expected.
I have the following dataset
resulted_by
follow_up_result
follow_up_number
#
%
0
User 1
good
1
30
30
1
User 2
good
2
65
65
2
User 3
bad
3
5
0.05
I want to Pivot:
follow up result and resulted by as indexes
follow up number as a column
# and % as values
pivot = df.head(3).pivot(columns=['follow_up_number'], values=["#", '%'], index=['follow_up_result', 'resulted_by'])
However, I want the follow up number to be above the values, here is how I achieved that:
pivot = df.head(3).pivot(columns=['follow_up_result', 'resulted_by'], values=["#", '%'], index=['follow_up_number'])
pivot = pivot.stack(level=0).T
Notice how I switch columns and indexes.
I want the column names to be at the same level as the values.
Is there a way to do that?
Is there a better way to achieve what I need without switching between columns and indexes?
Code Snippet:
https://onecompiler.com/python/3y5gzm7hu
I need some help with my Data Wrangling in Python, using pandas.
I am appending dataframes in a loop (one or two rows per iteration). For an unknown reason I have duplicated columns when appending which occur only in a few iterations, resulting in the following dataframe:
ID Nom. value 0 1 2 3 ...
1001 -857 -856.989 -856.989 -857.042 -857.042 ...
1001 335 334.987 334.987 335.006 335.006 ...
...
Is there a way to do this systematically in the loop (dataframe is large and consists of many iterations)? I need this dataframe instead:
ID Nom. value 0 1 ...
1001 -857 -856.989 -857.042 ...
1001 335 334.987 335.006 ...
...
Thanks!
I have one table like this:
id status time days ...
1 optimal 60 21
2 optimal 50 21
3 no solution 60 30
4 optimal 21 31
5 no solution 34 12
.
.
.
There are many more rows and columns.
I need to make a query that will return which columns have different information, given two IDs.
Rephrasing it, I'll provide two IDs, for example 1 and 5 and I need to know if these two rows have any columns with different values. In this case, the result should be something like:
id status time days
1 optimal 60 21
5 no solution 34 12
If I provide IDs 1 and 2, for example, the result should be:
id time
1 60
2 50
The output format doesn't need to be like this, it only needs to show clearly which columns are different and their values
I can tell you off the bat that processing this data in some sort of programming language will greatly help you out in terms of simplicity and readability for this type of solution, but here a thread of how it can be done in SQL.
Compare two rows and identify columns whose values are different
If you are looking for the solution in R. Here is my solution:
df <- read.csv(file = "sf.csv", header = TRUE)
diff.eval <- function(first.id, second.id, eval.df) {
res <- eval.df[c(first.id, second.id), ]
cols <- colnames(eval.df)
for (col in cols) {
if (res[1, col] == res[2, col]) {
res[, col] <- NULL
}
}
return(res)
}
print(diff.eval(1, 5, df))
print(diff.eval(1, 2, df))
You just need to create a dataframe out of table. I just created a .csv for ease locally and used the data by importing into a dataframe.
I have struggled with this even after looking at the various past answers to no avail.
My data consists of columns numeric and non numeric. I'd like to average the numeric columns and display my data on the GUI together with the information on the non-numeric columns.The non numeric columns have info such as names,rollno,stream while the numeric columns contain students marks for various subjects. It works well when dealing with one dataframe but fails when I combine two or more dataframes in which it returms only the average of the numeric columns and displays it leaving the non numeric columns undisplayed. Below is one of the codes I've tried so far.
df=pd.concat((df3,df5))
dfs =df.groupby(df.index,level=0).mean()
headers = list(dfs)
self.marks_table.setRowCount(dfs.shape[0])
self.marks_table.setColumnCount(dfs.shape[1])
self.marks_table.setHorizontalHeaderLabels(headers)
df_array = dfs.values
for row in range(dfs.shape[0]):
for col in range(dfs.shape[1]):
self.marks_table.setItem(row, col,QTableWidgetItem(str(df_array[row,col])))
A working code should return averages in something like this
STREAM ADM NAME KCPE ENG KIS
0 EAGLE 663 FLOYCE ATI 250 43 5
1 EAGLE 664 VERONICA 252 32 33
2 EAGLE 665 MACREEN A 341 23 23
3 EAGLE 666 BRIDGIT 286 23 2
Rather than
ADM KCPE ENG KIS
0 663.0 250.0 27.5 18.5
1 664.0 252.0 26.5 33.0
2 665.0 341.0 17.5 22.5
3 666.0 286.0 38.5 23.5
Sample data
Df1 = pd.DataFrame({
'STREAM':[NORTH,SOUTH],
'ADM':[437,238,439],
'NAME':[JAMES,MARK,PETER],
'KCPE':[233,168,349],
'ENG':[70,28,79],
'KIS':[37,82,79],
'MAT':[67,38,29]})
Df2 = pd.DataFrame({
'STREAM':[NORTH,SOUTH],
'ADM':[437,238,439],
'NAME':[JAMES,MARK,PETER],
'KCPE':[233,168,349],
'ENG':[40,12,56],
'KIS':[33,43,43],
'MAT':[22,58,23]})
Your question not clear. However guessing the origin of question based on content. I have modified your datframes which were not well done by adding a stream called 'CENTRAL', see
Df1 = pd.DataFrame({'STREAM':['NORTH','SOUTH', 'CENTRAL'],'ADM':[437,238,439], 'NAME':['JAMES','MARK','PETER'],'KCPE':[233,168,349],'ENG':[70,28,79],'KIS':[37,82,79],'MAT':[67,38,29]})
Df2 = pd.DataFrame({ 'STREAM':['NORTH','SOUTH','CENTRAL'],'ADM':[437,238,439], 'NAME':['JAMES','MARK','PETER'],'KCPE':[233,168,349],'ENG':[40,12,56],'KIS':[33,43,43],'MAT':[22,58,23]})
I have assumed you want to merge the two dataframes and find avarage
df3=Df2.append(Df1)
df3.groupby(['STREAM','ADM','NAME'],as_index=False).sum()
Outcome