Having trouble with and Excel spreadsheet, in google colab and a column is missing - pandas

Yes this is homework and no I don't want an answer to the question, but for some reason the column I would like to move using pandas is missing yet I can still see it on my end result. Why is this happening. This is what I have done:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
#read xlsx file
df = pd.read_excel("https://docs.google.com/spreadsheets/d/e/2PACX-
1vTd9TqybCunAe9HPPdb5mOW5uFn5m5fXO-mecfsn0TEk10_l8Bz1Kc7k13AFWoyvC1t3A7A27zozfTd/pub?
output=xlsx")
df
#removes last 2 rows
df.iloc[0:, 0:21]
#columns grouped by type float
df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
#columns grouped by type object
df.iloc[0:, [1,3,5,6,7]]
#gets dummies and stores them in variables
type_float = df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
type_object = df.iloc[0:, [1,3,5,6,7]]
#concatonates the dummies to orignal dataframe
df = pd.concat([type_float, type_object], axis='columns')
df
#rename
df.rename(columns = {'Attrition_Flag':'Target'}, inplace = True)
df
#Replaceing target with 0/1
df['Target'].replace(['Existing Customer', 'Attrited Customer'],[0, 1], inplace=True)
df
'''
This is where im having trouble
When I try to move column "target" I cant. Ive tried to pop it, and then move it to the back
and when I try using "df.iloc[0:, [15]]" which is its column, it just goes to the next column. Why is this column non-existent? anymore

Not sure if I understand correctly what you need to do but if you want to change the order of columns (make 'Target' the last column) you can use:
all_columns_in_new_order = list(df.columns.drop('Target')) + ['Target']
and then:
df = df.reindex(all_columns_in_new_order, axis=1)

Related

How to include an if condition after merging two dataframes?

currently in my code I'm merging two dataframes from my desktop, dropping some duplicates and some column and the final output is converted in a picture to be sent via telegram.
import pandas as pd
import dataframe_image as di
import telepot
df = pd.read_csv('a.csv', delimiter=';')
df1 = pd.read_csv('b.csv', delimiter=';')
total = pd.merge(df, df1, on="Conc", how="inner")
total = total.drop_duplicates(subset=["A"], keep="first")
total = total.drop(['A','B','C', 'D', 'E','Conc'], 1)
di.export(total, 'total.png')
bot = telepot.Bot('token')
bot.sendPhoto(chatid, photo=open('total.png', 'rb'))
This is the good path, in case the merging row is giving me a new dataframe with text on it. How can I manage the situation if the merging task as an output an empty df so I can send "NA" via telegram?
Many thanks

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

How do you append a column and drop a column with pandas dataframes? Can't figure out why it won't print the dataframe afterwards

The DataFrame that I am working with has a datetime object that I changed to a date object. I attempted to append the date object to be the last column in the DataFrame. I also wanted to drop the datetime object column.
Both the append and drop operations don't work as expected. Nothing prints out afterwards. It should print the entire DataFrame (shortened it is long).
My code:
import pandas as pd
import numpy as np
df7=pd.read_csv('kc_house_data.csv')
print(df7)
mydates = pd.to_datetime(df7['date']).dt.date
print(mydates)
df7.append(mydates)
df7.drop(['date'], axis=1)
print(df7)
Why drop/append? You can overwrite
df7['date'] = pd.to_datetime(df7['date']).dt.date
import pandas as pd
import numpy as np
# read csv, convert column type
df7=pd.read_csv('kc_house_data.csv')
df7['date'] = pd.to_datetime(df7['date']).dt.date
print(df7)
Drop a column using df7.drop('date', axis=1, inplace=True).
Append a column using df7['date'] = mydates.

How to access dask dataframe index value in map_paritions?

I am trying to use dask dataframe map_partition to apply a function which access the value in the dataframe index, rowise and create a new column.
Below is the code I tried.
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(index = ["row0" , "row1","row2","row3","row4"])
df
ddf = dd.from_pandas(df, npartitions=2)
res = ddf.map_partitions(lambda df: df.assign(index_copy= str(df.index)),meta={'index_copy': 'U' })
res.compute()
I am expecting df.index to be the value in the row index, not the entire partition index which it seems to refer to. From the doc here, this work well for columns but not the index.
what you want to do is this
df.index = ['row'+str(x) for x in df.index]
and for that first create your pandas dataframe and then run this code after you will have your expected result.
let me know if this works for you.

when reading an html (pandas.read_html), how to select dataframe and set_ index in one line

I'm reading an html which brings back a list of dataframes. I want to be able to choose the dataframe from the list and set my index (index_col) in the least amount of lines.
Here is what I have right now:
import pandas as pd
df =pd.read_html('http://finviz.com/insidertrading.ashx?or=-10&tv=100000&tc=1&o=-transactionvalue', header = 0)
df2 =df[4] #here I'm assigning df2 to dataframe#4 from the list of dataframes I read
df2.set_index('Date', inplace =True)
Is it possible to do all this in one line? Do I need to create another dataframe (df2) to assign one dataframe from a list, or is it possible I can assign the dataframe as soon as I read the list of dataframes (df).
Thanks.
Anyway:
import pandas as pd
df = pd.read_html('http://finviz.com/insidertrading.ashx?or=-10&tv=100000&tc=1&o=-transactionvalue', header = 0)[4].set_index('Date')