applying random distribution in each row of data frame - pandas

I have the following dataframe
import numpy as np
import pandas as pd
import scipy as sc
import scipy.stats as sct
d= {'col1': [1, 2,5,0.6], 'col2': [3, 4,1,0.8]}
df = pd. DataFrame(data=d)
I want to add two new column in that dataframe but the element of two new columns are the random poisson distribution of col1 and col2
I used the following coding to generate the new columns (col3 and col4).
df ['col3'] = int(sct.poisson.rvs(df.col1,size=1))
df ['col4'] = int(sct.poisson.rvs(df.col2,size=1))
This is the closet example of my dataframe which is quite huge and it contains 3,800,000 rows.
I can generate it using for loop. it took me too long time.
How can generate random poisson distribution based on dataframe without using loop?
Thanks
Zep

Try just using:
df['col3'] = sct.poisson.rvs(df.col1)
df['col4'] = sct.poisson.rvs(df.col2)

Related

Having trouble with and Excel spreadsheet, in google colab and a column is missing

Yes this is homework and no I don't want an answer to the question, but for some reason the column I would like to move using pandas is missing yet I can still see it on my end result. Why is this happening. This is what I have done:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
#read xlsx file
df = pd.read_excel("https://docs.google.com/spreadsheets/d/e/2PACX-
1vTd9TqybCunAe9HPPdb5mOW5uFn5m5fXO-mecfsn0TEk10_l8Bz1Kc7k13AFWoyvC1t3A7A27zozfTd/pub?
output=xlsx")
df
#removes last 2 rows
df.iloc[0:, 0:21]
#columns grouped by type float
df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
#columns grouped by type object
df.iloc[0:, [1,3,5,6,7]]
#gets dummies and stores them in variables
type_float = df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
type_object = df.iloc[0:, [1,3,5,6,7]]
#concatonates the dummies to orignal dataframe
df = pd.concat([type_float, type_object], axis='columns')
df
#rename
df.rename(columns = {'Attrition_Flag':'Target'}, inplace = True)
df
#Replaceing target with 0/1
df['Target'].replace(['Existing Customer', 'Attrited Customer'],[0, 1], inplace=True)
df
'''
This is where im having trouble
When I try to move column "target" I cant. Ive tried to pop it, and then move it to the back
and when I try using "df.iloc[0:, [15]]" which is its column, it just goes to the next column. Why is this column non-existent? anymore
Not sure if I understand correctly what you need to do but if you want to change the order of columns (make 'Target' the last column) you can use:
all_columns_in_new_order = list(df.columns.drop('Target')) + ['Target']
and then:
df = df.reindex(all_columns_in_new_order, axis=1)

Display dataframe index name with Streamlit

The following code does not display the name of the index:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
st.write(df)
Is there a way to have my_index displayed like you would do in a jupyter notebook?
According to the streamlit doc it will write dataframe as a table. So the index name is not shown.
To show the my_index name, reset the index to default and as a result the my_index will become a normal column. Add the following before st.write().
df.reset_index(inplace=True)
Output
I found a solution using pandas dataframe to_html() method:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
st.write(df.to_html(), unsafe_allow_html=True)
This results with the following output:
If you want the index and columns names to be in the same header row you can use the following code:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
df.columns.name = df.index.name
df.index.name = None
st.write(df.to_html(), unsafe_allow_html=True)
This results with the following output:
Note - if you have a large dataset and want to limit the number of rows use df.to_html(max_rows=N) instead where N is the number of rows you want to dispplay.

Lambdas function on multiple columns

I am trying to extract only number from multiple columns in my pandas data.frame.
I am able to do so one-by-one columns however I would like to perform this operation simultaneously to multiple columns
My reproduced example:
import pandas as pd
import re
import numpy as np
import seaborn as sns
df = sns.load_dataset('diamonds')
# Create columns one again
df['clarity2'] = df['clarity']
df.head()
df[['clarity', 'clarity2']].apply(lambda x: x.str.extract(r'(\d+)'))
If you want a tuple
cols = ['clarity', 'clarity2']
tuple(df[col].str.extract(r'(\d+)') for col in cols)
If you want a list
cols = ['clarity', 'clarity2']
[df[col].str.extract(r'(\d+)') for col in cols]
adding them to the original data
df['digit1'], df['digit2'] = [df[col].str.extract(r'(\d+)') for col in cols]

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

plotting a pandas dataframe row by row

I have the following dataframe:
I want to create pie charts one for each row, the thing is that i am having trouble with the charts order, i want each chart to have a figsize of lets say 5,5 and that every row in my dataframe will be a row of plot in my subplots with the index as title.
tried many combinations and playing with pyploy.subplots but not success.
would be glad for some help.
Thanks
You can either transpose your dataframe and using pandas pie kind for plotting, i.e. df.transpose().plot(kind='pie', subplots=True) or iterate through rows while sub plotting.
An example using subplots:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Recreate a similar dataframe
rows = ['rows {}'.format(i) for i in range(5)]
columns = ['hits', 'misses']
col1 = np.random.random(5)
col2 = 1 - col1
data = zip(col1, col2)
df = pd.DataFrame(data=data, index=rows, columns=columns)
# Plotting
fig = plt.figure(figsize=(15,10))
for i, (name, row) in enumerate(df.iterrows()):
ax = plt.subplot(2,3, i+1)
ax.set_title(row.name)
ax.set_aspect('equal')
ax.pie(row, labels=row.index)
plt.show()