The following code does not display the name of the index:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
st.write(df)
Is there a way to have my_index displayed like you would do in a jupyter notebook?
According to the streamlit doc it will write dataframe as a table. So the index name is not shown.
To show the my_index name, reset the index to default and as a result the my_index will become a normal column. Add the following before st.write().
df.reset_index(inplace=True)
Output
I found a solution using pandas dataframe to_html() method:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
st.write(df.to_html(), unsafe_allow_html=True)
This results with the following output:
If you want the index and columns names to be in the same header row you can use the following code:
import pandas as pd
import streamlit as st
df = pd.DataFrame(['row1', 'row2'], index=pd.Index([1, 2], name='my_index'))
df.columns.name = df.index.name
df.index.name = None
st.write(df.to_html(), unsafe_allow_html=True)
This results with the following output:
Note - if you have a large dataset and want to limit the number of rows use df.to_html(max_rows=N) instead where N is the number of rows you want to dispplay.
Related
Yes this is homework and no I don't want an answer to the question, but for some reason the column I would like to move using pandas is missing yet I can still see it on my end result. Why is this happening. This is what I have done:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
#read xlsx file
df = pd.read_excel("https://docs.google.com/spreadsheets/d/e/2PACX-
1vTd9TqybCunAe9HPPdb5mOW5uFn5m5fXO-mecfsn0TEk10_l8Bz1Kc7k13AFWoyvC1t3A7A27zozfTd/pub?
output=xlsx")
df
#removes last 2 rows
df.iloc[0:, 0:21]
#columns grouped by type float
df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
#columns grouped by type object
df.iloc[0:, [1,3,5,6,7]]
#gets dummies and stores them in variables
type_float = df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
type_object = df.iloc[0:, [1,3,5,6,7]]
#concatonates the dummies to orignal dataframe
df = pd.concat([type_float, type_object], axis='columns')
df
#rename
df.rename(columns = {'Attrition_Flag':'Target'}, inplace = True)
df
#Replaceing target with 0/1
df['Target'].replace(['Existing Customer', 'Attrited Customer'],[0, 1], inplace=True)
df
'''
This is where im having trouble
When I try to move column "target" I cant. Ive tried to pop it, and then move it to the back
and when I try using "df.iloc[0:, [15]]" which is its column, it just goes to the next column. Why is this column non-existent? anymore
Not sure if I understand correctly what you need to do but if you want to change the order of columns (make 'Target' the last column) you can use:
all_columns_in_new_order = list(df.columns.drop('Target')) + ['Target']
and then:
df = df.reindex(all_columns_in_new_order, axis=1)
I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6
I am trying to use dask dataframe map_partition to apply a function which access the value in the dataframe index, rowise and create a new column.
Below is the code I tried.
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(index = ["row0" , "row1","row2","row3","row4"])
df
ddf = dd.from_pandas(df, npartitions=2)
res = ddf.map_partitions(lambda df: df.assign(index_copy= str(df.index)),meta={'index_copy': 'U' })
res.compute()
I am expecting df.index to be the value in the row index, not the entire partition index which it seems to refer to. From the doc here, this work well for columns but not the index.
what you want to do is this
df.index = ['row'+str(x) for x in df.index]
and for that first create your pandas dataframe and then run this code after you will have your expected result.
let me know if this works for you.
I have the following dataframe
import numpy as np
import pandas as pd
import scipy as sc
import scipy.stats as sct
d= {'col1': [1, 2,5,0.6], 'col2': [3, 4,1,0.8]}
df = pd. DataFrame(data=d)
I want to add two new column in that dataframe but the element of two new columns are the random poisson distribution of col1 and col2
I used the following coding to generate the new columns (col3 and col4).
df ['col3'] = int(sct.poisson.rvs(df.col1,size=1))
df ['col4'] = int(sct.poisson.rvs(df.col2,size=1))
This is the closet example of my dataframe which is quite huge and it contains 3,800,000 rows.
I can generate it using for loop. it took me too long time.
How can generate random poisson distribution based on dataframe without using loop?
Thanks
Zep
Try just using:
df['col3'] = sct.poisson.rvs(df.col1)
df['col4'] = sct.poisson.rvs(df.col2)
I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.
Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2
try
df = pd.DataFrame(df1['Genre'].value_counts())