How to show truncated form of large pandas dataframe after style.apply?

How to show truncated form of large pandas dataframe after style.apply? - pandas

Normally, a relatively long dataframe like
df = pd.DataFrame(np.random.randint(0,10,(100,2)))
df
will display a truncated form in jupyter notebook like
With head, tail, ellipsis in between and row column count in the end.
However, after style.apply
def highlight_max(x):
return ['background-color: yellow' if v == x.max() else '' for v in x]
df.style.apply(highlight_max)
we got all rows displayed
Is it possible to still display the truncated form of dataframe after style.apply?

Something simple like this?
def display_df(dataframe, function):
display(dataframe.head().style.apply(function))
display(dataframe.tail().style.apply(function))
print(f'{dataframe.shape[0]} rows x {dataframe.shape[1]} columns')
display_df(df, highlight_max)
Output:
**** EDIT ****
def display_df(dataframe, function):
display(pd.concat([dataframe.iloc[:5,:],
pd.DataFrame(index=['...'], columns=dataframe.columns),
dataframe.iloc[-5:,:]]).style.apply(function))
print(f'{dataframe.shape[0]} rows x {dataframe.shape[1]} columns')
display_df(df, highlight_max)
Output:
The jupyter preview is basically something like this:
def display_df(dataframe):
display(pd.concat([dataframe.iloc[:5,:],
pd.DataFrame(index=['...'], columns=dataframe.columns, data={0: '...', 1: '...'}),
dataframe.iloc[-5:,:]]))
but if you try to apply style you are getting an error (TypeError: '>=' not supported between instances of 'int' and 'str') because it's trying to compare and highlight the string values '...'

You can capture the output in a variable and then use head or tail on it. This gives you more control on what you display every time.
output = df.style.apply(highlight_max)
output.head(10) # 10 -> number of rows to display
If you want to see more variate data you can also use sample, which will get random rows:
output.sample(10)

Related

create n dataframes in a for loop with an extra column with a specific number in it

Hi all I have a dataframe like that shown in the picture:
I am trying to create 2 different dataframes with the same "hour", "minute", "value" (and value.1 respectively) columns, by adding also column with number 0 and 1 respectively). I would like to do it in a for loop as I want to create n dataframe (not just 2 shown here).
I tried something like this but it's not working (error: KeyError: "['value.i'] not in index"):
for i in range(1):
series[i] = df_new[['hour', 'minute', 'value.i']]
series[i].insert(0, 'number', 'i')
can you help me ?
thannks

from what I have understood you want to make value.i to show value.1 or value.2
for i in range(1):
# f is for the format so can interpret i as variable only
series[i] = df_new[['hour','minute',f'value.{i}']]

Get count vectorizer vocabulary in new dataframe column by applying vectorizer on existing dataframe column using pandas

I have dataframe column 'review' with content like 'Food was Awesome' and I want a new column which counts the number of repetition of each word.
name The First Years Massaging Action Teether
review A favorite in our house!
rating 5
Name: 269, dtype: object
Expecting output like ['Food':1,'was':1,'Awesome':1]
I tried with for loop but its taking too long to execute
for row in range(products.shape[0]):
try:
count_vect.fit_transform([products['review_without_punctuation'][row]])
products['word_count'][row]=count_vect.vocabulary_
except:
print(row)
I would like to do it without for loop.

I found a solution for this.
I have defined a function like this-
def Vectorize(text):
try:
count_vect.fit_transform([text])
return count_vect.vocabulary_
except:
return-1
and applied above function-
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
products['word_count'] = products['review_without_punctuation'].apply(Vectorize)
This solution worked and I got vocabulary in new column.

You can get the count vector for all docs like this:
cv = CountVectorizer()
count_vectors = cv.fit_transform(products['review_without_punctuation'])
To get the count vector in array format for a particular document by index, say, the 1st doc,
count_vectors[0].toarray()
The vocabulary is in
cv.vocabulary_
To get the words that make up a count vector, say, for the 1st doc, use
cv.inverse_transform(count_vectors[0])

delete rows from pandas data frame that contains one of its columns as list , when one of its values match value in another compared list

delete rows from pandas data frame that contains one of its columns as list , when one of its values match value in another compared list column in another data frame.
here is the first data frame column: enter image description here
and the other data frame column is here: enter image description here
I have tried a lot of codes
Revdf=Revdf.drop(lambda x: [i for i in Revdf.AffiliationHistory if i in Authdf.Affiliations.values], axis=1)
or
Revdf=Revdf[~(Revdf.AffiliationHistory.isin(Authdf.Affiliations.values))]
but these can't help

There has to be an easier way, but i wrote a function for it and it works:
def remove_row(df1,x1,y1,df2,x2,y2):
assert type(df1.loc[x1,y1])==list,"type have to be list"
assert type(df2.loc[x2,y2])==list,"type have to be list"
flag =False
l1=df1.loc[x1,y1]
print(l1)
l2=df2.loc[x2,y2]
print(l2)
for i in l1:
if i in l2:
flag=True
break
if flag==True:
return df1.drop(x1)
else:
return df1
x is the row index, y is the column name, tried it on synthetic data and it works:
df1=pd.DataFrame({'col1':[0,0,0,0,1],
'col2':[[1,2,3,4],0,0,0,0]})
df2=pd.DataFrame({'col1':[0,0,0,0],
'col2':[[0,0,0,4],0,0,0]})
remove_row(df1,0,'col2',df2,0,'col2')
Also, i think a mistake you're making is this:
[1,2,3,4] in [0,1,2,3,4]
will return false, because you're asking if the second list contains the first.

Python3.4 Pandas DataFrame from function

I wrote a function that outputs selected data from a parsing function. I am trying to put this information into a DataFrame using pandas.DataFrame but I am having trouble.
The headers are listed below as well as the function.head() data output
QUESTION
How will I be able to place the function output within the pandas DataFrame so the headers are linked to the output
HEADERS
--TICK---------NI----------CAPEXP----------GW---------------OE---------------RE-------
OUTPUT
['MMM', ['4,956,000'], ['(1,493,000)'], ['7,050,000'], ['13,109,000'], ['34,317,000']]
['ABT', ['2,284,000'], ['(1,077,000)'], ['10,067,000'], ['21,526,000'], ['22,874,000']]
['ABBV', ['1,774,000'], ['(612,000)'], ['5,862,000'], ['1,742,000'], ['535,000']]

-Loop through each item (I'm assuming data is a list with each element being one of the lists shown above)
-Take the first element as the ticker and convert the rest into numbers using translate to undo the string formatting
-Make a DataFrame per row and then concat all at the end, then transpose
-Set the columns by parsing the header string (I've called it headers)
dflist = list()
for x in data:
h = x[0]
rest = [float(z[0].translate(str.maketrans('(','-','),'))) for z in x[1:]]
dflist.append(pd.DataFrame([h]+rest))
df = pd.concat(dflist, 1).T
df.columns = [x for x in headers.split('-') if len(x) > 0]
But this might be a bit slow - would be easier if you could get your input into a more consistent format.

Apply to each element in a Pandas dataframe

Since each series in the data frame is of tuple, I need to convert them into one number. Basically I have something like this:
price_table['Col1'].apply(lambda x: x[0])
But I actually need to do this for each column. x itself is a tuple but it has only 1 number inside, so I need to return x[0] to get its "value" which is of format float instead of tuple.
In R, I will put axis = c(1,2) but here seems that putting 2 numbers in axis doesnt work:
price_table.apply(lambda x: x[0],axis = 1)
TypeError: <lambda>() got an unexpected keyword argument 'axis'
Is there anyway to apply this simple function to each element in the data frame?
Thanks in advance.

For me the following works well:
"price_table['Col1'].apply(lambda x: x[0],1)"
I do not use the axis. But, I do not know the reason.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to show truncated form of large pandas dataframe after style.apply? - pandas

Related

create n dataframes in a for loop with an extra column with a specific number in it

Get count vectorizer vocabulary in new dataframe column by applying vectorizer on existing dataframe column using pandas

delete rows from pandas data frame that contains one of its columns as list , when one of its values match value in another compared list

Python3.4 Pandas DataFrame from function

Apply to each element in a Pandas dataframe

Categories

Resources