Jupyter Notebook still truncating pandas columns - pandas

So, I have a pandas data frame with two fields
partition
account_list
1
[id1,id2,id3,...]
2
[id4,id6,id5,...]
since the list is quite long and I want to see the complete content I'm using
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
louvain_communities.limit(tot_communities).toPandas()
nevertheless, as you can see (Jupiter I think) still truncate the column (I had to edit out the data for privacy)
Is there a way to fix this? I really need to check have the complete list, not truncated, to be shown.

max_colwidth and max_seq_items together work. Code below synthesises list with 500 items. Change the range() and you can test however many you want.
import pandas as pd
pd.set_option("max_colwidth", None)
pd.set_option("max_seq_items", None)
pd.DataFrame([{"partition":i, "account_list":[f"id{j}" for j in range(500)]} for i in range(2)])

Related

In Jupyter, do you do a running update (hide) a pandas table as you are updating it?

In Jupyter, I am running a long-running computation.
I want to show a Pandas table with the top 25 rows. The top 25 may update each iteration.
However, I don't want to show many Pandas tables. I want to delete / update the existing displayed Pandas table.
How is this possible?
This approach seems usable for matplotlibs but not pandas pretty tables.
You can use clear_output and display the dataframe:
from IPython.display import display, clear_output
# this is just to simulate the delay
import time
i = 1
while i<10:
time.sleep(1)
df = pd.DataFrame(np.random.rand(4,3))
clear_output(wait=True)
display(df)
i += 1

display output pandas dataframe in rstudio

One question please.
I like to use rstudio to code in python and R, but when I print a pandas dataframe I get output that doesn't use all the space. It is not very friendly and it is worse if I have more variables.
As shown in the attached image.
Is there a way to display the columns to the right like we do with tibble in r?
Thanks!
I have tried using these options but it doesn't work.
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

Pandas Dataframe column width setting does not work

I used these settings to display the entire dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
However, the columns with text are displayed in a very narrow mode, around 20 characters. The complete text is displayed, no truncation. I formerly used pd.set_option('display.max_colwidth', -1)or set a character limit e.g. pd.set_option('display.max_colwidth', 450) which worked fine and adjusted the column width accordingly. Any ideas what went wrong this time?
Help is very much appreciated because the results I get are VERY long and narrow columns :(

Holoviz panel will not print pandas dataframe row in Jupyter notebook

I'm trying to recreate the first panel.interact example in the Holoviz tutorial using a Pandas dataframe instead of a Dask dataframe. I get the slider, but the pandas dataframe row does not show.
See the original example at: http://holoviz.org/tutorial/Building_Panels.html
I've tried using Dask as in the Holoviz example. Dask rows print out just fine, but it demonstrates that panel seem to treat Dask dataframe rows differently for printing than Pandas dataframe rows. Here's my minimal code:
import pandas as pd
import panel
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
def select_row(rowno=0):
row = df.loc[rowno]
return row
panel.extension()
panel.extension('katex')
panel.interact(select_row, rowno=(0, 5))
I've included a line with the katex extension, because without it, I get a warning that it is needed. Without it, I don't even get the slider.
I can call the select_row(rowno=0) function separately in a Jupyter cell and get a nice printout of the row, so it appears the function is working as it should.
Any help in getting this to work would be most appreciated. Thanks.
Got a solution. With Pandas, loc[rowno:rowno] returns a pandas.core.frame.DataFrame object of length 1 which works fine with panel while loc[rowno] returns a pandas.core.series.Series object which does not work so well. Thus modifying the select_row() function like this makes it all work:
def select_row(rowno=0):
row = df.loc[rowno:rowno]
return row
Still not sure, however, why panel will print out the Dataframe object and not the Series object.
Note: if you use iloc, then you use add +1, i.e., df.iloc[rowno:rowno+1].

Can I request all Pandas columns display in an IPython notebook?

By default, the number of columns displayed by pandas commands is limited to display.max_columns. Is there something like df.showall() that can be used to override this on a per-command bases?
The simplest way I have found to do this is the following:
from IPython.core.display import HTML
HTML(df.to_html())
This will display the whole table in the IPython notebook output cell - all rows and columns. Scrollbars will appear for large tables.
To display all columns but no more than N rows, use:
HTML(df.head(N).to_html())
You can define a context manager:
from contextlib import contextmanager
#contextmanager
def temp_option(option, value):
old_value = pd.get_option(option)
pd.set_option(option, value)
yield
pd.set_option(option, old_value)
Then do what you want, something like
>>>with temp_option('display.max_rows', 200):
print(df)
I thought pandas already had this feature, but I couldn't find it.