Text gets truncated while using streamlit.write(df) - pandas

I'm using the function streamlit.write(df) to display a df, but the text is not fully displayed, here is a short example of the situation.
import pandas as pd
import streamlit as st
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': ['This is some text large text that will not be
completely displayed, need to add break lines or something.', 'short text',
'another piece of text.']})
st.write(df))
This is the output, the ideal thing is to add line breaks, but did not work for me.

You can use table
https://docs.streamlit.io/library/api-reference/data/st.table
st.table(df)

Related

Using ipysheet, how do I adjust the column width for an index column?

I am trying to work through the simple ipysheet example below and have not been able to find a way to increase the column width associated with the date index in the resulting sheet in a Jupyter Notebook
import ipysheet as ip
import pandas as pd
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
sheet = ip.from_dataframe(df)
display(sheet)
Resulting Output
I'm having the same problem, and don't see anything in the code that indicates this is possible.
You can see in the ipysheet code that Sheet extends ipywidget.DOMWidget, which in turn extends ipywidget.Widget. You can probably send some custom CSS into the constructor that will help it render the way you want but that's going to take a lot of research.
The workaround is to reset the index on your DataFrame so that dates is just a column.
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df = df.reset_index()
sheet = ip.from_dataframe(df)
Not exactly what you want, but for my purposes it's definitely good enough.

How to show the titles of subplots on box plots done using Groupby?

I just created box plots using a groupby, however, I'm having trouble including the titles of each box plot. Just to clarify, I don't manually want to change the titles of the subplots, I would like it to be automatically displayed since right now I just get all of the plots but I have no idea which is for what group...
Here's an example:
Here's the code I'm using:
gt_venta_precio_zona = gt_venta[['Precio USD','Zona']]
gt_venta_precio_zona.groupby('Zona').plot.box(fontsize=20,rot=90,figsize=(12,8),return_type='axes',patch_artist=True)
Any help will be highly appreciated.
Thank you in advance!
You can save the grouped data and then iterate over the groups, supplying the group name (key in groups.keys()) as title to the plot function:
import pandas as pd
df = pd.DataFrame( {'col1': pd.np.repeat(list('ABCD'), 50), 'col2': pd.np.random.random(200)})
grp = df.groupby('col1')
for key in grp.groups.keys():
grp.get_group(key).plot.box(title=key)

Cannot create bar plot with pandas

I am trying to create a bar plot using pandas. I have the following code:
import pandas as pd
indexes = ['Strongly agree', 'Agree', 'Neutral', 'Disagree', 'Strongly disagree']
df = pd.DataFrame({'Q7': [10, 11, 1, 0, 0]}, index=indexes)
df.plot.bar(indexes, df['Q7'].values)
By my reckoning this should work but I get a weird KeyError: 'Strongly agree' thrown at me. I can't figure out why this won't work.
By invoking plot as a Pandas method, you're referring to the data structures of df to make your plot.
The way you have it set up, with index=indexes, your bar plot's x values are stored in df.index. That's why Wen's suggestion in the comments to just use df.plot.bar() will work, as Pandas automatically looks to use df.index as the x-axis in this case.
Alternately, you can specify column names for x and y. In this case, you can move indexes into a column with reset_index() and then call the new index column explicitly:
df.reset_index().plot.bar(x="index", y="Q7")
Either approach will yield the correct plot:

matplotlib scatter plot: How to use the data= argument

The matplotlib documentation for scatter() states:
In addition to the above described arguments, this function can take a data keyword argument. If such a data argument is given, the following arguments are replaced by data[]:
All arguments with the following names: ‘s’, ‘color’, ‘y’, ‘c’, ‘linewidths’, ‘facecolor’, ‘facecolors’, ‘x’, ‘edgecolors’.
However, I cannot figure out how to get this to work.
The minimal example
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random(size=(3, 2))
props = {'c': ['r', 'g', 'b'],
's': [50, 100, 20],
'edgecolor': ['b', 'g', 'r']}
plt.scatter(data[:, 0], data[:, 1], data=props)
plt.show()
produces a plot with the default color and sizes, instead of the supplied one.
Anyone has used that functionality?
This seems to be an overlooked feature added about two years ago. The release notes have a short example (
https://matplotlib.org/users/prev_whats_new/whats_new_1.5.html#working-with-labeled-data-like-pandas-dataframes). Besides this question and a short blog post (https://tomaugspurger.github.io/modern-6-visualization.html) that's all I could find.
Basically, any dict-like object ("labeled data" as the docs call it) is passed in the data argument, and plot parameters are specified based on its keys. For example, you can create a structured array with fields a, b, and c
coords = np.random.randn(250, 3).view(dtype=[('a', float), ('b', float), ('c', float)])
You would normally create a plot of a vs b using
pyplot.plot(coords['a'], coords['b'], 'x')
but using the data argument it can be done with
pyplot.plot('a', 'b','x', data=coords)
The label b can be confused with a style string setting the line to blue, but the third argument clears up that ambiguity. It's not limited to x and y data either,
pyplot.scatter(x='a', y='b', c='c', data=coords)
Will set the point color based on column 'c'.
It looks like this feature was added for pandas dataframes, and handles them better than other objects. Additionally, it seems to be poorly documented and somewhat unstable (using x and y keyword arguments fails with the plot command, but works fine with scatter, the error messages are not helpful). That being said, it gives a nice shorthand when the data you want to plot has labels.
In reference to your example, I think the following does what you want:
plt.scatter(data[:, 0], data[:, 1], **props)
That bit in the docs is confusing to me, and looking at the sources, scatter in axes/_axes.py seems to do nothing with this data argument. Remaining kwargs end up as arguments to a PathCollection, maybe there is a bug there.
You could also set these parameters after scatter with the the various set methods in PathCollection, e.g.:
pc = plt.scatter(data[:, 0], data[:, 1])
pc.set_sizes([500,100,200])

Adding Arbitrary points on pandas time series using Dataframe.plot function

I have been trying to plot some time series graphs using the pandas dataframe plot function. I was trying to add markers at some arbitrary points on the plot to show anomalous points. The code I used :
df1 = pd.DataFrame({'Entropy Values' : MeanValues}, index=DateRange)
df1.plot(linestyle = '-')
I have a list of Dates on which I need to add markers.Such as:
Dates = ['15:45:00', '15:50:00', '15:55:00', '16:00:00']
I had a look at this link matplotlib: Set markers for individual points on a line. Does DF.plot have a similar functionality?
I really appreciate the help. Thanks!
DataFrame.plot passes all keyword arguments it does not recognize to the matplotlib plotting method. To put markers at a few points in the plot you can use the markevery argument. Here is an example:
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': range(10)}).set_index('A')
df.plot(linestyle='-', markevery=[1, 5, 7, 8], marker='o', markerfacecolor='r')
In your case, you would have to do something like
df1.plot(linestyle='-', markevery=Dates, marker='o', markerfacecolor='r')