Problem after groupby (pandas), the grouped column is not accessible - pandas

I have a problem after groupby and receive this error message:
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Ano'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/Bibliotecas/Exemplo.py", line 11, in
x = dfg['Ano']
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\frame.py", line 3024, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Ano'
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
from astropy.stats import biweight_midcorrelation as bw_cor
df = pd.read_csv(r'Bases_dados\D_1_4M\Tudo/combined.csv').iloc[:100000]
df['Ano'] = df['Data decimal']//1
dfg = df.groupby(by=["Ano"]).mean()
print(dfg)
x = dfg['Ano']
y = dfg['Lances']
r = np.corrcoef(x, y)[0][1]
bwr = bw_cor(x, y)
print(bwr, r)
plt.scatter(x, y)
plt.show()
If i use
x = df['Ano']
y = df['Lances']
work fine, but with dfg (grouped by 'Ano'), i receive that err msg.
When i print(dfg), the column "Ano" appears normally.

It's moved to the index part, so you can either reset_index or pass as_index=False to groupby to begin with:
dfg = df.groupby(by="Ano", as_index=False).mean()

Related

What do I need in order to save animation videos from matplotlib in mp3 format?

I am using python3.8 on Linux Mint 19.3, and I am trying to save an animation created by a cellular automata model in matplotlib. My actual code for the model is private, but it uses the same code for saving the animation as the code shown below, which is a slight modification of one of the examples shown in the official matplotlib documentation:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig, ax = plt.subplots()
def f(x, y):
return np.sin(x) + np.cos(y)
x = np.linspace(0, 2 * np.pi, 120)
y = np.linspace(0, 2 * np.pi, 100).reshape(-1, 1)
fig, ax = plt.subplots()
ims = []
for i in range(60):
x += np.pi / 15.
y += np.pi / 20.
im = ax.imshow(f(x, y), animated=True)
if i == 0:
ax.imshow(f(x, y)) # show an initial one first
ims.append([im])
ani = animation.ArtistAnimation(fig, ims, interval=50, blit=True,
repeat_delay=1000)
# To save the animation, use e.g.
#
# ani.save("movie.mp4")
#
# or
#
writer = animation.FFMpegWriter(fps=15, metadata=dict(artist='Me'), bitrate=1800)
ani.save("movie.mp3", writer=writer)
When executed, the code produces this error:
MovieWriter stderr:
Output file #0 does not contain any stream
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 234, in saving
yield self
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 1093, in save
writer.grab_frame(**savefig_kwargs)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 351, in grab_frame
self.fig.savefig(self._proc.stdin, format=self.frame_format,
File "/usr/local/lib/python3.8/dist-packages/matplotlib/figure.py", line 3046, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/backend_bases.py", line 2319, in print_figure
result = print_method(
File "/usr/local/lib/python3.8/dist-packages/matplotlib/backend_bases.py", line 1648, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/_api/deprecation.py", line 415, in wrapper
return func(*inner_args, **inner_kwargs)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/backends/backend_agg.py", line 486, in print_raw
fh.write(renderer.buffer_rgba())
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/justin/animation_test.py", line 36, in <module>
ani.save("movie.mp3", writer=writer)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 1093, in save
writer.grab_frame(**savefig_kwargs)
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 236, in saving
self.finish()
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 342, in finish
self._cleanup() # Inline _cleanup() once cleanup() is removed.
File "/usr/local/lib/python3.8/dist-packages/matplotlib/animation.py", line 373, in _cleanup
raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['ffmpeg', '-f', 'rawvideo', '-vcodec', 'rawvideo', '-s', '640x480', '-pix_fmt', 'rgba', '-r', '15', '-loglevel', 'error', '-i', 'pipe:', '-vcodec', 'h264', '-pix_fmt', 'yuv420p', '-b', '1800k', '-metadata', 'artist=Me', '-y', 'movie.mp3']' returned non-zero exit status 1.
I have looked at posts on similar queries concerning matplotlib animations, but none have specifically included the error Output file #0 does not contain any stream. I have little experience with ffmpeg, so I am wondering what might be missing.

Can anyone tell me the solution of this ValueError in MatPlotLib?

When I'm plotting some random scatter plot an error occurred. Can someone solve it?
The code is
# Date 27-06-2021
import time
import matplotlib.pyplot as plt
import numpy as np
import random as rd
def rd_color():
random_number = rd.randint(0, 16777215)
hex_number = str(hex(random_number))
hex_number = '#' + hex_number[2:]
return hex_number
arr1_for_x = np.linspace(10, 99, 1000)
arr1_for_y = np.random.uniform(10, 99, 1000)
# print(rd_color())
for i in range(1000):
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, color=rd_color())
plt.show()
and the ValueError is
ValueError: 'color' kwarg must be a color or sequence of color specs. For a sequence of values to be color-mapped, use the 'c' argument instead.
Traceback (most recent call last):
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4289, in _parse_scatter_color_args
mcolors.to_rgba_array(kwcolor)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\colors.py", line 367, in to_rgba_array
raise ValueError("Using a string of single character colors as "
ValueError: Using a string of single character colors as a color sequence is not supported. The colors can be passed as an explicit list instead.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Google Drive\tp\Programming\Python\tp2.py", line 24, in <module>
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\pyplot.py", line 3068, in scatter
__ret = gca().scatter(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1361, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4516, in scatter
self._parse_scatter_color_args(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4291, in _parse_scatter_color_args
raise ValueError(
ValueError: 'color' kwarg must be an color or sequence of color specs. For a sequence of values to be color-mapped, use the 'c' argument instead.
after this error I also use c in-place of color
# Date 27-06-2021
import time
import matplotlib.pyplot as plt
import numpy as np
import random as rd
def rd_color():
random_number = rd.randint(0, 16777215)
hex_number = str(hex(random_number))
hex_number = '#' + hex_number[2:]
return str(hex_number)
arr1_for_x = np.linspace(10, 99, 1000)
arr1_for_y = np.random.uniform(10, 99, 1000)
# print(rd_color())
for i in range(1000):
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, c=rd_color())
plt.show()
but again error occured
this time the error is
Traceback (most recent call last):
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4350, in _parse_scatter_color_args
colors = mcolors.to_rgba_array(c)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\colors.py", line 367, in to_rgba_array
raise ValueError("Using a string of single character colors as "
ValueError: Using a string of single character colors as a color sequence is not supported. The colors can be passed as an explicit list instead.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Google Drive\tp\Programming\Python\tp2.py", line 22, in <module>
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\pyplot.py", line 3068, in scatter
__ret = gca().scatter(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1361, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4516, in scatter
self._parse_scatter_color_args(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4359, in _parse_scatter_color_args
raise ValueError(
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not #814aa
please anyone resolve/spell it out for me.
The way I read Matplotlib's Tutorial on Specifying Colours, the hex literals need to have exactly 6 or 3 "hex digits".
This will work:
for i in range(1000):
myColor=rd_color()
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, color=[myColor])
plt.show()
Because you cannot call a function inside this color specification and your colors must be in a list, so define a variable for that color and pass it as list into your color parameter.

Jython ValueError: chr() arg not in range(256)

I am using Jython (jython2.7.0) to send a string value from a java program to a python method and then return the value to the java program but I get this error. ValueError: chr() arg not in range(256) Do you know what is the cause of the problem and How can I solve it ??
Exception in thread "main" Traceback (most recent call last):
File "PageRanking.py", line 9, in <module>
from bs4 import BeautifulSoup
File "C:\jython2.7.0\Lib\bs4\__init__.py", line 35, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "C:\jython2.7.0\Lib\bs4\builder\__init__.py", line 7, in <module>
from bs4.element import (
File "C:\jython2.7.0\Lib\bs4\element.py", line 10, in <module>
from bs4.dammit import EntitySubstitution
File "C:\jython2.7.0\Lib\bs4\dammit.py", line 14, in <module>
from html.entities import codepoint2name
File "C:\jython2.7.0\Lib\html\__init__.py", line 6, in <module>
from html.entities import html5 as _html5
File "C:\jython2.7.0\Lib\html\entities.py", line 2507, in <module>
entitydefs[name] = chr(codepoint)
This is my Python code
from __future__ import with_statement
from bs4 import BeautifulSoup
import requests
def pageRank(link):
url = "https://checkpagerank.net/"
payload = {'name':link}
r = requests.post(url, payload)
with open("requests_results.html", "wb") as f:
f.write(r.content)
with open(r'requests_results.html', "r", encoding='utf-8') as f:
text= f.read()
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('h2')
SResult = results[1]
first= SResult.contents[0]
rankerName = first.find('b').text
second= SResult.contents[2]
rankervalue = second.find('b').text
x = rankervalue[:1]
x = int(x)
x= x*100/10
return x

all input arrays must have the same dimensions - python using append

import numpy as np
x = [1,2,3,4]
y = [[5,6,7,8],[9,0,1,2]]
j = np.append(x,y,axis=0)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 5147, in
append
return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions

Pandas HDF5 append time series fails

Going through the documentation of pandas HDF5 usability (http://pandas.pydata.org/pandas-docs/stable/io.html#io-hdf5) the given example raises an error:
import pandas as pd
import numpy as np
store = pd.HDFStore('store.h5')
np.random.seed(1234)
index = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 3), index=index)
store['df'] = df
df1 = df[0:4]
df2 = df[4:]
store.append('df', df1)
store.append('df', df2)
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-225-ef7f2e059c6a>", line 1, in <module>
store.append('df', df1)
File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 919, in append
**kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 1252, in _write_to_group
raise ValueError('Can only append to Tables')
ValueError: Can only append to Tables
Has something changed here? Or am I doing something wrong?
You need to enable append by default store in the table format by setting the following option at the beginning as your store behaves like a DF currently:
pd.set_option('io.hdf.default_format','table')
Docs