TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid. It was supposed to work with df.append - pandas

df = pd.DataFrame(columns=['locale', 'description'])
for text in texts:
df = pd.concat(
dict(
locale=text.locale,
description=text.description
),
ignore_index=True
)
Are there any workaround for this? It was supposed to work with df.append but it says FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

Related

How to work around the frame.append method is deprecated use pandas.concat instead pandas error

df = pd.DataFrame()
c = WebSocketClient( api_key='APIKEYHERE', feed='socket.polygon.io', market='crypto', subscriptions=["XT.BTC-USD"] )
def handle_msg(msgs: List[WebSocketMessage]):
global df
df = df.append(msgs, ignore_index=True)
print(df)
c.run(handle_msg)
I have a WebSocket client open through polygon.io, when I run this I get exactly what I want but then I get a warning that the frame.append is being deprecated and that I should use pandas.concat instead. Unfortunately, my little fragile brain has no idea how to do this.
I tried doing df = pd.concat(msgs, ignore_index=True) but get TypeError: cannot concatenate object of type '<class 'polygon.websocket.models.models.CryptoTrade'>';
Thanks for any help
To use pandas.concat instead of DataFrame.append, you need to convert the WebSocketMessage objects in the msgs list to a DataFrame and then concatenate them. Here's an example:
def handle_msg(msgs: List[WebSocketMessage]):
global df
msgs_df = pd.DataFrame([msg.to_dict() for msg in msgs])
df = pd.concat([df, msgs_df], ignore_index=True)
print(df)
This code converts each WebSocketMessage object in the msgs list to a dictionary using msg.to_dict() and then creates a DataFrame from the list of dictionaries. Finally, it concatenates this DataFrame with the existing df using pd.concat.

Convert type object column to float

I have a table with a column named "price". This column is of type object. So, it contains numbers as strings and also NaN or ? characters. I want to find the mean of this column but first I have to remove the NaN and ? values and also convert it to float
I am using the following code:
import pandas as pd
import numpy as np
df = pd.read_csv('Automobile_data.csv', sep = ',')
df = df.dropna('price', inplace=True)
df['price'] = df['price'].astype('int')
df['price'].mean()
But, this doesn't work. The error says:
ValueError: No axis named price for object type DataFrame
How can I solve this problem?
edit: in pandas version 1.3 and less, you need subset=[col] wrapped in a list/array. In verison 1.4 and greater you can pass a single column as a string.
You've got a few problems:
df.dropna() arguments require the axis and then the subset. The axis is rows/columns, and then subset is which of those to look at. So you want this to be (I think) df.dropna(axis='rows',subset='price')
Using inplace=True makes the whole thing return None, and so you have set df = None. You don't want to do that. If you are using inplace=True, then you don't assign something to that, the whole line would just be df.dropna(...,inplace=True).
Don't use inplace=True, just do the assignment. That is, you should use df=df.dropna(axis='rows',subset='price')

Replacing .loc method of pandas

When I try this statement, I get an error...
spy_daily.loc[fomc_events.index, "FOMC"] = spy_daily.loc[fomc_events.index, "days_since_fomc"]
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: DatetimeIndex(['2020-03-15', '2008-03-16'], dtype='datetime64[ns]', name='Date', freq=None).
Not sure how to correct it. The complete code is available here...
https://www.wrighters.io/analyzing-stock-data-events-with-pandas/
Try to convert your index from your other dataframe to a list to subsetting your first dataframe:
rows = fomc_events.index.tolist()
spy_daily.loc[rows, "FOMC"] = spy_daily.loc[rows, "days_since_fomc"]

Workaround for Pandas FutureWarning when sorting a DateTimeIndex

As described here, Pandas.sort_index() sometimes emits a FutureWarning when doing a sort on a DateTimeIndex. That question isn't actionable, since it contains no MCVE. Here's one:
import pandas as pd
idx = pd.DatetimeIndex(['2017-07-05 07:00:00', '2018-07-05 07:15:00','2017-07-05 07:30:00'])
df = pd.DataFrame({'C1':['a','b','c']},index=idx)
df = df.tz_localize('UTC')
df.sort_index()
The warning looks like:
FutureWarning: Converting timezone-aware DatetimeArray to
timezone-naive ndarray with 'datetime64[ns]' dtype
The stack (Pandas 0.24.1) is:
__array__, datetimes.py:358
asanyarray, numeric.py:544
nargsort, sorting.py:257
sort_index, frame.py:4795
The error is emitted from datetimes.py, requesting that it be called with a dtype argument. However, there's no way to force that all the way up through nargsort -- it looks like obeying datetimes.py's request would require changes to both pandas and numpy.
Reported here. In the meantime, can you think of a workaround that I've missed?
Issue confirmed for the 0.24.2 milestone. Workaround is to filter the warning, thus:
with warnings.catch_warnings():
# Pandas 0.24.1 emits useless warning when sorting tz-aware index
warnings.simplefilter("ignore")
ds = df.sort_index()

reshape is deprecated issue when I pick series from pandas Dataframe

When I try to take one series from dataframe I get this issue
anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py:52: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
return getattr(obj, method)(*args, **kwds)
This is the code snippet
for idx, categories in enumerate(categorical_columns):
ax = plt.subplot(3,3,idx+1)
ax.set_xlabel(categories[0])
box = [df[df[categories[0]] == atype].price for atype in categories[1]]
ax.boxplot(box)
For avoid chained indexing use DataFrame.loc:
box = [df.loc[df[categories[0]] == atype, 'price'] for atype in categories[1]]
And for remove FutureWarning is necessary upgrade pandas with matplotlib.