When I try this statement, I get an error...
spy_daily.loc[fomc_events.index, "FOMC"] = spy_daily.loc[fomc_events.index, "days_since_fomc"]
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: DatetimeIndex(['2020-03-15', '2008-03-16'], dtype='datetime64[ns]', name='Date', freq=None).
Not sure how to correct it. The complete code is available here...
https://www.wrighters.io/analyzing-stock-data-events-with-pandas/
Try to convert your index from your other dataframe to a list to subsetting your first dataframe:
rows = fomc_events.index.tolist()
spy_daily.loc[rows, "FOMC"] = spy_daily.loc[rows, "days_since_fomc"]
Related
I have a dataframe df.
type(df) # pandas.core.frame.DataFrame
df[0] # KeyError 0
df[0:1] # Gives row 0 as expected
What is going on? I am sorry I come from an R background and have done some work with Python in the past but thought this was possible. What am I missing?
You're close! If you use .iloc it will return what you are expecting. So for this case you would be
first_row = df.iloc[0]
When you do df[0] it is looking for a column named 0.
reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
I'm using Pingouin.jl to test normality.
In their docs, we have
dataset = Pingouin.read_dataset("mediation")
Pingouin.normality(dataset, method="jarque_bera")
Which should return a DataFrame with normality true or false for each name in the dataset.
Currently, this broadcasting is deprecated, and I'm unable to concatenate the result in one DataFrame for each unique-column-output (which is working and outputs a DataFrame).
So, what I have so far.
function var_norm(df)
norm = DataFrame([])
for i in 1:1:length(names(df))
push!(norm, Pingouin.normality(df[!,names(df)[i]], method="jarque_bera"))
end
return norm
end
The error I get:
julia> push!(norm, Pingouin.normality(df[!,names(df)[1]], method="jarque_bera"))
ERROR: ArgumentError: `push!` does not allow passing collections of type DataFrame to be pushed into a DataFrame. Only `Tuple`, `AbstractArray`, `AbstractDict`, `DataFrameRow` and `NamedTuple` are allowed.
Stacktrace:
[1] push!(df::DataFrame, row::DataFrame; promote::Bool)
# DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1603
[2] push!(df::DataFrame, row::DataFrame)
# DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:1601
[3] top-level scope
# REPL[163]:1
EDIT: push! function was not properly written at my first version of the post. But, the error persists after the change. How can I reformat the output of type DataFrame from Pingouin into DataFrameRow?
As Pengouin.normality returns a DataFrame, you will have to iterate over its results and push one-by-one:
df = Pengouin.normality(…)
for row in eachrow(df)
push!(norms, row)
end
If you are sure Pengouin.normality returns a DataFrame with exactly one row, you can simply write
push!(norms, only(Pengouin.normality(…)))
I received error
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
on my code df['mean'] = df.groupby('col1').transform('mean')
I want to add the calculation as a new column.
I understood the error message and learnt that I need to use .loc to solve the issue but I don't know how to include .loc into my current code.
Try following:
df.loc[:, 'mean'] = df.groupby('col1').transform('mean')
I am creating a dask dataframe from a pandas dataframe using the from_pandas() function. When I try to select two columns from the dask dataframe using the square brackets [[ ]], I am getting a KeyError.
According to dask documentation, the dask dataframe supports the square bracket column selection like the pandas dataframe.
# data is a pandas dataframe
dask_df = ddf.from_pandas(data, 30)
data = data[dask_df[['length', 'country']].apply(
lambda x: myfunc(x, countries),
meta=('Boolean'),
axis=1
).compute()].reset_index(drop=True)
This is the error I am getting:
KeyError: "None of [Index(['length', 'country'], dtype='object')] are in the [columns]"
I was thinking that this might be something to do with providing the correct meta for the apply, but from the error it seems like the dask dataframe is not able to select the two columns, which should happen before the apply.
This works perfectly with if I replace "dask_df" with "data"(pandas df) in the apply line.
Is the index not being preserved when I am doing the from_pandas?
Try loading less data at once.
I had the same issue, but when I loaded only a subset of my data, it worked.
With the large dataset, I was able to run print(dask_df.columns) and see e.g.
Index(['apple', 'orange', 'pear'], dtype='object', name='fruit').
But when I ran dask_df.compute I would get KeyError: "None of [Index(['apple', 'orange', 'pear'], dtype='object')] are in the [columns]".
I knew that the data set was too big for my memory, and was trying dask hoping it would just figure it out for me =) I guess I have more work to do, but in any case I am glad to be in dask!
As the error states: columns ['length', 'country']
do not exist in dask_df.
Create them first than run your function.
As described here, Pandas.sort_index() sometimes emits a FutureWarning when doing a sort on a DateTimeIndex. That question isn't actionable, since it contains no MCVE. Here's one:
import pandas as pd
idx = pd.DatetimeIndex(['2017-07-05 07:00:00', '2018-07-05 07:15:00','2017-07-05 07:30:00'])
df = pd.DataFrame({'C1':['a','b','c']},index=idx)
df = df.tz_localize('UTC')
df.sort_index()
The warning looks like:
FutureWarning: Converting timezone-aware DatetimeArray to
timezone-naive ndarray with 'datetime64[ns]' dtype
The stack (Pandas 0.24.1) is:
__array__, datetimes.py:358
asanyarray, numeric.py:544
nargsort, sorting.py:257
sort_index, frame.py:4795
The error is emitted from datetimes.py, requesting that it be called with a dtype argument. However, there's no way to force that all the way up through nargsort -- it looks like obeying datetimes.py's request would require changes to both pandas and numpy.
Reported here. In the meantime, can you think of a workaround that I've missed?
Issue confirmed for the 0.24.2 milestone. Workaround is to filter the warning, thus:
with warnings.catch_warnings():
# Pandas 0.24.1 emits useless warning when sorting tz-aware index
warnings.simplefilter("ignore")
ds = df.sort_index()