Indexing in Function, Pandas

Indexing in Function, Pandas - pandas

I work with pandas dataframes and want to create a function with indexing.
def func(L,ra,N):
df_N = df[T_N].diff()
df[T_N] = df[T_N].iloc[df_N.index.min():df_N.index.max()]
df[T_N] = df[T_N].mean()
Value = [L,df[TR[0]].iloc[1]-(df[T_N[0]].iloc[1]-df[T_N[0]].iloc[1]),1]
return Value
For the line
df[T_N] = df[T_N].iloc[df_N.index.min():df_N.index.max()]
The error
TypeError: cannot do positional indexing on Int64Index with these indexers [nan] of type float
occurs. Does anyone know a way how I can avoid that? Is it even possible to do it that way? It works at other lines, just this one seems to be a problem.

Related

Can't convert Matrix to DataFrame JULIA

How can i convert a matrix to DataFrame in Julia?
I have an 10×2 Matrix{Any}, and when i try to convert it to a dataframe, using this:
df2 = convert(DataFrame,Xt2)
i get this error:
MethodError: Cannot `convert` an object of type Matrix{Any} to an object of type DataFrame

Try instead
df2 = DataFrame(Xt2,:auto)
You cannot use convert for this; you can use the DataFrame constructor, but then as the documentation (simply type ? DataFrame in the Julia REPL) will tell you, you need to either provide a vector of column names, or :auto to auto-generate column names.
Tangentially, I would also strongly recommend avoiding Matrix{Any} (or really anything involving Any) for any scenario where performance is at all important.

Writing data frame with object dtype to HDF5 only works after converting to string

I have a big data dataframe and I want to write it to disk for quick retrieval. I believe to_hdf(...) infers the data type of the columns and sometimes gets it wrong. I wonder what the correct way is to cope with this.
import pandas as pd
import numpy as np
length = 10
df = pd.DataFrame({"a": np.random.randint(1e7, 1e8, length),})
# df.loc[1, "a"] = "abc"
# df["a"] = df["a"].astype(str)
print(df.dtypes)
df.to_hdf("df.hdf5", key="data", format="table")
Uncommenting various lines leads me to the following.
Just filling the column with numbers will lead to a data type int32 and stores without problem
Setting one element to abc changes the data to object, but it seems that to_hdf internally infers another data type and throws an error: TypeError: object of type 'int' has no len()
Explicitely converting the column to str leads to success, and to_hdf stores the data.
Now I am wondering what is happening in the second case, and is there a way to prevent this? The only way I found was to go through all columns, check if they are dtype('O') and explicitely convert them to str.

Instead of using hdf5, I have found a generic pickling library which seems to be perfect for the job: jiblib
Storing and loading data is straight forward:
import joblib
joblib.dump(df, "file.jl")
df2 = joblib.load("file.jl")

DataFrame.quantile() function didn't work when use apply function

I made a function to get the middle part of a DataFrame, but the quantile function didn't work well. So I can't get the part of dataframe I want.
I'm using pandas 0.23.4
Here is the func body.
def func(df,key_name,tail):
df_new = df[df[key_name]>df[key_name].quantile(tail)]
return df_new
but it works good when I try it in this way
func(clean_video,'all_vv',0.1)
when I apply it like
clean_video.apply(func,axis = 1,args = ('all_vv',0.1))
I got the Error below.
AttributeError: ("'int' object has no attribute 'quantile'", 'occurred at index 0')
Thanks for all help.

Vectorizing text from data frame column using pandas

I have a Data Frame which looks like this:
I am trying to vectorize every row, but only from the text column. I wrote this code:
vectorizerCount = CountVectorizer(stop_words='english')
# tokenize and build vocab
allDataVectorized = allData.apply(vectorizerCount.fit_transform(allData.iloc[:]['headline_text']), axis=1)
The error says:
TypeError: ("'csr_matrix' object is not callable", 'occurred at index 0')
Doing some research and trying changes I found out the fit_transform function returns a scipy.sparse.csr.csr_matrix and that is not callable.
Is there another way to do this?
Thanks!

There are a number of problems with your code. You probably need something like
allDataVectorized = pd.DataFrame(vectorizerCount.fit_transform(allData[['headline_text']]))
allData[['headline_text']]) (with the double brackets) is a DataFrame, which transforms to a numpy 2d array.
fit_transform returns a csr matrix.
pd.DataFrame(...) creates a DataFrame from a csr matrix.

DataFrame.apply(func, raw=True) doesn't seem to take effect?

I am trying to hash together only a few columns of my dataframe df so I do
temp = df['field1', 'field2']
df["hash"] = temp.apply(lambda x: hash(x), raw=True, axis=1)
I set raw to true because the doc (I am using 0.22) says it will pass a numpy array instead of a mutable Series but even with raw=True I am getting a Series, why?
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/home/teto/mptcpanalyzer/mptcpanalyzer/data.py", line 190, in _hash_row
return hash(x)
File "/nix/store/9ampki9dbq0imhhm7i27qkh56788cjpz-python3.6-pandas-0.22.0/lib/python3.6/site-packages/pandas/core/generic.py", line 1045, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: ("'Series' objects are mutable, thus they cannot be hashed", 'occurred at index 1')

It's strange, as I can't reproduce your exact error (that is, by me, raw=True indeed results in an np.ndarray being passed). In any case, neither a Series nor a np.ndarray are hashable. The following works, though:
temp.apply(lambda x: hash(tuple(x)), axis=1)
A tuple is hashable.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Indexing in Function, Pandas - pandas

Related

Can't convert Matrix to DataFrame JULIA

Writing data frame with object dtype to HDF5 only works after converting to string

DataFrame.quantile() function didn't work when use apply function

Vectorizing text from data frame column using pandas

DataFrame.apply(func, raw=True) doesn't seem to take effect?

Categories

Resources