Pandas apply function is not resolving - pandas

Please view the Data Frame by clicking this image
Names=jobs[['Company Name']]
F = lambda x: x.split("\n")
Names.apply(F , axis=1)
AttributeError: 'Series' object has no attribute 'split'
When I run the following code, it works. Why am I facing this issue, I have never faced this kind of a problem before. PS: I got this data from scraping websites, unlike before. I am just hoping it has something to do with this
Names=jobs[['Company Name']]
F = lambda x: x.str.split("\n")
Names.apply(F , axis=1)
When I try it this why :
Ratings = jobs['Company Name'].apply(lambda x:x.split("\n")[1] , axis=1)
I get this error
TypeError: <lambda>() got an unexpected keyword argument 'axis'

You do not need the apply here, str.split is vectorized
jobs['Company Name'].str.split('\n')
should do the job.
I can not tell you why it had not worked before, but I can imagine it is due to the double brackets in [['Company Name']]. Single Brackets would collapse that to a Series while you keep the (2-dimensional Structure) of the Dataframe with the double brackets. See e.g. Python pandas: Keep selected column as DataFrame instead of Series for more details.

Related

Holoviz panel will not print pandas dataframe row in Jupyter notebook

I'm trying to recreate the first panel.interact example in the Holoviz tutorial using a Pandas dataframe instead of a Dask dataframe. I get the slider, but the pandas dataframe row does not show.
See the original example at: http://holoviz.org/tutorial/Building_Panels.html
I've tried using Dask as in the Holoviz example. Dask rows print out just fine, but it demonstrates that panel seem to treat Dask dataframe rows differently for printing than Pandas dataframe rows. Here's my minimal code:
import pandas as pd
import panel
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
def select_row(rowno=0):
row = df.loc[rowno]
return row
panel.extension()
panel.extension('katex')
panel.interact(select_row, rowno=(0, 5))
I've included a line with the katex extension, because without it, I get a warning that it is needed. Without it, I don't even get the slider.
I can call the select_row(rowno=0) function separately in a Jupyter cell and get a nice printout of the row, so it appears the function is working as it should.
Any help in getting this to work would be most appreciated. Thanks.
Got a solution. With Pandas, loc[rowno:rowno] returns a pandas.core.frame.DataFrame object of length 1 which works fine with panel while loc[rowno] returns a pandas.core.series.Series object which does not work so well. Thus modifying the select_row() function like this makes it all work:
def select_row(rowno=0):
row = df.loc[rowno:rowno]
return row
Still not sure, however, why panel will print out the Dataframe object and not the Series object.
Note: if you use iloc, then you use add +1, i.e., df.iloc[rowno:rowno+1].

DataFrame.quantile() function didn't work when use apply function

I made a function to get the middle part of a DataFrame, but the quantile function didn't work well. So I can't get the part of dataframe I want.
I'm using pandas 0.23.4
Here is the func body.
def func(df,key_name,tail):
df_new = df[df[key_name]>df[key_name].quantile(tail)]
return df_new
but it works good when I try it in this way
func(clean_video,'all_vv',0.1)
when I apply it like
clean_video.apply(func,axis = 1,args = ('all_vv',0.1))
I got the Error below.
AttributeError: ("'int' object has no attribute 'quantile'", 'occurred at index 0')
Thanks for all help.

Vectorizing text from data frame column using pandas

I have a Data Frame which looks like this:
I am trying to vectorize every row, but only from the text column. I wrote this code:
vectorizerCount = CountVectorizer(stop_words='english')
# tokenize and build vocab
allDataVectorized = allData.apply(vectorizerCount.fit_transform(allData.iloc[:]['headline_text']), axis=1)
The error says:
TypeError: ("'csr_matrix' object is not callable", 'occurred at index 0')
Doing some research and trying changes I found out the fit_transform function returns a scipy.sparse.csr.csr_matrix and that is not callable.
Is there another way to do this?
Thanks!
There are a number of problems with your code. You probably need something like
allDataVectorized = pd.DataFrame(vectorizerCount.fit_transform(allData[['headline_text']]))
allData[['headline_text']]) (with the double brackets) is a DataFrame, which transforms to a numpy 2d array.
fit_transform returns a csr matrix.
pd.DataFrame(...) creates a DataFrame from a csr matrix.

square elements of a non square pandas dataframe

I want to square the elements of a non square(m*n dimension) pandas dataframe but each time i try the following I get an error that says
1) np.power(errorR, 2)
2) errorR**2
ValueError: input must be a square array
is there a good solution for this?
Try df.applymap(lambda x: x**2)
I was using this in the jupyter environment and it worked after restarting the workspace.

Tensorflow classifier.predict write to csv

I've not used Tensorflow for a while and when I updated it seemed to have broken my old code as many of the old functions are deprecated. I fixed them with the new code and it all seems to be running except for when I write out the results:
y_predicted = classifier.predict(X_test)
There is an as iterable option as well - which I don't think I need.
I use to write out the results of the predictions using:
pandas.DataFrame(y_predicted).to_csv(/dir/)
but now I am getting an error that not all elements can be converted into String type. Is there a class in y_predicted I am suppose to be calling instead of the whole thing?
Anyways, I found a solution using np.array instead of a pandas dataframe:
result = np.asarray(y_predicted)
formatInt = result.astype(np.int)
np.savetxt("dir",formatInt,delimiter=",")
You can also try,
df = pandas.DataFrame({'Prediction':list(y_predicted)})
df.to_csv('filename.csv')