From one line iteration to loop to include exception management - pandas

I have two columns and I need to check whether the value in one column, all_news['Query], is in another column, all['description'] column.
I found the following solution:
all_news['C'] = on.apply(lambda x: x.Query in x.description, axis=1)
but I get the following error:
TypeError: ("argument of type 'float' is not iterable", 'occurred at
index 737')
Likely because there are some weird characters the iteration cannot decipher and it seems I cannot run any exception management in a one-line iteration.
How can I unfold this one line iteration into a for loop?
Result for index 737:
Query = 'medike international'
description = 'po ketvirtadienį praūžusios liūties nukentėjo ne tik kauno miestas, bet ir rajonas. pliaupiant lietui prie vilkijos vydūno alėjos esančioje apžvalgos aikštelėje ...'

Related

Error while transforming series with series.map() function

def locate (code):
string1 = str(code)
floor = string1[3]
if floor == '1':
return 'Ground Floor'
else:
if int(string1[5]) < 1:
lobby = 'G'
elif int(string1[5]) < 2:
lobby = 'F'
else:
lobby = 'E'
return floor + lobby
print(locate('S191009'))
print(locate('S087525'))
This function works fine with individual input code as above with Output
Ground Floor
7E
But when I use this to map a series in data frame, it shows error.
error_data1['location'] = error_data1['status'].map(locate)
Error message: string index out of range.
How can I fix this??
Your problem is with your series values:
se = pd.Series(['S191009', 'rt'])
se.map(locate)
produces the same error you reported. You can ignore these rows using try...except in function if it does not hurt you.
The problem is you are indexing an index on a string that doesn't exist (i.e the string is shorter than what you expect). As the other answer mentioned, if you try and use
my_string="foo"
print(my_string[5])
You will get the same error. To solve this you should add a try except statement, or for simplicity an initial if statement that returns "NotValid" or something like that. Your data probably has strings that do not follow the standard form you expect.

Pandas str split. Can I skip line which gives troubles?

I have a dataframe (all5) including one column with dates('CREATIE_DATUM'). Sometimes the notation is 01/JAN/2015 sometimes it's written as 01-JAN-15.
I only need the year, so I wrote the following code line:
all5[['Day','Month','Year']]=all5['CREATIE_DATUM'].str.split('-/',expand=True)
but I get the following error:
columns must be same length as key
so I assume somewhere in my dataframe (>100.000 lines) a value has more than two '/' signs.
How can I make my code skip this line?
You can try to use pd.to_datetime and then use .dt property to access day, month and year:
x = pd.to_datetime(all5["CREATIE_DATUM"])
all5["Day"] = x.dt.day
all5["Month"] = x.dt.month
all5["Year"] = x.dt.year

Using to_datetime several columns names

I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.

Iterate on OrientRecord object

I am trying to increment twice in a loop and print the OrientRecord Objects using Python.
Following is my code -
for items in iteritems:
x = items.oRecordData
print (x['attribute1'])
y=(next(items)).oRecordData #Here is the error
print (y['attribute2'])
Here, iteritems is a list of OrientRecord objects. I have to print attributes of two consecutive objects in one loop.
I am getting the following error -
TypeError: 'OrientRecord' object is not an iterator
Try using a different approach to it:
for i in range(0,len(iteritems),2):
x = iteritems[i].oRecordData
print (x['attribute1'])
y = iteritems[i+1].oRecordData
print (y['attribute2'])
The range() function will start from 0 and iterate by 2 steps.
However, this will work properly only if the total amount (range) of records is an even number, otherwise it'll return:
IndexError: list index out of range
I hope this helps.

FDR Error fdrtool in R

Iam using fdrtool for my pvalues but i have an error which is :
Error in if (max(x) > 1 | min(x) < 0) stop("input p-values must all be in the range 0 to 1!") : missing value where TRUE/FALSE needed
The p value are not less than 0,greater than 1.
The range of p value are [1,0]. the code is :
n=40000
pval1<-vector(length=n)
pval1[1:n]= pv1list[["Pvalue"]]
fdr<-fdrtool(pval1,statistic="pvalue")
I ran your code without problem (although I can't reproduce it because I don't have the object "pvlist").
Since you're having a missing value error, my guess is that you're having problems reading the csv file into R. I recommend the "read.table" function since from my experience it usually reads in data from a csv file without errors:
pvlist<- read.table("c:/pvslit.csv", header=TRUE,
sep=",", row.names="id")
And now you want to check the number of rows and missingness:
nrow(pvlist) # is this what you expect?
nrow(na.omit(pvlist)) # how many non-missing rows are there?
Additionally you want to make sure that your "p-value" column is not a character or factor:
str(pvlist) # examining the structure of the dataframe
pvlist[,2] <- as.numeric(pvlist[,2]) # assuming the 2nd column is the pvalue
In short, you most likely have a problem with reading in the data or the class of the data in the dataframe.