Iterate on OrientRecord object - iteration

I am trying to increment twice in a loop and print the OrientRecord Objects using Python.
Following is my code -
for items in iteritems:
x = items.oRecordData
print (x['attribute1'])
y=(next(items)).oRecordData #Here is the error
print (y['attribute2'])
Here, iteritems is a list of OrientRecord objects. I have to print attributes of two consecutive objects in one loop.
I am getting the following error -
TypeError: 'OrientRecord' object is not an iterator

Try using a different approach to it:
for i in range(0,len(iteritems),2):
x = iteritems[i].oRecordData
print (x['attribute1'])
y = iteritems[i+1].oRecordData
print (y['attribute2'])
The range() function will start from 0 and iterate by 2 steps.
However, this will work properly only if the total amount (range) of records is an even number, otherwise it'll return:
IndexError: list index out of range
I hope this helps.

Related

Understanding Pandas Series Data Structure

I am trying to get my head around the Pandas module and started learning about the Series data structure.
I have created the following Series in Spyder :-
songs = pd.Series(data = [145,142,38,13], name = "Count")
I can obtain information about the Series index using the code:-
songs.index
The output of the above code is as follows:-
My question is where it states Start = 0 and Stop = 4, what are these referring to?
I have interpreted start = 0 as the first element in the Series is in row 0.
But i am not sure what Stop value refers to as there are no elements in row 4 of the Series?
Can some one explain?
Thank you.
This concept as already explained adequately in the comments (indexing is at minus one the count of items) is prevalent in many places.
For instance, take the list data structure-
z = songs.to_list()
[145, 142, 38, 13]
len(z)
4 # length is four
# however indexing stops at i-1 position 'i' being the length/count of items in the list.
z[4] # this will raise an IndexError
# you will have to start at index 0 going till only index 3 (i.e. 4 items)
z[0], z[1], z[2], z[-1] # notice how -1 can be used to directly access the last element

Confused Beginner learning Python

I am working on a problem in Python and don't understand the answer.
for number in range(1, 10):
if number % 2 == 0:
print(number)
The answer to this problem is 2,4,6,8
Can anyone explain this answer?
range is a function in python which generates a sequence of integers, for example:
r=range(3)
returns a iterable object range(0,3) which generates sequence of integers from 0 to 3-1(2),inorder for you to see the elements in it , you can loop through it:
for i in r:
print(i)
#prints number from 0 to 3-1
Or, wrap it in a list:
list(range(3)) //returns [0,1,2]
range can take 3 params as input start,end and optionally step.The parameters start and end are basically lower and upper bounds to the sequence.In the above example since we have given only one integer range considers start as 0 and end as 3. This function range(start,end,[step]) generates integers in the following manner: start,start+1....end-1 considering the above example 0,0+1...3-1
if you give both the start and the end params to the range, the function generates integers from start upto but not including end, Example:
for i in range(3,8):print(i) #prints numbers from 3 to 8-1
if you give the third parameter which is the step(which is usually 1 by default), then range adds that number to the sequence :
list(range(3,8)) or list(range(3,8,1)) # will return [3,4,5,6,7],sequence generation will be like:3,3+1,(3+1)+1...
list(range(3,8,2)) #returns [3,5,7];3,3+2,(3+2)+2....
So , coming to your question now :
for number in range(1, 10): if number % 2 == 0: print(number)
In the above code you are basically telling python to loop over the sequence of integeres between 1 to 9 and print the numbers which are divisible by 2,which prints 2,4,6,8.
Hope this helped you :)

Using to_datetime several columns names

I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.

From one line iteration to loop to include exception management

I have two columns and I need to check whether the value in one column, all_news['Query], is in another column, all['description'] column.
I found the following solution:
all_news['C'] = on.apply(lambda x: x.Query in x.description, axis=1)
but I get the following error:
TypeError: ("argument of type 'float' is not iterable", 'occurred at
index 737')
Likely because there are some weird characters the iteration cannot decipher and it seems I cannot run any exception management in a one-line iteration.
How can I unfold this one line iteration into a for loop?
Result for index 737:
Query = 'medike international'
description = 'po ketvirtadienį praūžusios liūties nukentėjo ne tik kauno miestas, bet ir rajonas. pliaupiant lietui prie vilkijos vydūno alėjos esančioje apžvalgos aikštelėje ...'

Apply function with pandas dataframe - POS tagger computation time

I'm very confused on the apply function for pandas. I have a big dataframe where one column is a column of strings. I'm then using a function to count part-of-speech occurrences. I'm just not sure the way of setting up my apply statement or my function.
def noun_count(row):
x = tagger(df['string'][row].split())
# array flattening and filtering out all but nouns, then summing them
return num
So basically I have a function similar to the above where I use a POS tagger on a column that outputs a single number (number of nouns). I may possibly rewrite it to output multiple numbers for different parts of speech, but I can't wrap my head around apply.
I'm pretty sure I don't really have either part arranged correctly. For instance, I can run noun_count[row] and get the correct value for any index but I can't figure out how to make it work with apply how I have it set up. Basically I don't know how to pass the row value to the function within the apply statement.
df['num_nouns'] = df.apply(noun_count(??),1)
Sorry this question is all over the place. So what can I do to get a simple result like
string num_nouns
0 'cat' 1
1 'two cats' 1
EDIT:
So I've managed to get something working by using list comprehension (someone posted an answer, but they've deleted it).
df['string'].apply(lambda row: noun_count(row),1)
which required an adjustment to my function:
def tagger_nouns(x):
list_of_lists = st.tag(x.split())
flat = [y for z in list_of_lists for y in z]
Parts_of_speech = [row[1] for row in flattened]
c = Counter(Parts_of_speech)
nouns = c['NN']+c['NNS']+c['NNP']+c['NNPS']
return nouns
I'm using the Stanford tagger, but I have a big problem with computation time, and I'm using the left 3 words model. I'm noticing that it's calling the .jar file again and again (java keeps opening and closing in the task manager) and maybe that's unavoidable, but it's really taking far too long to run. Any way I can speed it up?
I don't know what 'tagger' is but here's a simple example with a word count that ought to work more or less the same way:
f = lambda x: len(x.split())
df['num_words'] = df['string'].apply(f)
string num_words
0 'cat' 1
1 'two cats' 2