Error during insertion of a new row in a DataFrame - pandas

I made a dataframe from a dictionary and set one of its columns as my index. While inserting new a row, I get this error:
docdf=docdf.loc[sno_value]=[name_value,age_value,special_value,contact_value,fees_value,sal_value]
AttributeError: 'list' object has no attribute 'loc'
This is my code:
import pandas as pd
dict={"S.NO":[1,2,3,4,5],
"NAME":["John Sharon","Steven Sufjans","Ram Charan","Krishna Kumar","James Chacko"],
"AGE":[30,29,44,35,45],
"SPECIALISATION":["Neuro","Psych","Cardio","General","Immunology"],
"CONTACT":[9000401199,9947227405,9985258207,9982458204,8976517744],
"FEES":[1200,2100,3450,4500,3425],
"SAL":[20000,30000,40000,50000,45800]}
docdf=pd.DataFrame(dict)
docdf= docdf.set_index("S.NO")
#INSERT A ROW
sno_value=int(input('S.NO: '))
name_value = input('NAME: ')
age_value = int(input('AGE: '))
special_value = input('SPECIALISATION: ')
contact_value = int(input('CONTACT: '))
fees_value = int(input('FEES: '))
sal_value = int(input('SAL: '))
docdf=docdf.loc[sno_value]=[name_value,age_value,special_value,contact_value,fees_value,sal_value]
print(docdf)
I tried inserting each value separately and then tried to insert a new row using loc function. I was expecting the input S.NO to become the index of new row and then print the whole dictionary with the new row.

Related

numpy/pandas - why the selected the element from list are the same by random.choice

there is a list which contains integer values.
list=[1,2,3,.....]
then I use np.random.choice function to select a random element and add it to the a existing dataframe column, please refer to below code
df.message = df.message.astype(str) + "rowNumber=" + '"' + str(np.random.choice(list)) + '"'
But the element selected by np.random.choice and appended to the message column are always the same for all message row.
What is issue here?
Expected result is that the selected element from the list is not the same.
Pass to np.random.choice with parameter size and convert values to strings:
df = pd.DataFrame(
{'message' : ['aa','bb','cc']})
L = [1,2,3,4,5]
df.message = (df.message.astype(str) + "rowNumber=" + '"' +
np.random.choice(L, size=len(df)).astype(str) + '"')
print (df)
message
0 aarowNumber="4"
1 bbrowNumber="2"
2 ccrowNumber="5"

proximity search on different rows

For data that indexed from dataframe like this:
import json
mycolumns = ['name']
df = pd.DataFrame(columns=mycolumns)
rows = [["John Abraham"],["Lincoln Smith"]]
for row in rows:
df.loc[len(df)] = row
print(df)
jsons = json.loads(df.to_json(orient='records'))
n = 0
for j in jsons:
j['injection_timestamp'] = pd.to_datetime('now')
es.index(index="prox", doc_type='record', body=j)
if n%1000==0:
print (n/1000),
n+=1
I am trying to search match_phrase that is spread on two rows as described here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_multivalue_fields_2.html#_multivalue_fields_2
es.search(index="prox", body={"query": {"match_phrase":{"name": "Abraham Lincoln"}}})
I expected to get 1 hit because of the ways how arrays are indexed but I don't get any hit.

Make a dictionary from csv columns in pandas in python

I have 3 column csv file, with column headings {id, ingredients, recipe}
Now, I want to create a dictionary in a way, id is the key and ingredients and recipe is the value.
When combining ingredients and recipe, I need to insert a fullstop and a whitespace as well.
e.g., <ingredient>. <recipe>
My current code is as follows.
input_data = pd.read_csv( input_file, header=0, delimiter="\t", quoting=3 )
L= input_data["ingredient"] + '. ' + input_data["recipe"]
my_d = input_data.set_index('id')[L].to_dict()
Please help me!!
Use zip with dict:
my_d = dict(zip(input_data['id'], input_data["ingredient"] + '. ' + input_data["recipe"]))
Sample:
input_data = pd.DataFrame({'ingredient':list('abg'),
'id':[1,2,4],
'recipe':list('rth')})
print (input_data)
id ingredient recipe
0 1 a r
1 2 b t
2 4 g h
my_d = dict(zip(input_data['id'], input_data["ingredient"] + '. ' + input_data["recipe"]))
print (my_d)
{1: 'a. r', 2: 'b. t', 4: 'g. h'}

Python 3.4: Loop and Append: Why Not Working With cx_Oracle and Pandas?

For some reason, the following code only returns a dataframe of 10 rows instead of 20 (there are millions of rows in the SQL view).
When I viewed the output from print(data2), it showed the first 10 rows as a DataFrame, but the next DataFrame was empty.
import cx_Oracle as cx
import pandas as pd
conn = cx.Connection("username/pwd#server")
data = pd.DataFrame([])
SQL1 = '''SELECT * FROM TABLE_MV where rownum between '''
for i in range(1, 20, 10):
lower = i
upper = i+9
SQL3 = SQL1 + str(lower) + ' and ' + str(upper)
data2 = pd.read_sql(SQL3, conn)
print(data2)
data = data.append(data2)

Nested Cell to string

I have the following problem:
Objective (high-level):
I would like to convert ESRI Shapefiles into SQL spatial data. For that purpose, I need to adapt the synthax.
Current status / problem:
I constructed a the following cell array:
'MULTIPOLYGON(' {1x2332 cell} ',' {1x916 cell} ',' {1x391 cell} ',' {1x265 cell} ')'
with in total 9 fields. This cell array contains the following 'nested' cell arrays: {1x2332 cell}, {1x916 cell}, {1x391 cell}, {1x265 cell}. As an example, 'nested' cell {1x2332 cell} has the following form:
'((' [12.714606000000000] [42.155628000000000] ',' [12.702529999999999] [42.152873999999997] ',' ... ',' [12.714606000000000] [42.155628000000000] '))'
However, I would like to have the entire cell array (including all 'nested cells') as one string without any spaces (except the space between the numbers (coordinates)). Would you have an idea how I could get to a solution?
Thank you in advance.
You probably need loops for this.
Consider a smaller example:
innerCell1 = {'((' [12.714606000000000] [42.155628000000000] ',' [12.702529999999999] [42.152873999999997] ',' [12.714606000000000] [42.155628000000000] '))'};
outerCell = {'MULTIPOLYGON(' innerCell1 ',' innerCell1 ')'};
You can go along these lines:
outer = outerCell; %// will be overwritten
ind_outer = find(cellfun(#iscell, outer)); %// positions of inner cell arrays in `outer`
for m = ind_outer
inner = outer{m};
ind_inner = cellfun(#isnumeric, inner); %// positions of numbers in `inner`
ind_inner_space = find(ind_inner(1:end-1) & ind_inner(2:end)); %// need space
ind_inner_nospace = setdiff(find(ind_inner), ind_inner_space); %// don't need
for n = ind_inner_space
inner{n} = [num2str(inner{n}) ' ']; %// convert to string with space
end
for n = ind_inner_nospace
inner{n} = num2str(inner{n}); %// convert to string, without space
end
outer{m} = [inner{:}]; %// concatenate all parts of `inner`
end
str = [outer{:}]; %// concatenate all parts of `outer`
This results in the string
str =
MULTIPOLYGON(((12.7146 42.1556,12.7025 42.1529,12.7146 42.1556)),((12.7146 42.1556,12.7025 42.1529,12.7146 42.1556)))