Make a dictionary from csv columns in pandas in python - pandas

I have 3 column csv file, with column headings {id, ingredients, recipe}
Now, I want to create a dictionary in a way, id is the key and ingredients and recipe is the value.
When combining ingredients and recipe, I need to insert a fullstop and a whitespace as well.
e.g., <ingredient>. <recipe>
My current code is as follows.
input_data = pd.read_csv( input_file, header=0, delimiter="\t", quoting=3 )
L= input_data["ingredient"] + '. ' + input_data["recipe"]
my_d = input_data.set_index('id')[L].to_dict()
Please help me!!

Use zip with dict:
my_d = dict(zip(input_data['id'], input_data["ingredient"] + '. ' + input_data["recipe"]))
Sample:
input_data = pd.DataFrame({'ingredient':list('abg'),
'id':[1,2,4],
'recipe':list('rth')})
print (input_data)
id ingredient recipe
0 1 a r
1 2 b t
2 4 g h
my_d = dict(zip(input_data['id'], input_data["ingredient"] + '. ' + input_data["recipe"]))
print (my_d)
{1: 'a. r', 2: 'b. t', 4: 'g. h'}

Related

numpy/pandas - why the selected the element from list are the same by random.choice

there is a list which contains integer values.
list=[1,2,3,.....]
then I use np.random.choice function to select a random element and add it to the a existing dataframe column, please refer to below code
df.message = df.message.astype(str) + "rowNumber=" + '"' + str(np.random.choice(list)) + '"'
But the element selected by np.random.choice and appended to the message column are always the same for all message row.
What is issue here?
Expected result is that the selected element from the list is not the same.
Pass to np.random.choice with parameter size and convert values to strings:
df = pd.DataFrame(
{'message' : ['aa','bb','cc']})
L = [1,2,3,4,5]
df.message = (df.message.astype(str) + "rowNumber=" + '"' +
np.random.choice(L, size=len(df)).astype(str) + '"')
print (df)
message
0 aarowNumber="4"
1 bbrowNumber="2"
2 ccrowNumber="5"

Error during insertion of a new row in a DataFrame

I made a dataframe from a dictionary and set one of its columns as my index. While inserting new a row, I get this error:
docdf=docdf.loc[sno_value]=[name_value,age_value,special_value,contact_value,fees_value,sal_value]
AttributeError: 'list' object has no attribute 'loc'
This is my code:
import pandas as pd
dict={"S.NO":[1,2,3,4,5],
"NAME":["John Sharon","Steven Sufjans","Ram Charan","Krishna Kumar","James Chacko"],
"AGE":[30,29,44,35,45],
"SPECIALISATION":["Neuro","Psych","Cardio","General","Immunology"],
"CONTACT":[9000401199,9947227405,9985258207,9982458204,8976517744],
"FEES":[1200,2100,3450,4500,3425],
"SAL":[20000,30000,40000,50000,45800]}
docdf=pd.DataFrame(dict)
docdf= docdf.set_index("S.NO")
#INSERT A ROW
sno_value=int(input('S.NO: '))
name_value = input('NAME: ')
age_value = int(input('AGE: '))
special_value = input('SPECIALISATION: ')
contact_value = int(input('CONTACT: '))
fees_value = int(input('FEES: '))
sal_value = int(input('SAL: '))
docdf=docdf.loc[sno_value]=[name_value,age_value,special_value,contact_value,fees_value,sal_value]
print(docdf)
I tried inserting each value separately and then tried to insert a new row using loc function. I was expecting the input S.NO to become the index of new row and then print the whole dictionary with the new row.

Iterate over an xpath (table row) python

I have xpaths as follow:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
As you can see, the tr[] values are changing. I want iterate over these values.
Below is the code I have used
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a')
Please let me know How can I iterate over them.
This may not be the exact solution you are looking for but this is the idea.
tableRows = driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr")
for e in tableRows:
e.find_element_by_xpath(".//td[3]/a")
if you want all third td for every row, use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr/td[3]/a')
If you want only the first 3 rows use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[position() < 4]/td[3]/a')
For looping through tds look I.e. this answer
Another alternative, assuming 4 elements:
for elem in range(1,5):
element = f"/html/body/div[1]/table[3]/tbody/tr[{elem}]/td[3]/a"
#e = driver.find_element_by_xpath(element)
#e.click()
print(element)
Prints:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
You could do whatever you wanted with the elements in the loop, I just printed to show the value
Option 1: Fixed Column number like 3 needs to be be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
for row in range(1, (rows + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[3]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text
Option 2: Both Row and Col needs to be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
columns = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr/td"))
for row in range(1, (rows + 1)):
for column in range(1, (columns + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[" + str(column) + "]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text

Replace columns values by other values conditionally in Pandas

I would like to remove the description in a data frame if the value is identical to the caption:
m[m.Description == m.Caption].Description = \
m[m.Description == m.Caption].Description.map(lambda x: '')
I feel this writing is quite boilerplate:
df[condition][columns] = df[condition][columns].map(lambda x: value)
Is there a better syntax to do the same? I imagine something like:
df[condition][columns].map(lambda x: value, inplace=True)
You need loc with boolean indexing:
m.loc[m.Description == m.Caption, 'Description'] = ' '
Sample:
m = pd.DataFrame({'Description':['a','b','f'],
'Caption':['a','c',''],
'C':[7,8,9]})
print (m)
C Caption Description
0 7 a a
1 8 c b
2 9 f
m.loc[m.Description == m.Caption, 'Description'] = ' '
print (m)
C Caption Description
0 7 a
1 8 c b
2 9 f
Alternatively use mask:
m['Description'] = m['Description'].mask(m.Description == m.Caption, ' ')
print (m)
C Caption Description
0 7 a
1 8 c b
2 9 f

How to define a 2-Column Array A = 1, B = 2... ZZZ =?

I need to create a 2 column array in ABAP so that a program can look up a record item (defined by the letters A - ZZZ) and then return the number associated with it.
For example:
A = 1
B = 2
C = 3
...
Z = 26
AA = 27
AB = 28
...
AZ =
BA =
...
BZ =
CA =
...
...
ZZZ =
Please can you suggest how I can code this.
Is there a better option than writing an array?
Thanks.
you don't need to lookup the value in a table. this can be calculated:
parameters: p_input(3) type c value 'AAA'.
data: len type i value 0,
multiplier type i value 1,
result type i value 0,
idx type i.
* how many characters are there?
len = strlen( p_input ).
idx = len.
* compute the value for every char starting at the end
* in 'ABC' the C is multiplied with 1, the B with 26 and the A with 26^2
do len times.
* p_input+idx(1) should be the actual character and we look it up in sy-abcde
search p_input+idx(1) in SY-ABCDE.
* if p_input+idx(1) was A then sy-fdpos should now be set to 0 that is s why we add 1
compute result = result + ( sy-fdpos + 1 ) * multiplier.
idx = idx - 1.
multiplier = multiplier * 26.
enddo.
write: / result.
i didn't test the program and it has pretty sure some syntax errors. but the algorithm behind it should work.
perhaps I'm misunderstanding, but don't you want something like this?
type: begin of t_lookup,
rec_key type string,
value type i,
end of t_lookup.
data: it_lookup type hashed table of t_lookup with unique key rec_key.
then once it's populated, read it back
read table it_lookup with key rec_key = [value] assigning <s>.
if sy-subrc eq 0.
" got something
else.
" didn't
endif.
unfortunately, arrays don't exist in ABAP, but a hashed table is designed for this kind of lookup (fast access, unique keys).
DATA: STR TYPE STRING, I TYPE I, J TYPE I, K TYPE I, CH TYPE C, RES
TYPE INT2, FLAG TYPE I.
PARAMETERS: S(3).
START-OF-SELECTION.
I = STRLEN( S ).
STR = S.
DO I TIMES.
I = I - 1.
CH = S.
IF CH CO '1234567890.' OR CH CN SY-ABCDE.
FLAG = 0.
EXIT.
ELSE.
FLAG = 1.
ENDIF.
SEARCH SY-ABCDE FOR CH.
J = I.
K = 1.
WHILE J > 0.
K = K * 26.
J = J - 1.
ENDWHILE.
K = K * ( SY-FDPOS + 1 ).
RES = RES + K.
REPLACE SUBSTRING CH IN S WITH ''.
ENDDO.
* RES = RES + SY-FDPOS.
IF FLAG = 0.
MESSAGE 'String is not valid.' TYPE 'S'.
ELSE.
WRITE: /, RES .
ENDIF.
Use this code after executing.
I did a similar implementation some time back.
Check this it it works for you.
DATA:
lv_char TYPE char1,
lv_len TYPE i,
lv_len_minus_1 TYPE i,
lv_partial_index1 TYPE i,
lv_partial_index2 TYPE i,
lv_number TYPE i,
result_tab TYPE match_result_tab,
lv_col_index_substr TYPE string,
lv_result TYPE i.
FIELD-SYMBOLS:
<match> LIKE LINE OF result_tab.
lv_len = strlen( iv_col_index ) .
lv_char = iv_col_index(1).
FIND FIRST OCCURRENCE OF lv_char IN co_char RESULTS result_tab.
READ TABLE result_tab ASSIGNING <match> INDEX 1.
lv_number = <match>-offset .
lv_number = lv_number + 1 .
IF lv_len EQ 1.
ev_col = ( ( 26 ** ( lv_len - 1 ) ) * lv_number ) .
ELSE.
lv_len_minus_1 = lv_len - 1.
lv_col_index_substr = iv_col_index+1(lv_len_minus_1) .
CALL METHOD get_col_index
EXPORTING
iv_col_index = lv_col_index_substr
IMPORTING
ev_col = lv_partial_index2.
lv_partial_index1 = ( ( 26 ** ( lv_len - 1 ) ) * lv_number ) + lv_partial_index2 .
ev_col = lv_partial_index1 .
ENDIF.
Here The algorithm uses a recursive logic to determine the column index in numbers.
This is not my algorithm but have adapted to be used in ABAP.
The original algorithm is used in Open Excel, cant find any links right now.