Create Dataframe name from 2 strings or variables pandas

Create Dataframe name from 2 strings or variables pandas - pandas

i am extracting selected pages from a pdf file. and want to assign dataframe name based on the pages extracted:
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
for i in selected_pages():
df{str(i)} = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True,area = [100,10,740,950],pages= (i), index = False)
print (df{str(i)} )
The idea, ultimately, as in above example, is to have dataframes: df10, df11. I have tried "df" + str(i), "df" & str(i) & df{str(i)}. however all are giving error msg: SyntaxError: invalid syntax
Or any better way of doing it is most welcome. thanks

This is where a dictionary would be a much better option.
Also note the error you have at the start of the loop. selected_pages is a list, so you can't do selected_pages().
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
df = {}
for i in selected_pages:
df[i] = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True, area = [100,10,740,950], pages= (i), index = False)

i = int(i) - 1 # this will bring it to 10
dfB = df[str(i)]
#select row number to drop: 0:4
dfB.drop(dfB.index[0:4],axis =0, inplace = True)
dfB.columns = ['col1','col2','col3','col4','col5']

Related

Force single line of string in VObject

I am trying to create vCards (Email contacts) unsing the vobject library and pandas for python.
When serializing the values I get new lines in the "notes" of my output(no new lines in source). In every new line, created by ".serialize()", there is also a space in the beginning. I would need to get rid of both.
Example of output:
BEGIN:VCARD
VERSION:3.0
EMAIL;TYPE=INTERNET:test#example.at
FN:Valentina test
N:Valentina;test;;;
NOTE:PelletiererIn Mitglieder 2 Preiserhebung Aussendung 2 Pressespiegelver
sand 2 GeschäftsführerIn PPA_PelletiererInnen GeschäftsführerIn_Pellet
iererIn
ORG:Test Company
TEL;TYPE=CELL:
TEL;TYPE=CELL:
TEL;TYPE=CELL:
END:VCARD
Is there a way that I can force the output in a single line?
output = ""
for _,row in df.iterrows():
j = vobject.vCard()
j.add('n')
j.n.value = vobject.vcard.Name(row["First Name"],row["Last Name"])
j.add('fn')
j.fn.value = (str(row["First Name"]) + " " + row["Last Name"])
o = j.add("email")
o.value = str((row["E-mail Address"]))
o.type_param = "INTERNET"
#o = j.add("email")
#o.value = str((row["E-mail 2 Address"]))
#o.type_param = "INTERNET"
j.add('org')
j.org.value = [row["Organization"]]
k = j.add("tel")
k.value = str(row["Home Phone"])
k.type_param = "CELL"
k = j.add("tel")
k.value = str(row["Business Phone"])
k.type_param = "CELL"
k = j.add("tel")
k.value = str(row["Mobile Phone"])
k.type_param = "CELL"
j.add("note")
j.note.value = row["Notiz für Kontaktexport"]
output += j.serialize()
print(output)

to_string(index = False) results in non empty string even when dataframe is empty

I am doing the following in my python script and I want to hide the index column when I print the dataframe. So I used .to_string(index = False) and then use len() to see if its zero or not. However, when i do to_string(), if the dataframe is empty the len() doesn't return zero. If i print the procinject1 it says "Empty DataFrame". Any help to fix this would be greatly appreciated.
procinject1=dfmalfind[dfmalfind["Hexdump"].str.contains("MZ") == True].to_string(index = False)
if len(procinject1) == 0:
print(Fore.GREEN + "[✓]No MZ header detected in malfind preview output")
else:
print(Fore.RED + "[!]MZ header detected within malfind preview (Process Injection indicator)")
print(procinject1)

That's the expected behaviour in Pandas DataFrame.
In your case, procinject1 stores the string representation of the dataframe, which is non-empty even if the corresponding dataframe is empty.
For example, check the below code snippet, where I create an empty dataframe df and check it's string representation:
df = pd.DataFrame()
print(df.to_string(index = False))
print(df.to_string(index = True))
For both index = False and index = True cases, the output will be the same, which is given below (and that is the expected behaviour). So your corresponding len() will always return non-zero.
Empty DataFrame
Columns: []
Index: []
But if you use a non-empty dataframe, then the outputs for index = False and index = True cases will be different as given below:
data = [{'A': 10, 'B': 20, 'C':30}, {'A':5, 'B': 10, 'C': 15}]
df = pd.DataFrame(data)
print(df.to_string(index = False))
print(df.to_string(index = True))
Then the outputs for index = False and index = True cases respectively will be -
A B C
10 20 30
5 10 15
A B C
0 10 20 30
1 5 10 15
Since pandas handles empty dataframes differently, to solve your problem, you should first check whether your dataframe is empty or not, using pandas.DataFrame.empty.
Then if the dataframe is actually non-empty, you could print the string representation of that dataframe, while keeping index = False to hide the index column.

Iterate over an xpath (table row) python

I have xpaths as follow:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
As you can see, the tr[] values are changing. I want iterate over these values.
Below is the code I have used
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a')
Please let me know How can I iterate over them.

This may not be the exact solution you are looking for but this is the idea.
tableRows = driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr")
for e in tableRows:
e.find_element_by_xpath(".//td[3]/a")

if you want all third td for every row, use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr/td[3]/a')
If you want only the first 3 rows use this:
search_input = driver.find_elements_by_xpath('/html/body/div[1]/table[3]/tbody/tr[position() < 4]/td[3]/a')
For looping through tds look I.e. this answer

Another alternative, assuming 4 elements:
for elem in range(1,5):
element = f"/html/body/div[1]/table[3]/tbody/tr[{elem}]/td[3]/a"
#e = driver.find_element_by_xpath(element)
#e.click()
print(element)
Prints:
/html/body/div[1]/table[3]/tbody/tr[1]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[2]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[3]/td[3]/a
/html/body/div[1]/table[3]/tbody/tr[4]/td[3]/a
You could do whatever you wanted with the elements in the loop, I just printed to show the value

Option 1: Fixed Column number like 3 needs to be be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
for row in range(1, (rows + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[3]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text
Option 2: Both Row and Col needs to be iterated:
rows = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr"))
columns = len(self.driver.find_elements_by_xpath("/html/body/div[1]/table[3]/tbody/tr/td"))
for row in range(1, (rows + 1)):
for column in range(1, (columns + 1)):
local_xpath = ""/html/body/div[1]/table[3]/tbody/tr[" + str(row) + "]/td[" + str(column) + "]"
# do something with element
# cell_text = self.driver.find_element_by_xpath(local_xpath ).text

Dataframe index rows all 0's

I'm iterating through PDF's to obtain the text entered in the form fields. When I send the rows to a csv file it only exports the last row. When I print results from the Dataframe, all the row indexes are 0's. I have tried various solutions from stackoverflow, but I can't get anything to work, what should be 0, 1, 2, 3...etc. are coming in as 0, 0, 0, 0...etc.
Here is what I get when printing results, only the last row exports to csv file:
0
0 1938282828
0
0 1938282828
0
0 22222222
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2])
print(df)`
Thank you for any help!

You are replacing the same dataframe each time:
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2]) # this creates new df each time
print(df)
Correct Code:
infile = glob.glob('./*.pdf')
df = pd.DataFrame()
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = df.append([myfieldvalue2])
print(df)

Find line matching an expression, copy entire line, insert in new line and replace the expression

I have several files (yml, tf, xml) for which I need to find a string i.e. var1, and then insert a new line with foo2, the rest of the line is unchanged.
Example
variable "my_vars" {
type = "map"
default = {
var1 = "10.48.225.160/28"
var2 = "10.48.225.160/28"
var3 = "10.48.225.160/28"
var4 = "10.48.225.160/28"
}
}
I tried the code below but I need the edit in place.
import sys
import string
def find(substr, replstr, infile):
f = open(infile,"rw")
lines = f.readlines()
for i in range(len(lines)):
if substr in lines[i]:
j = string.replace(lines[i], substr, replstr)
lines.insert(i + 1, j)
print "\n".join(lines)
old_env = sys.argv[1]
new_env = sys.argv[2]
file = sys.argv[3]
find(old_env, new_env, file)

import sys
import string
def find(substr, replstr, infile):
f = open(infile,"r")
lines = f.readlines()
for i in range(len(lines)):
if substr in lines[i]:
j = string.replace(lines[i], substr, replstr)
lines.insert(i + 1, j)
print "".join(lines)
f.close()
f = open(infile,"w")
k = "".join(lines)
f.writelines(k)
f.close()
old_env = sys.argv[1]
new_env = sys.argv[2]
file = sys.argv[3]
find(old_env, new_env, file)
The one caveat is there is a match on the last line of the file, the iterator will miss this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create Dataframe name from 2 strings or variables pandas - pandas

i = int(i) - 1 # this will bring it to 10 dfB = df[str(i)] #select row number to drop: 0:4 dfB.drop(dfB.index[0:4],axis =0, inplace = True) dfB.columns = ['col1','col2','col3','col4','col5']

Related

Force single line of string in VObject

to_string(index = False) results in non empty string even when dataframe is empty

Iterate over an xpath (table row) python

Dataframe index rows all 0's

Find line matching an expression, copy entire line, insert in new line and replace the expression

Categories

Resources