what to do for non present DIV tag. Using Selenium & Python - selenium

I'm trying to extract information with help of selenium and python from this container "PROJECT INFORMATION" //www.rera.mp.gov.in/view_project_details.php?id=aDRYYk82L2hhV0R0WHFDSnJRK3FYZz09
but while do this I was getting this error
Unable to locate element:
{"method":"xpath","selector":"/html/body/div/article/div2/div/div2/div2/div2"}
after studying about it I found that this highlighted div is missing and there are many places in this container where div is missing. How am I supposed to do that? I want information only from the right side of the table
MY CODE:
for c in [c for c in range(1, 13) if (c == True)]:
row = driver.find_element_by_xpath("/html/body/div/article/div[2]/div/div[2]/div["+ str(c) +"]/div[2]").text
print(row, end=" ")
print(" ")
else:
print('NoN')
error:
no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div/article/div[2]/div/div[2]/div[2]/div[2]"}
(Session info: chrome=83.0.4103.106)

The fields highlighted are two different cases. While for "Registration Number" the required div does not exist, for "Proposed End Date" it exists but contains only white space.
Give this a try instead of the for c... loop. It should handle both cases.
#find parent element
proj_info=driver.find_element_by_xpath("//div[#class='col-md-12 box']")
#find all rows in parent element
proj_info_rows = proj_info.find_elements_by_class_name('row')
for row in proj_info_rows:
try:
if row.find_element_by_class_name('col-md-8').text.strip() == "":
print(f"{row.find_element_by_class_name('col-md-4').text} contains only whitespace {row.find_element_by_class_name('col-md-8').text}")
print('NaN')
else:
print(row.find_element_by_class_name('col-md-8').text)
except SE.NoSuchElementException:
print('NaN')
You need this import:
from selenium.common import exceptions as SE

Related

Add Proxy to Selenium & export dataframe to CSV

I'm trying to make a scraper for capterra. I'm having issues getting blocked, so I think I need a proxy for my driver.get. Also, I am having trouble exporting a dataframe to a CSV. The first half of my code (not attached) is able to get all the links and store them in a list that I am trying to access with Selenium to get the information I want, but the second part is where I am having trouble.
For an example, these are the types of links I am storing in the plinks list and that the driver is accessing:
https://www.capterra.com/p/212448/Blackbaud-Altru/
https://www.capterra.com/p/80509/Volgistics-Volunteer-Management/
https://www.capterra.com/p/179048/One-Earth/
for link in plinks:
driver.get(link)
#driver.implicitly_wait(20)
companyProfile = bs(driver.page_source, 'html.parser')
try:
name = companyProfile.find("h1", class_="sm:nb-type-2xl nb-type-xl").text
except AttributeError:
name = "couldn't find"
try:
reviews = companyProfile.find("div", class_="nb-ml-3xs").text
except AttributeError:
reviews = "couldn't find"
try:
location = driver.find_element(By. XPATH, "//*[starts-with(., 'Located in')]").text
except NoSuchElementException:
location = "couldn't find"
try:
url = driver.find_element(By. XPATH, "//*[starts-with(., 'http')]").text
except NoSuchElementException:
url = "couldn't find"
try:
features = [x.get_text() for x in companyProfile.select('[id="LoadableProductFeaturesSection"] li span')]
except AttributeError:
features = "couldn't find"
companyInfo.append([name, reviews, location, url, features])
companydf = pd.DataFrame(companyInfo, columns = ["Name", "Reviews", "Location", "URL", "Features"])
companydf.to_csv(wmtest.csv, sep='\t')
driver.close()
I am using Mozilla for the webdriver, and I am happy to change to Chrome if it works better, but is it possible to have the webdriver pick from a random set of proxies for each get request?
Thanks!

Can use Beautifulsoup to find elements hidden by other wrapped elements?

I would like to extract the text data of the author affiliations on this page using Beautiful soup.
I know of a work around using selenium to simply click on the 'show more' link and scan the page again? Im not sure what kind of elements these are, hidden? as they only appear in the inspector after clicking the button.
Is there a way to extract this info just using beautiful soup or do I need selenium or something equivalent to reveal the elements in the HTML code?
from bs4 import BeautifulSoup
import requests
url = 'https://www.sciencedirect.com/science/article/abs/pii/S0920379621007596'
sp = BeautifulSoup(r.content, 'html.parser')
r = sp.get(url)
author_data = sp.find('div', id='author-group')
affiliations = author_data.find('dl', class_='affiliation').text
print(affiliations)
That info is within a script tag though you need to map the letters for affiliations to the actual affiliations. The code below extracts the JavaScript object housing the info you want and handles with JSON library.
There is then a series of steps to dynamically determine which indices hold the info of interest and then use a constructed mapping of the letters to affiliations to assign the correct affiliation to each author.
The author first and last names are also dynamically ascertained and joined together with a space.
The intention was to avoid hardcoding indices which might change over time.
import re
import json
import requests
r = requests.get('https://www.sciencedirect.com/science/article/abs/pii/S0920379621007596',
headers={'User-Agent': 'Mozilla/5.0'})
data = json.loads(re.search(r'(\{"abstracts".*})', r.text).group(1))
base = [i for i in data['authors']['content']
if i.get('#name') == 'author-group'][0]['$$']
affiliation_data = [i for i in base if i['#name'] == 'affiliation']
author_data = [i for i in base if i['#name'] == 'author']
name_info = [i['_'] for author in author_data for i in author['$$']
if i['#name'] in ['given-name', 'surname']]
affiliations = dict(zip([j['_'] for i in affiliation_data for j in i['$$'] if j['#name'] == 'label'], [
j['_'] for i in affiliation_data for j in i['$$'] if isinstance(j, dict) and '_' in j and j['_'][0].isupper()]))
# print(affiliations)
author_affiliations = dict(zip([' '.join([i[0], i[1]]) for i in zip(name_info[0::2], name_info[1::2])], [
affiliations[j['_']] for author in author_data for i in author['$$'] if i['#name'] == 'cross-ref' for j in i['$$'] if j['_'] != '⁎']))
print(author_affiliations)

How to access the details (i.e sub fields) of second/third elements that has the same class name in selenium using python?

I have 3 elements with a particular instance (eg.: there are 3 <div class="sc-1xo2hia-0 TegxE"> under each <div direction="vertical" class="sc-1fp9csv-0 iFnncD"> in the webiste: https://www.blockchain.com/btc/block/00000000000000000004b91bad9ecfa8c0e57c256d0007cca6f0a2a9e54a2ccc ; click 'Inspect element' on the first transaction to view the specific DOM tree)
Now I want to access the some elements from 2nd and 3rd instance of the first tag (sc-1xo2hia-0 TegxE)
How do I do this efficiently?
PS: This code :
from selenium import webdriver
driver=webdriver.Firefox()
driver.get('https://www.blockchain.com/btc/block/00000000000000000004b91bad9ecfa8c0e57c256d0007cca6f0a2a9e54a2ccc')
Txn_elements=driver.find_elements_by_xpath('//div[#class="sc-1fp9csv-0 iFnncD"]')
length=len(Txn_elements)
for i in range(0,length):
element=Txn_elements[i]
data=element.find_elements_by_xpath(".//div[#class='sc-1xo2hia-0 TegxE'][1]")
print data[0].text
still prints details of the 0th <div class="sc-1xo2hia-0 TegxE"> only
i.e. it still prints:
Hash
fc1630ec40d95da3fcca40d499c4be616ea6591dda6f0d3d85a678d47c91ae62
2019-11-06 8:37 PM
where as it should have printed:
17A16QmavnUfCW11DAApiJxp7ARnxN5pGX
2.62352930 BTC
xpath= (.//div[#class='ge5wha-0 bLrlXr']/a)[1] //to get 17A16QmavnUfCW11DAApiJxp7ARnxN5pGX
xpath = (.//div[#class='ge5wha-1 bWdiuU']/span)[1] //to get 2.62352930 BTC
try with this xpath.
Please check below solution its working but I am not sure why you are using for loop if you just want to print two elements in the //div[#class="sc-1fp9csv-0 iFnncD"]
if you want to print only one then remove for loop and try to execute your code
driver.get('https://www.blockchain.com/btc/block/00000000000000000004b91bad9ecfa8c0e57c256d0007cca6f0a2a9e54a2ccc')
Txn_elements=driver.find_elements_by_xpath('//div[#class="sc-1fp9csv-0 iFnncD"]')
length=len(Txn_elements)
for i in range(0,length):
element=Txn_elements[i]
data=element.find_elements_by_xpath("//body/div[#id='__next']/div[#class='sc-1myx216-0 iygrgv']/div[#class='p5q4id-0 fasJHc sc-5vnaz6-1 doVOgS']/div[#class='fieq4h-0 klQmUt']/div[#class='xoxfsb-0 bmukdK']/div[3]/div[2]/div[1]/div[2]/div[1]/div[1]/div[1]/a[1]")
print data[0].text
data1 = element.find_elements_by_xpath(
" //body/div[#id='__next']/div[#class='sc-1myx216-0 iygrgv']/div[#class='p5q4id-0 fasJHc sc-5vnaz6-1 doVOgS']/div[#class='fieq4h-0 klQmUt']/div[#class='xoxfsb-0 bmukdK']/div[3]/div[2]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/span")
print data1[0].text
Try below the solution for transaction ID
driver.get('https://www.blockchain.com/btc/block/00000000000000000004b91bad9ecfa8c0e57c256d0007cca6f0a2a9e54a2ccc')
List1 = driver.find_elements_by_xpath("//body/div[#id='__next']/div[#class='sc-1myx216-0 iygrgv']/div[#class='p5q4id-0 fasJHc sc-5vnaz6-1 doVOgS']/div[#class='fieq4h-0 klQmUt']/div[#class='xoxfsb-0 bmukdK']/div[*]/div[2]/div[1]/div[2]/div/div/div/a")
for items in List1:
print (items.text)
List2 = driver.find_elements_by_xpath("//body/div[#id='__next']/div[#class='sc-1myx216-0 iygrgv']/div[#class='p5q4id-0 fasJHc sc-5vnaz6-1 doVOgS']/div[#class='fieq4h-0 klQmUt']/div[#class='xoxfsb-0 bmukdK']/div[*]/div[2]/div[1]/div[2]/div/div/div/div/span")
for items in List2:
print (items.text)

REBOL layout: How to create layout words automatically - word has no context?

Using the REBOL/View 2.7.8 Core, I would like to prepare a view layout beforehand by automatically assigning words to various layout items, as in the following example.
Instead of
prepared-view: [across
cb1: check
label "Checkbox 1"
cb2: check
label "Checkbox 2"
cb3: check
label "Checkbox 3"
cb4: check
label "Checkbox 4"
]
view layout prepared-view
I would thus like the words cb1 thru cb5 to be created automatically, e.g.:
prepared-view2: [ across ]
for i 1 4 1 [
cbi: join "cb" i
cbi: join cbi ":"
cbi: join cbi " check"
append prepared-view2 to-block cbi
append prepared-view2 [
label ]
append prepared-view2 to-string join "Checkbox " i
]
view layout prepared-view2
However, while difference prepared-view prepared-view2 shows no differences in the block being parsed (== []), the second script leads to an error:
** Script Error: cb1 word has no context
** Where: forever
** Near: new/var: bind to-word :var :var
I've spent hours trying to understand why, and I think somehow the new words need to be bound to the specific context, but I have not yet found any solution to the problem.
What do I need to do?
bind prepared-view2 'view
view layout prepared-view2
creates the correct bindings.
And here's another way to dynamically create layouts
>> l: [ across ]
== [across]
>> append l to-set-word 'check
== [across check:]
>> append l 'check
== [across check: check]
>> append l "test"
== [across check: check "test"]
>> view layout l
And then you can use loops to create different variables to add to your layout.
When you use TO-BLOCK to convert a string to a block, that's a low-level operation that doesn't go through the "ordinary" binding to "default" contexts. All words will be unbound:
>> x: 10
== 10
>> code: to-block "print [x]"
== [print [x]]
>> do code
** Script Error: print word has no context
** Where: halt-view
** Near: print [x]
So when you want to build code from raw strings at runtime whose lookups will work, one option is to use LOAD and it will do something default-ish, and that might work for some code (the loader is how the bindings were made for the code you're running that came from source):
>> x: 10
== 10
>> code: load "print [x]"
== [print [x]]
>> do code
10
Or you can name the contexts/objects explicitly (or by way of an exemplar word bound into that context) and use BIND.

Adding Dynamic Text as Text Element in ArcMap 10.2

I am creating an Add-in button in ArcMap 10.2 that adds a floating concatenated dynamic text box to the map layout. I'm having a hard time with my script and am hoping someone will have an answer.
Here is my code:
def onClick(self):
mxd = arcpy.mapping.MapDocument("CURRENT")
for elm in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT", " ")[0]:
elmWidth = 4.0
x = 100
elm.text = 'User: <dyn type="user"/> Date: <dyn type="date" format="short"/> <Document Path: dyn type="document" property="path"/>'
elm.fontSize = x
while elm.elementWidth > float(elmWidth):
elm.fontSize = x
x = x-1
arcpy.RefreshActiveView()
del mxd
I'm getting errors of UnboundLocalError: local variable 'mxd' referenced before assignment and IndexError: list index out of range
I'm stuck and need help.
Thank you.
First, you can't add a new text element to an mxd layout, you can only modify or copy existing ones.
Second, write either:
for elm in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT", " "):
...
or
elm = arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT", " ")[0]
This supposes there is a text element named " " in your mxd.