Scraping data inside an iframe with splinter

Scraping data inside an iframe with splinter - selenium

Currently using splinter and trying to get some data from this page :
https://ecomm.one-line.com/one-ecom/manage-shipment/cargo-tracking?ctrack-field=CAAU4023030&trakNoParam=CAAU4023030
Since the data Im looking for is inside the iframe, Im using the get_iframe() according to those docs.
My attempt
iframe = browser.get_iframe('IframeCurrentEcom')
iframe
<contextlib._GeneratorContextManager object at 0x00000210118653A0>
iframe.find_by_xpath("/html/body/div[3]/div[2]/div[1]/form/div[3]/div[1]/div/div/div/div[2]/span[2]")
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: '_GeneratorContextManager' object has no attribute 'find_by_xpath'
I have also tried
>>> with browser.get_iframe('IframeCurrentEcom') as iframe:
... iframe.find_by_xpath('/html/body/div[3]/div[2]/div[1]/form/div[3]/div[1]/div/div/div/div[2]/span[2]')
Output:
Traceback (most recent call last):
File "C:\Users\bonva\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\switch_to.py", line 87, in frame
frame_reference = self._driver.find_element(By.ID, frame_reference)
File "C:\Users\bonva\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 857, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Users\bonva\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "C:\Users\bonva\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: [id="IframeCurrentEcom"]
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:186:5
NoSuchElementError#chrome://remote/content/shared/webdriver/Errors.jsm:398:5
element.find/</<#chrome://remote/content/marionette/element.js:300:16
Cannot find element with that id ? Really , here it is a proof that he should be able to find it :
Example (run it on your own)
from splinter import Browser
>>> browser = Browser()
>>> browser.visit("https://www.one-line.com/en")
>>> browser.execute_script("window.scrollTo(0,500)")
>>> input = browser.find_by_xpath('//*[#id="ctrack-field"]')
>>> input.fill("CAAU4023030")
>>> search = browser.find_by_xpath('/html/body/div[1]/div/main/div[3]/div/div/div/div/div[2]/div/div/div[2]/form/div[2]/button')
>>> search.click()
>>> browser.get_iframe('IframeCurrentEcom')
<contextlib._GeneratorContextManager object at 0x000002100F991C40>
>>> iframe = browser.get_iframe('IframeCurrentEcom')
>>> iframe.html

Related

send file to serial port in python

i am trying to send the file below '105.8k' to my energy meter.
i am using the xmodem example from pypi but i get the following error:
Traceback (most recent call last):
File "C:\Py\mainpy.py", line 68, in <module>
status = modem.send(f, retry=3)
File "C:\Users\admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\xmodem\__init__.py", line 270, in send
char = self.getc(1)
File "C:\py\mainpy.py", line 62, in getc
return ser.read(size) or None
AttributeError: 'str' object has no attribute 'read'
the code i use:
### send file to port###
ser = serialPortCombobox.get().split(" ")[0]
def getc(size, timeout=1):
return ser.read(size) or None
def putc(data, timeout=1):
return ser.write(data)
modem = XMODEM(getc, putc)
f = open('105.8k', 'rb')
status = modem.send(f, retry=3)
ser.close()
stream.close()
thank you for your help.

Error downloading PDF files

I have the following (simplified) code:
import os
import scrapy
class TestSpider(scrapy.Spider):
name = 'test_spider'
start_urls = ['http://www.pdf995.com/samples/pdf.pdf', ]
def parse(self, response):
save_path = 'test'
file_name = 'test.pdf'
self.save_page(response, save_path, file_name)
def save_page(self, response, save_dir, file_name):
os.makedirs(save_dir, exist_ok=True)
with open(os.path.join(save_dir, file_name), 'wb') as afile:
afile.write(response.body)
When i run it, I get this error:
[scrapy.core.scraper] ERROR: Error downloading <GET http://www.pdf995.com/samples/pdf.pdf>
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1301, in _inlineCallbacks
result = g.send(result)
File "C:\Python36\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1278, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <200 http://www.pdf995.com/samples/pdf.pdf>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\twisted\internet\defer.py", line 1301, in _inlineCallbacks
result = g.send(result)
File "C:\Python36\lib\site-packages\scrapy\core\downloader\middleware.py", line 53, in process_response
spider=spider)
File "C:\Python36\lib\site-packages\scrapy_beautifulsoup\middleware.py", line 16, in process_response
return response.replace(body=str(BeautifulSoup(response.body, self.parser)))
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 79, in replace
return cls(*args, **kwargs)
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 20, in __init__
self._set_body(body)
File "C:\Python36\lib\site-packages\scrapy\http\response\__init__.py", line 55, in _set_body
"Response body must be bytes. "
TypeError: Response body must be bytes. If you want to pass unicode body use TextResponse or HtmlResponse.
Do I need to introduce a middleware or something to handle this? This looks like it should be valid, at least by other examples.
Note: at the moment I'm not using a pipeline because there in my real spider I have a lot of checks on whether the related item has been scraped, validating if this pdf belongs to the item, and checking a custom name of a pdf to see if it was downloaded. And as mentioned, many samples did what I'm doing here so I thought it would be easier and work.

The issue because of your own scrapy_beautifulsoup\middleware.py which is trying to replace the return response.replace(body=str(BeautifulSoup(response.body, self.parser))).
You need to correct that and that should fix the issue

python alternative for devices which does not support SSH exec_command

I am new to python and want to know alternate way for doing the following.
I am having issue with the exec_command of paramiko...
Following is the code:
sshdell = paramiko.SSHClient()
sshdell.set_missing_host_key_policy(paramiko.AutoAddPolicy())
sshdell.connect('ip', port=22, username='user', password='pwd')
stdin,stdout,stderr = sshdell.exec_command("ping 4.2.2.2 interface X1")
ping_check = stdout.readlines()
for line in ping_check:
print(line)
the given error is thrown.
Traceback (most recent call last):
File "delltest.py", line 36, in <module>
stdin,stdout,stderr = sshdell.exec_command("ping 4.2.2.2 interface X1")
File "C:\python35\lib\site-packages\paramiko\client.py", line 441, in exec_command
chan.exec_command(command)
File "C:\python35\lib\site-packages\paramiko\channel.py", line 60, in _check
return func(self, *args, **kwds)
File "C:\python35\lib\site-packages\paramiko\channel.py", line 234, in exec_command
self._wait_for_event()
File "C:\python35\lib\site-packages\paramiko\channel.py", line 1161, in _wait_for_event
raise e
paramiko.ssh_exception.SSHException: Channel closed.
Please suggest as my device may not support the exec_command() function.

Kazoo package using Jython

Kazoo's fairly working under the Python, but the project which i'm working on requires to use it under the Jython.
Here is the issue:
>>> from kazoo.client import KazooClient
>>> zk = KazooClient('127.0.0.1')
>>> zk.start()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\jython2.7.0\Lib\site-packages\kazoo\client.py", line 541, in start
event = self.start_async()
File "C:\jython2.7.0\Lib\site-packages\kazoo\client.py", line 576, in start_async
self._connection.start()
File "C:\jython2.7.0\Lib\site-packages\kazoo\protocol\connection.py", line 170, in start
rw_sockets = self.handler.create_socket_pair()
File "C:\jython2.7.0\Lib\site-packages\kazoo\handlers\threading.py", line 165, in create_socket_pair
return utils.create_socket_pair(socket)
File "C:\jython2.7.0\Lib\site-packages\kazoo\handlers\utils.py", line 148, in create_socket_pair
temp_srv_sock.bind(('', port))
File "C:\jython2.7.0\Lib\_socket.py", line 1367, in meth
return getattr(self._sock,name)(*args)
File "C:\jython2.7.0\Lib\_socket.py", line 812, in bind
self.bind_addr = _get_jsockaddr(address, self.family, self.type, self.proto, AI_PASSIVE)
File "C:\jython2.7.0\Lib\_socket.py", line 1565, in _get_jsockaddr
addr = _get_jsockaddr2(address_object, family, sock_type, proto, flags)
File "C:\jython2.7.0\Lib\_socket.py", line 1594, in _get_jsockaddr2
hostname = {AF_INET: INADDR_ANY, AF_INET6: IN6ADDR_ANY_INIT}[family]
KeyError: 0
How i'd already said - there is no this kind issue using the python.
I'm pretty sure that it is connected with the Jython-version of the _socket.py file, but don't know the workaround.
What can you recommend?

FeedparserDict object doesn't have 'content' attribute

I am trying to get familiar with the feedparser library, but I don't seem to be able to access the content attribute of entries in the feedparser object:
d = feedparser.parse('http://www.reddit.com/r/python/.rss')
post = d.entries[2]
post.content
the above code block gives me this error:
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\feedparser.py", line 414, in __getattr__
return self.__getitem__(key)
File "C:\Python34\lib\site-packages\feedparser.py", line 375, in __getitem__
return dict.__getitem__(self, key)
KeyError: 'content'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#87>", line 1, in <module>
content = post.content[0].value
File "C:\Python34\lib\site-packages\feedparser.py", line 416, in __getattr__
raise AttributeError("object has no attribute '%s'" % key)
AttributeError: object has no attribute 'content'

Just do a print(post) and you will probably see that it doesn't have a content attribute.
RSS feeds do not guarantee that it will have one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scraping data inside an iframe with splinter - selenium

Related

send file to serial port in python

Error downloading PDF files

python alternative for devices which does not support SSH exec_command

Kazoo package using Jython

FeedparserDict object doesn't have 'content' attribute

Categories

Resources