How to set limit scrolling page while scrapping instagram? - selenium

scrolldown=driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
last_count = scrolldown
time.sleep(3)
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
if last_count==scrolldown:
match=True
I want to scrape data from an Instagram profile with Selenium, but I don't know how to set the limit for scrolling the page. Because of the code above, the page keeps scrolling until I don't know when it stops. I just want to scroll through that account's posts until I find the one I'm looking for. 

As you mentioned "to scroll through that account's posts until I find the one I'm looking for" presumably the specific element should be having an unique attribute either among:
id
classname
aria-label
innerText
or can be identified uniquely within the HTML DOM with combination of it's attributes. Once you are able to construct the locator strategy which identifies the element uniquely, you can easily use scrollIntoView() method as follows:
element = driver.find_element(By.XPATH, "//unique_xpath_locator")
driver.execute_script("return arguments[0].scrollIntoView();", element)

Probably the best and safest way to scroll is to use
element = driver.find_element(...)
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', element)
this command scrolls smoothly in such a way that element is vertically at the center of the page. So in your case I suggest to scroll to the oldest loaded post (it should be located at the bottom of the screen) so that new ones are loaded, and repeat the process until you find the post you are looking for. You can do this with the following code
while 1:
loaded_posts = driver.find_elements(By.CSS_SELECTOR, 'article > div > div > div > div')
# scroll to last loaded post
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', loaded_posts[-1])
post_found = ...
if post_found:
break

Related

Cant get element generated by javaScript with Selenium

I want to bypass a website which uses cloudflare's botdetection and DDos-protection. I am using selenium. when the page load request is sent, the page starts loading and an element appears on the screen.
the html looks like this:
the white box is a span named 'mark' and the hidden blue box is a 'input' tag with type 'checkbox'.
I tried to get the elements it with usual ways like driver.find_elements(By.CLASS_NAME, "class") also xpath, cssSelector and JsPath but didnt work. note I have waited manually to element fully appear on the screen and the problem is not about waiting for element to be loaded.
because it was generated by js so I tried element = driver.execute_script("return document.querySelector('$cssSelector')") pattern. also with xpath and JsPath. they also didnt work and elements were not found.
the code:
markSpanJsPath = driver.execute_script("return document.querySelector('#cf-stage > div.ctp-checkbox-container > label > span')")
if markSpanJsPath:#kkk
print('found markSpanJsPath')
driver.execute_script("arguments[0].click();", markSpanJsPath)
print('found markSpanJsPath js click')
markSpanXpath = driver.find_elements(By.XPATH,'//*[#id="cf-stage"]/div[6]/label/span')
if markSpanXpath:
print('found markSpanXpath')
driver.execute_script("arguments[0].click();", markSpanXpath)
print('found markSpanXpath js click')
printed nothing.
so how to click on one of 'mark span' or 'checkbox' to pass the 'human verification'?
First of all it should be find_element() not find_elements()
use below xpath to identify and click.
markSpanXpath = driver.find_element(By.XPATH,'//label[#class="ctp-checkbox-label"]//span[#class="mark"]')
markSpanXpath.click()
if above click doesn't work try java scripts.
driver.execute_script("arguments[0].click();", markSpanXpath)

WebDriverWait.until.expected_conditions.presence_of_element_located not waiting for reloaded DOM

I have an app with 2 buttons. Button A takes the app to a new page with no buttons, then returns to the page with 2 buttons. My automated test is meant to click on Button A, wait while the app heads to the new pages and returns, then click on Button B.
The code:
el05a = WebDriverWait(driver, 120).until(
expected_conditions.presence_of_element_located((By.ID, "id_of_button_a"))
)
el05a.click()
el05b = WebDriverWait(driver, 120).until(
expected_conditions.presence_of_element_located((By.ID, "id_of_button_b"))
)
el05b.click()
However, I receive a StaleElementReferenceException about button B not being in DOM anymore.
Obviously, button B is not gonna be in the DOM while the app is at the new page, but why does my code not know to wait until the presence of button B is located? I thought presence_of_element_located means the code would be on hold until the element is located.
I know this could "technically" be patched with a time.sleep module but I'm trying to avoid that.
As per your query it seems likes as your checking presence_of_element_located and which only check for it presence and not the visibility of the element.
Try replacing the presence_of_element_located with visibility_of_element_located.
There is difference between visibility_of_element_located and presence_of_element_located.
1) visibility_of_element_located
Checking that an element is present on the DOM of a page and visible. Basically it tests if the element we are looking for is present as well as visible on the page.
2) presence_of_element_located
Checking that an element is present on the DOM of a page. Basically it tests if the element we are looking for is present somewhere on the page.
Code:
el05a = WebDriverWait(driver, 120).until(
expected_conditions. visibility_of_element_located((By.ID, "id_of_button_a"))
)
el05a.click()
el05b = WebDriverWait(driver, 120).until(
expected_conditions. visibility_of_element_located((By.ID, "id_of_button_b"))
)
el05b.click()
visibility_of_element_located: Returns the WebElement once it is located and visible.
An expectation for checking that an element is present on the DOM of a page and visible. Visibility means that the element is not only displayed but also has a height and width that is greater than 0.
presence_of_element_located: Returns the WebElement if element is present on DOM and not even visible.
An expectation for checking that an element is present on the DOM of a page. This does not necessarily mean that the element is visible.
Please change it from
expected_conditions.presence_of_element_located((By.ID, "id_of_button_a"))
to
expected_conditions.visibility_of_element_located((By.ID, "id_of_button_a"))

selenium python how to find and click element that change everytime

im trying to find an element with dinamic values , for example <span class="ms-Button-label label-175" id="id__177">Save</span> in inspect element, the id and class values tend to change for every refresh, how can i in this case find the element in selenium? i tried troguht xpath but seems doesnt work because can not find the path, i was thinking to find "Save" world torught always find by xpath but actually i dont know if im doing well : driver.find_element_by_xpath(//span(#.... but then? how can insert element if it changes everytime? thanks!
Something like this may work:
driver.find_element_by_xpath('//span[text()="Save"]')
But this will fail, if there is more than one button with text "Save" on the page.
In that case you may try to find some specific outer element (div, form, etc.) which does not change and contains the button. Then find the button inside of it.
With few requests with driver:
specific_div = driver.find_element_by_id("my_specific_div")
button = specific_div.find_element_by_tag_name("span") # e.g. there is only one span in that div
Or with more specific xpath:
button = driver.find_element_by_xpath('//div[#class="some-specific-class"]/span[text()="Save"]')
If needed, search for more nested elements before the button, so you can get more narrow search field.
More examples in the docs.

wait.Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.ClassName(className)) doesn't return any element

I need to find IReadOnlyCollection<IWebElement> using WebDriverWait to make sure that elements had been rendered on page.
This is my code
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(timeout));
return wait.Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.ClassName("TextInput")));
This code failing on timeout.
Meaning that could not find any elements on page with given class name.
I added this line of code BEFORE my original code just to make sure that elements are present
var allInputs1 = container.FindElements(By.ClassName("textInput"));
And that line returns elements as expected.
So my conclustion is that
wait.Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.ClassName("TextInput")))
doesn't work as expected since that couldn't find elements that are for sure present on page.
What is the best way to find array of elements using WebDriverWait?
Your conclusion is wrong. With FindElements you just make sure that elements are present.
The API documentation for VisibilityOfAllElementsLocatedBy states:
An expectation for checking that all elements present on the web page
that match the locator are visible. Visibility means that the elements
are not only displayed but also have a height and width that is
greater than 0.
And obviously present is not visible.
I think you should try ExpectedConditions.PresenceOfAllElementsLocatedBy

HTML code - how to find out absolute position on the page?

Is there any browser engine or plugin which would give user information about position of given HTML element ? I want to know where is element located e.g. left corner or center of the page.
It should not be a huge problem as Firefox and Chrome marks you elements within page as you go through html code in Developer Tools > "Element tab".
Example of highlighted element : http://imgur.com/mUHd51q we see that selected element is currently in the centre of the screen - how to get this information programatically ?
Selenium-webdriver can give you information about any DOM element you want:
d = Selenium::WebDriver.for :chrome
d.get "http://www.google.com"
elem = d.find_element(:name, "btnI")
elem.location
=> #<struct Selenium::WebDriver::Point x=532, y=356>