Selenium creating bulk email addresses - selenium

I want to use selenium to create several email addresses at once. I suppose they can be random but I already have a list of the email account names I want to create.
I know how to create 1 email using webdriver but how would I go about it if I want to sign up several, one after the other automatically, without having to always change the code?
Simple code for creating 1 email:
from selenium import webdriver
import time
url = 'https://hotmail.com/'
driver = webdriver.Chrome('/C:Users/Desktop/chromedriver')
driver.get(url)
driver.find_element_by_xpath("//a[contains(#class, 'linkButtonSigninHeader')]/#href").click()
time.sleep(2)
driver.find_element_by_id('MemberName').send_keys('usernameexample')
time.sleep(1)
driver.find_element_by_id('iSignupAction).click()
time.sleet(2)
driver.find_element_by_id('PasswordInput').send_keys('Passwordexample1')
time.sleep(1)
driver.find_element_by_id('iSignupAction').click()
time.sleep(2)
driver.find_element_by_id('FirstName').send_keys('john')
time.sleep(1)
driver.find_element_by_id('LastName').send_keys('wayne')
time.sleep(1)
driver.find_element_by_id('iSignupAction').click()

As others have pointed out, you could iterate over a data collection, such as an array:
array_of_usernames = ['username_one', 'username_two']
array_of_usernames.each do |username|
url = 'https://hotmail.com/'
driver = webdriver.Chrome('/C:Users/Desktop/chromedriver')
driver.get(url)
driver.find_element_by_xpath("//a[contains(#class, 'linkButtonSigninHeader')]/#href").click()
driver.find_element_by_id('MemberName').send_keys("#{username}") #INTERPOLATED BLOCK-LOCAL VARIABLE HERE
driver.find_element_by_id('iSignupAction).click()
driver.find_element_by_id('PasswordInput').send_keys('Passwordexample1')
driver.find_element_by_id('iSignupAction').click()
driver.find_element_by_id('FirstName').send_keys('john')
driver.find_element_by_id('LastName').send_keys('wayne')
driver.find_element_by_id('iSignupAction').click()
# some step to log out so that next username can register
end
If you aren't familiar with arrays or iteration, then I'd suggest looking at the docs to get your head around it: https://ruby-doc.org/core-2.6.1/Array.html#method-i-each

Related

Rotating Proxies and Selenium - Close driver and re-run script with new IP

I am running an automated script that needs to change IP every 5-6 refreshes. I understand that the it cannot be changed dynamically as it needs to re-open the webdriver with new options. I have a list of proxies in a text file and I wish to use these IPs at random every 5-6 times the driver refreshes. It would normally get stuck in a try catch statement that looks for a keyword and if it does not find the keyword it will refresh every 5 seconds until this word is found.
def find_link_by_word_in_href(driver, words):
for word in words:
try:
return driver.find_element(By.XPATH, f"//*[contains(#href,'{word}')]")
except NoSuchElementException:
pass
while True:
element = find_link_by_word_in_href(driver, ['dadsa', 'daasd', 'asdsad'])
if element is not None:
element.click()
break
else:
driver.refresh()
time.sleep(5)
I wish to break this once it hits 5 refreshes and restart the script choosing a new IP from the text file. Can anybody help or point me in the correct direction to solve this?

Scraping: scrape multiple pages in looping (Beautifulsoup)

I am trying to scrape real estate data using Beautifulsoup, but when I save the result of the scrape to a .csv file, it only contains the information from the first page. I would like to scrape the number of pages I have set in the "pages_number" variable.
# How many pages
pages_number =int(input('How many pages? '))
# inicializa o tempo de execução
tic = time.time()
# Chromedriver
chromedriver = "./chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
#initial link
link = 'https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page=1'
driver.get(link)
# creating looping pages
for page in range(1,pages_number+1):
time.sleep(15)
data = driver.execute_script("return document.getElementsByTagName('html' [0].innerHTML")
soup_complete_source = BeautifulSoup(data.encode('utf-8'), "lxml")
I already tried this solution but got an error:
link = 'https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page={}.format(page)'
Does anyone have any idea what can be done?
COMPLETE CODE
https://github.com/arturlunardi/webscraping_vivareal/blob/main/scrap_vivareal.ipynb
I see that the url you are using belongs to page 1 only.
https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page=1
Are you changing it anywhere in your code? If not, then no matter what you fetch, it would fetch from page 1 only.
You should do something like this:
for page in range(1,pages_number+1):
chromedriver = "./chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
#initial link
link = f"https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page={page}"
driver.get(link)
time.sleep(15)
data = driver.execute_script("return document.getElementsByTagName('html' [0].innerHTML")
soup_complete_source = BeautifulSoup(data.encode('utf-8'), "lxml")
driver.close()
Test Output (not the soup part) - for pages_number = 3 (stored urls in a list, for easy view):
['https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page=1', 'https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page=2', 'https://www.vivareal.com.br/aluguel/sp/sao-paulo/?__vt=lnv:a&page=3']
Process finished with exit code 0

how to use time sleep to make selenium output consistent

This might be the stupidest question i asked yet but this is driving me nuts...
Basically i want to get all links from profiles but for some reason selenium gives different amounts of links most of the time ( sometimes all sometimes only a tenth)
I experimented with time.sleep and i know its affecting the output somehow but i dont understand where the problem is.
(but thats just my hypothesis maybe thats wrong)
I have no other explanation why i get incosistent output. Since i get all profile links from time to time the program is able to find all relevant profiles.
heres what the output should be (for different gui input)
input:anlagenbau output:3070
Fahrzeugbau output:4065
laserschneiden output:1311
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import TimeoutException
from urllib.request import urlopen
from datetime import date
from datetime import datetime
import easygui
import re
from selenium.common.exceptions import NoSuchElementException
import time
#input window suchbegriff
suchbegriff = easygui.enterbox("Suchbegriff eingeben | Hinweis: suchbegriff sollte kein '/' enthalten")
#get date and time
now = datetime.now()
current_time = now.strftime("%H-%M-%S")
today = date.today()
date = today.strftime("%Y-%m-%d")
def get_profile_url(label_element):
# get the url from a result element
onlick = label_element.get_attribute("onclick")
# some regex magic
return re.search(r"(?<=open\(\')(.*?)(?=\')", onlick).group()
def load_more_results():
# load more results if needed // use only on the search page!
button_wrapper = wd.find_element_by_class_name("loadNextBtn")
button_wrapper.find_element_by_tag_name("span").click()
#### Script starts here ####
# Set some Selenium Options
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
# Webdriver
wd = webdriver.Chrome(options=options)
# Load URL
wd.get("https://www.techpilot.de/zulieferer-suchen?"+str(suchbegriff))
# lets first wait for the timeframe
iframe = WebDriverWait(wd, 5).until(
EC.frame_to_be_available_and_switch_to_it("efficientSearchIframe")
)
# the result parent
result_pane = WebDriverWait(wd, 5).until(
EC.presence_of_element_located((By.ID, "resultPane"))
)
#get all profilelinks as list
time.sleep(5)
href_list = []
wait = WebDriverWait(wd, 15)
while True:
try:
#time.sleep(1)
wd.execute_script("loadFollowing();")
#time.sleep(1)
try:
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
except TimeoutException:
break
#time.sleep(1) # beeinflusst in irgeneiner weise die findung der ergebnisse
result_elements = wd.find_elements_by_class_name("fancyCompLabel")
#time.sleep(1)
for element in result_elements:
url = get_profile_url(element)
href_list.append(url)
#time.sleep(2)
while True:
try:
element = wd.find_element_by_class_name('fancyNewProfile')
wd.execute_script("""var element = arguments[0];element.parentNode.removeChild(element);""", element)
except NoSuchElementException:
break
except NoSuchElementException:
break
wd.close #funktioniert noch nicht
print("####links secured: "+str(len(href_list)))
Since you say that the sleep is affecting the number of results, it sounds like they're loading asynchronously and populating as they're loaded, instead of all at once.
The first question is whether you can ask the web site developers to change this, to only show them when they're all loaded at once.
Assuming you don't work for the same company as them, consider:
Is there something else on the page that shows up when they're all loaded? It could be a button or a status message, for instance. Can you wait for that item to appear, and then get the list?
How frequently do new items appear? You could poll for the number of results relatively infrequently, such as only every 2 or 3 seconds, and then consider the results all present when you get the same number of results twice in a row.
The issue is the method presence_of_all_elements_located doesn't wait for all elements matching a passed locator. It waits for presence of at least 1 element matching the passed locator and then returns a list of elements found on the page at that moment matching that locator.
In Java we have
wait.until(ExpectedConditions.numberOfElementsToBeMoreThan(element, expectedElementsAmount));
and
wait.until(ExpectedConditions.numberOfElementsToBe(element, expectedElementsAmount));
With these methods you can wait for predefined amount of elements to appear etc.
Selenium with Python doesn't support these methods.
The only thing you can see with Selenium in Python is to build some custom method to do these actions.
So if you are expecting some amount of elements /links etc. to appear / be presented on the page you can use such method.
This will make your test stable and will avoid usage of hardcoded sleeps.
UPD
I have found this solution.
This looks to be the solution for the mentioned above methods.
This seems to be a Python equivalent for wait.until(ExpectedConditions.numberOfElementsToBeMoreThan(element, expectedElementsAmount));
myLength = 9
WebDriverWait(browser, 20).until(lambda browser: len(browser.find_elements_by_xpath("//img[#data-blabla]")) > int(myLength))
And this
myLength = 10
WebDriverWait(browser, 20).until(lambda browser: len(browser.find_elements_by_xpath("//img[#data-blabla]")) == int(myLength))
Is equivalent for Java wait.until(ExpectedConditions.numberOfElementsToBe(element, expectedElementsAmount));

Does using selenium change the token of the session?

i am writing a tool to check mac address online with selenium i managed to find the input and the submit but when i ask for the results it print the session id and the token
import selenium
## set up options
options = Options()
options.headless=True
browser.Firefox(options, exceutable_path=r"geckodriver_path")
browser.get("site-URL")
## mac address sent to site
elem = browser.find_element_by_id('result')
elemnt = browser.find_element_by_css_selector('#results-log')
print (elem)
print (elemnt)
the output is some session info
<selenium.webdriver.remote.webelement.WebElement (session="289e304328d8a7900f7003d4ed6530be", element="f807a2e7-8895-4e8d-b7af-ce3d27fbf897")>
i need to get the result that is on the site
You saw it right.
The variable elem is a WebElement identified through browser.find_element_by_id('result')
The variable elemnt is a WebElement identified through browser.find_element_by_css_selector('#results-log')
Printing the element will be in the following format:
<selenium.webdriver.remote.webelement.WebElement (session="289e304328d8a7900f7003d4ed6530be", element="f807a2e7-8895-4e8d-b7af-ce3d27fbf897")>
You can find a relevant discussion in Are element IDs numbers in Webdrivers?

Threading and Selenium

I'm trying to make multiple tabs in Selenium and open a page on each tab simultaneously. Here is the code.
CHROME_DRIVER_PATH = "C:/chromedriver.exe"
from selenium import webdriver
import threading
driver = webdriver.Chrome(CHROME_DRIVER_PATH)
links = ["https://www.google.com/",
"https://stackoverflow.com/",
"https://www.reddit.com/",
"https://edition.cnn.com/"]
def open_page(url, tab_index):
driver.switch_to_window(handles[tab_index])
driver.get(url)
return
# open a blank tab for every link in the list
for link in range(len(links)-1 ): # 1 less because first tab is already opened
driver.execute_script("window.open();")
handles = driver.window_handles # get handles
all_threads = []
for i in range(0, len(links)):
current_thread = threading.Thread(target=open_page, args=(links[i], i,))
all_threads.append(current_thread)
current_thread.start()
for thr in all_threads:
thr.join()
Execution goes without errors, and from what I understand this should logically work correctly. But, the effect of the program is not as I imagined. It only opens one page at a time, sometimes it doesn't even switch the tab... Is there a problem that I'm not aware of in my code or threading doesn't work with Selenium?
There is no need in switching to new window to get URL, you can try below to open each URL in new tab one by one:
links = ["https://www.google.com/",
"https://stackoverflow.com/",
"https://www.reddit.com/",
"https://edition.cnn.com/"]
# Open all URLs in new tabs
for link in links:
driver.execute_script("window.open('{}');".format(link))
# Closing main (empty) tab
driver.close()
Now you can handle (if you want) all the windows from driver.window_handles as usual