Extract complete URL from a link - scrapy

I am scrapping amanzon.co.in using scrapy-playwright. I am able to extract description, rating and price of desired item. However for going to next page I want to extract href for Next Page button at the bottom of the page.
Thru scrapy-playwright python code I am able to extract href of next button as : href="/s?k=Soap+for+men&page=2"
When I extract URL using the browser, it appears like : https://www.amazon.in/s?k=soap+for+men&page=2&crid=1A43B14UY65X0&qid=1671472636&sprefix=soap+for+men%2Caps%2C262&ref=sr_pg_1
How do I get generate complete URL from the link including crid extracted thru code ?

Related

Selenium XPath for WebTable using parent & sibling

I am trying to automate the web table on the demoqa site https://demoqa.com/webtables where I am aiming to click the edit and delete button for a specific user. Please find the attached screenshot for reference.
Screenshot
I am trying to click the edit and delete button for the user 'Cierra' hence I have created the customize XPath '//div[contains(text(),'cierra#example.com')]//following::div[text()='Insurance']//following::div//div//span[#title='Edit']'
Trying to click the edit and delete button using the contains text with email 'cierra#example.com' however I see four results even I use the unique username. Could anyone help me with this?
(//div[contains(text(),'cierra#example.com')]//following::div[text()='Insurance']//following::div//div//span[#title='Edit'])[1]
you can enclose the result in bracket and call [1] , to get first one:
But you don't have to over complicate it , just get email then go back to parent and get span under that parent ,:
//div[contains(text(),'cierra#example.com')]/..//span[#title="Edit"]
if you still want to use fancy xpath locator then use :
//div[contains(text(),'cierra#example.com')]/following-sibling::div[contains(text(),'Insurance')]/following-sibling::div//span[#title='Edit']

Is there a way to click on anchor link

Please help me as the anchor tag looks like the below
<a title ="excel" class="activelink"style="Text-Decoration: none; onclick="$find('ReportViewerControl').exportReport('Excelopenxml');" href ="javascript:void(0)" alt="Excel" _selected="true"> Excel</a>
This doesn't have any document id or class.. Any help would be highly appreciated.
I try to check your HTML code and found that it contains the class but it can be possible that many other elements on the page using the same class. If you try to access the link using that class then it can possible that you click the incorrect element on the page.
We can see that the link contains the ' Excel' text. We can try to loop through all the links on the page and try to match the innerHTML to find that specific link.
Example:
'Below is code to loop through anchor tags, find the specific link, and click it.
Set elems = IE.document.getElementsByTagName("a")
For Each elem In elems
If (elem.innerHTML) = " Excel" Then
elem.Click
Exit For
End If
Next elem
Output:
In a similar way, you can also match other attributes from the anchor tag that may also help to click the link.
Note: This code example is for clicking the specific link on a page. It may not help you to automate the file download.
Further, you can try to modify the code example as per your own requirements.

VBA Macro - How to click a link in java web page

I want to get data from a web page.
web adress:
https://intvrg.gib.gov.tr/intvrg_side/main.jsp?token=d1078f5e3dc646b78d5d4e5842f21e97feb48d366bc7617458b6679dec12675154a01fccc42292bb04d926bc259dbc75e39dd8e202535fd70a7098396c74a6f7
After this page I click "Diger Sorgulamalar" at the right box, then try to click "Vergi Kimlik Numarası Dogrulama"
With vba code after navigating to the web page I can click first link with the code:
IE.document.getElementById("gen__1155").Click
However, I cannot navigate to the "Vergi Kimlik Numarası Dogrulama". I tried:
IE.document.getElementById("H7d190dfed4bed-faf6170603664e").Click
But this does not work. The web source code is like below.
How can I go to that page?
The issue is that H7d190dfed4bed-faf6170603664e does not appear in the HTML code you posted, so it cannot find it. So probably the ID changes everytime the page is accessed so you cannot hardcode the ID.
You need to find something else to determine the correct link. Is the link name Vergi Kimlik Numarası Dogrulama always the same?
Then you could loop through all links and check to find the name of the tag:
For Each lnk In IE.document.GetElementByTagName("a")
If lnk.innerHTML = "Vergi Kimlik Numarası Dogrulama" Then
lnk.Click
Exit For 'if there is only one link with that name you can exit here.
End If
Next lnk

AppleScript - accentued characters

I have problem with special characters in AppleScript (service in Automator).
The selected text (title of a book) is the input (titre in the script), and the goal is to display in safari the result of the advanced research of this book on noosfere.org
It's ok when there is no accent characters in my selected text.
But if titre is sphère d'influence :
In the display box (only used for testing), "sphère d'influence" is correctly written with the "è".
But in safari, in the research field in the website, I have "sphère d''influence".
on run {titre, parameters}
set url_noosfere_titre to "https://www.noosfere.org/livres/noosearch.asp?Mots=" & titre & "&Envoyer=Envoyer&livres=livres&ModeRecherche=AND&ModeMoteur=MOTSCLEFS&recherche=1"
display dialog (url_noosfere_titre as text) buttons {"OK", "annulé"}
set retour to button returned of result
if retour is equal to "OK" then
open location url_noosfere_titre
end if
end run
URL with accents is very messy.
Basically URL's should be converted using 2 hexadecimal values with escape character '%'.For instance, "é", should be converted in %C3%A8.
However, result depends of web browser (IE, Safari, Chrome, Firefox) and versions. Sometime the browser is doing its own conversion which you can't avoid.
In your case, it is probably more efficient to change your mind and don't try to fill URL with the word you are searching for, but instead, just display the web page and set your word directly into the search area of the page, then click on the submit button ("Envoyer" in French).
In Applescript, you can manipulate a web page by using Javascript commands.
The script bellow displays your web page, add your word to search in the correct field, select the search area and submit the request by clicking on the submit button.
Doing that, no accent issue at all !
set BaseURL to "https://www.noosfere.org/livres/noosearch.asp?Mots"
set MyWord to "sphère"
tell application "Safari"
open location BaseURL
activate
delay 1 -- time to open the new window. Could be replaced by checking javascript load = complete
tell document 1
-- fill the search field of the page with expression to search
do JavaScript "document.getElementsByClassName('liste')[0].firstElementChild.children[1].value = '" & MyWord & "'"
-- check boxes for search area are 1, 3, 5,...17, 19 in "littérature" block
-- in this example, select the 2nd check box, so index is 3 (="Livres")
do JavaScript "document.getElementById('litt').children[3].click()"
-- click on the search button of the page (="Envoyer")
do JavaScript "document.getElementsByClassName('liste')[0].firstElementChild.children[2].click()"
end tell
end tell
fyi, I have been forced to add the selection of a search area (here I click on check box for all books ='Livres') because at least 1 selection in that area is mandatory.

How to validate a dynamic xpath/href of edit/delete button of a list using selenium webdriver

I am validating an application where after creating a link that is showing on a list where all other same types of links are present. When I inspect the element of displayed links on the list I am getting a "HREF" with the same sequence but on last numerical number is showing which is dynamic. So my question is that how can I validate this using selenium web driver after creating a new link and then clicking on this link.
Example of the HREF pattern: /admin/dashboard/quicklink/edit/2-->when I create another new link the href of the new link becomes /admin/dashboard/quicklink/edit/3. How can I locate this dynamic href?
My purpose is that the link which have been created when I run the code and without changing my code I want to click the newly created link after creation
Option 1 if the number of newly created link will always greater than exists links
steps of code:
find all links and save into a List after create new link
sort the List by last number
get the first or last link in sorted List dependent the sorting order
Option 2 Don't care newly created link use greater number
steps of code:
find all links before create new link, and save all links' href value into List1
create new link
find all links again and save all links' href value into List2
Compare List1 and List2 to find the different href, then use this href to find link to click