XPath selector returns empty list - scrapy

I'm trying to scrape data from store: https://www.tibia.com/charactertrade/?subtopic=currentcharactertrades&page=details&auctionid=12140&source=overview
There is no problem with getting data from 1st and 2nd table, but when I goes down, xpath returns only empty lists.
even tried to save response in file:
scrapy fetch --nolog "https://www.tibia.com/charactertrade/?subtopic=currentcharactertrades&page=details&auctionid=3475&source=overview" > response.html
for table with skills everything works good
sword = response.xpath('//div [#class="AuctionHeader"]/a/text()').get()
but when it comes to getting for example gold value, I get only empty list:
gold = response.xpath('/html/body/div[3]/div[1]/div[2]/div/div[2]/div/div[1]/div[2]/div[5]/div/div/div[3]/div[2]/div[2]/table/tbody/tr/td/div/table/tbody/tr[2]/td/div[2]/div/table/tbody/tr[3]/td/div/text()').get()
In chrome/firefox both selectors works smooth, but in scrapy only 1st one
I know there might be some problems with data updated by javascript, but it doesn't look like this case

Doesn't look like it's a javascript problem. Think you're not getting your XPATH selectors correct. It's best to be as specific as possible and not to use multiple nodes down. Here we can select the attribute TableContent to get the tables you want. There you can select each individual table that you require if needed.
Code Example
table = response.xpath('//table[#class="TableContent"]')[3]
gold_title = table.xpath('tr/td/span/text()')[2].get()
gold_value = table.xpath('tr/td/div/text()')[2].get()
output
'Gold: '
'31,030'
Explanation
Using the class attribute TableContent, you can select which table you want. Here I've selected the table with the gold values. I've then selected each row and the specific element which has the gold value. The values are hidden behind span and div elements. get() returns a string, getall() returns a list.

Related

Selenium finding elements returns incorrect elements

I'm using Selenium to try and get some elements on a web page but I'm having trouble getting the ones I want. I'm getting some, but they're not the ones I want.
So what I have on my page are five divs that look like this:
<div class="membershipDetails">
Inside each one is something like this:
<div class="membershipDetail">
<h3>
VIP Membership
</h3>
</div>
They DO all have this same link, but they don't have the same text ('VIP Membership' would be replaced by something else)
So the first thing was to get all the divs above in a list. This is the line I use:
listElementsMembership = driver.find_elements_by_css_selector(div[class^='membershipDetail'])
This gives me five elements, just as I would expect. I checked the 'class' attribute name and they are what I would expect. At this point I should say that they aren't all EXACTLY the same name 'membershipDetail'. Some have variations. But I can see that I have all five.
The next thing is to go through these elements and try and get that element which contains the href ('VIP Membership').
So I did that like this:
for elem in listElementsMembership:
elemDetailsLink = elem.find_element_by_xpath('//a[contains(#href,"EditMembership")]')
Now this does return something, but it always got me the element from the FIRST of the five elements. It's as if the 'elem.find_element_by_xpath' line is going up a level first before finding these hrefs. I kind of confirmed this by switching this to a 'find_elements_by_xpath' (plural) and getting, you guessed it, five elements.
So is this line:
elemDetailsLink = elem.find_element_by_xpath('//a[contains(#href,"EditMembership")]')
going up a level before getting its results? If it is, now can I make it not do that and just restrict itself to the children?
If you are trying to find element with in an element use a . in the xpath like below:
listElementsMembership = driver.find_elements_by_css_selector(div[class^='membershipDetail'])
for elem in listElementsMembership:
elemDetailsLink = elem.find_element_by_xpath('.//a') # Finds the "a" tag with respect to "elem"
Suppose if you are looking for VIP Membership:
listElementsMembership = driver.find_elements_by_css_selector(div[class^='membershipDetail'])
for elem in listElementsMembership:
value = elem.find_element_by_xpath('.//a').get_attribute("innerText")
if "VIP Membership" in value:
print(elem.find_element_by_xpath('.//a').get_attribute("innerText"))
And if you dont want iterate over all the five elements try to use xpath like below: (As per the HTML you have shared)
//div[#class='membershipDetail']//a[text()='VIP Membership']
Or
//div[#class='membershipDetail']//a[contains(text(),'VIP Membership')]
You've few mistake in that css selector.
Quotes are missing.
^ is for starts-with, not sure if you really need that. In case it's partial matching please use * instead of ^
Also, I do not see any logic for the below statement in your code attempt.
The next thing is to go through these elements and try and get that
element which contains the href ('VIP Membership').
Code :
listElementsMembership = driver.find_elements_by_css_selector("div[class*='membershipDetail']")
for ele in listElementsMembership:
e = ele.find_element(By.XPATH, ".//descendant::a")
if "VIP Membership" in e.get_attribute('href'):
print(e.text, e.get_attribute('href'))
You can give an index using a square bracket like this.
elemDetailsLink = elem.find_element_by_xpath('(//a[contains(#href,"EditMembership")])[1]')
If you are trying to get an element using XPath, the index should start with 1, not 0.

How to Select Choices input field having same class, type, Xpath everything is same

I have two input fields to enter choices which have same class, type. Id is different by it is dynamic and create on run time so i can't use id.I used indexing ,it's not working properly.
driver.findElement(By.xpath("//input[#type='text'][#placeholder='Provide a response entry that customers can select'][1]")).click();
driver.findElement(By.xpath("//input[#type='text'][#placeholder='Provide a response entry that customers can select'][1]")).sendKeys("Iphone 6");
driver.findElement(By.xpath("//input[#type='text'][#placeholder='Provide a response entry that customers can select'][2]")).click();
driver.findElement(By.xpath("//input[#type='text'][#placeholder='Provide a response entry that customers can select'][2]")).sendKeys("Iphone 7");
I used indexing in given image link.
click link to view code in organized way
Index 1 works in this case but unable to find index 2.
Given inspected html code is below of input field 1 and field 2
Field 1
Input field 1 image Xpath link
field 2
Input field 2 image link
If these two inputs are always in this sequence (so the first input is always first and second always second)
You can use:
driver.findElement(By.xpath("(//input[#type='text'][#placeholder='Provide a response entry that customers can select'])[1]")).click();
driver.findElement(By.xpath("(//input[#type='text'][#placeholder='Provide a response entry that customers can select'])[2]")).click();
At the same time I have corrected the syntax in indexing
Building on #Anand 's answer, you can simplify a little:
WebElement button1 = driver.findElement(By.xpath("(//input[#type='text' and #placeholder='Provide a response entry that customers can select'])[1]"));
WebElement button2 = driver.findElement(By.xpath("(//input[#type='text' and #placeholder='Provide a response entry that customers can select'])[2]"));
I think it's a little easier to read using and instead of stacking brackets.
I use it similarly for widgets:
WebElement header = driver.findElement(By.xpath("//div[contains(#class,'panel')]/div[contains(#class,'panel-heading') and text()[contains(.,'News Feed')]]"));

How do I pre-select rows in a DataTable based on the value in a column?

Situation:
I have a pandas dataframe which I convert into an html table via df.to_html(). I then add the DataTables class to the table. This DataTables-table has the following columns:
ID | X | Y | Val |...More columns...| Selection_Criteria |...More columns...
The values in Selection_Criteria can be either 1 or 0. I know that with:
$('#ProductList').DataTable( {
...
"fnInitComplete": function(oSettings, json) { $('#ProductList tbody tr:eq(0)').click(); }
});
(Source: http://code.datatables.net/forums/discussion/38171/automatic-select-of-the-first-row-on-reload)
..it is theoretically possible to select the first row. (In reality, I have not been able to simulate a click for the first row.)
But my question goes more towards: How do I automatically pre-select ALL rows where the value is 1 in Selection_Criteria? What is the best approach? Should this be done client/server side?
In pandas the term "select"(ing) means to screen out that which was not selected for. I know that in a table on a web page, selected can mean being highlighted to stand out from the others. There are a couple of ways you can do this on the server side. You could display two tables, one for each state of Selection_Criteria. This would save you the hassle of trying to select individual rows out of a table in the first place (which would be done with Javascript, not Pandas). While pandas has the ability to add a class to the resulting html, the class is applied to the element.
If you are using jquery you are going to use these pieces. as you haven't put example data I can't be exact.
replace x in the next line with the number of columns the Selection_Criteria=1 is across the table
$( "tr td:nth-child(x):contains('1')" ).addClass('selected');
There are solutions on the backend using beautifulsoup and css selectors, or lxml.etree with xpath selectors. But jquery is going to be the most concise with this problem.
#Aliester. Thank you for the pointer!
This helped me find the solution to my own question. What I did:
1.) Identify row index that I want to select when the table loads.
2.) Pass the index to js.
3.) Loop over the indices and apply the following command to each index entry:
table.row(':eq('+hit_index_row+')').select();
So I am using the API to select each individual row. This works for me and hopefully could be helpful to others as well. It may be a bit hacky, so more elegant suggestions are welcome!
You can do this by providing a function for the "rowCallback" option when initializing the DataTable. https://datatables.net/reference/option/rowCallback
Also it is generally better to use the API methods to select rows instead of just changing the class. I found that the DataTable + Select libraries keep an internal collection of selected row indexes (just current page if serverside processing is on) instead of using the class to resolve selected items.
So while the display will look right, if you just change the class, if you rely on any of the API methods to get selected items later on there will be issues. Additionally just changing the class on the row will not fire any of the "select" events on the table so you can't rely on those either.

Unable to select an element in the table

I am testing out Selenium recently to see if it can recognize my web app better than QTP. So far it seems doing quite well. I ran into a problem trying to find an element within the table element. Some how I was not able to find master table but not the rows within the table.
This is how the table looks like
The code below works fine...
WebElement BaseTable = driver.findElement(By.id("table_simpleBrowser|type=TradingInstrumentReport|!browser"));
Where as the code below does not...
BaseTable = driver.findElement(By.id("table_simpleBrowser|type=TradingInstrumentReport|!browser_tr_1"));
or
BaseTable = driver.findElement(By.className("even status_DEFAULT"));
or
WebElement BaseTable = driver.findElement(By.id("table_simpleBrowser|type=TradingInstrumentReport|!browser"));
BaseTable = BaseTable.findElement(By.className("even status_DEFAULT"));
Can someone please help to show me how I can retrieve the a certain value in the table by finding the element in certain row/column in the table?
Thanks.
even and status_DEFAULT are actually two classes of this web element. By.className() receives only one class as parameter. It should be
findElement(By.className("even"));
// or
findElement(By.className("status_DEFAULT"));
To find element by the two classes use By.cssSelector()
findElement(By.cssSelector(".even.status_DEFAULT")); // note the dot before each class name
However it seems that its not unique enough. I recommend you search by id which contains browser_tr_1
findElement(By.cssSelector("[id*=`browser_tr_1`]"));

Is there a shorter syntax than soup.select("#visitor_stats")[0]?

I'm using BeautifulSoup (import bs4) to read some information from a web page. Several lines in my script look like
stats = soup.select("#visitor_stats")[0]
Is there a shorter syntax for this?
select() lets you select a bunch of HTML tag elements based on their CSS properties (like id and class). In this case you are looking for all HTML tag elements with CSS id property set to visitor_stats. And then selecting the first element from the returned list.
The BeautifulSoup method find() returns the first occurrence of the search criteria. So the list index [0] can be gotten rid of by using find()
stats = soup.find(attrs={'id':'visitor_stats'})
But I am not sure if this is any shorter :)