selenium scraping content from website into an array [closed] - selenium

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am interested in scraping content from a website and putting it into an array. Specifically, I am interested in scraping plaintext into an array by identifying the html element the plain text is under. I am using selenium with Java and I was hoping someone could shed some light on the best way to do this. I would be scanning in multiple plain text elements and putting them in sequential order, into an array. The plain text would be in html tables and I would need to take a specific section of the table that has the plain text I was interested in.

this is a rather broad question, but still I'm hoping I can help. I've used selenium with scrapy library (python) for scraping and it worked all very well. If your question is what's the best way to find the text in the HTML it is pretty much safe to say that the answer is XPath. It is a very simple language designed to extract multiple elements from html/xml. Just google for examples and I'm sure you'll get the hang of it. Selenium has quiate a few built-in funcionts for xpath, you'll find plenty of examples

Related

How is structured data cross-checked with the page content by search engines? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I've just started learning structured data and I'm still trying to wrap my head around the concept.
First I started out with Microdata using schema.org vocabulary and now I'm learning JSON-LD.
The thing that is bugging me is that 'how do search engines know that the structured data that I'm providing in the head matches with the content of the page?' because in the specific case of JSON-LD I'm not specifying which element contains what information.
This was not the case in Microdata where I provide the structured data in the element itself. And to add to my confusion I've read in multiple articles that Microdata & JSON-LD have the same result, which means my understanding of Microdata is wrong too!
Please help me with this.
Thank you
Think of JSON-LD and MicroData working in compliment. Where there's a lot of content on your page, go with MicroData as the work is already there. You're familiar, and so you know about structuring your page to work with it. JSON-ld is basically an easy short-cut for identification.
(This is not intended as an 'expert' answer, but a simple answer for someone still learning the ropes.)

Find element by xpath or find element by css which one is better in Selenium Webdriver? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I use find element by xpath to locate elements and find it useful but it is suggested to use find element by css when using Selenium Webdriver. Would like to know which one is better?
As with anything and everything, when it comes to such choices, things are subjective. Following are the points that I would like to convey from my own knowledge.
CSS is faster compared to Xpath
CSS has a better cross browser compatibility
CSS is easy to understand, learn, implement
Xpath is a bit slow compared to CSS
Xpath implementation in each browser might differ giving rise to cross browser issues
Xpath might not work properly with old version of IE
Xpath is difficult to understand and implement
From the above points it is clear that CSS is better than Xpath.
BUT a question might arise, if CSS is better than Xpath, why is Xpath still present and used widely.
The answer to that big BUT, AFAIK, is because, Xpath is more powerful than CSS. I have faced situations where certain element selections was possible using only Xpath and not CSS. For e.g.,
selecting elements based on text
parent/child/sibling selections
index based selection (Refer CSS Index Based Selection )
etc.
Hope this gives you some idea on where to use CSS and where to use Xpath. :)

My webpage has infinite scrolling, how should I test it? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am testing infinite scroll feature of my web application on a variety of pages. What aspects should I keep in mind?
What should be automated, using WebDriver for example.
The website is primarily targeting iOS/Android devices.
I would recommend NOT automating any UI feature, at least the visual part. I have found that testing functionality is fantastic with WebDriver, but that UI is harder, and generally not useful with automation.
For example, you could write a test that scrolls the page down, and verifies that new content is loaded. However, you would have no idea whether or not the layout of the content is any good, or even if its on the part of the page you are looking at, unless you thought of each possible scenario where the UI could be wrong, and you tested for each.
That being said, I would still use the UI to make sure it functions. I would scroll down the page, and click on a link to make sure that that link works. Rather than going to a URL, I would use the menu at the top.
In summation, I would test that the features function properly, but not test that the look good. That should be done in manual testing.
For UI testing (design) you can try Sikuli
Here is Sikuli Api you can integrate with WebDriver
Only issue is that Sikuli is not testing picture pixel by pixel... try it and you'll see PROS and CONS

testing content of a webpage [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am new to software testing and was wondering which is the correct way to test content of a web page. For ex. on a web page if there are 10 labels then should I test first header "Selenium Training and Video Tutorials" and then second details given below the header then further details in this way and create separate test step for testing different text? Or I can use div tag which will give me the complete content of the page at once and test everything in one step. I can do it in one step or divide into steps but I want to do it in a correct way. I am using selenium webdriver (java).
Adding to the arran answer, it is better to split the 10 labels into 10 assert statements so that you can easily know which one went wrong, and also use TestNG or Junit for assertions. Since you are new, there are methods in TestNG like
assertEquals(char actual, char expected);
So in your code, it might look like
header1="programatically get the value using selenium"
assertEquals("Selenium Training and Video Tutorials", header1)
Testng also gives you clear report too.
Writing separate tests for each label as described by you will be a great option for :
increasing understandability
tracking down any errors
fixing of the script (if required in future)

SEO : things to consider\implement for your website's content [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
lets say i have a website that i am developing...
the site may have wallpapers, question & answers, info (e.g imdb,wikipedia etcetera)
what do i need to do so that when some search engine analyzes a particular page of my website for particular, lets say 'XYZ', it finds 'XYZ', content it finds 'XYZ' content if it present in that page...
please i am new to this so pardon my non-techy jargon...
The most important tips in SEO revolve around what not to do:
Keep Java and Flash as minimal as is possible, web crawlers can't parse them. Javascript can accomplish the vast majority of Flash-like animations, but it's generally best to avoid them altogether.
Avoid using images to replace text or headings. Remember that any text in images won't be parsed. If necessary, there are SEO-friendly ways of replacing text with images, but any time you have text not visible to the user, you risk the crawler thinking your trying to cheat the system.
Don't try to be too clever. The best way to optimize your search results is to have quality content which engages your audience. Be wary of anyone who claims they can improve your results artificially; Google is usually smarter than they are.
Search engines (like Google) usually use the content in <h1> tags to find out the content of your page and determine how relevant your page is to that content by the number of sites that link to your page.