How to get plain text version of rendered page?

How to get plain text version of rendered page? - phantomjs

I am running PHP PhantomJs and I am looking to get plain text version of webpage.
Right now i am using :
echo $response->getContent();
Which brings whole html source code of rendered page, I dont need that. I need to get rendered plain text from page.

Related

How to access comment section of a page source using selenium?

How can we access the content after class="hidden_elem" using selenium, as it does not get loaded into the page source while accessing it. The content in the comments is actually what we need to extract as they are a part of the script that gets run.

How do I render a PDF from HTML with working named anchors?

Is there a way for a bunch of named anchors in a large html to be clickable within a PhantomJs generated PDF file?
I.e. say I have a table of contents or a list of FAQ questions. When clicking on the question/title - I'm taken to its answer/content within the same HTML file which is great but when the same HTML is rendered into a PDF each named anchor becomes an absolute URL (i.e. http://example.com/render.html#anchor_1) so clicking on it opens a browser with that URL instead of jumping to its content within the PDF file.
So, basically, is it possible (and how?) for a markup like this - https://fiddle.jshell.net/jyjuaaog/ to work within the generated PDF?
BTW, this works great when "printing as a PDF file" in Google Chrome but links end up broken when rendered in PhantomJs so there must be something I'm missing that I can't seem to find in the docs.
Any ideas?
Thanks!

Apparently there's a bug in PhantomJs preventing this. As suggested by PhantomJsCloud a quick-and-dirty workaround would be to replace the links with page links.

Page not displayed on my website

I'm trying to put together my website but I'm experiencing a very weird behaviour. I have an html resource named y6.html in my www directory in the root directory of the website. It worked very fined until yesterday when suddenly when accessing it it sends me a void webpage with an empty head and an empty body (not a 404).
Also I realized after one point that I would change the css on the website but that the changes I made would be on the ftp server at the right place but the website would still display the old version I had not modified even after emptying the cache.
The page is : http://www.dronecontrast.com/y6.html
Any clue on what's causing this?
Thanks

This is an HTML error. Your <title> is not closed as usual. You must use </title> to close it.

</title> is missing. Add the slash and try again

Just looking through the page source of that web page, you have made an error with the title tag. On your closing tag you have missed your "/". Put that in and see if it works

Your HTML markup is wrong. Please check nesting of title,head and body inside your html tag. Please consider using a text editor like Notepad++ / Sublime Text and check whether the tags are closed and nested properly.
In your markup Title tag should be closed.

Checking the contains of an embed tag using Selenium

We generate a pdf doc via a call to a web service that returns the path to the generated doc.
We use an embed html tag to display the pdf inline, i.e.
<div id="ctl00_ContentPlaceHolder2_ctl01_embedArea">
<embed wmode="transparent" src="http://www.company.com/vdir/folder/Pdfs/file.pdf" width="710" height="400"/>
I'd like to use selenium to check that the pdf is actually being displayed and if possible save the path, i.e. the src link into a variable.
Anyone know how to do this? Ideally we'd like to be able to then compare this pdf to a reference one but that's a question for another day.

As far as inspecting the pdf from selenium, you're more or less out of luck. The embed tag just drops a plugin into the page, and because a plugin isn't well represented in the DOM, Selenium can't get a very good handle on it.
However, if you're using Selenium-RC you may want to consider getting the src of the embed element, then requesting that URL directly and evaluating the resulting PDF in code. Assuming your embed element looks like this <embed id="embedded" src="http://example.com/static/pdf123.pdf" /> you can try something like this
String pdfSrc = selenium.getAttribute("embedded#src");
Then make a web request to the pdfSrc url and do (somehow) validate it's the one you want. It may be enough to just check that it's not a 404.

Link a Blog into scrolling text of SWF file

I'm working on an entirely flash-based site for a client who has already been using Blogspot for his News/Homepage updates. He wants to continue updating through Blogspot, but wants the blog to automatically fill in the text box on the flash site Homepage. I'm not sure if this is possible, or how I would go about doing it.
Here is the blogspot page:
http://atmarsamps.blogspot.com/
Here is an example of what the scrolling SWF text box will be like:
http://eloquentcreative.com/
Is this possible? Any help would be absolutely amazing!

You can use URLLoader to load the page as text. I'm not sure of the best way to parse it though.
Maybe you can try looking for the CSS tag that is being used for the text in question and then grabbing the text in between those tags? There might be better ways to do this though.
Note, you can update values to the htmlText property of a text box, which will allow Flex to maintain some of the styles specified from the loaded page.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get plain text version of rendered page? - phantomjs

I am running PHP PhantomJs and I am looking to get plain text version of webpage. Right now i am using : echo $response->getContent(); Which brings whole html source code of rendered page, I dont need that. I need to get rendered plain text from page.

Related

How to access comment section of a page source using selenium?

How do I render a PDF from HTML with working named anchors?

Page not displayed on my website

Checking the contains of an embed tag using Selenium

Link a Blog into scrolling text of SWF file

Categories

Resources