Download infinitely scrolling webpage chunk by chunk (Selenium/PhantomJS) - selenium

I am using Selenium with PhantomJS to scroll to the bottom of an infinitely scrolling page of Twitter search results.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
I manually set a number of times for it too loop (I try to estimate how many reloads the web driver can take before crashing). When done, I grab the raw html:
text_file.write(driver.page_source.encode("utf-8"))
This works ok, but I am looking for a way to keep the program going without the 'browser cache', or whatever it is called, filling up. Any ideas on how to achieve the steps below?
Run the driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")command for X times
Then dump the loaded raw html to a text file
Then run the driver for X times again
Then dump the loaded raw html into another text file, but not the content loaded in step 1, only the new content loaded in step 3
This would empty the browser/driver memory into several output text files and make it possible for the loop to practically go on forever. Any ideas?

Related

Loading big files into GtkSourceView is slow

I am trying to load 1k+ lines long files into GtkSourceView, using GTK3 in Python (PyGObject).
Whenever I set the text, it takes 2-3sec to fully appear (it is slowly scrolling and adding new lines at the bottom). I have connected a profiler and it shows 99.5% of cpu time in Gtk.main.
Basically I am using this for setting text:
txt_sourceview.get_buffer().set_text(new_text)
Am I doing something wrong here? Is there a way to speed this up?

How to break a SSRS Report page only in PDF?

I am using BIDS to develop a report that needs to be configured in one page when viewed on the internet explorer but still needs page breaks for good display in PDF when exported. The problem is that when i insert page breaks beetween tables another page is added to the report and i only want that in the PDF. Is there any way i can insert page breaks in the PDF but not on the report view itself?
You can achieve this by setting the InteractiveSize property to the wanted value (0 can be used to make it infinite).
If you wish to keep the same width, then you should only change the Height.
Edit: Defining specific page breaks will always force the viewer to use paging as well. This was implemented to improve the performance of large reports to allows users to begin viewing the initial pages of the report while waiting for additional pages to become available.
HTML and Excel output shows a report as a single page if there are no
page breaks. If you do specify InteractiveHeight and InteractiveWidth,
the HTML and Excel output formats render reports using soft page
breaks. Soft page breaks are placed on a page using an estimated page
size, which makes the size of the reports less exact than reports
produced by an output format that supports page size. Soft page breaks
are calculated at run time by the control. Although it is not
recommended, you can disable soft page breaks by setting
InteractiveHeight to 0.
Source: Defining Page Size and Page Breaks in a ReportViewer Report
As described in the quote above, the InteractiveHeight is used to apply soft page breaking when using the report viewer. So the solution is to only use soft page breaks.
You can manipulate your report to break correctly with paging without using hard page breaks by wrapping the wanted blocks inside rectangles. These rectangles you then re-size to be the size of a page and set the property KeepTogetheras true.
This will try to fit the content of the rectangles on the same page, adding a break when you reach the next rectangle. Because your InteractiveSize has no limit, this will not be displayed in the report viewer.

Print output split out to multiple pages when using phantomjs to print a long webpage

I have been able to print a webpage to different sizes of a single pdf. There is a way to assign the width and height of the single page. But if my browser window is long with lets say 10000 rows of data, I want it to display split up into multiple pages. Is that possible using this utility?
It was a non issue. When I increased the browser viewport heigh to something way more than what I had specified for the pdf page dimensions, it did cut it up into multiple pages. Thank God!

Print preview of my web page to pdf and save it on server side programmatically

I have a deceitfully difficult problem which I had thought it was easy one at the beginning and yet I have spent more than 3 days on and off in total.
what I simply need to do is to save the print preview of the page to PDF file on server side from code behind initiating by a button click.
I was expecting using an open source and then I thought there would be a code like xyzopencode.savepasgeaspdf(path) but I could not find it. I got really close to solution by saving the PDF but then I realized it did not save the picture it only saved the strings.
I tried the pdfsharp but as long as I see it draws the whole thing from scratch and I am nor sure if I can do it.
The reason I need picture compatible one is I have 3rd party signature controller on my page and my couple of attempts worked without them or any picture but when I added pictures they failed to show to picture or did not create the PDF at all. The perfect solution would be just saving what ever shows up in the print preview as PDF, just like the built in feature of Google chrome (but on server side).

ABCPDF.NET not rendering image

I am using ABCPDF.net to generate my PDFs from HTML files. There are 3 images on the HTML page which; of which 2 images are generated but the third is one is not on the PDF. I have tried to move the logo around but without any luck. This is my first time using this product which I am impressed with but this one problem is throwing me off.
From memory there could be one or two things causing this.
1) If your image happens to cross over the boundary of the page even by 1 pixel it will not be rendered.
2) You have to make sure that all of your elements are loaded in a procedural fashion by which I mean that whatever you want to be displayed on the top must be added to the page last. If you don't do this you may find that a rectangle you declared to contain some text for example will be displayed over the top of your image which will obscure it.
3) Sounds daft but just double check all syntax and the images location and ensure that you have referenced it correctly.
If all else fails could you provide some source code or a more in depth explanation.
EDIT: Also double check that you have set the correct size on the image when importing it as you may find it reverts back to the actual dimensions that it is on the disk which could make it too big for the page thus not rendering it.