How to turn an entire website into a PDF file?

How to turn an entire website into a PDF file? - pdf

My law practice involves a government agency that has decided to turn a 400+ page manual into a website. I need to periodically turn the website into a manual so that I can see what rules governed at a particular time. Is there a way to automatically turn the website into a pdf file?

There is a tool called wkhtmltopdf that can help you. But you need to scrape yourself, probably in a way that is specific to the site you want to convert.

Related

How to compare content between two web pages in different environments?

We are in the process of building a website from scratch from an existing website. The web page is an identical copy, and as the web page contains many pages we need a way to compare content between the sites. It is of course possible to do manually, but it takes both a lot of time and entails a risk of human errors.
I have seen that there are services that offer this by inputting two URLs which are then analyzed and where discrepancies are presented. However, these cannot be used as our test environment is local (built in Sitecore).
Is there a way to solve this without making our test environment available online (which is not possible)? For example, does software exist for this, or alternatively some service where you can compare a web page that is online with one that is local?
Note that we're only looking for content comparison (not visual).

(Un)fortunately there's many ways to do this, but fortunately there are some simple ones.
What I would do is:
Get a list of URLs for each site. If the Sitemap is exhaustive, then you could use that, if it's not you might want to run some Sitecore Powershell to get the lists.
Given the lists (from files, or Sitecore API or something), write a program to visit each URL, get the text of the page after it's done rendering, and save it to disk (something like Selenium is good for this and you can use any language). You'll want some folder structure like host/urlpart/urlpart/pagename.txt, basically the same as your content tree.
Use some filesystem diff program like WinMerge to compare the two folders
This is quick and dirty, but a good place to start.

Automating web page population

I have data in a csv file & want to do the following with it:
Log into web site
Populate field of the page with the csv data
Navigate to next page
Input the rest of data
Click submit
Repeat for next line
I can do this using UiPath but it's an expensive option for a relatively simple use case.
Any one any suggestions on how do this using a different method?
Thanks,
EddieT

If you're looking for alternatives then you probably would want to investigate APIs or Webhooks. But that all depends on the access rights you have for that particular website.
Try messaging the Developers of the website you need as they might have this service already available.
UiPath may appear expensive but if you calculate the amount of time saved for this one process then you will see the money savings too.
If you can find a couple of other processes you want to automate then I'd highly recommend it.

Generate and Save the files automatically to my local disk using Selenium

I have a Report Generator which is an intranet web application generates some reports. There are about 100 reports. Those reports are of PDF and Excel type. And I want to ensure that all these reports are generated without any issue. This is a daily job.
Each report takes an average of 2 min. Manual checking process takes 200 min.
As this is a testing process and not bothered about the contents in the files I want to automate the process.
We are using Selenium test cases to test our web application.
Is there any way to Save these reports on my location disk using Selenium ?

To answer your question, no. Browsers won't allow it, unless a user chooses to upload. But even if there is a way, i would advise against using it.

Even if you can do this by any means its HIGHLY NOT RECOMMENDED
This will be a huge security threat and it won't be allowed. Javascript is inside a security sandbox and won't allow these kind of things.
What if the server is sending a potentially dangerous file that might affect the client system?
See JavaScript security

At best, you could display the file download prompt. The browser's security (and common sense :)) won't allow you to do anything more. If you absolutely must do unsupervised file downloads, you could use some kind of ActiveX, or a Java applet.

How to compare test website and live website

We have our production server running our website. Then we have a test server which has exact same data but with changes to code to do some new functionality. This web app has over 500 pages.
Is there any program that can
Login to the test site
Crawl through each page and then save the page as html
Compare with the same page saved with live site?
This way we can make sure that new features that we add to our test site will not break the live site when code updates are applied to production.
I am currently trying to use WinHTTrack website copier and then comparing the test and live folders with some code comparison tool like beyond compare. This works ok but there are lot of files changed because of the domain name changes.
Looking forward to ideas / solutions for this problem.
Regards

Have you looked at using Watir for this? It's not exactly the thing you are looking for but it might allow you some more granularity in your tests and ensure the site is functionally identical rather than getting caught up on changing guids, timestamps and all the other things that tend to change across any significant size website from day to day as part of it's standard functionality.

Apparently you can't make consistent, reproduceable builds in your project, can you? I would recommend moving towards that in the long run, it will save you a lot of headaches. That way you would know exactly what was deployed to which server when, so there would be no more need to bend around backwards to get the deployed sources back like this...
I know this is not a direct solution to your problem... but maybe it is worth comparing, whether you would save more in the long run by investing the efforts into your build process now, instead of implementing this workaround (and then improving your build process anyway - because one day you will almost surely need to do that).

wget has a --convert-links option, there are also some options to preserve cookies that might let you do it logged in http://drupal.org/node/118759#comment-664498

use an Offline Downloader, download all files to your computer from both sources, then compare the folder contents using a free tool like Total Commander.
EDIT
Load both of your sources into a CVS, and compare it there.

browser plugin to test a site's look when migrating

I'm thinking I need a browser plugin that does the following, and if it doesn't exist, it should. I may as well say FF for now, but it could be any browser.
The problem: when moving a website from one server to another, you need migration testing. It is a pain to click on every link by hand and compare it to the old host. You really need 2 machines or have to constantly thrash your hosts file.
The plugin:
Would allow you to specify an alternate hosts entry for a website. 2 entries would make it clear, one for live, one for test.
The plugin would crawl every link on the site, and render the page in the browser, and save an image of the entire page.
It would switch hosts and repeat, and save images in a second folder. Since the rendering engines match, the images should match. We need to switch hosts (like /etc/hosts) so all absolute links are the same for the site.
Now this could be part of the plugin or external, now that we have 2 folders of identically named images, we run an image-diff program on the whole batch. A quick test would be a bdiff or hash, or we could get more sophisticated and determine how different each image is.
This would save so much time. So can it be done with existing tools, or do I need to go write it?

Have a look at Selenium, it allows you to script interactions with the browser and verify content.

That is overengineered. What kind of website is it? How big? Which framework (PHP, JSP, Rails, etc.)? Why not copy the website onto the new server and grep the code for specific ties to the old server?

I'd concentrate on why you think the site would differ between two servers, and focus on testing those specific cases rather than the whole site. When a site is moved to a new machine the issues are generally very obvious from looking at a couple of pages.
Presumably they are both looking at the same data source, assuming there is a data source, otherwise a folder diff on the two installations would suffice. This being the case, it should be a simple task to identify which areas of the site are likely to be affected by a server migration.
Also, I wouldn't personally trust a machine matching two images to sign off system as ready to go live. There just isn't a substitute for real human testing. Yes it's time consuming, but how important is your site?

Try http://www.browsercam.com/ - free trial should allow you to specify main page and follow links to make screenshots automatically of the sub-pages as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to turn an entire website into a PDF file? - pdf

My law practice involves a government agency that has decided to turn a 400+ page manual into a website. I need to periodically turn the website into a manual so that I can see what rules governed at a particular time. Is there a way to automatically turn the website into a pdf file?

There is a tool called wkhtmltopdf that can help you. But you need to scrape yourself, probably in a way that is specific to the site you want to convert.

Related

How to compare content between two web pages in different environments?

Automating web page population

Generate and Save the files automatically to my local disk using Selenium

How to compare test website and live website

browser plugin to test a site's look when migrating

Categories

Resources