I want to use an application that checks for broken links. I got to know that, Xenu is one such software. I do not have access to internal aspx/http files on a drive. The Problem I am facing is the Website requires the user to be authenticated. After login I need to crawl the site to determine which links are broken.
As an example, I kick off with mail.google.com. We end up typing the Username and password after which we are served different URLs. If I give the Xenu (or similar programs) the link such as mail.google.com it will not be able to fecth URLs inside the mail.google.com which will be of type - /mail/u/0/?shva=1#inbox/ etc. There lies the problem.
With minimal or least scripting language how can I provide Xenu (or other similar app) capability to Login by providing external URL (mail.google.com) in this example in order to do whatever xenu has to do.
Thanks
Balaji S
Xenu can be used with an authenticated user as long as the cookies are persistent. You will need to enable cookies in Xenu and login once yourself using IE.
From their FAQ:
By default, cookies are disabled, and Xenu rejects all cookies. If you
need cookies because
you have used Internet Explorer to authenticate yourself before
starting a run
to prevent the server from delivering URLs with a
session ID
then you can enable the cookies in the advanced options
dialog. (This has been available since Version 1.2g)
Warning: You
should not use this option if you have links that delete data, e.g. a
database or a shop - you are risking data loss!!!
You can enable cookies in the Options menu. Click Preferences and switch to the Advanced tab.
For single page applications (like gmail) you will also need to configure Xenu to parse Javascript
This is done by modifying the ini file (traditionally at C:\Program Files (x86)\Xenu135\Xenu.ini) and adding a line of code under [Options]
Javascript=[Jj]ava[Ss]cript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]
There are several variations provided in their FAQ, but I didn't get them to work perfectly.
Related
What is the best way to password-protect a folder on IIS with a single set of credentials to be shared by a group of users?
Our hosting service offers Plesk, which in turn offers a "password-protected directory" function, but some of our clients have HTTP authorization disabled, so they get an automatic 401.4 error with no prompt for credentials.
I've looked into Forms authentication but this seems cumbersome to set up for the numerous separate domains at issue.
The protected content is not super sensitive, we just don't want it easily accessible to the public. Many of our users do not use the site frequently and we don't want to implement individual credentialing for everyone (we do have that in place for more sensitive sections) just so they can view current project reports or meeting minutes.
On sites I don't control, but am just a user, that do the same things as mine, it is a big pain to have to look up a username and password twice per year just to view a meeting agenda (yes, browser could remember but they also have a 4-month expiration and lots of us are on different devices all the time).
Is Forms authentication the way to go? Took a several hours for me to get it set up and working, with all sorts of settings not well documented in a single place.
(I had previously asked about how to disable Basic Auth on the client side, was told more than once it's not possible - but it is, via client/browser registry keys)
Thanks.
It's perfectly fine to use forms authentication. All you need to do is navigate to the folder or file you want to protect, then go to Authorization Rules. Add a deny rule for anonymous users, when users who are not logged in try to click on any file in that folder, they will be redirected to your login page. You can find a lot of guides on forms authentication in Google, you can refer to the following:
https://learn.microsoft.com/zh-CN/troubleshoot/developer/webapps/aspnet/development/forms-based-authentication
https://learn.microsoft.com/en-us/iis/application-frameworks/building-and-running-aspnet-applications/how-to-take-advantage-of-the-iis-integrated-pipeline
I'm trying to add a test coverage badge to the Readme of a private repository on GitHub. Our continuous integration process saves out the image to a secured Google Cloud Storage bucket that's not accessible to the public, and should remain that way.
Google's authorization layer is smart enough that if I go to the URL for the image, I'm automatically redirected to the resource with a valid auto-generated signed URL.
E.g., if I go to http://storage.cloud.google.com/secret-files/mysecretfile.png, then if I'm logged in and allowed to view it, I'm automatically redirected to something like https://blahblah-apidata.googleusercontent.com/download/storage/v1/b/secret-files/o/mysecretfile.png?key=verylongkey, where I can load the image.
This seemed perfect. Reference the canonical path in the GitHub Readme, authenticated users see the image, unauthenticated users are still blocked, we don't have to make the file public, and we don't have to do anything complicated.
Except that GitHub is proxying the image request, meaning that it will always be unauthenticated. My browser is loading something like https://camo.githubusercontent.com/mysecretimage.png.
Is there a clever way to work around this? Or do I need to go back to the drawing board?
All images on github.com are proxied using the Camo image proxy. There are a couple reasons for this:
It preserves the privacy of users. It isn't possible for a document to track users by directing them to a different site or using cookies to track them.
It means images can be cached and served at an appropriate size.
GitHub can have a very strict content security policy that does not allow loading from untrusted sites, which means that any sort of accidental security problem (like an XSS) is a lot less likely to work.
Note the last part. Even if you found some sneaky way to get another image URL to render properly in the website, your browser wouldn't load it because it violates the Content-Security-Policy header the site sent, and moreover, your browser would tattle about that to the reporting URL that GitHub provided.
So any image URL you provide will need to be readable by GitHub's image proxy and it won't be possible to serve different content to different users.
I'm using a Google API (e.g., for Maps Embed) with a key that is restricted via a list of HTTP Referrers. In this case, the map is embedded in my.site.com, so within the Google API -> Credentials page, I allow access for referrer .site.com/. When I visit my.site.com from most browsers, Google maps displays correctly as the browser sets the referrer field to my.site.com. When using the Brave browser, however, it sets the referrer field to the origin and displays an error:
Request received from IP address 98.229.177.122, with referrer: https://www.google.com/
Of course I could add google.com to the list of allowed referrers, but that defeats the purpose of limiting the use of the API key to my own website - anyone could "borrow" the API key, add it to their site for the same API, and anyone using Brave would be able to access the feature. Now that each access costs $, I'd rather not do this. Any ideas for a work-around?
Note: #geocodezip - thanks for the reference. Indeed, I forgot to add that when I set the site-specific shield to "All cookies allowed", or even completely turn shields off for the site, the behavior is still the same (error). However, in the default shield settings, when I set the cookies field to "All cookies allowed", then it works as intended (maps are displayed), even though for the default settings section it states:
These are the default Shields settings. They apply to all websites
unless you change something in the Shields panel on a particular site.
Changing these won't affect your existing per-site settings.
which I interpret to mean that the site-specific settings take precedence over the defaults.
So I'm thinking this (site-specific cookies setting not over-riding the default) is a brave bug, though that is a bit separate from my initial hope for a different approach that didn't require manual intervention on the user's part.
I can do this in FF and IE, and I know it doesn't exist in Chrome yet. Anybody know if you can do this in a Safari plugin? I can't find anything that says one way or another in the documentation.
Edit (November 2021): as pointed out in the comments, ParosProxy seems to no longer exist (and was last released ~2006 from what I can see). There are more modern options for debugging on Mac (outside of browser plugins on non-Safari browsers) like Proxyman. Rather than adding another list of links that might expire, I'll instead advise people to search for "debugging proxy" on their platform of choice instead.
Original Answer (2012):
The Safari "Develop" menu in advanced preferences allows you to partially customize headers (like the user agent), but it is quite limited.
However, if a particular browser or app does not allow you to alter the headers, just take it out of the equation. You can use things like Fiddler or ParosProxy (and many others) to alter the requests regardless of the application sending the request.
They also have the advantage of allowing you to make sure that you are sending the same headers regardless of the application in question and (depending on your requirements) potentially work across multiple browsers and apps without modification.
Safari has added extension support but its APIs don't let you have granular level control over Request & Response as compared to Chrome/Firefox/Edge.
To have granular level control over your Request and Response, you need setup a system wide proxy instead.
Requestly Desktop App automatically does this for you and on top of that, you can do various types of modifications too like:
Modify Request/Response Headers
Redirect URLs
Modify Response
Delay Network request
Insert Custom Scripts
Change User-Agent
Here's an article about header modification using requestly
https://requestly.io/feature/modify-request-response-headers/
Disclaimer: I work at Requestly
I was googling for tools for checking broken links in a remote web page. The w3c validator seemed a good one. But I am still unsure as how to check for pages which are restricted, i.e. the pages which I can only access by logging in to the site. Can we do that using the w3c validator? If not than is there any other tool for the same?
For basic authentication the online validator will proxy it and prompt you to logon, alternatively see this post.
Sometimes you can specify the login details in the URL: username:password#url.to.the.site. This I believe only will work if you are using a .htaccess file for logins.