I have a NodeJS web server and am changing things around a good deal. With Test all internal links, I can test the validity of all internal links and ensure no stale links. I would like to do the same with images and ensure that the user never sees a missing image.
The hyperlink package says it also checks assets:
Hyperlink is known to:
Detect broken links to internal assets
But I just checked that it does not, because I deleted one file and the report came out without errors.
How can I use a web-crawler, similar to hyperlink, to test that no internal website page has an asset giving 404?
Related
When I check my website URL on Google URL inspection tool it shows that page resources could not be loaded i.e image, stylesheet and script files. However, my website is working perfectly on a live server and the website is not rendered properly by Googlebot smartphone. I have tried everything to remove these errors but nothing helped. I have also checked that these resources are not blocked in robots.txt file.
Screenshot of page resources error
I've been struggling with this for a couple of days now, and finally reached the only solution that has worked for me. In my case, it wasn't a robots.txt problem, as I believe that you've already checked before posting this.
The problem has to do with the number of resources Googlebot is willing to fetch before giving up. If your CSS and JS files are too many, or too big, Googlebot gives up before fetching all of the resources needed to render the page properly.
You can solve it by minifying your files via a server mod, or via plugins like WP Rocket or Autoptimize. If you have too many CSS and JS files and the problem persists after minifying, try combining these files as well by using the same plugins.
Using MS Edge and apache w/ php, I just discovered via access.log that when I have the JavaScript debug panel (i.e. developer panel) open, it is making every http call twice. When I closed this panel, it has fixed the issue of all insert statements getting called twice.
Question: Does this doubling of http calls happen on every / most browsers that I need to look out for, or is this something special/unique with MS Edge?
I can't speak for all browsers and all developer tools. But, for IE and Edge the first time you open the tools and then open a JS file in the sources view it will try to request the file again. That request will be served from the local browser cache, sometimes not, depending on the cache settings for the file being requested.
The reason browser tools need to make this request is that browsers will often throw out the original source file as it doesn't need it to execute the page, as the source has been parsed it into something else that it can work with.
However, after you've opened the developer tools the browser will keep around sources in future navigations, either in the tools front end or elsewhere. Not keeping sources is an optimization for the first time use case, to save browsers keeping around source on the very low odds of the tool being used on any given navigation.
Of course some files are never cached by the browser and will need to be downloaded when requested by the tools, for example sourcemapped files.
In general any resources on your site that can be accessed by HTTP GET should be idempotent. That is, a GET shouldn't change the resource being requested (or generall the state of your site), so hopefully making additional requests shouldn't be an issue.
I have a PHP application running on a Micrisoft IIS 7 server. The application shows PDF files on an iFrame, which contains user's sensitive data that I wouldn't like to be directly accessed by anyone that knows the file address.
So basically, I'm looking for a way to protect files from direct browser access or download, but still be able to show it on the application's iFrame.
I made some research with Rewrite rules, but since the "HTTP_REFERER" of an iFrame is empty, I couldn't find a good solution
Any suggestions for this?
Thanks in advance
Without seeing any of your code, or how your application works, I can only give suggestions based on how I think your app works.
Rather than showing the files themselves, with links directly to those files, you should consider changing your application so that the PHP reads in the directory, displays the file names (however you want them to appear), with links that go to a download.php page. The download page (after checking whether the user has permission to download the file) then loads the file into memory and serves it out as a response (with appropriate Content-Disposition and Content-Type headers).
Since your PHP application can read files directly within the web directory, you can set up rewrite rules to prevent accessing those files from the web; that way, the files can only be accessed by the PHP application, which doesn't rely on rewrite rules to access the drive.
This is how places like Source Forge can display an advertisement with a countdown that your file download will begin in 5 seconds.
My server has been compromised recently. This morning, I have discovered that the intruder is injecting an iframe into each of my HTML pages. After testing, I have found out that the way he does that is by getting Apache (?) to replace every instance of
<body>
by
<iframe link to malware></iframe></body>
For example if I browse a file residing on the server consisting of:
</body>
</body>
Then my browser sees a file consisting of:
<iframe link to malware></iframe></body>
<iframe link to malware></iframe></body>
I have immediately stopped Apache to protect my visitors, but so far I have not been able to find what the intruder has changed on the server to perform the attack. I presume he has modified an Apache config file, but I have no idea which one. In particular, I have looked for recently modified files by time-stamp, but did not find anything noteworthy.
Thanks for any help.
Tuan.
PS: I am in the process of rebuilding a new server from scratch, but in the while, I would like to keep the old one running, since this is a business site.
I don't know the details of your compromised server. While this is a fairly standard drive-by attack against Apache that you can, ideally, resolve by rolling back to a previous version of your web content and server configuration (if you have a colo, contact the technical team responsible for your backups), let's presume you're entirely on your own and need to fix the problem yourself.
Pulling from StopBadware.org's documentation on the most common drive-by scenarios and resolution cases:
Malicious scripts
Malicious scripts are often used to redirect site visitors to a
different website and/or load badware from another source. These
scripts will often be injected by an attacker into the content of your
web pages, or sometimes into other files on your server, such as
images and PDFs. Sometimes, instead of injecting the entire script
into your web pages, the attacker will only inject a pointer to a .js
or other file that the attacker saves in a directory on your web
server.
Many malicious scripts use obfuscation to make them more difficult for
anti-virus scanners to detect:
Some malicious scripts use names that look like they’re coming from
legitimate sites (note the misspelling of “analytics”):
.htaccess redirects
The Apache web server, which is used by many hosting providers, uses a
hidden server file called .htaccess to configure certain access
settings for directories on the website. Attackers will sometimes
modify an existing .htaccess file on your web server or upload new
.htaccess files to your web server containing instructions to redirect
users to other websites, often ones that lead to badware downloads or
fraudulent product sales.
Hidden iframes
An iframe is a section of a web page that loads content from another
page or site. Attackers will often inject malicious iframes into a web
page or other file on your server. Often, these iframes will be
configured so they don’t show up on the web page when someone visits
the page, but the malicious content they are loading will still load,
hidden from the visitor’s view.
How to look for it
If your site was reported as a badware site by Google, you can use
Google’s Webmaster Tools to get more information about what was
detected. This includes a sampling of pages on which the badware was
detected and, using a Labs feature, possibly even a sample of the bad
code that was found on your site. Certain information can also be
found on the Google Diagnostics page, which can be found by replacing
example.com in the following URL with your own site’s URL:
www.google.com/safebrowsing/diagnostic?site=example.com
There exist several free and paid website scanning services on the
Internet that can help you zero in on specific badware on your site.
There are also tools that you can use on your web server and/or on a
downloaded copy of the files from your website to search for specific
text. StopBadware does not list or recommend such services, but the
volunteers in our online community will be glad to point you to their
favorites.
In short, use the stock-standard tools and scanners provided by Google first. If the threat can't otherwise be identified, you'll need to backpath through the code of your CMS, Apache configuration, SQL setup, and remaining content of your website to determine where you were compromised and what the right remediation steps should be.
Best of luck handling your issue!
I'm a web application developer, who runs a site http://myfav.es. We've been struggling with this issue for about a month now.
We use the HTML application cache spec - www.w3.org/TR/offline-webapps/ - with dynamically generated manifest files - myfav.es/personal.manifest - to speed page delivery. These dynamically generated manifest files use proper headers, and PHP to serve up custom manifests for users.
We also use gzip compression to serve the site from a linux/apache host.
For the life-cycle of our site, users report getting a err_failed similar to this screenshot in chrome. twitpic.com/272237.
This error is intermittent, occuring once every 200-300 visits, but will persists on every page refresh, including hard refreshes, which presumably means that an error using app cache is causing them to continuously load a failed version of the site. However, mysteriously JUST clearing cookies causes the error to fix itself.
I'm completely out of ideas on how to approach this error, and googling the error message appears to get a ton of confused users with voodoo-ish approaches to solving it. I've personally seen the error, along with a number of complaint from other users of chrome, so I'm fairly certain it cannot be caused by a particular user having abnormal settings or browser preferences.
Does anyone have any insight into the cause of this browser error and its origins? Whether its likely server-side or a byproduct of app design?