Some sites give 404 error temporarily. But I paste in on browser it works.
How to tell scrapy to retry the 404 status code links for 5 times.
There are two Scrapy settings relevant to what you need:
RETRY_HTTP_CODES: you should override the default value in your project to include 404
RETRY_TIMES: just set it to 5
Related
Is there a way to ignore 4xx error codes while recrawling a domain that is partialy within cache?
I have crawled a huge part of the page before running into issues, then I realigned the settings to not cache 4xx codes, because the crawler stoped
Crawled (403) <GET https:/... ['cached']:
Changed cache setting to: HTTPCACHE_IGNORE_HTTP_CODES = [401, 403, 404]
This unfortunatelly seems to force me to recrawl the page without cache, as I am getting now this info from logs:
INFO: Ignoring response <403 https://www...>: HTTP status code is not handled or not allowed.
Either way, the crawler stops at the same position as it is retrieving the cached 403 response codes, while they are now 200 from non cache.
How can I adapt the settings in order to continue crawling the page?
Or as an alterantive, how can the cache be emptied/saved? Because otherwise I would need to override without the cache setting as far as I understand the docs.
The best solution I could find is to change the name of the crawler and start crawling fresh. This worked as it is using a new cache folder, but my original question was not answered by this and I had to recrawl pages I already had downloaded to cache.
When you cache a page, then on each same request, scrapy goes to that cached data and if that page happened to be cached as 403 or any other, scrapy does not offer to crawl again. so, either you remove that page from the cached data or turn off cache to fetch the web page again.
Use the HTTPCACHE_IGNORE_HTTP_CODES setting.
I am writing a web application and trying to use font-awesome icons.
The fonts are imported to the right directories but when I load the page the icons appear as blank squares. (Example in attached picture).
When I press F12 this is the error I get: (Also in attached picture)
Failed to load resource: the server responded with
fontawesome-webfont.woff2 a status of 404 (Not Found)
Failed to load resource: the server responded with
fontawesome-webfont.woff a status of 404 (Not Found)
Failed to load resource: the server responded with
fontawesome-webfont.ttf a status of 404 (Not Found)
Those files do exist and I can see them.
I managed to find some solutions by changing the web.config file but this is not helping with laravel.
I can't find any appropriate solution for laravel 5.5 online. Would appreciate help!
PS - I don't know yet if this is relevant, but I use the icons in vue.js components.
IIS will not serve files of unknown type.
In IIS, go to MIME Types.
Right click in a blank area and select Add.
File name extension: .woff2
MIME type: font/x-woff
Click OK
After a fresh install of Apache 2.4.27 on my Unix system I discovered something really odd. My local server giving me 500 error with this error log
AH02429: Response header name 'Content-Length' contains invalid characters, aborting request I tried with all installed browser Chrome, Firefox, Safari problem is same. As I can see my browsers not sending this header, not sure why I am encountering this error. I tried with curl and it's same. Added some tweaks on my httpd.conf file, nothing changed.
I Googled and found that this error comes form apache new module Filter. My apache complied with all modules is there anyway to tell the Filter module to not check for headers or something?
Any kind of help will be life saving.
Thanks in advance
Some time ago I changed my 404 error page to the main page of my site. (Silly of me, but I'm new at this and it seemed like a good idea at the time.)
Trouble is, I forget how I accomplished it. I was trying the TextPattern CMS at one time, and I think I did it in the CMS, but I don't remember how.
Nowadays, I'm not using the CMS (but the database is still there), and I've created a more descriptive error page, and I've updated my .htaccess file, but the behaviour doesn't change--still the old error (bad) page comes back. I've tried the following in my .htaccess file, all to no avail
ErrorDocument 404 default
ErrorDocument 404 http:/www.mysite.com/404.shtml
ErrorDocument 404 /404.shtml
Any help would be greatly appreciated. Google Webmaster Tools reports 54 "soft 404" errors, which I gotta fix!
-Thanks
I expect that such soft 404 pages are error pages with the server response code 200 and not 404.
It depends a little on the software you are using on the server. I think that your script 404.shtml returns the wrong response code so far you need to edit that file that it will return the response code 404.
In PHP you can do that with:
header("Not Found", 404, true);
Please note that this line must stand before any output!
I would appreciate a minute of someones time just to help me with a server issue.
I have a site that serves up .php pages, when I say served up there is a wordpress php inculde tag at the top of every page that includes the blog snippets.
These pages are served as .php and they render fine in the browser. However when you do a crawl test on them they show a 404?
I run an Apache server, which I hope I have set up properly.
Any ideas would be greatly appreciated.
Thank you for your time in advance.
Haydyn
How are you crawling the pages, please specify the commands.
If you are using wget, do a wget -S http://sitename, to get the server headers. It may help in debugging.
Do you have http_proxy environment variable setup ?