Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I would like to add Expires header to my images files stored in S3. I have just found out Cyberduck that easily add metadata. However, I would like to add Expires like 1 month after the request (like I do with static files in my webserver with Nginx). I donĀ“t know if this is possible. Otherwise, I can set expires with a date, i.e 1 2018-06-20 but, I think when I get this date, I will need to update all my files with a new date in the future. I would like to set this header "dinamically" one month later. Is it possible? Any other approach?
Set Cache-Control: public, max-age=2592000.
This will tell the client that the object can be cached for up to 30 days from the time of download.
Setting Expires is no longer considered best practice, and in any event, S3 only supports a static value, here.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Due to an update error, I put in prod a robots.txt file that was intended for a test server. Result, the prod ended up with this robots.txt :
User-Agent: *
Disallow: /
That was 10 days ago and I now have more than 7000 URLS blocked Error (Submitted URL blocked by robots.txt) or Warning (Indexed through blocked byt robots.txt).
Yesterday, of course, I corrected the robots.txt file.
What can I do to speed up the correction by Google or any other search engine?
You could use the robots.txt test feature. https://www.google.com/webmasters/tools/robots-testing-tool
Once the robots.txt test has passed, click the "Submit" button and a popup window should appear. and then click option #3 "Submit" button again --
Ask Google to update
Submit a request to let Google know your robots.txt file has been updated.
Other then that, I think you'll have to wait for Googlebot to crawl the site again.
Best of luck :).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am writing a code for spider in Scrapy for this website
[ https://www.garageclothing.com/ca/ ]
this website uses jsessionid.
I want to get that in my code(spider)
Can anybody guide me that how can i get
jsessionid in my code.
Currently i just copy paste the jsessionid from inspection tools of browser after visiting that website on browser.
This site uses JavaScript to set JSESSIONID. But if you will disable JavaScript, and try to load the page, you'll see that it requests the following URL:
https://www.dynamiteclothing.com/?postSessionRedirect=https%3A//www.garageclothing.com/ca&noRedirectJavaScript=true (1)
which redirects you to this URL:
https://www.garageclothing.com/ca;jsessionid=YOUR_SESSION_ID (2)
So you can do the following:
start requests with the URL (1)
in callback, extract session ID from URL (2) (which will be stored in response.url)
make the requests you want with the extracted session ID in cookies
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm going to have a site where content remains on the site for a period of 15 days and then gets removed.
I don't know too much about SEO, but my concern is about the SEO implications of having "content" indexed by the search engines, and then one day it suddenly goes and leaves a 404.
What is the best thing I can do to cope with content that comes and goes in the most SEO friendly way possible?
The best way will be to respond with HTTP Status Code 410;
from w3c:
The requested resource is no longer available at the server and no
forwarding address is known. This condition is expected to be
considered permanent. Clients with link editing capabilities SHOULD
delete references to the Request-URI after user approval. If the
server does not know, or has no facility to determine, whether or not
the condition is permanent, the status code 404 (Not Found) SHOULD be
used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web
maintenance by notifying the recipient that the resource is
intentionally unavailable and that the server owners desire that
remote links to that resource be removed. Such an event is common for
limited-time, promotional services and for resources belonging to
individuals no longer working at the server's site. It is not
necessary to mark all permanently unavailable resources as "gone" or
to keep the mark for any length of time -- that is left to the
discretion of the server owner.
more about status codes here
To keep the traffic it may be an option to not delete but archive the old content. So it remains accessible by its old URL but linked at some deeper points in the archive on your site.
If you really want to delete it then it is totally ok to return with 404 or 410. Spiders understand that the resource is not available anymore.
Most search engines use something called a robot.txt file. You can specify which URLs and Paths you want the search engine to ignore. So if all of your content is at www.domain.com/content/* then you can have Google ignore that whole branch of your site.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
On one of my sites have a lot of restricted pages which is only available to logged-in users, and for everyone else it outputs a default "you have to be logged in ... " view.
The problem is; a lot of these pages are listed on Google with the not-logged-in-view, and it looks pretty bad when 80% of the pages in the list have the same title and description/preview.
Would it be a good choice to, along with my default not-logged-in-view, send a 401 unauthorized header? And would this stop Google (and other engines) to index these pages?
Thanks!
(and if you have another (better?) solution I would love to hear about it!)
Use a robots.txt to tell search engines not to index the not logged in pages.
http://www.robotstxt.org/
Ex.
User-agent: *
Disallow: /error/notloggedin.html
401 Unauthorized is the response code for requests that requires user authentication. So this is exactly the response code you want and have to send. Status Code Definitions
EDIT: Your previous suggestion, response code 403, is for requests, where authentication makes no difference, eg. disabled directory browsing.
here are the status codes googlebot understands and recommends.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40132
in your case an HTTP 403 would be the right one.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am unable to see the following headers in e-mails received on my Postfix e-mail receiving server:
Return-Path
Received: from
Similar to header on gmail
Received: from dev16 ([123.123.123.123])
by mx.google.com with SMTP id xxxxxxxxxxxxxxxx;
Tue, 27 Oct 2009 05:52:56 -0700 (PDT)
Return-To:
Please suggest me what should I do to add these headers in the received e-mails.
Thanks in advance.
Ashish
Actually the e-mail contains these headers, but the mail viewer client needs to be configured to show these headers.
Actually i needed to parse the email headers for anti spoofing, and i was unable to see these headers with the mail client i was using therefore i thought these headers are not present.
But once i checked the actual mbox file and cleared all my doubts.
Also for appending custom headers in Postfix received e-mail one can use milter protocol implemented in Postfix by using 'lib-milter' provided by sendmail.
For java implementation one can use 'jilter'.