I have implemented cloaking for search engines. In my category pages, it will display all the products if the user agent visiting is a search engine.
But how do I test it? Can I pretend to be a search engine? Do I wait till google indexes it and look at the cached result?
You could try User Agent Switcher in Firefox, or spider the site yourself with wget and specify the user agent with the -U option
It's worth noting that delivering different pages to some search engines (e.g. Google) can lead to the entire site being removed from their index.
There are addons for Firefox that let you set the User Agent. Here's one:
https://addons.mozilla.org/en-US/firefox/addon/59
I think the new Safari has this built in when you activate the Developer menu
Related
Hello stackoverflow community. I am asking for help with downloading dvd covers from a dvd shop website (dvdempire.com). I am using wget for Windows.
So the syntax would be wget -r -A .jpg https://www.dvdempire.com/all-movies.html
But the problem is that it doesnt want to connect with SSL. The handshake fails.
Maybe the website has disabled mass downloading of DVD covers because of bandwidth or copyright reasons ?
The covers can be manually downloaded by clicking each link, but it would be much faster to do it with a batch program.
There are some 115000 covers in total.
The Terms of Use page for the site includes the following:
"Read these terms carefully before you ("You") accept these Terms by: (a) placing an order through DVDEmpire or (b) otherwise using the Websites."
"You agree, further, not to use or attempt to use any engine, software, tool, agent or other device or mechanism (including without limitation browsers, spiders, robots, avatars or intelligent agents) to navigate or search the Websites other than the search engine and search agents available from DVDEmpire on the Websites and other than generally available third party web browsers (e.g., Netscape Navigator, Microsoft Explorer)."
I suggest that you contact the site maintainers directly about what you want to do.
We have an issue where we have a website for test and an equivalent website for live. What we are finding is that due to carelessness our testers are using the wrong site (e.g. testing on the live site!).
We have total control over both sites, but since the test site is for acceptance testing by the users we don't want to make them different in any way. The sites must look the same and there is also a layer of management that will kick up a storm if the test and live sites are in any way different.
How have other people solved this problem? I am thinking of a browser plugin to make the browser look different somehow (e.g. changing the colour of the location bar when on the test website). Does anyone know of a plugin or a technique that would work? (We primarily use Firefox and Chrome)
Thanks,
Phil
UPDATE
We eventually settled on a program of: different credentials for the test and live site (this was not popular!) and making a series of plugins available for those who wanted them (colourize tabs for Chrome and Firefox users - we never did find a good plugin for IE).
Thanks to those who answered.
In our company we use different site's names:
www.dev.site.com - for developers
www.qa.site.com - for QA's
www.site.com - production site
Another good practic is to use different users credentials for dev\qa and prod sites.
I am using wget to download url that could be used on either linux/osx/windows. My question is if server behavior could be affected by user-agent string (-U) option ? According to this MS link web server can use this information to provide content that is tailored for your specific browser. According to Apache doc(access control section) you can use these directives to deny access to a particular browser (User-Agent). So I am wondering if I need to download links with different user-agent for different OS or one download would suffice.
Is this actually done ? I tried bunch of servers but did not really see different behavior across user agents.
There are sites that prevent scraping by returning an error response when they detect you're hitting their servers with an automation tool instead of a browser, and the user agent is one of the aspects of detecting that difference.
Other than that not much useful can be said about this, as we don't know what sites you want to target, what HTTP server they run and what code runs on top of that.
I have a Domino site that is getting highs for cross site scripting on app scan.
We don't have a license to run appscan. Another group needs to do that (yeah big corporations :) ). But I have noticed that the IE browser will complain too with a url as such:
http://myserver.com/cld/cldg.nsf/vwTOC?OpenView&Start=28
(ie will warn you on crosssite scripting with such a url).
I noticed the notes.net forum site does not come up with such an error in IE, when I try to inject script tags. I guess it must scrub the url before the page is rendered? How is this being done in the notes.net forum? Is it done at server level or a database level?
I did found this thread
How to avoid a XSP/Domino Cross-Site Scripting Vulnerability?
where Steve mentions his blog and web rules but the blog mentions that they are not needed in 8.5.4. and above. Am I understanding that right? If so we are at 8.5.4. Is there something I still need to do to scrub my url?
Edit: We are at 8.5.3. Not 8.5.4. I was mistaken. Our admin is going to try Steves's suggestions
Need to know if there is a crawler/downloader that can crawl and download and entire website with at least a link depth of 4 pages. The site I am trying to download has java script hyperlinks that are rendered only by a browser and thus the crawler is unable crawl these hyperlinks unless the crawler itself renders them!!!
Ive used Teleport Pro and it works well
Metaproducts Offline Explorer boasts doing what you need.