Different results locally and in staging - google-custom-search

I'm using google custom search engine with a search engine that searches specific sites and excludes some patterns in these sites.
I'm testing the api locally and I receive 12 results. I test the same exact call in staging (heroku us region) and I receive 410 results.
Does google personalise the results when using a custom search engine?
If yes, how do I turn it off? If no, do you have any idea why am I seeing this difference?
Update
Ok I did a test. I issued the exact same request by using a proxy and not, and the results are different (vastly).
Now, the question is, can this behaviour be disabled?

Ok found it. By specifying the userIp param https://developers.google.com/custom-search/json-api/v1/using_rest it will force google to use the same behaviour regardless of location.

Related

Is there a quick way to detect redirections?

I am migrating a website and it has many redirections. I would like to generate a list in which I can see all redirects, target and source.
I tried using Cyotek WebCopy but it seems to be unable to give the data I need. Is there a crawling method to do that? Or probably this can be accessed in Apache logs?
Of course you can do it by crawling the website, but I advise against it in this specific situation, because there is an easier solution.
You use Apache, so you are (probably) working with HTTP/HTTPS protocol. You could refer to HTTP referrer, if you use PHP, then you can reach the previous page via $_SERVER['HTTP_REFERER']. So, you will need to do the following:
figure out a way to store previous-next page pairs
at the start of each request store such a pair, knowing what the current URL is and what the previous was
maybe you will need to group your URLs and do some aggregation
load the output somewhere and analyze

Zap Vaadin setup issues

Being absolutely new to security testing, I am trying to run basic steps and then trying to run a spider and active scan. I have seen couple of videos from owasp & youtube and tried to make sense of the included ZAP documentation. However, I don't think ZAP is logged in while making the spider crawl or active scans.
Mine is a Java + Vaadin & Spring based application and the POST URL's don't change on making any kind of request. Just the request parameters change. POST URL's are always
http://example.com/UIDL/?v-uiId=0
I have used
Set Active Session under HTTP session, before running the spider/active scan
have setup the context(though very few URL's appear under Sites),
I have also tried to update structure parameter with words like "page", "UIDL" etc. which i could see in the URL's
I have also tried to exclude logout URL, but since all the POST URL's are the same, the spider and active scan don't really do anything, in that case.
On right clicking i got an option to select the request as Form based Authentication.
I have tried that as well, however, it populates the "Login Request POST data" with vaues like
12929ddf-5d7f-4264-b810-2fa8f38eca6f[["597","v","v",["pagelength",["i","28"]]],["597","v","v",["firstToBeRendered",["i","0"]]],["597","v","v",["lastToBeRendered",["i","26"]]],["597","v","v",["reqfirstrow",["i","15"]]],["597","v","v",["reqrows",["i","12"]]]]
and fills the username/password fields with exactly the same.
One last thing, do we really need to disable the XSRF to be able to use ZAP properly with Java/Vaadin applications?
I am simply trying to get it working and then continue learning from there. Would appreciate if anyone can please help me with the correct setup.

How long does Google continue polling a linked CSE specification file after it's requested?

When you create a Google Custom Search Engine (CSE) with a linked specification file on your server, Google's "FeedFetcher-Google-CoOp" bot requests that file in order to build the CSE. It appears that even after results have been returned to the user and the specification file is no longer used, Google continues polling it regularly for at least several days.
My question is how long Google will continue polling the file after it has stopped being requested by your CSE code, and if there is any way to force it to stop immediately.
(We created a dynamic linked CSE that was unique to each query, which meant many, many specification files (the same script with different GET arguments each time) were requested. Now that we are no longer using them, FeedFetcher-Google-CoOp continues to request this script with various past arguments.
FeedFetcher-Google-CoOp ignores robots.txt. We are now returning 410: Gone for all requests, but it is difficult to tell whether this is having an effect, since there are so many different versions being requested (ie: /script.php?query=). Ideally there would be some way to tell Google that script.php does not exist, regardless of arguments, but without robots.txt, I can't find a way to do so.
TL;DR:
1) Will Google stop requesting this script on its own eventually? If so, when?
2) Is there a way to stop it requesting immediately?
If left alone, it appears Google will continue requesting these files indefinitely (at least for months). It ignores 410 (gone) responses, but it appears that it respects 301 redirects! So to stop Google trying to request outdated CSE specifications, you can 301 redirect them to a null file. Google will likely still try to access the file again for every set of arguments it has cached, but should stop trying after that.

Normal Google Custom Search

I'm writing an application that analyses search engine results.
With the Google Search API now being depreciated and limited to 1000 queries/day they are forcing developers to move to the AJAX APIs and to use the Custom Search API to do a Google search.
The thing is I don't need a Custom Search, I need a general search not one that is filtered by site; OK maybe filtered by USA/UK (Google.com/Google.co.uk).
Does anyone know how to just do a regular Google search using the AJAX APIs? Is the Custom Search the right thing to be using?
I don't want to hit the 1000/day limit using the old service but this is exactly what I need.
I did find: How do I create a CSE that searches the entire web?
http://www.google.com/support/customsearch/bin/answer.py?hl=en&answer=1210656
But by the sounds of it this will distort the search results.
Thank you.
OK. Here's how I think it is done.
Create a Custom Search Engine.
Add a site such as *.com When this is created go to the Advanced tab
and download the context xml.
Remove the Background Label associated with the site.
Upload the XML to replace the previous context.
This seems to work just fine and is returning the same values as far as I can see.
Yes, you are right *in theory, and this should let you get 100 results a day on the fly. Just this Saturday though, Google confirmed how here -
(* so far though, we can't get it working...)

What's the best way to test a site which displays differently depending on the client location?

I am using an IP location lookup to display localised prices to customers depending on whether they are visiting from the UK, US or general EU and defaulting to the US price if the location can't be determined.
I could easily force the system to believe I'm from a specific country for testing but still there is no way of knowing for sure that it's displaying correctly when a visitor from abroad accesses my site. Is the use of some proxy the only viable way of testing a site like this? If so how would I go about tracking down one that I can use to test my site from various countries of origin?
You should be able to achieve that by using proxies. http://www.proxy4free.com/page1.html has a bunch. That site just came from a Google search; I've never used proxies like this before though, so there may be better sites out there.
This is not about how to test, but rather how you identify your visitors.
Instead of using IP-lookup to determine their geographical location, you should instead grab the information about the locale they use from the useragent string.
F.instance, I'm a norwegian, and when I go to useragent.org I see that my browser sends "nb-NO" as the language my machine uses.
You can easily use that to customize currency, dates etc on your site.
If the website is indexed in Google's cache, you can visit the google with the proper URL address. ex. http://www.google.co.uk/
And see if it's displaying properly in the cache.
#Frode:
Checking system locale in iseragent string might be misleading.
I go to Canada, and set my system locale as French. So it might show the user EU prices as opposed to showing US price. Many such cases are possible where locale wont give accurate info about the end users desired "price class" in this particular application mentioned.
=AD
If you want to use geo-ip location to detect a user's language, using a proxy probably is the best way to do so.
There are a lot of lists of open proxies on the web, mostly listed with the countries. Google has quite a lot of search results on this topic. Of the top results, I have used SamAir to test some stuff before.
Searching for a working open proxy with an acceptable speed in the correct country can be a tedious task. Also keep in mind that you should not use any these proxy servers to submit any sensitive data, because you never know who runs them. This could be a kinda trustworthy ISP (ie. not from GB ;D), a honeypot to collect data, or an illegal open proxy hosted by some trojan.