Rightmove API and scraping technical and legal - api

I'm looking to build an app using property data. Nestoria has a free API and rules of use and Zoopla an API you register for. OnTheMarket and Rightmove have same terms of use to the letter (bizarre for competitors?). Rightmove advertise an API for upload but not download - I can't find anything for OnTheMarket.
I've discovered that Rightmove does have an API although the post code search is obfuscated by their own outcode mappings...
https://api.rightmove.co.uk/api/sale/find?index=0&sortType=1&numberOfPropertiesRequested=2&locationIdentifier=OUTCODE%5E1&apiApplication=IPAD
I'm wary of using an API that's not promoted. The alternative is scraping, which is harder technically and legally questionable, although from what I read the data is in the public domain and so free to use.
I've contacted Rightmove but got no response.
Is anyone using the Rightmove api and had this authorised by them? Seems most strange that it's open and available but barely mentioned when searching for it.
Can anyone clarify what rules/law/ethics are in place for scraping data?

Don't query their hidden API. But you can run a web crawler on RightMove.co.uk website, and it is perfectly legal as outlined in their Terms of Service under section 3.3 :
You must not use or attempt to use any automated program unless the automated program identifies itself uniquely in the User Agent field and is fully compliant with the Robots Exclusion Protocol
A web crawler like Apache Nutch perfectly follows the Robots Exclusion Protocol. From their robots.txt file I found they have elaborate nested sitemap.xml files, and hence they rather promote organized but polite crawling of their website. I was myself wanting to get their data, so I am beginning with my endeavour to crawl them with my resources - do let me know if you need access to this data.

You are not allowed to scrape their data, here what their terms&conditions say about it:
"You must not use or attempt to use any automated program (including, without limitation, any spider or other web crawler) to access our system or this Site. You must not use any scraping technology on the Site. Any such use or attempted use of an automated program shall be a misuse of our system and this Site. Obtaining access to any part of our system or this Site by means of any such automated programs is strictly unauthorised."

Related

How to call Google NLP Api from a Google Chrome extension

My aim is to select some text from a web page, start a google chrome extension and give the text to a google cloud api (Natural Language API) in my case.
I want to do some sentimental analysis and then get back the result to mark/ highlight positive sentences in green and negative ones in red.
I am new to this and do not know how to start.
The extension consists of manifest, popup etc. How should I call an API from there that does Natural Language Processing?
Should I create a Google Cloud Application with an API_KEY to call? In that case I would have to upload my credentials right?
Sorry sounds a bit confusing I know but I just don't know how I can bring this 2 things together an would be more than happy about any help
The best way to authenticate your app will depend on the specific needs and use cases of your application. You can see an overview of all the different methods here.
If you are not planning on identifying users nor on using a back end server that handles authenticating (as I assume to be your case), the best option would indeed be to use API keys. They do not identify the user, but are enough for the Natural Language APIs.
To do this you will need to create an API key for the services you want and add the necessary restrictions to make the key as secure as possible. Detailed instructions on how to do this and how to use the key in a url can be found here.
The API call could be made from within the Chrome extension with any JavaScript method capable of performing POST requests. For example using XMLHttpRequest or the Fetch API. You can find an example of the parameters that need to be included in the request here.
You may run into CORS issues when making the request directly from the extension. I recommend reading this answer, where a couple of workarounds for these issues are suggested.

Regarding the use of API's

My app is a Personal Assistant who's main job is to redirect the user to something that complies with his/her wishes. I realize, for example that AllRecipies.com has no API. My question is that can I, say open the browser app with the url as
http://allrecipes.com/search/results/?wt=QUERY>&sort=re.
Is this considered as using their API? Not just AllRecipies, but numerous other such services. If I am using this method, then do I have to request API key, etc? I am not retrieving anything. I am simply redirecting the user to their page with the query pre-written. Does this require all the licensing fees, API Key, etc?
Do I have to agree to this fees(If they ask), Request API Key, etc?
With the particular URL in question, it is simply an HTML web server URL, rather than a web API, as such. You can still get data out of it, but you'd have to parse the HTML yourself to extract what you want from the HTML response.
They may have an API that you can use to access data more directly as JSON, XML, etc, but you'll have to look into that yourself. And you will possibly require an API key to access it. But perhaps not, if it's publicly available and they don't care how many calls they get to it by anonymous users.
You may find this resource useful. It contains a lot of open APIs and code snippets to access them: http://www.programmableweb.com/
If you are simply trying to hit a URL or directing a user to this particular URL which you already know and is static meaning you always hit the same url without change in parameters, then this is not considered an API call and will not be requiring any API key.
However, if they have some APIs exposed, you will need to go through their documentation and using this API most likely requires the use of an API key(alhough this might not be true always). Usually, most platforms have a bunch of APIs available for different scenarios and these are called based on user specific parameters and requirements.

Is scraping Google+ pages/comments/notifications via cURL legal?

The limitations of the Google+ API have just put a hold on a little project I am working on.
I can achieve what I need with a basic cURL script (login to Google+ Page, scrape page, parse data) but I was just wondering if this is allowed?
(yes the script will break whenever they update G+, I can live with that)
A search on "are you allowed to login to google with curl" produces lots of results. So it seems lots of people are doing it, just wondering if anyone knows if it is "really" allowed?
I am not a legal attorney, always seek advice from law professionals.
However my take on this is that it is legal. Nothing restricts you from crawling websites and performing datamining without wasting the resources of the server (Such as DDOS) and without using any illegal means to attain this information such as exploiting the server software or using some vulnerability that might expose the user data.
If this information is publicly available online it belongs to the Public Domain, and as long as you are not selling it, it can be considered fair use.
On the other hand you are violating an awful lot of user agreements.

What should a developer know before building an API for a community based website?

What things should a developer designing and implementing an API for a community based website know before starting the heavy coding? There are a bunch of APIs out there like Twitter API, Facebook API, Flickr API, etc which are all good examples. But how would you build your own API?
What technologies would you use? I think it's a good idea to use REST-like interface so that the API is accessible from different platforms/clients/browsers/command line tools (like curl). Am I right? I know that all the principles of web development should be met like caching, availability, scalability, security, protection against potential DOS attacks, validation, etc. And when it comes to APIs some of the most important things are backward compatibility and documentation. Am I missing something?
On the other hand, thinking from user's point of view (I mean the developer who is going to use your API), what would you look for in an API? Good documentation? Lots of code samples?
This question was inspired by Joel Coehoorn's question "What should a developer know before building a public web site?".
This question is a community wiki, so I hope you will help me put in one place all the things that should be addressed when building an API for a community based website.
If you really want to define a REST api, then do the following:
forget all technology issues other than HTTP and media types.
Identify the major use cases where a client will interact with the API
Write client code that perform those "use cases" against a hypothetical HTTP server. The only information that client should start with is the response from a GET request to the root API url. The client should identify the media-type of the response from the HTTP content-type header and it should parse the response. That response should contain links to other resources that allow the client to perform all of the APIs required operations.
When creating a REST api it is easier to think of it as a "user interface" for a machine rather than exposing an object model or process model. Imagine the machine navigating the api programmatically by retrieving a response, following a link, processing the response and following the next link. The client should never construct a URL based on its knowledge of how the server organizes resources.
How those links are formatted and identified is critical. The most important decision you will make in defining a REST API is your choice of media types. You either need to find standard ways of representing that link information (think Atom, microformats, atom link-relations, Html5 link relations) or if you have specialized needs and you don't need really wide reach to many clients, then you could create your own media-types.
Document how those media types are structured and what links/link-relations they may contain. Specific information about media types is critical to the client. Having a server return Content-Type:application/xml is useless to a client if it wants to do anything more than parse the response. The client cannot know what is contained in a response of type application/xml. Some people do believe you can use XML schema to define this but there are several disadvantages to this and it violates the REST "self-descriptive message" constraint.
Remember that what the URL looks like has absolutely no bearing on how the client should operate. The only exception to this, is that a media type may specify the use of templated URIs and may define parameters of those templates. The structure of the URL will become significant when it comes to choosing a server side framework. The server controls the URL structure, the client should not care. However, do not let the server side framework dictate how the client interacts with the API and be very cautious about choosing a framework that requires you to change your API. HTTP should be the only constraint regarding the client/server interaction.

How to find inbound links to a given URL on the fly?

Technorarati's got their Cosmos api, which works fairly well but limits you to noncommercial use and no more than 500 queries a day.
Yahoo's got a Site Explorer InLink Data API, but it defines the task very literally, returning links from sidebar widgets in blogs rather than just links from inside blog content.
Is there any other alternative for tracking who's linking to a given URL (think of the discussion links that run below stories on Techmeme.com)? Or will I have to roll my own?
Well, it's not an API, but if you google (for example): "link:nytimes.com", the search results that come back show inbound links to that site.
I haven't tried to implement what you want yet, but the Google search API almost certainly has that functionality built in.
Is this for links to Urls under your control?
If so, you could whip up something quick that logs entries in the Referrer HTTP header.
If you wanted to do to this for an entire web site without altering application code, you could implement as an ISAPI filter or equivalent for your web server of choice.
Information available publicly from web crawlers is always going to be incomplete and unreliable (not that my solution isn't...).