How to download all extractors along with the endpoints for RESTful request? - import.io

I've been using import.io to extract lots of data from hundreds of web pages. I've already created extractors for those URLs and still adding more.
I've designed an automated process that sends an HTTP request to the import.io RESTful API for all extractors recursively.
Every time I create a new extractor, I perform some manual task to insert the endpoint of the newly created extractor into my database. This approach is very time consuming and may be error prone since there is a chance of copy/paste error.
Although import.io maintains the list of my data, I'd love to download all the extractors along with their endpoints for RESTful request so that the data can be stored into my database.
Is there a way to download or extract all of my data into excel format or some other ?

At this time there is no way to Bulk download API endpoints for all your extractors I'm afraid.
It is possible to get the GUIDs of you connectors using this method.
http://api.docs.import.io/legacy/#ConnectorMethods "search connectors"
You could write a small script in python for example to parse this to get the GUIDs.
Potentially you could add this to your automated process.

Related

Bulk Extrator Via APi

I have built an Extractor with the UI on windows. I have multiple pages so i have chosen bulk extractor. I fed around 10+ urls and click query. Great everything works, comes out beautiful.
However i go over to export and want to start learning how to export this bulk extractor of urls to the API. I hit simple API, this only displays the results of the first bulk url.
How can i make a call to multiple urls within the same API call so the results are all in one page rather than multiple api calls?
I am trying to at some point bring all this data via php into my MySQL database periodically (daily).
Thanks.
Keelan,
The bad news: at the moment it is not possible to run bulk extract as an API
The good news: we are working on it right now! Actually, if you want to get in touch with us, we'd love to add you to the beta users for the new feature

Update keyword list for streaming API on-the-fly using HBC

I'm working on a project accessing Twitter's Streaming API with HBC.
I'm storing keywords for Twitters Streaming API (filter) in a file and now I'm looking for a way to close and reconnect to Twitter each time the file changes.
I googled with no useful result.
Any idea how I could manage this task?
Don't do this Twitter doesn't like reconnects in stream API, they will ban your application.
If you have to often change filter parameters better use Rest Api search tweets endpoint

Can an API be used without HTTP by applications on the same server?

I am about to build a CRM application in CakePHP. The CRM will require an API for other applications to extract the data.
Applications that are not on the same server will obviously access this via some sort of API caller object that makes an HTTP request to the CRM.
As I understand it, the HTTP request time will increase to the overall load time. For applications on the same server is it possible/is it common practice to access the API by some other means. i.e. including the controller and using the actions ... (im not even sure if this would work as the framework wouldn't be bootstrapped right?). By doing this I am looking to: 1) Reduce load times. 2) Return data as PHP arrays instead of json/xml objects.
In order to return HTTP responses then you need some kind of HTTP server, API's are designed for exactly the reason that you are describing above - they allow multiple applications to access the data in a database by creating an interface to it in a way that can be controlled and standardised.
If you design your API well, have it return JSON objects then the average response shouldn't be a problem as you will only be returning a small amount of text.
Getting a PHP array from a JSON object is as simple as json_decode($json_object, true);
If you are using CakePHP to create an API you should look at using the awesome CRUD plugin by Friends of Cake

When to use. ProtoRPC or REST

I primarily deal with REST json APIs at my work. So I am familiar with them. But for my own project I am trying out app engine as I believe it is a great fit.
I had already started writing my logic in python (using ferris), and in reading more on app engine I came across protorpc and cloud endpoints. But in most of the examples I have read, they seem to be doing the same just as I would do in a rest api. Make a request with json, and get a json response back. Or an error.
The main difference I see, is that in rest, the endpoints are based around a resource. And the HTTP verbs around them such as GET, POST, PUT, DELETE, etc. Where as in RPC, each request would have it's own endpoint rather than be based around a resource or model.
Both are requested with, and respond with json. So I think I am failing to see the benefit of using Google endpoints. Do they offer some other kind of benefit? I could maybe see better analytics tracking with endpoints. But wouldn't I still need to the use post http verb for modifying content?
Would any of this change if I was using certain frameworks? Like django. Although I am currently testing out Ferris which has a messaging system that deals with protorpc. Although, I have not been able to rest or yet.
So what am I missing? What are the benefits of endpoints over just making my object methods handle a json request. Do socket connections play into this at all?
You can do both by using endpoints to make a RESTful API.
protorpc/endpoints doesn't magically make your API RESTful. You have to make your API RESTful within the endpoints framework, which uses protorpc.
The benefits of using endpoints is that you can get a lot of the authentication work done easily (assuming you're going to use Google accounts), the serialization/deserialization is done for you, your client libraries can be generated for you, and more than just a Javascript client can be supported. If these things sound good, endpoints may be right for you. If these things don't apply, then maybe be more direct and use RequestHandlers yourself.
I was able to make a simple API using endpoints. It has 3 main resources: records, datasets and moves. A Record object is the smallest unit of data. The Dataset object is a group of records. And the Move object represents the event of a Dataset changing location.
My RESTful URIs look like this:
GET /records # Get list of records
GET /records/<id> # Get single record
POST /records # Create records
GET /datasets # Get list of datasets
GET /datasets/<id> # Get single dataset
POST /datasets # Create dataset
GET /moves # Get list of moves
GET /moves/<id> # Get single move
POST /moves # Create move
With this I can create data, organize it and move it around. I was able to do this completely within the endpoints framework on App Engine and it is RESTful.

Best way to upload multiple files as part of a REST API? Single or multiple POST requests?

I am trying to create a REST API for my web service.
I want to make the users of the API able to initiate a new request with my service. This involves uploading one or two zip files along with some other parameters.
How can I make this all combined into one request? Or is it better to do it multiple requests somehow?
I don't have a lot of familiarity with making REST APIs so I don't know how people usually do it.
I'm using PHP for my site if that matters.
To do this, you'd need your client to upload in mime/multipart format. I don't know PHP, but I'm sure there's a library out there that will support receiving/parsing the multipart messages you get.
As for whether it's a good idea .. If initiating the request is the creation of a single resource, it's not unreasonable to accept mime/multipart. If the parts being sent are themselves full-fledged resources, it would probably be better to make the client send them up separately, and reference them in the initiation request. Also note that mime/multipart is going to be a bit harder for your clients to deal with than simple requests.
This post seems to be related to what you're trying to accomplish.