Apache Nutch REST api - api

I'm trying to launch a crawl via the rest api. A crawl starts with injecting urls. Using a chrome developer tool "Advanced Rest Client" I'm trying to build this POST payload up but the response I get is a 400 Bad Request.
POST - http://localhost:8081/job/create
Payload
{
"crawl-id":"crawl-01",
"type":"INJECT",
"config-id":"default",
"args":{ "path/to/seedlist/directory"}
}
My problem is in the args, I think more is needed but I'm not sure. In the NutchRESTAPI page this is the sample it gives for creating a job.
POST /job/create
{
"crawlId":"crawl-01",
"type":"FETCH",
"confId":"default",
"args":{"someParam":"someValue"}
}
POST /job/create
{
"crawlId":"crawl-01",
"jobClassName":"org.apache.nutch.fetcher.FetcherJob"
"confId":"default",
"args":{"someParam":"someValue"}
}
I'm not sure what param or value to give each of the commands to complete a job. (eg. Inject, Generate, Fetch, Parse, and UpdateDb) Can someone clear this up? How do I tell the api where to look for the seedlist at?
UPDATE
When trying to complete the Generate command I came into a classException error where the value for the topN key is to be of type long but the api reads it as either a string or an int. I found a fix that is supposed to included in the 2.3.1 release (release date: TBA) and applied it and recompiled my code. It can now work.

At the time of this posting, the REST API is not yet complete. A much more detailed document exists, though it's still not comprehensive. It is linked to in the following email from the user mailing list (which you might want to consider joining):
http://www.mail-archive.com/user%40nutch.apache.org/msg13652.html
But to answer your question about the seedlist, you can create the seedlist through REST, or you can use the argument "seedDir"
{
"args":{
"seedDir":"/path/to/seed/directory"
},
"confId":"default",
"crawlId":"sample-crawl-01",
"type":"INJECT"
}

Related

Kotlin for Volley, how can I check the JSON request for newer data in the API?

I'm working on an app that gets a list of documents/source URL from an api. I'd like to periodically check for new or updated contents within that API so users can update saved items in the database. I'm at a loss on the correct wording to search, thus Google and Stack Overflow have both failed me. My fetching function is below:
The URL for the API is https://api.afiexplorer.com
private fun fetchPubs() {
_binding.contentMain.loading.visibility = View.VISIBLE
request = JsonArrayRequest(
Request.Method.GET,
Config.BASE_URL,
JSONArray(),{ response ->
val items: List<Pubs> =
Gson().fromJson(response.toString(), object : TypeToken<List<Pubs>>() {}.type)
val sortedItems = items.sortedWith(compareBy { it.Number })
pubsList?.clear()
pubsList?.addAll(sortedItems)
// Hardcoded pubs moved to Publications Gitlab Repo
// https://gitlab.com/afi-explorer/pubs
_binding.contentMain.recyclerView.recycledViewPool.clear()
adapter?.notifyDataSetChanged()
_binding.contentMain.loading.visibility = View.GONE
setupData()
Log.i("LENGTH OF DATA", "${items.size}")
},
{error ->
println(error.printStackTrace())
Toasty.error(applicationContext, getString(string.no_internet), Toast.LENGTH_SHORT, true).show()
}
)
MyApplication.instance.addToRequestQueue(request!!)
}
private fun setupData(){
adapter = MainAdapter(applicationContext, pubsList!!, this)
_binding.contentMain.recyclerView.adapter = adapter
}
I tried using ChatGPT to see if that would get me started and that failed miserably. Also searched Google, Reddit and Stack Overflow for similar projects, but mine is a unique scenario I guess. I'm just a hobbyist and intermediate dev I guess. First time working with Volley, everything works, but I would like to find a way to send a notification (preferably not Firebase) if there is updated info within the API listed above. I'm not sure if this is actually doable.
Are you asking if you can somehow find if the remote API has changed its content? If so, how would that service advise you? If the service provider provides a web hook or similar callback you could write a server-based program to send a push notification to your Android app.
Perhaps you intent to poll the API periodically, and then you want to know if there is a change?
If you use a tool such as Postman or curl to easily see the headers of the API https://api.afiexplorer.com you will see, unfortunately, there is no Last-Modified header or ETag header which would allow you to easily determine if there was a change.
Next looking at the content of the API, the author does not provide an obvious version/change date, so no luck there.
What you could do is receive the content as a String, and perform a checksum operation on it, and if it differs you know there has been a change
or if you are deserialising the received JSON in Kotlin data classes, then out of the box, Kotlin will enable you to perform an equality operation on a previous copy of the data to know if there was a change.
This looks like an android app; if so, why don't you create a background service that makes requests to the API and updates the data as needed? You can use an AlarmManager class to set the interval threshold for polling by using the setInexactRepeating() method.
Most apps are updated in this fashion; sometimes, a separate table is created to catalog changesets.
Let me know if this helps.

Creating API - general question about verbs

I decided to move my application to a new level by creating a RESTful API.
I think I understand the general principles, I have read some tutorials.
My model is pretty simple. I have Projects and Tasks.
So to get the lists of Tasks for a Project you call:
GET /project/:id/tasks
to get a single Task:
GET /task/:id
To create a Task in a Project
CREATE /task
payload: { projectId: :id }
To edit a Task
PATCH /task/:taskId
payload: { data to be changed }
etc...
So far, so good.
But now I want to implement an operation that moves a Task from one Project to another.
My first guess was to do:
PATCH /task/:taskId
payload: { projectId: :projectId }
but I do not feel comfortable with revealing the internal structure of my backend to the frontend.
Of course, it is just a convention and has nothing to do with security, but I would feel better with something like:
PATCH /task/:taskId
payload: { newProject: :projectId }
where there is no direct relation between the 'newProject' and the real column in the database.
But then, the next operation comes.
I want to copy ALL tasks from Project A to Project B with one API call.
PUT /task
payload: { fromProject: :projectA, toProject: :projectB }
Is it a correct RESTful approach? If not - what is the correct one?
What is missing here is "a second verb".
You can see that we are creating a new task(s) hence: 'PUT' but we also 'copy' which is implied by fromProject and toProject.
Is it a correct RESTful approach? If not - what is the correct one?
To begin, think about how you would do it in a web browser: the world wide web is the reference implementation for the REST architectural style.
One of the first things that you will notice: on the web, we are almost always using POST to make changes to the server. You fill in a form in a browser, submit the form, the browser takes information from the input controls of the form to create the HTTP request body, the server figures out how to do the work that is described.
What we have in HTTP is a standardized semantics for messages that manipulate individual documents ("resources"); doing useful work is a side effect of manipulating documents (see Webber 2011).
The trick of POST is that it is the method whose standardized meaning includes the case where "this method isn't worth standardizing" (see Fielding 2009).
POST /2cc3e500-77d5-4d6d-b3ac-e384fca9fb8d
Content-Type: text/plain
Bob,
Please copy all of the tasks from project A to project B
The request line and headers here are metadata in the transfer of documents over a network domain. That is to say, that's the information we are sharing with the general purpose HTTP application.
The actual underlying business semantics of the changes we are making to documents is not something that the HTTP application cares about -- that's the whole point, after all.
That said - if you are really trying to do manipulation of document hierarchies in general purpose and standardized way, then you should maybe see if your problem is a close match to the WebDAV specifications (RFC 2291, RFC 4918, RFC 3253, etc).
If the constraints described by those documents are acceptable to you, then you may find that a lot of the work has already been done.

Making PUT and GET Responses in Katalon Studio

I’m new in working with RESTful APIs. For my case I want to make an PUT Request and then GET it.
I made an PUT Request and it worked. [1]: https://imgur.com/a/zlUTzYB
But now I want to make an GET Request. Can I somehow make so that GET Request will automatically take PUT Requests statementId and bind it to the link. [2]: https://imgur.com/a/qqBd5nR
I watched a lot of videos and documentations about APIs but still doens't get it how to make it. I’m really new for making such things and sorry if I asked a dumb question.
These steps might help to solve your case:
- Execute the PUT request
- Get the Response, parse it using Jsonslurper, and the the value out.
- Save the value into a variable, ie GlobalVariable
- Create the GET request with parameter taken from the GlobalVariable
- Execute the GET request.
Found an answer:
RequestObject ro = findTestObject("GET_request")
ro.setRestUrl(String.format(ro.getRestUrl(), idValue2))
ResponseObject resp = WS.sendRequest(ro)

How to use a Postman Mock Server

I have followed the guide here to create a postman mock for a postman collection. The mock seem to be successfully created, but I have no idea how to use the mock service.
I've been given a url for the mock, but how do I specify one of my requests? If I issue a GET request to https://{{mockid}}.mock.pstmn.io I get the following response:
{
"error": {
"name": "mockRequestNotFoundError",
"message": "We were unable to find any matching requests for the mock path (i.e. undefined) in your collection."
}
}
According to the same guide mentioned above the following url to "run the mock" https://{{mockId}}.mock.pstmn.io/{{mockPath}} but what exactly is mockPath?
Within my collection I have plenty of folders, and inside one of these folders I have a request with an example response. How do I access this example response through the mock? Thanks for all help in advance!
Here's the Postman Pro API, which doesnt mention a lot more than just creating reading mocks.
I had the same issue seeing an irrelevant error but finally I found the solution. Unfortunately I cannot find a reference in Postman website. But here is my solution:
When you create a Mock server you define your first request (like GET api/v1/about). So the Mock server will be created but even when you obtain your API key and put it in the header of request (as x-api-key) it still returns an error. It doesn't make sense but it turned out that defining the request is not enough. For me it only started returning a response when I added an Example for the request.
So I suggest for each request that you create, also create at least one example. The request you send will be matched with the examples you have created and the matched response will be returned. You can define body, headers and the HTTP status code of the example response..
I have no Pro Postman subscription and it worked for me using my free subscription.
Menu for adding an example or selecting one of them for editing:
UI for defining the example (See body, headers and status) :
How to go back to the request page:
Here is the correct reply I get based on my example:
If you request in the example is a GET on api.domain.com/api/foo then the mockPath is /api/foo and your mock endpoint is a GET call to https://{{mockid}}.mock.pstmn.io/api/foo.
The HTTP request methods and the the pathname as shown in the image below constitute a mock.
For ease of use the mock server is designed to be used on top of collections. The request in the examples is used as is along with response attached to it. The name of the folder or collection is not a part of the pathname and is not factored in anywhere when using a mock. Mocking a collection means mocking all the examples in within your collection. An example is a tuple of request and response.
An optional response status code if specified lets you fetch the appropriate response for the same path. This can be specified with the x-mock-response-code header. So passing x-mock-response-code as 404 will return the example that matches the pathname and has a response with status code of 404.
Currently if there are examples with the same path but different domains, and mock is unable to distinguish between them it will deterministically return the first one.
Also if you have several examples for the same query :
Mock request accept another optional header, x-mock-response-code, which specifies which integer response code your returned response should match. For example, 500 will return only a 500 response. If this header is not provided, the closest match of any response code will be returned.
Optional headers like x-mock-response-name or x-mock-response-id allow you to further specify the exact response you want by the name or by the uid of the saved example respectively.
Here's the documentation for more details.
{{mockPath}} is simply the path for your request. You should start by adding an example for any of your requests.
Example:
Request: https://www.google.com/path/to/my/api
After adding your mock server, you can access your examples at:
https://{{mockId}}.mock.pstmn.io/path/to/my/api

CCNet API, docs?

I want to query my CCNet server to find out the status of the builds. I've heard rumor that there's a (ReST?) API of sorts, but I can't seem to find any documentation for it.
Is there any documentation for it, or do I need to download the CCNet source code and start reading?
EDIT: I found the endpoint /XmlStatusReport.aspx, which gives an XML overview of all projects. The same filename in any folder gives exactly the same response, though, so I'm afraid that might be the only API there is.
As an alternative to the XML you already mentioned yourself, you could use remoting as the CCTray app does. If you reference ThoughtWorks.CruiseControl.Remote.dll form the CruiseControl.NET\server folder you can instantiate CruiseServerRemotingClient and use it to retrieve information from the server.
The following snippet prints out the list of projects on the server and their build statuses:
CruiseServerRemotingClient client = new CruiseServerRemotingClient("tcp://ccnetserver:21234/CruiseManager.rem");
ProjectStatus[] statusList = client.GetProjectStatus();
foreach (ProjectStatus status in statusList)
{
Console.WriteLine("{0}: {1}", status.Name, status.BuildStatus);
}
You could also retrieve the log for the latest build in XML format as follows:
string buildName = client.GetLatestBuildName("Jasenje");
Console.WriteLine(client.GetLog("Jasenje", buildName));
I haven't managed to find any real documentation for the API but at least there are XML comments with brief descriptions of methods and parameters.