Enable Full Text Search in Apache CouchDB - lucene

I have followed the blog entry here to enable full text search https://developer.ibm.com/dwblog/2015/text-search-apache-couchdb/#.Vly24SCrQbV
I have everything correctly set up, and have also tried with other peoples docker images.
How do you set up a search? What documents are needed.
I have created a database called cats with one document
{
"_id": "6f35d75b476517e2fc0b3eb12c000e72",
"_rev": "1-c9a6b4734c83287499e8bbd6d1339050",
"name": "tibbles"
}
And a design/view
{
"_id": "_design/cat_look",
"_rev": "1-aae457e6edf5e4a3f69357e5a2160fcc",
"views": {
"kitty_name": {
"map": "function (doc) {\n index(\"kittyName\", doc.name, {\"store\": true});\n}"
}
},
"language": "javascript"
}
If I go to http://localhost:15984/cats/_design/cat_look/_search/kitty_name?q="*"
I get
{"error":"not_found","reason":"kitty_name not found."}
Thanks for any help on this, I am very lost.

A Lucene search index is set up differently to how a Map Reduce view is done. In your code, it looks like you've tried to use a Map Reduce view. For Lucene, first you need to set up an index:
{
"_id": "_design/Cat_look",
"indexes": {
"kitty_name": {
"index": "function(doc){ ... }"
}
}
}
Consult Cloudant's docs on the subject: https://console.bluemix.net/docs/services/Cloudant/api/search.html#search

Thanks for your help, you are right I set up the Lucene search index incorrectly. Here is the code to get a simple example working for anyone else lost.
If you have docker setup
docker run -d -p 15984:15984 ncheaz/couchdb:search
to get couchdb search on local port 15984
The document to search
{
"_id": "6f35d75b476517e2fc0b3eb12c000e72",
"_rev": "1-c9a6b4734c83287499e8bbd6d1339050",
"name": "tibbles"
}
The Search Index.
Create a new document, not a new view.
{
"_id": "_design/cat_look",
"_rev": "2-23f6ab0606a603cbef04653d167585d4",
"views": {},
"language": "javascript",
"indexes": {
"kitty_name": {
"analyzer": "simple",
"index": "function (doc) {if (doc.name) {index(\"name\", doc.name, {\"store\":true}); }}"
}
}
}
The url to search for the cats name is
http://localhost:15984/cats/_design/cat_look/_search/kitty_name?q=name:tibbl*
note that kitty_name is the name of the _search and name is the index name.
I recommend anyone struggling to get this working to create a free trial account on IBM Cloudant as the documentation directly relates to their product and it is a lot easier to follow.

Related

jira res-api create issue with bullets list in description

Hi i want to create a jira issue with bullet list and other formating from rest-api
Right know i am testing this code, but it returns
{
"errorMessages": [],
"errors": {
"description": "Operation value must be a string"
}
}
Body looks like this
{
"fields": {
"project": {
"key": "DWH"
},
"issuetype": {
"name": "Story"
},
"summary": "auto created datatask",
"description": "description": ["Pro:<fil pro and activity number provided>.",
"Tablename:<fill table name>.",
"Table view:<fill table view name>.",
"endpoint:<fill knudepunkt>.",
"Task to do.",
"*Copy attached document to metadata/new folder.",
"* Run job to create the table.",
"* Test the data in Production when ready.",
"* Create extra userstory"],
"assignee": {
"key": "bdmdwhdata",
"name": "bdmdwhdata",
"emailAddress": "dwhdata#bankdata.dk"
},
"labels": ["DATA"]
}
}
Is it possible to create multiline decription from REST-API in jira issues
I i jost write a single line in description it works fine.
For adding new lines, try using \n in the description. For more information, kindly check this documentation.

Accessing a Word(.docx) file's content with Microsoft Graph REST API?

Is there a way to obtain the content of a Word document stored in the cloud through the Microsoft Graph API without having to download the file locally?
The goal is to build an app that analyzes a Word document's inner content and produce some interesting data from it. However after searching through Microsoft's Dev Center, Graph Explorer, and their API's documentation repository, I can't find any API endpoints that can serve me that data.
I can find some endpoints that deal with manipulating Excel's contents, but not one that deals with Word. Does Microsoft Graph not support retrieving a Word document's content?
EDIT: For example, I know I can read the contents of a "message" and even apply a search on it through query parameters, as demonstrated by one of Microsoft's samples. But I can't seem to find how to do this with Word documents.
Well, it's possible to download the content of the document.
See: Download the contents of a DriveItem.
For example:
GET /v1.0/me/drive/root:/some-folder/document.docx:/content
But you'll get the entire docx, with embedded images and all. Don't know if this is what you are looking for.
As an example, see the helix-word2md project that fetches a docx and converts it to markdown.
I'm afraid you can't direly access word content. What you can do is use web URL property of a DriveItem opening a document the associated Word Online or native world if it is installed.
You can use this below to show specific item or all items:
GET /users/{userId}/drive/items/{itemId}
GET me/drive/root/children/
This is the result below:
{
"#microsoft.graph.downloadUrl": "",
"createdDateTime": "2018-08-10T01:43:00Z",
"eTag": "\"{00000000-3E94-4161-9B82-0000000},2\"",
"id": "00000000IOJA4ONFB6MFAZXARX7L7RU4NV",
"lastModifiedDateTime": "2018-08-10T01:43:00Z",
"name": "daily check.docx",
"webUrl": "https://xxxxxxx",
"cTag": "\"c:{00000000-3E94-4161-9B82-37FAFF1A71B5},2\"",
"size": 26330,
"createdBy": {
"user": {
"email": "000000.onmicrosoft.com",
"id": "000000-93dc-41b7-b89b-760c4128455a",
"displayName": "Chris"
}
},
"lastModifiedBy": {
"user": {
"email": "0000#0000.onmicrosoft.com",
"id": "00000000-93dc-41b7-b89b-00000000",
"displayName": "Chris"
}
},
"parentReference": {
"driveId":
"b!000000000gdQMtns72t31yqWMhnFCjmCqO3tR5ypOf17NKl2USqo1bNqhOzrZ",
"driveType": "business",
"id": "00000VN6Y2GOVW7725BZO354PWSELRRZ",
"path": "/drive/root:"
},
"file": {
"mimeType": "application/vnd.openxmlformats-
officedocument.wordprocessingml.document",
"hashes": {
"quickXorHash": "OSOK7r2hIVSeY1+FjaCnlOxn2p8="
}
},
"fileSystemInfo": {
"createdDateTime": "2018-08-10T01:43:00Z",
"lastModifiedDateTime": "2018-08-10T01:43:00Z"
}
}

Dropbox API V2 list_file_members/batch empty results

I'm currently trying to work with the Dropbox list_file_members API endpoint, as it appears to me to be the only place to find out who owns a file (
see follow example result taken from the documentation page )
{
"users": [
{
"access_type": {
".tag": "owner"
},
"user": {
"account_id": "dbid:AAH4f99T0taONIb-OurWxbNQ6ywGRopQngc",
"same_team": true,
"team_member_id": "dbmid:abcd1234"
},
"permissions": [],
"is_inherited": false
}
],
"groups":[...]
...
}
However, when I call the API on a single file I get the follow
{
"users": [],
"groups": [
{
"access_type": {
".tag": "editor"
},
"permissions": [],
"is_inherited": true,
"group": {
"group_name": "Everyone at TEAM_NAME_HERE",
"group_id": "g:GROUP_ID_HERE",
"member_count": 6,
"group_management_type": {
".tag": "company_managed"
},
"group_type": {
".tag": "team"
},
"is_owner": false,
"same_team": true
}
}
],
"invitees": []
}
This result contains no owner information, so I'm assuming this is because everyone has the same access levels ??
The problem worsens when I try to call files in batches using the sharing_list_file_members/batch endpoint, I get the following result
[
{
"file": "id:THIS_IS_MY_FILE_ID",
"result": {
".tag": "result",
"members": {
"users": [],
"groups": [],
"invitees": []
},
"member_count": 0
}
}
]
Obviously this is even less helpful, this is the same when I access the API via my own PHP, as well as the API explorer, could anyone tell me where I'm going wrong and why I'm getting no results from users and even groups when done in batches ?
The /2/sharing/list_file_members endpoint is documented as:
Use to obtain the members who have been invited to a file, both inherited and uninherited members.
The /2/sharing/list_file_members/batch endpoint is documented as:
Get members of multiple files at once. The arguments to this route are more limited, and the limit on query result size per file is more strict. To customize the results more, use the individual file endpoint.
Inherited users are not included in the result, and permissions are not returned for this endpoint.
It sounds like the file for your example is in a team folder, and so the group listed for your non-batch example is the team group, i.e., an inherited group. The documentation indicates that this group isn't expected when using the batch endpoint.

is there a wikipedia api call that can retrieve restriction status of the article?

Otherwise I must do querySelector on the page content to find if there is a some kind of padlock and by try and error check what (id or class) is unique to that icon.
Other source to find is this info is to go on information page by adding $action=info to the url params. But then another problem comes in that the protection status is written in that's particular wiki language.
Using the API is the right way to do it, but you need to use action=query. The padlocks icons are inconsistent across wikis, and most wikis probably don't even have them.
If you use the right parameters for your API query, you should be getting the results you're looking for.
Example for the English Wikipedia:
https://en.wikipedia.org/w/api.php?action=query&prop=info&format=json&inprop=protection&titles=Elton%20John gives you this result:
{
"batchcomplete": "",
"query": {
"pages": {
"5052197": {
"pageid": 5052197,
"ns": 0,
"title": "Elton John",
"contentmodel": "wikitext",
"pagelanguage": "en",
"touched": "2015-10-02T03:49:24Z",
"lastrevid": 683730854,
"length": 115931,
"protection": [
{
"type": "edit",
"level": "autoconfirmed",
"expiry": "infinity"
},
{
"type": "move",
"level": "sysop",
"expiry": "infinity"
}
],
"restrictiontypes": [
"edit",
"move"
]
}
}
}
}
Here the protection array tells you that only sysops can move the page, and only autoconfirmed users can edit it.
If you make a similar query on another wiki, say the French Wikipedia: https://fr.wikipedia.org/w/api.php?action=query&prop=info&format=json&inprop=protection&titles=Malia%20Obama , you get this in response (trimmed):
"protection": [
{
"type": "edit",
"level": "sysop",
"expiry": "infinity"
},
{
"type": "move",
"level": "sysop",
"expiry": "infinity"
}
],
"restrictiontypes": [
"edit",
"move"
]
In this case, sysops are the only one who can move and edit the page.

API Pagination Standards

I have been working on an API and pagination is required. Only 25 elements will be returned in each request. I was looking around for standards and I seem to see 2 different things going on.
The Link Header
Link: https://www.rfc-editor.org/rfc/rfc5988
Example:
Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
<https://api.github.com/user/repos?page=50&per_page=100>; rel="last"
In the JSON response
Link: API pagination best practices
Example:
"paging": {
"previous": "http://api.example.com/foo?since=TIMESTAMP"
"next": "http://api.example.com/foo?since=TIMESTAMP2"
}
Question:
Should I do both? and that being said; is the key "paging" the correct key? or "links" or "pagination"
I would say it depends on the structure of data you return (and may return in the future).
If you never have nested objects that need their own links, then using the Link header is (mildly) preferable, because it's more correct. The issue with nested objects is that you can't nest Link headers.
Consider the following collection entity:
{
"links": {
"collection": "/cards?offset=0&limit=25"
},
"data": [
{
"cardName": "Island of Wak-Wak",
"type": "Land",
"links": {
"set": "/cards?set=Arabian Knights"
}
},
{
"cardName": "Mana Drain",
"type": "Interrupt",
"links": {
"set": "/cards?set=Legends"
}
}
]
}
There's no good way to include links for the cards in the headers.