Reading through:
https://www.microsoft.com/cognitive-services/en-us/Academic-Knowledge-API/documentation/GraphSearchMethod
It is a bit obscure the meaning of "path":
"path": "/paper/AuthorIDs/author" - I don't see authorIds object in the returned results.
# post data query
{
"path": "/paper/AuthorIDs/author",
"paper": {
"type": "Paper",
"NormalizedTitle": "graph engine",
"select": [
"OriginalTitle"
]
},
"author": {
"return": {
"type": "Author",
"Name": "bin shao"
}
}
}
#results
{
"Results": [
[
{
"CellID": 2160459668,
"OriginalTitle": "Trinity: a distributed graph engine on a memory cloud"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 2171539317,
"OriginalTitle": "A distributed graph engine for web scale RDF data"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 2411554868,
"OriginalTitle": "A distributed graph engine for web scale RDF data"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 73304046,
"OriginalTitle": "The Trinity graph engine"
},
{
"CellID": 2093502026
}
]
]
}
Which is the correct path (or data to post) to query for citation and co-citation of an article, and paginate results?
You will find AuthorIDs on the graph schema from Microsoft Academic Search:
Assuming you know the ID of the source paper (2118322263 in the following example), here is the POST part of the request:
{
"path": "/paper/CitationIDs/citation",
"paper": {
"type": "Paper",
"id": [ 2118322263 ],
"select": [
"OriginalTitle"
]
},
"citation": {
"return": {
"type": "Paper"
},
"select": [
"OriginalTitle"
]
}
}
This returns 634 results in one response, while a query to the paper itself shows a citation count of 732. I have no idea why there is a difference, nor how to do pagination.
Related
I faced with the issue when I try to search for several words including a special character (section sign "§").
Example: AB § 32.
I want all words "AB", "32" and symbol "§" to be included in found documents.
In some cases document can be found, in some not.
If my document contains the following text then search finds it:
Lagrum: 32 § 1 mom. första stycket a) kommunalskattelagen (1928:370) AB
But if document contains this text then search doesn't find:
Lagrum: 32 § 1 mom. första stycket AB
For symbol "§" I use UT8-encoding "\xc2\xa7".
Index uses "lucene.swedish" analyzer.
"Content": [
{
"analyzer": "lucene.swedish",
"minGrams": 4,
"tokenization": "nGram",
"type": "autocomplete"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
Query looks like:
{
"index": "test_index",
"compound": {
"filter": [
{
"text": {
"query": [
"111111111111"
],
"path": "ProductId"
}
},
],
"must": [
{
"autocomplete": {
"query": [
"AB"
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"\xc2\xa7",
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"32"
],
"path": "Content"
}
}
],
},
"count": {
"type": "lowerBound",
"threshold": 500
}
}
The question is what is wrong with the search and how can I get a correct result (return both above mentioned documents) ?
Focusing only on the content field, here is an index definition that should work for your requirements. The docs are here. Let me know if this works for you.
{
"mappings": {
"dynamic": false,
"fields": {
"content": [
{
"type": "autocomplete",
"tokenization": "nGram",
"minGrams": 4,
"maxGrams": 7,
"foldDiacritics": false,
"analyzer": "lucene.whitespace"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
}
}
}
This is my api response. Want to extract the value of the Id based on the displayNumber. This display number is a given in the list of values in examples/csv file.
{
"Acc": [
{
"Id": "2b765368696b3441673633325",
"code": "SGD",
"val": 406030.83,
"displayNumber": "8957",
"curval": 406030.83
},
{
"Id": "4e676269685a73787472355776764b50717a4",
"code": "GBP",
"val": 22.68,
"displayNumber": "1881",
"curval": 22.68
},
{
"Id": "526e666d65366e67626244626e6266467",
"code": "SGD",
"val": 38404.44,
"displayNumber": "1004",
"curval": 38404.44
},
],
"combinations": [
{
"displayNumber": "3444",
"Code": "SGD",
"Ids": [
{
"Id": "2b765368696b34416736333254462"
},
{
"Id": "4e676269685a7378747235577"
},
{
"Id": "526e666d65366e6762624d"
}
],
"destId": "3678434b643530456962435272d",
"curval": 3.85
},
{
"displayNumber": "8957",
"code": "SGD",
"Ids": [
{
"Id": "3678434b6435304569624357"
},
{
"Id": "4e676269685a73787472355776764b50717a4"
},
{
"Id": "526e666d65366e67626244626e62664679"
}
],
"destId": "2b765368696b344167363332544",
"curval": 406030.83
},
{
"displayNumber": "1881",
"code": "GBP",
"Ids": [
{
"Id": "3678434b643530456962435275"
},
{
"Id": "2b765368696b3441673"
},
{
"Id": "526e666d65366e67626244626e626"
}
],
"destId": "4e676269685a7378747d",
"curval": 22.68
},
]
}
Examples
|displayNumber|
|8957|
|3498|
|4943|
Below expression works if i give the value
* def tempid = response
* def fromAccount = get[0] tempid.Acc[?(#.displayNumber==8957].Id
I'm not sure how to make this comparison value (i.e. 1881) as a variable which can be read from examples (scenario outline) or a csv file. Went through the documentation, which recommends, karate filters or maps. However, not able to follow how to implement.
You almost got it :-). This is the way you want to solve this
Scenario Outline: Testing SO question for Navneeth
* def tempid = response
* def fromAccount = get[0] tempid.Acc[?(#.displayNumber == <displayNumber>)]
* print fromAccount
Examples:
|displayNumber|
|8957|
|1881|
|3444|
You need to pass the placeholder in examples as -
'<displayNumber>'
I've successfully created my Knowledgebase using API.
But I forgot to add some alternative questions and metadata for one of the pairs.
I've noticed PATH method in the API to update the Knowledebase, so updating kb is supported.
I've created a payload which looked like this:
{
"add": {
},
"delete": {
},
"update": {
"qnaList": [
{
"id": 1,
"answer": "Answer",
"source": "link_to_source",
"questions": [
"Question 1?",
"Question 2?"
],
"metadata": [
{
"name": "oldMetadata",
"value": "oldMetadata"
},
{
"name": "newlyAddedMetaData",
"value": "newlyAddedMetaData"
}
]
}]}
}
I get back the following response HTTP 202 Accepted:
{
"operationState": "NotStarted",
"createdTimestamp": "2018-05-21T07:46:52Z",
"lastActionTimestamp": "2018-05-21T07:46:52Z",
"userId": "user_uuid",
"operationId": "operation_uuid"
}
So, looks like it worked. But in reality, this request doesn't take any affect.
When I check operation details, it returns me the following:
{
"operationState": "Succeeded",
"createdTimestamp": "2018-05-21T07:46:52Z",
"lastActionTimestamp": "2018-05-21T07:46:54Z",
"resourceLocation": "/knowledgebases/kb_uuid",
"userId": "user_uuid",
"operationId": "operation_uuid"
}
What am I doing wrong? And how should I update my kb via API properly?
Please help
I had the same problem, I discovered that it was necessary to have all the data of the json even if they were not used.
In your case you need "name" and "urls" in the "update" section and "Delete" in "update/qnaList/questions" section:
{
"add": {},
"delete": {},
"update": {
"name": "nameofKbBase", //this
"qnaList": [
{
"id": 2370,
"answer": "DemoAnswerEdit",
"source": "CustomSource",
"questions": {
"add": [
"DemoQuestionEdit"
],
"delete": [] //this
},
"metadata": { }
}
],
"urls": [] //this
}
}
I'm trying to query the output of a Natural Language Processing (NLP) call in Big Query (BQ) but I'm struggling to get the output in the right format for BQ.
I understand that BQ takes json files (as newline delimited) - but just not sure that (a) the output of NLP is json newline delimited and (b) if my schema is correct.
Here's the json output I'm working with:
{
"entities": [
{
"name": "Rowling",
"type": "PERSON",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
},
"salience": 0.65751493,
"mentions": [
{
"text": {
"content": " J.",
"beginOffset": -1
}
},
{
"text": {
"content": "K. Rowl",
"beginOffset": -1
}
}
]
},
{
"name": "LONDON",
"type": "LOCATION",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/London"
},
"salience": 0.14284456,
"mentions": [
{
"text": {
"content": "\ufeffLON",
"beginOffset": -1
}
}
]
},
{
"name": "Harry Potter",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter"
},
"salience": 0.0726779,
"mentions": [
{
"text": {
"content": "th Harry Pot",
"beginOffset": -1
}
},
{
"text": {
"content": "‘Harry Pot",
"beginOffset": -1
}
}
]
},
{
"name": "Deathly Hallows",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter_and_the_Deathly_Hallows"
},
"salience": 0.022565609,
"mentions": [
{
"text": {
"content": "he Deathly Hall",
"beginOffset": -1
}
}
]
}
],
"language": "en"
}
Is there a way to send the output directly to big query via the command line in Google Cloud shell?
Any information would be greatly appreciated!
Thanks
Glad you found my Harry Potter blog post! I'd recommend storing the NL API's JSON response as a string in BigQuery and then using a user-defined function to query it. You should be able to run the following (the table is publicly viewable) to get a count of how often each entity appears in the JSON you posted:
SELECT
COUNT(*) as entity_count, entity
FROM
JS(
(SELECT entities FROM [sara-bigquery:samples.hp_udf]),
entities,
"[{ name: 'entity', type: 'string'}]",
"function(row, emit) {
try {
x = JSON.parse(row.entities);
entities = x['entities'];
entities.forEach(function(data) {
emit({ entity: data.name });
});
} catch (e) {}
}"
)
GROUP BY entity
ORDER BY entity_count DESC
send the output directly to big query via the command line in Google Cloud shell
Look at this page, and search for "bq load"
https://cloud.google.com/bigquery/bq-command-line-tool
Here they have some example about json schema.
Schema to load json data to google big query
I can't find very much documentation on how to properly define the index function such that I can do a full text search on the information that I need.
I've used the Alchemy API to add "entities" json to my documents.
For instance, I have a document with the following:
"_id": "redacted",
"_rev": "redacted",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
I haven't been able to find any code snippets showing how to construct a search index function that looks at nested json.
My stab at indexing the whole object appears to be incorrect.
This is the full design document:
{
"_id": "_design/entities",
"_rev": "redacted",
"views": {},
"language": "javascript",
"indexes": {
"entities": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.entities.relevance > 0.5){\n index(\"default\", doc.entities.text, {\"store\":\"yes\"});\n }\n\n}"
}
}
}
And the search index formatted a little bit more clearly is
function (doc) {
if (doc.entities.relevance > 0.5){
index("default", doc.entities.text, {"store":"yes"});
}
}
Adding the for loop as suggested below makes a lot of sense.
However, I still am not able to return any results.
My query is
"https://user.cloudant.com/calbills/_design/entities/_search/entities?q=Governors"
Server response is:
{"total_rows":0,"bookmark":"g2o","rows":[]}
The "for..in" style loop doesn't seem to work.
However, I do get results using the more standard for loop loops.
function (doc) {
if(doc.entities){
var arrayLength = doc.entities.length;
for (var i = 0; i < arrayLength; i++) {
if (parseFloat(doc.entities[i].relevance) > 0.5)
index("default", doc.entities[i].text);
}
}
}
Cheers!
Your need to loop on the elements in the doc.entities array.
function (doc) {
for(entity in doc.entities){
if (parseFloat(entity.relevance) > 0.5){
index("default", entity.text, {"store":"yes"});
}
}
}
This is what I tried :
function(doc){
if(doc.entities){
for( var p in doc.entities ){
if (doc.entities[p].relevance > 0.5)
{
index("entitiestext", doc.entities[p].text, {"store":"yes"});
}
}
}
}
Query String used :"q=entitiestext:California Constitution&include_docs=true"
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
},
"doc": {
"_id": "redacted",
"_rev": "4-7f6e6db246abcf2f884dc0b91451272a",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
}
}
]
}
Query String used: q=entitiestext:California Constitution
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
}
}
]
}