Query Wikidata REST API with related identifier - wikidata-api

I am attempting to do alignments for a set of known VIAF IDs. I would like to query the Wikidata REST API with a VIAF ID (P214) and get back a set of one or more Wikidata entity IDs (QXXXXX) that correspond to that VIAF entity. I am unable to find any examples of this either in the Wikidata API documentation or otherwise online.
I've noodled around with various permutations of queries using action=wbsearchentities and action=query, all to no avail.
Could anyone kindly point me to set of docs or example code that enumerates the correct query parameters for such a search?

Let's suppose you want to find the item whose VIAF ID is "113230702" (i.e. Douglas Adams).
Solution 1
Use action=query:
https://www.wikidata.org/w/api.php?action=query&format=json&list=search&srsearch=haswbstatement:P214=113230702
URL response:
{"batchcomplete":"","query":{"searchinfo":{"totalhits":1},"search":[{"ns":0,"title":"Q42","pageid":138,"size":319024,"wordcount":1204,"snippet":"Douglas Adams\nDouglas Adams\n\u0414\u0443\u0433\u043b\u0430\u0441 \u0410\u0434\u0430\u043c\u0441\nDouglas Adams\nDouglas Adams\nDouglas Adams\nDouglas Adams\nDouglas Adams\nDouglas Adams\nDouglas Adams\nDouglas Adams","timestamp":"2021-08-13T22:04:39Z"}]}}
Solution 2
Use Wikidata Query Service:
https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT%20DISTINCT%20%3Fx%0AWHERE%20{%0A%20%20%3Fx%20wdt%3AP214%20%22113230702%22%0A}
This last URL comes from the following SPARQL query:
SELECT DISTINCT ?x
WHERE {
?x wdt:P214 "113230702"
}
URL response:
{
"head" : {
"vars" : [ "x" ]
},
"results" : {
"bindings" : [ {
"x" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q42"
}
} ]
}
}

Related

Check if an array contains a value Query in Cumulocity REST API

I have a managed object of type “ABC” with a fragment “A”, that has sub-structure as following:
{
"type": "ABC",
"A": {
"value": ["B", "C"]
}
}
How would one create a filter/query that would check if "A" fragment contains “C” in the "value" array?
That query fails:
{{url}}/inventory/managedObjects?query=$filter=(type+eq+'ABC'+and+A.value+has+‘C‘)
With
{
"error": "inventory/Invalid Data",
"message": "Find by filter query failed : Query '$filter=(type eq 'ABC' and A.value has ‘C‘)' could not be understood. Please try again.",
"info": "https://www.cumulocity.com/guides/reference-guide/#error_reporting"
}
Cumulocity doc about querying REST API.
Solution:
Use eq instead of has:
{{url}}/inventory/managedObjects?query=$filter=(type+eq+'ABC'+and+A.value+eq+‘C‘)
I couldn't find a source but the following is working for me with the expected result:
{{url}}/inventory/managedObjects?query=$filter=(type+eq+'ABC'+and+A.value+eq+'C')
So basically you need to use the eq operator for your use case.

Cloudant search document by attributes of nested objects

My documents in cloudant have the following structure
{
"_id" : "1234",
"name" : "test",
"objects" : [
{
"type" : "TYPE1"
"time" : "1215
},
{
"type" : "TYPE2"
"time" : "1115"
}
]
}
Now I need to query my documents by a list of types.
Examples
1) If I would query with TYPE1 then all the documents where there is an object with this type would return. (The example doc would return)
2) If I would query with TYPE1 and TYPE3 it would return all documents which contain either of them (The example doc would return)
3) If I would query with TYPE3, TYPE4 and TYPE5 it would return all documents which contain either of them (The example doc would not return)
How would the code in the _design document look like and how would my API request look like?
One option is to use Cloudant Search.
Sample design document named types, which indexes each type property in your objects array
{
"_id": "_design/types",
"views": {},
"language": "javascript",
"indexes": {
"one-of": {
"analyzer": "standard",
"index": "function (doc) {\n for(var i in doc.objects) {\n index(\"type\", doc.objects[i].type); \n }\n}"
}
}
}
Query examples:
Search for one key (type=val)
GET https://$HOST/$DATABASE/_design/$DDOC/_search/one-of?q=type%3ATYPE1
Search for multiple keys (type=val1 OR type=val2)
GET https://$HOST/$DATABASE/_design/$DDOC/_search/one-of?q=type%3ATYPE1%20OR%20type%3ATYPE2
Search for multiple keys (type=val1 AND type=val2)
GET https://$HOST/$DATABASE/_design/$DDOC/_search/one-of?q=type%3ATYPE1%20AND%20type%3ATYPE2
To include the documents in the response append &include_docs=true.

Elastic Search: Ordering based on custom logic

I am giving a pattern "Master Servant" to elastic server search api.
It returns all the documents that contain at least one of them (Master OR Servant).
It shows the results in descending order of score.
However, I want to change that ordering to my custom logic i.e If a document contains both the words i.e. Master AND Servant, show that document first.
Can this be achieved?
Use the bool query.
From the 'ES definitive Guide'
The bool query takes a more-matches-is-better approach, so the score from each match clause will be added together to provide the final _score for each document. Documents that match both clauses will score higher than documents that match just one clause.
EDIT Based on comment:
to clarify I believe you want something like this:
{
"query": {
"bool": {
"should": [
{ "match": { "field": "Master" }},
{ "match": { "field": "Servant" }}
]
}
}
}

Rally Lookback: help fetching all history based on future state

Probably a lookback newbie question, but how do I return all of the history for stories based on an attribute that gets set later in their history?
Specifically, I want to load all of the history for all stories/defects in my project that have an accepted date in the last two weeks.
The following query (below) doesn't work because it (of course) only returns those history records where accepted date matches the query. What I actually want is all of the history records for any defect/story that is eventually accepted after that date...
filters :
[
{
property: "_TypeHierarchy",
value: { $nin: [ -51009, -51012, -51031, -51078 ] }
},
{
property: "_ProjectHierarchy",
value: this.getContext().getProject().ObjectID
},
{
property: "AcceptedDate",
value: { $gt: Ext.Date.format(twoWeeksBack, 'Y-m-d') }
}
]
Thanks to Nick's help, I divided this into two queries. The first grabs the final history record for stories/defects with an accepted date. I accumulate the object ids from that list, then kick off the second query, which finds the entire history for each object returned from the first query.
Note that I'm caching some variables in the "window" scope - that's my lame workaround to the fact that I can't ever quite figure out the context of "this" when I need it...
window.projectId = this.getContext().getProject().ObjectID;
I also end up flushing window.objectIds (where I store the results from the first query) when I exec the query, so I don't accumulate results across reloads. I'm sure there's a better way to do this, but I struggle with scope in javascript.
filter for first query
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "AcceptedDate",
value : {
$gt : Ext.Date.format(monthBack, 'Y-m-d')
}
}, {
property : "_ValidTo",
value : {
$gt : '3000-01-01'
}
} ]
Filter for second query:
filters : [ {
property : "_TypeHierarchy",
value : {
$nin : [ -51009, -51012, -51031, -51078 ]
}
}, {
property : "_ProjectHierarchy",
value : window.projectId
}, {
property : "ObjectID",
value : {
$in : window.objectIds
}
}, {
property : "c_Kanban",
value : {
$exists : true
}
} ]
Here's an alternative query that will return only the snapshots that represent transition into the Accepted state.
find:{
_TypeHierarchy: { $in : [ -51038, -51006 ] },
_ProjectHierarchy: 999999,
ScheduleState: { $gte: "Accepted" },
"_PreviousValues.ScheduleState": {$lt: "Accepted", $exists: true},
AcceptedDate: { $gte: "2014-02-01TZ" }
}
A second query is still required if you need the full history of the stories/defects. This should at least give you a cleaner initial list. Also note that Project: 999999 limits to the given project, while _ProjectHierarchy finds stories/defects in the child projects, as well.
In case you are interested, the query is similar to scenario #5 in the Lookback API documentation at https://rally1.rallydev.com/analytics/doc/.
If I understand the question, you want to get stories that are currently accepted, but you want that the returned results include snapshots from the time when they were not accepted. Before you write code, you may test an equivalent query in the browser and see if the results look as expected.
Here is an example - you will have to change OIDs.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"_ProjectHierarchy":12352608219,"_TypeHierarchy":"HierarchicalRequirement","ScheduleState":"Accepted",_ValidFrom:{$gte: "2013-11-01",$lt: "2014-01-01"}}},sort:[{"ObjectID": 1},{_ValidFrom: 1}]&fields=["Name","ScheduleState","PlanEstimate"]&hydrate=["ScheduleState"]
You are correct that a query like this: find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}
will return one snapshot per story that satisfies it.
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}}&fields=true&start=0&pagesize=1000
but a query like this: find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}
will return the whole history of the 3 artifacts:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12352608129/artifact/snapshot/query.js?find={"ObjectID":{$in:[16483705391,16437964257,14943067452]}}&fields=true&start=0&pagesize=1000
To illustrate, here are some numbers: the last query returns 17 results for me. I check each story's revision history, and the number of revisions per story are 5, 5, 7 respectively, sum of which is equal to the total result count returned by the query.
On the other hand the number of stories that meet find={"AcceptedDate":{$gt:"2014-01-01T00:00:00.000Z"}} is 13. And the query based on the accepted date returns 13 results, one snapshot per story.

How to prevent Facet Terms from tokenizing

I am using Facet Terms to get all the unique values and their count for a field. And I am getting wrong results.
term: web
Count: 1191979
term: misc
Count: 1191979
term: passwd
Count: 1191979
term: etc
Count: 1191979
While the actual result should be:
term: WEB-MISC /etc/passwd
Count: 1191979
Here is my sample query:
{
"facets": {
"terms1": {
"terms": {
"field": "message"
}
}
}
}
If reindexing is an option, it would be the best to change mapping and mark this fields as not_analyzed
"your_field" : { "type": "string", "index" : "not_analyzed" }
You can use multi field type if keeping an analyzed version of the field is desired:
"your_field" : {
"type" : "multi_field",
"fields" : {
"your_field" : {"type" : "string", "index" : "analyzed"},
"untouched" : {"type" : "string", "index" : "not_analyzed"}
}
}
This way, you can continue using your_field in the queries, while running facet searches using your_field.untouched.
Alternatively, if this field is stored, you can use a script field facet instead:
"facets" : {
"term" : {
"terms" : {
"script_field" : "_fields.your_field.value"
}
}
}
As the last resort, if this field is not stored, but record source is stored in the index, you can try this:
"facets" : {
"term" : {
"terms" : {
"script_field" : "_source.your_field"
}
}
}
The first solution is the most efficient. The last solution is the least efficient and may take a lot of time on a large index.
Wow, I also got this same issue today while term aggregating in the recent elastic-search. After googling and some partial understanding, found how this geeky indexing works(which is very simple).
Queries can find only terms that actually exist in the inverted index
When you index the following string
"WEB-MISC /etc/passwd"
it will be passed to an analyzer. The analyzer might tokenize it into
"WEB", "MISC", "etc" and "passwd"
with its position details. And this tokens might filtered to lowercase such as
"web", "misc", "etc" and "passwd"
So, after indexing,the search query can see the above 4 only. not the complete word "WEB-MISC /etc/passwd". For your requirement the following are my options you can use
1.Change the Default Analyzer used by elasticsearch([link][1])
2.If it is not need, just TurnOff the analyzer by setting 'not_analyzed' for the fields you need
3.To convert the already indexed data searchable, re-indexing is the only option
I have briefly explained this problem and proposed two solutions here.
I have talked about multiple approaches here.
One is use of not_analyzed to preserve the string as it is. But then as it has the drawback of being case insensitive , a better approach would be use keyword tokenizer + lowercase filter