Indexing policy is not being created along with collection in Cosmos DB - indexing

I'm writing a database creation script using Java SDK and the indexing policy is not being created as expected (and documented).
JAVA SDK used: com.microsoft.azure:azure-documentdb:2.4.0
Azure Cosmos DB emulator 2.2.2 for windows
Current Cosmos DB installation in Azure Portal with SQL account
I construct the collection creation request with a JAVA library and the result (before the actual request) looks like this (DocumentCollection::toJson()):
{
"uniqueKeyPolicy": {},
"partitionKey":
{
"kind": "Hash",
"paths": ["/playerId"]
},
"indexingPolicy":
{
"indexingMode": "Consistent",
"automatic": true,
"includedPaths": [
{
"path": "/gameId/?",
"indexes": [
{
"kind": "Range",
"dataType": "String"
}
]
},
{
"path": "/playerId/?",
"indexes": [
{
"kind": "Range",
"dataType": "String"
}
]
},
{
"path": "/date/*",
"indexes": [
{
"kind": "Range",
"dataType": "String"
}
]
}
],
"excludedPaths": [
{
"path": "/*"
}
]
},
"id": "Games"
}
The request completes successful but if I check the actual indexing policy with data explorer or DocumentClient.readCollection it looks like this:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/gameId/?",
"indexes": []
},
{
"path": "/playerId/?",
"indexes": []
},
{
"path": "/date/*",
"indexes": []
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
]
}
As you can see the arrays for index definitions are empty.
Then if I copy the indexing policy from SDK-generated output and manually paste it in Emulator or Portal 'Scale and settings' window for created collection the update outcome becomes the following:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/gameId/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
},
{
"path": "/playerId/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
},
{
"path": "/date/*",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
]
}
So the indexes are being created (although with extra Number entry as mentioned here).
Am I doing something wrong with a creation script?

Please follow the github source code and refer to below code which works for me.
import com.microsoft.azure.documentdb.*;
import java.util.Collection;
import java.util.List;
public class CreateCollectionTest {
//
static private String YOUR_COSMOS_DB_ENDPOINT = "https://***.documents.azure.com:443/";
static private String YOUR_COSMOS_DB_MASTER_KEY = "***";
public static void main(String[] args) throws DocumentClientException {
DocumentClient client = new DocumentClient(
YOUR_COSMOS_DB_ENDPOINT,
YOUR_COSMOS_DB_MASTER_KEY,
new ConnectionPolicy(),
ConsistencyLevel.Session);
DocumentCollection collection = new DocumentCollection();
collection.set("id", "game");
IndexingPolicy indexingPolicy = new IndexingPolicy();
Collection<IncludedPath> includedPaths = new ArrayList<IncludedPath>();
IncludedPath includedPath = new IncludedPath();
includedPath.setPath("/gameId/?");
Collection<Index> indexes = new ArrayList<Index>();
Index stringIndex = Index.Range(DataType.String);
stringIndex.set("precision", -1);
indexes.add(stringIndex);
Index numberIndex = Index.Range(DataType.Number);
numberIndex.set("precision", -1);
indexes.add(numberIndex);
includedPath.setIndexes(indexes);
includedPaths.add(includedPath);
indexingPolicy.setIncludedPaths(includedPaths);
collection.setIndexingPolicy(indexingPolicy);
ResourceResponse<DocumentCollection> createColl = client.createCollection("dbs/db", collection, null);
}
}

Related

REST dataset for Copy Activity Source give me error Invalid PaginationRule

My Copy Activity is setup to use a REST Get API call as my source. I keep getting Error Code 2200 Invalid PaginationRule RuleKey=supportRFC5988.
I can call the GET Rest URL using the Web Activity, but this isn't optimal as I then have to pass the output to a stored procedure to load the data to the table. I would much rather use the Copy Activity.
Any ideas why I would get an Invalid PaginationRule error on a call?
I'm using a REST Linked Service with the following properties:
Name: Workday
Connect via integration runtime: link-unknown-self-hosted-ir
Base URL: https://wd2-impl-services1.workday.com/ccx/service
Authentication type: Basic
User name: Not telling
Azure Key Vault for password
Server Certificate Validation is enabled
Parameters: Name:format Type:String Default value:json
Datasource:
"name": "Workday_Test_REST_Report",
"properties": {
"linkedServiceName": {
"referenceName": "Workday",
"type": "LinkedServiceReference",
"parameters": {
"format": "json"
}
},
"folder": {
"name": "Workday"
},
"annotations": [],
"type": "RestResource",
"typeProperties": {
"relativeUrl": "/customreport2/company1/person%40company.com/HIDDEN_BI_RaaS_Test_Outbound"
},
"schema": []
}
}
Copy Activity
{
"name": "Copy Test Workday REST API output to a table",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010",
"requestMethod": "GET",
"paginationRules": {
"supportRFC5988": "true"
}
},
"sink": {
"type": "SqlMISink",
"tableOption": "autoCreate"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "Workday_Test_REST_Report",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "Destination_db",
"type": "DatasetReference",
"parameters": {
"schema": "ELT",
"tableName": "WorkdayTestReportData"
}
}
]
}
],
"folder": {
"name": "Workday"
},
"annotations": []
}
}
Well after posting this, I noticed that in the copy activity code there is a nugget about "supportRFC5988": "true" I switched the true to false, and everything just worked for me. I don't see a way to change this in the Copy Activity GUI
Editing source code and setting this option to false helped!

How to avoid the duplicated data entry after parsing json in kusto?

I have following sample json data.
{
"data": {
"type": "ABC",
"id": "17495500314",
"attributes": {
[!["event": "update",
"gps_vali][1]][1]d": true,
"gps": {
"distance_diff": 6.48,
"total_distance": 848.6
},
"hdop": 79,
"fuel_level": 46.8,
"total_fuel_used": 60443.9,
"location": {
"latitude": 411.372618,
"longitude": -1.254931,
"relative_position": {
"distance": "37",
}
},
"idle_periods": []
},
"relationships": {
"assets": {
"data": [
{
"type": "ABCDFTTG",
"id": "1589799143500003",
"attributes": {
"external_id": "ABCDFTTG",
"hardware_id": "ABCDFTTG"
}
}
]
},
"devices": {
"data": [
{
"type": "ABCDFTTG",
"id": "1585231172900341",
"attributes": {
"serial": "5572016191"
}
},
{
"type": "tablet",
"id": "1587893062600175",
"attributes": {
"serial": "ABCDFTTG"
}
}
]
},
"users": {
"data": [
{
"type": "user",
"id": "ABCDFTTG",
"attributes": {
"external_id": "ABCDFTTG"
}
}
]
}
}
},
"meta": {
"message_id": "11eb-8c75-0b3f87aedbb5",
"consumer_version": "1.2.0",
"origin_version": null,
"timestamp": "2021-06-14T17:42:29Z"
}
}
I want only one row instead of this two. Here is my kusto query which is used for parsing json data into table columns.
Test
|where messageId =="123"
//|mv-expand message=message.data.attributes
|mv-expand message
|mv-expand Value=message.data.relationships.assets.['data']
|mv-expand value_devices=message.data.relationships.devices.['data']
|mv-expand value_user=message.data.relationships.users.['data']
| project type=message.data.type,id=message.data.id,
event=tostring(message.data.attributes.event),
logged_at=tostring(message.data.attributes.logged_at),
distance=toint(message.data.attributes.location.relative_position.distance),
// Value=message.data.relationships.assets.['data'],//.['data']
type_asset=Value.type,asset_id=Value.id,
device_type=value_devices.type,device_id=value_devices.id,
device_attr_serial=value_devices.attributes.serial,
user_type=value_user.type,user_id=value_user.id,
user_external_id=value_user.attributes.external_id
This duplicate row appeared after adding user tag this tag is array so how to handle this array with single id.
I have parse my json data any got the following output.
Expected output should be like
check device_type and device_id columns

How to update existing Knowledgebase using QnA Maker API v4.0?

I've successfully created my Knowledgebase using API.
But I forgot to add some alternative questions and metadata for one of the pairs.
I've noticed PATH method in the API to update the Knowledebase, so updating kb is supported.
I've created a payload which looked like this:
{
"add": {
},
"delete": {
},
"update": {
"qnaList": [
{
"id": 1,
"answer": "Answer",
"source": "link_to_source",
"questions": [
"Question 1?",
"Question 2?"
],
"metadata": [
{
"name": "oldMetadata",
"value": "oldMetadata"
},
{
"name": "newlyAddedMetaData",
"value": "newlyAddedMetaData"
}
]
}]}
}
I get back the following response HTTP 202 Accepted:
{
"operationState": "NotStarted",
"createdTimestamp": "2018-05-21T07:46:52Z",
"lastActionTimestamp": "2018-05-21T07:46:52Z",
"userId": "user_uuid",
"operationId": "operation_uuid"
}
So, looks like it worked. But in reality, this request doesn't take any affect.
When I check operation details, it returns me the following:
{
"operationState": "Succeeded",
"createdTimestamp": "2018-05-21T07:46:52Z",
"lastActionTimestamp": "2018-05-21T07:46:54Z",
"resourceLocation": "/knowledgebases/kb_uuid",
"userId": "user_uuid",
"operationId": "operation_uuid"
}
What am I doing wrong? And how should I update my kb via API properly?
Please help
I had the same problem, I discovered that it was necessary to have all the data of the json even if they were not used.
In your case you need "name" and "urls" in the "update" section and "Delete" in "update/qnaList/questions" section:
{
"add": {},
"delete": {},
"update": {
"name": "nameofKbBase", //this
"qnaList": [
{
"id": 2370,
"answer": "DemoAnswerEdit",
"source": "CustomSource",
"questions": {
"add": [
"DemoQuestionEdit"
],
"delete": [] //this
},
"metadata": { }
}
],
"urls": [] //this
}
}

elasticsearch Not_analyzed and analyzed

Hello For certain requirement i have made all the index not_analyzed
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"index": "not_analyzed"
}
}
}
]
}
}
}
But now as per our requirement i have to make certain field as analyzed . and keep rest of the field as not analyzed
My Data is of type :
{ "field1":"Value1",
"field2":"Value2",
"field3":"Value3",
"field4":"Value3",
"field5":"Value4",
"field6":"Value5",
"field7":"Value6",
"field8":"",
"field9":"ce-3sdfa773-7sdaf2-989e-5dasdsdf",
"field10":"12345678",
"field11":"ertyu12345ffd",
"field12":"A",
"field13":"Value7",
"field14":"Value8",
"field15":"Value9",
"field16":"Value10",
"field17":"Value11",
"field18":"Value12",
"field19":{
"field20":"Value13",
"field21":"Value14"
},
"field22":"Value15",
"field23":"ipaddr",
"field24":"datwithtime",
"field25":"Value6",
"field26":"0",
"field20":"0",
"field28":"0"
}
If i change my template as per recommendation to something like this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}}},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Then i get error while i insert data stating
{"error":"MapperParsingException[failed to parse [field19]]; nested: ElasticsearchIllegalArgumentException[unknown property [field20 ]]; ","status":400}
In short you want to change the mapping of your index.
If your index does not contain any data(which I suppose, is not the
case), then you can simply delete the index and create it again with
new mapping.
If your index contains data, you will have to reindex it.
Steps for reindexing:
Put all data from existing index to dummy index.
Delete existing index. Generate new mapping.
Transfer data from dummy index to newly created index.
You can also give a look to elastic search alias here
This link might also be useful.
If you want to use the same field as analysed and not analysed at the same time you have to use multifield using
"title": {
"type": "multi_field",
"fields": {
"title": { "type": "string" },
"raw": { "type": "string", "index": "not_analyzed" }
}
}
This is for your reference.
For defining multifield in dynamic_templates use:
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}
Refer this for more info on this.
You can either write multiple templates or have separate properties depending on your requirements. Both will work smoothly.
1) Multiple Templates
{
"mappings": {
"your_doctype": {
"dynamic_templates": [
{
"analyzed_values": {
"match_mapping_type": "*",
"match_pattern": "regex",
"match": "title|summary",
"mapping": {
"type": "string",
"analyzer": "keyword"
}
}
},
{
"date_values": {
"match_mapping_type": "date",
"match": "*_date",
"mapping": {
"type": "date"
}
}
},
{
"exact_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary are analyzed by keyword analyzer. I have also added date field which is optional, it will map create_date etc as date. Last one will match anything and it will not_analyzed which will fulfill your requirements.
2) Add analyzed field as properties.
{
"mappings": {
"your_doctype": {
"properties": {
"title": {
"type": "string",
"analyzer": "keyword"
},
"summary": {
"type": "string",
"analyzer": "keyword"
}
},
"dynamic_templates": [
{
"any_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary fields are analyzed while rest will be not_analyzed
You would have to reindex the data no matter which solution you take.
EDIT 1 : After looking at your data and mapping, there is one slight problem in it. Your data contains object structure and you are mapping everything apart from filed6 as string and filed19 is an Object and not string and hence ES is throwing the error. The solution is to let ES decide which datatype the field is with dynamic_type. Change your mapping to this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "{dynamic_type}", <--- this will decide if field is either object or string.
"index": "not_analyzed"
}
}
}
]
}
}
}
Hope this helps!!

Cloudant Search Queries Index Function

I can't find very much documentation on how to properly define the index function such that I can do a full text search on the information that I need.
I've used the Alchemy API to add "entities" json to my documents.
For instance, I have a document with the following:
"_id": "redacted",
"_rev": "redacted",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
I haven't been able to find any code snippets showing how to construct a search index function that looks at nested json.
My stab at indexing the whole object appears to be incorrect.
This is the full design document:
{
"_id": "_design/entities",
"_rev": "redacted",
"views": {},
"language": "javascript",
"indexes": {
"entities": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.entities.relevance > 0.5){\n index(\"default\", doc.entities.text, {\"store\":\"yes\"});\n }\n\n}"
}
}
}
And the search index formatted a little bit more clearly is
function (doc) {
if (doc.entities.relevance > 0.5){
index("default", doc.entities.text, {"store":"yes"});
}
}
Adding the for loop as suggested below makes a lot of sense.
However, I still am not able to return any results.
My query is
"https://user.cloudant.com/calbills/_design/entities/_search/entities?q=Governors"
Server response is:
{"total_rows":0,"bookmark":"g2o","rows":[]}
The "for..in" style loop doesn't seem to work.
However, I do get results using the more standard for loop loops.
function (doc) {
if(doc.entities){
var arrayLength = doc.entities.length;
for (var i = 0; i < arrayLength; i++) {
if (parseFloat(doc.entities[i].relevance) > 0.5)
index("default", doc.entities[i].text);
}
}
}
Cheers!
Your need to loop on the elements in the doc.entities array.
function (doc) {
for(entity in doc.entities){
if (parseFloat(entity.relevance) > 0.5){
index("default", entity.text, {"store":"yes"});
}
}
}
This is what I tried :
function(doc){
if(doc.entities){
for( var p in doc.entities ){
if (doc.entities[p].relevance > 0.5)
{
index("entitiestext", doc.entities[p].text, {"store":"yes"});
}
}
}
}
Query String used :"q=entitiestext:California Constitution&include_docs=true"
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
},
"doc": {
"_id": "redacted",
"_rev": "4-7f6e6db246abcf2f884dc0b91451272a",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
}
}
]
}
Query String used: q=entitiestext:California Constitution
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
}
}
]
}