Creating Collection in Azure Search Service using Indexer - sql

I am using indexer to sync data from my SQL Database to Azure Search Service. I have a field in my SQL View, which contains XML data. The Column contains a list of string. The corresponding field in my Azure Search Service Index in a Collection(Edm.String).
On checking some documentations, I found that Indexer does not change Xml(SQL) to Collection(Azure Search).
Is there any workaround as to how I can get create the Collection from the Xml data?
p.s I am extracting the data from a View, so I can change the Xml to JSON if needed.

UPDATE on October 17, 2016: Azure Search now automatically converts a string coming from a database to an Collection(Edm.String) field if the data represents a JSON string array: for example, ["blue", "white", "red"]
Old response: great timing, we just added a new "field mappings" feature that allows you to do this. This feature will be deployed sometime early next week. I will post a comment on this thread when this is rolled out in all datacenters.
To use it, you indeed need to use JSON. Make sure your source column contains a JSON array, for example ["hello" "world"]. Then, update your indexer definition to contain the new fieldMappings property:
"fieldMappings" : [ { "sourceFieldName" : "YOUR_SOURCE_FIELD", "targetFieldName" : "YOUR_TARGET_FIELD", "mappingFunction" : { "name" : "jsonArrayToStringCollection" } } ]
NOTE: You'll need to use API version 2015-02-28-Preview to add fieldMappings.
HTH,
Eugene

Related

How to achieve generic Audit.NET json data processing?

I am using Audit.Net library to log EntityFramework actions into a database (currently everything into one AuditEventLogs table, where the JsonData column stores the data in the following Json format:
{
"EventType":"MyDbContext:test_database",
"StartDate":"2021-06-24T12:11:59.4578873Z",
"EndDate":"2021-06-24T12:11:59.4862278Z",
"Duration":28,
"EntityFrameworkEvent":{
"Database":"test_database",
"Entries":[
{
"Table":"Offices",
"Name":"Office",
"Action":"Update",
"PrimaryKey":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8"
},
"Changes":[
{
"ColumnName":"Address",
"OriginalValue":"test_address",
"NewValue":"test_address"
},
{
"ColumnName":"Contact",
"OriginalValue":"test_contact",
"NewValue":"test_contact"
},
{
"ColumnName":"Email",
"OriginalValue":"test_email",
"NewValue":"test_email2"
},
{
"ColumnName":"Name",
"OriginalValue":"test_name",
"NewValue":"test_name"
},
{
"ColumnName":"OfficeSector",
"OriginalValue":1,
"NewValue":1
},
{
"ColumnName":"PhoneNumber",
"OriginalValue":"test_phoneNumber",
"NewValue":"test_phoneNumber"
}
],
"ColumnValues":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8",
"Address":"test_address",
"Contact":"test_contact",
"Email":"test_email2",
"Name":"test_name",
"OfficeSector":1,
"PhoneNumber":"test_phoneNumber"
},
"Valid":true
}
],
"Result":1,
"Success":true
}
}
Me and my team has a main aspect to achieve:
Being able to create a search page where administrators are able to tell
who changed
what did they change
when did the change happen
They can give a time period, to reduce the number of audit records, and the interesting part comes here:
There should be an input text field which should let them search in the values of the "ColumnValues" section.
The problems I encountered:
Even if I map the Json structure into relational rows, I am unable to search in every column, with keeping the genericity.
If I don't map, I could search in the Json string with LIKE mssql function but on the order of a few 100,000 records it takes an eternity for the query to finish so it is probably not the way.
Keeping the genericity would be important, so we don't need to modify the audit search page every time when we create or modify a new entity.
I only know MSSQL, but is it possible that storing the audit logs in a document oriented database like cosmosDB (or anything else, it was just an example) would solve my problem? Or can I reach the desired behaviour using relational database like MSSQL?
Looks like you're asking for an opinion, in that case I would strongly recommend a document oriented DB.
CosmosDB could be a great option since it supports SQL queries.
There is an extension to log to CosmosDB from Audit.NET: Audit.AzureCosmos
A sample query:
SELECT c.EventType, e.Table, e.Action, ch.ColumnName, ch.OriginalValue, ch.NewValue
FROM c
JOIN e IN c.EntityFrameworkEvent.Entries
JOIN ch IN e.Changes
WHERE ch.ColumnName = "Address" AND ch.OriginalValue = "test_address"
Here is a nice post with lot of examples of complex SQL queries on CosmosDB

Converting String to JSON in Data Factory

Moving data from SQL Server to Cosmos in Copy Activity of Data Factory v2. One of the column in SQL server has JSON object (Although dataType is (varchar(MAX)) and I have mapped it to one column in Cosmos collection. The issue is it adds it as String NOT json object. How can we setup it up in Copy Activity so that data for that one particular column gets added as Json Object not string
It gets added as follows:
MyObject:"{SomeField: "Value" }"
However I want this to be:
MyObject:{SomeField: "Value" } // Without quotes so that it appears as json object rather than string
Use JSON conversion function available in Data Factory.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#json
MyObject:json("{SomeField: "Value" }")
It will result as
MyObject:{SomeField: "Value" }

SharePoint change column id for REST requests

I recently started experimenting with the REST API for SharePoint 2013 Foundation and I am trying to return all entries in a list. My GET request returns the data I am looking for, but the IDs used to identify the columns in the list are not helpful for identifying what the information is (see images below). The column IDs between 'Title' and 'ID', in the second image, are a jumble of characters.
SharePoint List View
Response Data
Is there any way to configure the list to use the column names as IDs? Also, is there some significance to the characters currently used as IDs?
You will need to make a second request to get a listing of columns that includes the InternalName and the Title which is what you are trying to reference:
You can use this REST call:
_api/web/lists/GetByTitle('Project Details')/fields
or you can use CSOM:
using (ClientContext context = new ClientContext(url))
{
List list = context.Web.Lists.GetByTitle("Project Details");
context.Load(list, l => l.Fields);
context.ExecuteQuery();
foreach(Field field in list.Fields)
{
Console.WriteLine(field.Title);
Console.WriteLine(field.InternalName);
}
}
SharePoint automatically generates the InternalName and it is a read-only field, at least using REST. It'll be easier to get the Field Data to correlate the InternalName to the Title than changing the values.
The column you are referring to, between Title and Id, is the ID of the content type associated to the item. It is not a column ID.
The SharePoint REST API is OData compliant, so you can use the $select parameter to query for the neccesary fields.
http://server/site/_api/web/lists('guid')/items?$select=Column1,Column2
Please be aware though, lookup fields need to be expanded as well, otherwise you get only the Id of the lookup item.
http://server/site/_api/web/lists('guid')/items?$select=LookupColumn&$expand=LookupColumn/Title

Delete a field and its contents in all the records and recreate it with new mapping

I have a field field10 which got created by accident when I updated a particular record in my index. I want to remove this field from my index, all its contents and recreate it with the below mapping:
"mytype":{
"properties":{
"field10":{
"type":"string",
"index":"not_analyzed",
"include_in_all":"false",
"null_value":"null"
}
}
}
When I try to create this mapping using the Put Mapping API, I get an error: {"error":"MergeMappingException[Merge failed with failures {[mapper [field10] has different index values, mapper [field10] has different index_analyzer, mapper [field10] has different search_analyzer]}]","status":400}.
How do I change the mapping of this field? I don't want to reindex millions of records just for this small accident.
Thanks
AFAIK, you can't remove a single field and recreate it.
You can not either just modify a mapping and have everything reindexed automagicaly. Imagine that you don't store _source. How can Elasticsearch know what your data look like before it was indexed?
But, you can probably modify your mapping using a multifield with field10.field10 using the old mapping and field10.new with the new analyzer.
If you don't reindex, only new documents will have content in field10.new.
If you want to manage old documents, you have to:
Send again all your docs (it will update everything) - aka reindex (you can use scan & scroll API to get your old documents)
Try to update your docs with the Update API
You can probably try to run a query like:
curl -XPOST localhost:9200/crunchbase/person/1/_update -d '{
"script" : "ctx._source.field10 = ctx._source.field10"
}'
But, as you can see, you have to run it document by document and I think it will take more time than reindexing all with the Bulk API.
Does it help?

What are the resources or tools used to manage temporal data in key-value stores?

I'm considering using MongoDB or CouchDB on a project that needs to maintain historical records. But I'm not sure how difficult it will be to store historical data in these databases.
For example, in his book "Developing Time-Oriented Database Applications in SQL," Richard Snodgrass points out tools for retrieving the state of data as of a particular instant, and he points out how to create schemas that allow for robust data manipulation (i.e. data manipulation that makes invalid data entry difficult).
Are there tools or libraries out there that make it easier to query, manipulate, or define temporal/historical structures for key-value stores?
edit:
Note that from what I hear, the 'version' data that CouchDB stores is erased during normal use, and since I would need to maintain historical data, I don't think that's a viable solution.
P.S. Here's a similar question that was never answered: key-value-store-for-time-series-data
There are a couple options if you wanted to store the data in MongoDB. You could just store each version as a separate document, as then you can query to get the object at a certain time, the object at all times, objects over ranges of time, etc. Each document would look something like:
{
object : whatever,
date : new Date()
}
You could store all the versions of a document in the document itself, as mikeal suggested, using updates to push the object itself into a history array. In Mongo, this would look like:
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
// make changes to obj
...
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
A cooler (I think) and more space-efficient way, although less time-efficient, might be to store a history in the object itself about what changed in the object at each time. Then you could replay the history to build the object at a certain time. For instance, you could have:
{
object : startingObj,
history : [
{ date : d1, addField : { x : 3 } },
{ date : d2, changeField : { z : 7 } },
{ date : d3, removeField : "x" },
...
]
}
Then, if you wanted to see what the object looked like between time d2 and d3, you could take the startingObj, add the field x with the value 3, set the field z to the value of 7, and that would be the object at that time.
Whenever the object changed, you could atomically push actions to the history array:
db.foo.update({object : startingObj}, {$push : {history : {date : new Date(), removeField : "x"}}})
Yes, in CouchDB the revisions of a document are there for replication and are usually lost during compaction. I think UbuntuOne did something to keep them around longer but I'm not sure exactly what they did.
I have a document that I need the historical data on and this is what I do.
In CouchDB I have an _update function. The document has a "history" attribute which is an array. Each time I call the _update function to update the document I append to the history array the current document (minus the history attribute) then I update the document with the changes in the request body. This way I have the entire revision history of the document.
This is a little heavy for large documents, there are some javascript diff tools I was investigating and thinking about only storing the diff between the documents but haven't done it yet.
http://wiki.apache.org/couchdb/How_to_intercept_document_updates_and_perform_additional_server-side_processing
Hope that helps.
I can't speak for mongodb but for couchdb it all really hinges on how you write your views.
I don't know the specifics of what you need but if you have a unique id for a document throughout its lifetime and store a timestamp in that document then you have everything you need for robust querying of that document.
For instance:
document structure:
{ "docid" : "doc1", "ts" : <unix epoch> ...<set of key value pairs> }
map function:
function (doc) {
if (doc.docid && doc.ts)
emit([doc.docid, doc.ts], doc);
}
}
The view will now output each doc and its revisions in historical order like so:
["doc1", 1234567], ["doc1", 1234568], ["doc2", 1234567], ["doc2", 1234568]
You can use view collation and start_key or end_key to restrict the returned documents.
start_key=["doc1", 1] end_key=["doc1", 9999999999999]
will return all historical copies of doc1
start_key=["doc2", 1234567] end_key=["doc2", 123456715]
will return all historical copies of doc2 between 1234567 and 123456715 unix epoch times.
see ViewCollation for more details