What are the resources or tools used to manage temporal data in key-value stores? - sql

I'm considering using MongoDB or CouchDB on a project that needs to maintain historical records. But I'm not sure how difficult it will be to store historical data in these databases.
For example, in his book "Developing Time-Oriented Database Applications in SQL," Richard Snodgrass points out tools for retrieving the state of data as of a particular instant, and he points out how to create schemas that allow for robust data manipulation (i.e. data manipulation that makes invalid data entry difficult).
Are there tools or libraries out there that make it easier to query, manipulate, or define temporal/historical structures for key-value stores?
edit:
Note that from what I hear, the 'version' data that CouchDB stores is erased during normal use, and since I would need to maintain historical data, I don't think that's a viable solution.
P.S. Here's a similar question that was never answered: key-value-store-for-time-series-data

There are a couple options if you wanted to store the data in MongoDB. You could just store each version as a separate document, as then you can query to get the object at a certain time, the object at all times, objects over ranges of time, etc. Each document would look something like:
{
object : whatever,
date : new Date()
}
You could store all the versions of a document in the document itself, as mikeal suggested, using updates to push the object itself into a history array. In Mongo, this would look like:
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
// make changes to obj
...
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
A cooler (I think) and more space-efficient way, although less time-efficient, might be to store a history in the object itself about what changed in the object at each time. Then you could replay the history to build the object at a certain time. For instance, you could have:
{
object : startingObj,
history : [
{ date : d1, addField : { x : 3 } },
{ date : d2, changeField : { z : 7 } },
{ date : d3, removeField : "x" },
...
]
}
Then, if you wanted to see what the object looked like between time d2 and d3, you could take the startingObj, add the field x with the value 3, set the field z to the value of 7, and that would be the object at that time.
Whenever the object changed, you could atomically push actions to the history array:
db.foo.update({object : startingObj}, {$push : {history : {date : new Date(), removeField : "x"}}})

Yes, in CouchDB the revisions of a document are there for replication and are usually lost during compaction. I think UbuntuOne did something to keep them around longer but I'm not sure exactly what they did.
I have a document that I need the historical data on and this is what I do.
In CouchDB I have an _update function. The document has a "history" attribute which is an array. Each time I call the _update function to update the document I append to the history array the current document (minus the history attribute) then I update the document with the changes in the request body. This way I have the entire revision history of the document.
This is a little heavy for large documents, there are some javascript diff tools I was investigating and thinking about only storing the diff between the documents but haven't done it yet.
http://wiki.apache.org/couchdb/How_to_intercept_document_updates_and_perform_additional_server-side_processing
Hope that helps.

I can't speak for mongodb but for couchdb it all really hinges on how you write your views.
I don't know the specifics of what you need but if you have a unique id for a document throughout its lifetime and store a timestamp in that document then you have everything you need for robust querying of that document.
For instance:
document structure:
{ "docid" : "doc1", "ts" : <unix epoch> ...<set of key value pairs> }
map function:
function (doc) {
if (doc.docid && doc.ts)
emit([doc.docid, doc.ts], doc);
}
}
The view will now output each doc and its revisions in historical order like so:
["doc1", 1234567], ["doc1", 1234568], ["doc2", 1234567], ["doc2", 1234568]
You can use view collation and start_key or end_key to restrict the returned documents.
start_key=["doc1", 1] end_key=["doc1", 9999999999999]
will return all historical copies of doc1
start_key=["doc2", 1234567] end_key=["doc2", 123456715]
will return all historical copies of doc2 between 1234567 and 123456715 unix epoch times.
see ViewCollation for more details

Related

How to achieve generic Audit.NET json data processing?

I am using Audit.Net library to log EntityFramework actions into a database (currently everything into one AuditEventLogs table, where the JsonData column stores the data in the following Json format:
{
"EventType":"MyDbContext:test_database",
"StartDate":"2021-06-24T12:11:59.4578873Z",
"EndDate":"2021-06-24T12:11:59.4862278Z",
"Duration":28,
"EntityFrameworkEvent":{
"Database":"test_database",
"Entries":[
{
"Table":"Offices",
"Name":"Office",
"Action":"Update",
"PrimaryKey":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8"
},
"Changes":[
{
"ColumnName":"Address",
"OriginalValue":"test_address",
"NewValue":"test_address"
},
{
"ColumnName":"Contact",
"OriginalValue":"test_contact",
"NewValue":"test_contact"
},
{
"ColumnName":"Email",
"OriginalValue":"test_email",
"NewValue":"test_email2"
},
{
"ColumnName":"Name",
"OriginalValue":"test_name",
"NewValue":"test_name"
},
{
"ColumnName":"OfficeSector",
"OriginalValue":1,
"NewValue":1
},
{
"ColumnName":"PhoneNumber",
"OriginalValue":"test_phoneNumber",
"NewValue":"test_phoneNumber"
}
],
"ColumnValues":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8",
"Address":"test_address",
"Contact":"test_contact",
"Email":"test_email2",
"Name":"test_name",
"OfficeSector":1,
"PhoneNumber":"test_phoneNumber"
},
"Valid":true
}
],
"Result":1,
"Success":true
}
}
Me and my team has a main aspect to achieve:
Being able to create a search page where administrators are able to tell
who changed
what did they change
when did the change happen
They can give a time period, to reduce the number of audit records, and the interesting part comes here:
There should be an input text field which should let them search in the values of the "ColumnValues" section.
The problems I encountered:
Even if I map the Json structure into relational rows, I am unable to search in every column, with keeping the genericity.
If I don't map, I could search in the Json string with LIKE mssql function but on the order of a few 100,000 records it takes an eternity for the query to finish so it is probably not the way.
Keeping the genericity would be important, so we don't need to modify the audit search page every time when we create or modify a new entity.
I only know MSSQL, but is it possible that storing the audit logs in a document oriented database like cosmosDB (or anything else, it was just an example) would solve my problem? Or can I reach the desired behaviour using relational database like MSSQL?
Looks like you're asking for an opinion, in that case I would strongly recommend a document oriented DB.
CosmosDB could be a great option since it supports SQL queries.
There is an extension to log to CosmosDB from Audit.NET: Audit.AzureCosmos
A sample query:
SELECT c.EventType, e.Table, e.Action, ch.ColumnName, ch.OriginalValue, ch.NewValue
FROM c
JOIN e IN c.EntityFrameworkEvent.Entries
JOIN ch IN e.Changes
WHERE ch.ColumnName = "Address" AND ch.OriginalValue = "test_address"
Here is a nice post with lot of examples of complex SQL queries on CosmosDB

Unable to use Ember data with JSONAPI and fragments to support nested JSON data

Overview
I'm using Ember data and have a JSONAPI. Everything works fine until I have a more complex object (let's say an invoice for a generic concept) with an array of items called lineEntries. The line entries are not mapped directly to a table so need to be stored as raw JSON object data. The line entry model also contains default and computed values. I wish to store the list data as a JSON object and then when loaded back from the store that I can manipulate it as normal in Ember as an array of my model.
What I've tried
I've looked at and tried several approaches, the best appear to be (open to suggestions here!):
Fragments
Replace problem models with fragments
I've tried making the line entry model a fragment and then referencing the fragment on the invoice model as a fragmentArray. Line entries add to the array as normal but default values don't work (should they?). It creates the object and I can store it in the backend but when I return it, it fails with either a normalisation issue or a serialiser issue. Can anyone state the format the data be returned in? It's confusing as normalising the data seems to require JSONAPI but the fragment requires JSON serialiser. I've tried several combinations but no luck so far. My line entries don't have actual ids as the data is saved and loaded as a block. Is this an issue?
DS.EmbeddedRecordsMixin
Although not supported in JSONAPI, it sounds possible to use JSONAPI and then switch to JSONSerializer or RESTSerializer for the problem models. If this is possible could someone give me a working example and the JSON format that should be returned by the API? I have header authorisation and other such data so would I still be able to set this at the application level for all request not using my JSONAPI?
Ember-data-save-relationships
I found an add on here that provides an add on to do this. It seems more involved than the other approaches but when I've tried this I can send the data up by setting a the data as embedded. Great! But although it saves it doesn't unwrap it correct and I'm back with the same issues.
Custom serialiser
Replace the models serialiser with something that takes the data and sends it as plain JSON data and then deserialises back into something Ember can use. This sounds similar to the above but I do the heavy lifting. The only reason to do this is because all examples for the above solutions are quite light and don't really show how to set this up with an actual JSONAPI set up that would need it.
Where I am and what I need
Basically all approaches lead to saving the JSON fine but the return JSON from the server not being the correct format or the deserialisation failing but it's unclear what it should be or what needs to change without breaking the existing JSONAPI models that work fine.
If anyone know the format for return API data it may resolve this. I've tried JSONAPI with lineEntries returning the same format as it saved. I've tried placing relationship sections like the add on suggested and I've also tried placing relationship only data against the entries and an include section with all the references. Any help on this would be great as I've learned a lot through this but deadlines a looming and I can't see a viable solution that doesn't break as much as it fixes.
If you are looking for return format for relational data from the API server you need to make sure of the following:
Make sure the relationship is defined in the ember model
Return all successes with a status code of 200
From there you need to make sure you return relational data correctly. If you've set the ember model for the relationship to {async: true} you need only return the id of the relational model - which should also be defined in ember. If you do not set {async: true}, ember expects all relational data to be included.
return data with relationships in JSON API specification
Example:
models\unicorn.js in ember:
import DS from 'ember-data';
export default DS.Model.extend({
user: DS.belongsTo('user', {async: true}),
staticrace: DS.belongsTo('staticrace',{async: true}),
unicornName: DS.attr('string'),
unicornLevel: DS.attr('number'),
experience: DS.attr('number'),
hatchesAt: DS.attr('number'),
isHatched: DS.attr('boolean'),
raceEndsAt: DS.attr('number'),
isRacing: DS.attr('boolean'),
});
in routes\unicorns.js on the api server on GET/:id:
var jsonObject = {
"data": {
"type": "unicorn",
"id": unicorn.dataValues.id,
"attributes": {
"unicorn-name" : unicorn.dataValues.unicornName,
"unicorn-level" : unicorn.dataValues.unicornLevel,
"experience" : unicorn.dataValues.experience,
"hatches-at" : unicorn.dataValues.hatchesAt,
"is-hatched" : unicorn.dataValues.isHatched,
"raceEndsAt" : unicorn.dataValues.raceEndsAt,
"isRacing" : unicorn.dataValues.isRacing
},
"relationships": {
"staticrace": {
"data": {"type": "staticrace", "id" : unicorn.dataValues.staticRaceId}
},
"user":{
"data": {"type": "user", "id" : unicorn.dataValues.userId}
}
}
}
}
res.status(200).json(jsonObject);
In ember, you can call this by chaining model functions. For example when this unicorn goes to race in controllers\unicornracer.js:
raceUnicorn() {
if (this.get('unicornId') === '') {return false}
else {
return this.store.findRecord('unicorn', this.get('unicornId', { backgroundReload: false})).then(unicorn => {
return this.store.findRecord('staticrace', this.get('raceId')).then(staticrace => {
if (unicorn.getProperties('unicornLevel').unicornLevel >= staticrace.getProperties('raceMinimumLevel').raceMinimumLevel) {
unicorn.set('isRacing', true);
unicorn.set('staticrace', staticrace);
unicorn.set('raceEndsAt', Math.floor(Date.now()/1000) + staticrace.get('duration'))
this.set('unicornId', '');
return unicorn.save();
}
else {return false;}
});
});
}
}
The above code sends a PATCH to the api server route unicorns/:id
Final note about GET,POST,DELETE,PATCH:
GET assumes you are getting ALL of the information associated with a model (the example above shows a GET response). This is associated with model.findRecord (GET/:id)(expects one record), model.findAll(GET/)(expects an array of records), model.query(GET/?query=&string=)(expects an array of records), model.queryRecord(GET/?query=&string=)(expects one record)
POST assumes you at least return at least what you POST to the api server from ember , but can also return additional information you created on the apiServer side such as createdAt dates. If the data returned is different from what you used to create the model, it'll update the created model with the returned information. This is associated with model.createRecord(POST/)(expects one record).
DELETE assumes you return the type, and the id of the deleted object, not data or relationships. This is associated with model.deleteRecord(DELETE/:id)(expects one record).
PATCH assumes you return at least what information was changed. If you only change one field, for instance in my unicorn model, the unicornName, it would only PATCH the following:
{
data: {
"type":"unicorn",
"id": req.params.id,
"attributes": {
"unicorn-name" : "This is a new name!"
}
}
}
So it only expects a returned response of at least that, but like POST, you can return other changed items!
I hope this answers your questions about the JSON API adapter. Most of this information was originally gleamed by reading over the specification at http://jsonapi.org/format/ and the ember implementation documentation at https://emberjs.com/api/data/classes/DS.JSONAPIAdapter.html

Creating Collection in Azure Search Service using Indexer

I am using indexer to sync data from my SQL Database to Azure Search Service. I have a field in my SQL View, which contains XML data. The Column contains a list of string. The corresponding field in my Azure Search Service Index in a Collection(Edm.String).
On checking some documentations, I found that Indexer does not change Xml(SQL) to Collection(Azure Search).
Is there any workaround as to how I can get create the Collection from the Xml data?
p.s I am extracting the data from a View, so I can change the Xml to JSON if needed.
UPDATE on October 17, 2016: Azure Search now automatically converts a string coming from a database to an Collection(Edm.String) field if the data represents a JSON string array: for example, ["blue", "white", "red"]
Old response: great timing, we just added a new "field mappings" feature that allows you to do this. This feature will be deployed sometime early next week. I will post a comment on this thread when this is rolled out in all datacenters.
To use it, you indeed need to use JSON. Make sure your source column contains a JSON array, for example ["hello" "world"]. Then, update your indexer definition to contain the new fieldMappings property:
"fieldMappings" : [ { "sourceFieldName" : "YOUR_SOURCE_FIELD", "targetFieldName" : "YOUR_TARGET_FIELD", "mappingFunction" : { "name" : "jsonArrayToStringCollection" } } ]
NOTE: You'll need to use API version 2015-02-28-Preview to add fieldMappings.
HTH,
Eugene

Data structure to use in Sencha Touch similar to Vector in Blackberry

I am a beginner to sencha Touch, basically i am a blackberry developer. Currently we are migrating our application to support Sencha Touch 1.1. Now i have some business solutions like i want to store the selected values in the local database. I mean i have multiple screens where, Once the user selects a value in each of the screen the data should save in the below following format.
[{'key1': "value1", 'key2': "value2", 'key3': "value3" ,'key4': "value4", 'key5': "value5"}]
1. First, the values need to be saved in key value pairs
2. The keys should play the role of primary key, key shouldn't be duplicated.
3. Should be available till the application life cycle or application session, don't need to save the data permanently.
I have come across the concepts like LocalStorageProxy, JsonStore and some others. I don't understand which one i can use for my specific requirements.
May be my question is bit more confusing. I have achieved the same using vector, in Blackberry Java so any data structure similar to this could help me. Need the basic operations like
Create
Add
Remove
Remove all
Fetch elements based on key
Please suggest me some samples or some code snapshots, which may help me to achieve this.
Edit: 1
I have done the changes as per #Ilya139 's answer. Now I am able to add the data with key,
// this is my Object declared in App.js
NSDictionary: {},
// adding the data to object with key
MyApp.NSDictionary['PROD'] = 'SONY JUKE BOX';
//trying to retrieve the elements from vector
var prod = MyApp.NSDictionary['PROD'];
Nut not able to retrieve the elements using the above syntax.
If you don't need to save the data permanently then you can just have a global object with the properties you need. First define the object like this:
new Ext.Application({
name: 'MyApp',
vectorYouNeed: {},
launch: function () { ...
Then add the key-value pairs to the object like this
MyApp.vectorYouNeed[key] = value;
And fetch them like this
value = MyApp.vectorYouNeed[key];
Note that key is a string object i.e. var key='key1'; and value can be any type of object.
To remove one value MyApp.vectorYouNeed[key] = null; And to remove all of them MyApp.vectorYouNeed = {};

MongoDB Update / Upsert Question - Schema Related

I have an problem representing data in MongoDB. I was using this schema design, where a combination of date and word is unique.
{'date':2-1-2011,
'word':word1'
users = [user1, user2, user3, user4]}
{'date':1-1-2011,
'word':word2'
users = [user1, user2]}
There are a fixed number of dates, approximately 200; potentially 100k+ words for each date; and 100k+ users.
I inserted records with an algorithm like so:
while records exist:
message, user, date = pop a record off a list
words = set(tokenise(message))
for word in words:
collection1.insert({'date':date, 'word':word}, {'user':user})
collection2.insert('something similar')
collection3.insert('something similar again')
collection4.insert('something similar again')
However, this schema resulted in extremely large collections and terrible performance was terrible. I am inserting different information into each of the four collections, so it is an extremely large number of operations on the database.
I'm considering representing the data in a format like so, where the words and users arrays are sets.
{'date':'26-6-2011',
'words': [
'word1': ['user1', 'user2'],
'word2': ['user1']
'word1': ['user1', 'user2', 'user3']]}
The idea behind this was to cut down on the number of database operations. So that for each loop of the algorithm, I perform just one update for each collection. However, I am unsure how to perform an update / upsert on this because with each loop of the algorithm, I may need to insert a new word, user, or both.
Could anyone recommend either a way to update this document, or could anyone suggest an alternative schema?
Thanks
Upsert is well suited for dynamically extending documents. Unfortunately I only found it working properly if you have an atomic modifier operation in your update object. like the $addToSet here (mongo shell code):
db.words is empty. add first document for a given date with an upsert.
var query = { 'date' : 'date1' }
var update = { $addToSet: { 'words.word1' : 'user1' } }
db.words.update(query,update,true,false)
check object.
db.words.find();
{ "_id" : ObjectId("4e3bd4eccf7604a2180c4905"), "date" : "date1", "words" : { "word1" : [ "user1" ] } }
now add some more users to first word and another word in one update.
var update = { $addToSet: { 'words.word1' : { $each : ['user2', 'user4', 'user5'] }, 'words.word2': 'user3' } }
db.words.update(query,update,true,false)
again, check object.
db.words.find()
{ "_id" : ObjectId("4e3bd7e9cf7604a2180c4907"), "date" : "date1", "words" : { "word1" : [ "user1", "user2", "user4", "user5" ], "word2" : [ "user3" ] } }
I'm using MongoDB to insert 105mil records with ~10 attributes each. Instead of updating this dataset with changes, I just delete and re insert everything. I found this method to be faster than individually touching each row to see if it was one that I needed to update. You will have better insert speeds if you create JSON formatted text files and use MongoDB's mongoimport tool.
format your data into JSON txt files (one file per collection)
mongoimport each file and specify the collection you want it inserted into