How to achieve generic Audit.NET json data processing? - sql

I am using Audit.Net library to log EntityFramework actions into a database (currently everything into one AuditEventLogs table, where the JsonData column stores the data in the following Json format:
{
"EventType":"MyDbContext:test_database",
"StartDate":"2021-06-24T12:11:59.4578873Z",
"EndDate":"2021-06-24T12:11:59.4862278Z",
"Duration":28,
"EntityFrameworkEvent":{
"Database":"test_database",
"Entries":[
{
"Table":"Offices",
"Name":"Office",
"Action":"Update",
"PrimaryKey":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8"
},
"Changes":[
{
"ColumnName":"Address",
"OriginalValue":"test_address",
"NewValue":"test_address"
},
{
"ColumnName":"Contact",
"OriginalValue":"test_contact",
"NewValue":"test_contact"
},
{
"ColumnName":"Email",
"OriginalValue":"test_email",
"NewValue":"test_email2"
},
{
"ColumnName":"Name",
"OriginalValue":"test_name",
"NewValue":"test_name"
},
{
"ColumnName":"OfficeSector",
"OriginalValue":1,
"NewValue":1
},
{
"ColumnName":"PhoneNumber",
"OriginalValue":"test_phoneNumber",
"NewValue":"test_phoneNumber"
}
],
"ColumnValues":{
"Id":"40b5egc7-46ca-429b-86cb-3b0781d360c8",
"Address":"test_address",
"Contact":"test_contact",
"Email":"test_email2",
"Name":"test_name",
"OfficeSector":1,
"PhoneNumber":"test_phoneNumber"
},
"Valid":true
}
],
"Result":1,
"Success":true
}
}
Me and my team has a main aspect to achieve:
Being able to create a search page where administrators are able to tell
who changed
what did they change
when did the change happen
They can give a time period, to reduce the number of audit records, and the interesting part comes here:
There should be an input text field which should let them search in the values of the "ColumnValues" section.
The problems I encountered:
Even if I map the Json structure into relational rows, I am unable to search in every column, with keeping the genericity.
If I don't map, I could search in the Json string with LIKE mssql function but on the order of a few 100,000 records it takes an eternity for the query to finish so it is probably not the way.
Keeping the genericity would be important, so we don't need to modify the audit search page every time when we create or modify a new entity.
I only know MSSQL, but is it possible that storing the audit logs in a document oriented database like cosmosDB (or anything else, it was just an example) would solve my problem? Or can I reach the desired behaviour using relational database like MSSQL?

Looks like you're asking for an opinion, in that case I would strongly recommend a document oriented DB.
CosmosDB could be a great option since it supports SQL queries.
There is an extension to log to CosmosDB from Audit.NET: Audit.AzureCosmos
A sample query:
SELECT c.EventType, e.Table, e.Action, ch.ColumnName, ch.OriginalValue, ch.NewValue
FROM c
JOIN e IN c.EntityFrameworkEvent.Entries
JOIN ch IN e.Changes
WHERE ch.ColumnName = "Address" AND ch.OriginalValue = "test_address"
Here is a nice post with lot of examples of complex SQL queries on CosmosDB

Related

Select * Except particular properties in Cosmos DB with SQL API

Consider the following, I have a document that looks something like this:
"id": 2
"properties": {
"desired": {
"Property1": 10,
"Property2": 1,
"Property3": 1,
"$metadata": {
...
},
"$version": 53
}
},
I want to get everything from the document EXCEPT $metadata and $version The obvious solution would be to:
SELECT c["Property1"], c["Property2"] .... FROM c where c["id"] = "2"
However, my document may expand dynamically, hence why the above is suboptimal. I therefore figured that it may be better to exclude just $metadata and $version. I looked at different "interesting" solutions here on stackoverflow, amongst which one suggests to create a temporary table.
Unfortunately, the query needs to be very efficient, because I want to reduce the amount of RUs used. Also I really want to avoid handling the exclusion in the code.
Therefore, how do I exclude particular "columns" from my document, without writing an excessively long query, which may include creating temporary tables.
Cosmos DB does not support "Project Away". You will need to specify properties to project or use * and return all of them.

Product attributes db structure for e-commerce

Backstory:
I'm building an e-commerce web app (online store)
Now I got to the point of choosing a database system and an appropriate design.
I got stuck with developing a design for product attributes
I've been considering of choosing NoSQL (MongoDB) or SQL database systems
I need you advice and help
The problem:
When you choose a product type (e.g. table) it should show you the corresponding filters for such a type (e.g. height, material etc.). When you choose another type, say "car", it provides you with the car specific filter attributes (e.g. fuel, engine volume)
For example, here on one popular online store if you choose a data storage type you get a filter fo this type attributes, such as hard drive size or connection type
Question
What approach is the best for such a problem? I described some below, but maybe you have your own thoughts in regard to it
MongoDB
Possible solution:
You can implement such product attrs structure pretty easy.
You can create one collection with a field attrs for each product and put there whatever you want, like they suggest here (field "details"):
https://docs.mongodb.com/ecosystem/use-cases/product-catalog/#non-relational-data-model
The structure will be
Problem:
With such a solution you don't have product types at all so you can't filter the products out by their types. Each product contains it's own arbitrary structure in attrs field and don't follow any pattern
Ir maybe I can somehow go with this approach?
SQL
There are solutions like single table where all the products store in one table and you end up with as many fields as an attribute number of all the products taken together.
Or for every product type you create a new table
But I won't consider these ones. One is very bulky and another one isn't much flexible and requires a dynamic scheme design
Possible solution
There is one pretty flexible solution called EAV https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Our schema would be:
EAV
Such a design may be done on MongoDB system, but I'm not sure it's been made for such a normalised structure
Problem
The schema is going to get really huge and really hard to query and grasp
If you choose SQL database, take a look PostgreSQL which supports JSON features. Not necessarily you need to follow Database normalization.
If you choose MongoDB, you need to store attrs array with generic {key:"field", value:"value"} pairs.
{id:1, attrs:[{key: "prime", value: true}, {key:"height", value:2}, {key:"material", value:"wood"},{key:"color", "value":"brown"}]}
{id:2, attrs:[{key: "prime", value: true}, {key:"fuel", value:"gas"}, {key:"volume", "value":3}]}
{id:3, attrs:[{key: "prime", value: true}, {key:"fuel", value:"diesel"}, {key:"volume", "value":1.5}]}
Then you define Multi-key index like this:
db.collection.createIndex({"attrs.key":1, "attrs.value":1})
If you want apply step-by-step filters, use MongoDB aggregation with $elemMatch operator
☑ Prime
☑ Fuel
☐ Other
...
☑ Volume 3
☐ Volume 1.5
Query's representation
db.collection.aggregate([
{
$match: {
$and: [
{
attrs: {
$elemMatch: {
key: "prime",
value: true
}
}
},
{
attrs: {
$elemMatch: {
key: "fuel"
}
}
},
{
attrs: {
$elemMatch: {
key: "volume",
"value": 3
}
}
}
]
}
}
])
MongoPlayground

Unable to use Ember data with JSONAPI and fragments to support nested JSON data

Overview
I'm using Ember data and have a JSONAPI. Everything works fine until I have a more complex object (let's say an invoice for a generic concept) with an array of items called lineEntries. The line entries are not mapped directly to a table so need to be stored as raw JSON object data. The line entry model also contains default and computed values. I wish to store the list data as a JSON object and then when loaded back from the store that I can manipulate it as normal in Ember as an array of my model.
What I've tried
I've looked at and tried several approaches, the best appear to be (open to suggestions here!):
Fragments
Replace problem models with fragments
I've tried making the line entry model a fragment and then referencing the fragment on the invoice model as a fragmentArray. Line entries add to the array as normal but default values don't work (should they?). It creates the object and I can store it in the backend but when I return it, it fails with either a normalisation issue or a serialiser issue. Can anyone state the format the data be returned in? It's confusing as normalising the data seems to require JSONAPI but the fragment requires JSON serialiser. I've tried several combinations but no luck so far. My line entries don't have actual ids as the data is saved and loaded as a block. Is this an issue?
DS.EmbeddedRecordsMixin
Although not supported in JSONAPI, it sounds possible to use JSONAPI and then switch to JSONSerializer or RESTSerializer for the problem models. If this is possible could someone give me a working example and the JSON format that should be returned by the API? I have header authorisation and other such data so would I still be able to set this at the application level for all request not using my JSONAPI?
Ember-data-save-relationships
I found an add on here that provides an add on to do this. It seems more involved than the other approaches but when I've tried this I can send the data up by setting a the data as embedded. Great! But although it saves it doesn't unwrap it correct and I'm back with the same issues.
Custom serialiser
Replace the models serialiser with something that takes the data and sends it as plain JSON data and then deserialises back into something Ember can use. This sounds similar to the above but I do the heavy lifting. The only reason to do this is because all examples for the above solutions are quite light and don't really show how to set this up with an actual JSONAPI set up that would need it.
Where I am and what I need
Basically all approaches lead to saving the JSON fine but the return JSON from the server not being the correct format or the deserialisation failing but it's unclear what it should be or what needs to change without breaking the existing JSONAPI models that work fine.
If anyone know the format for return API data it may resolve this. I've tried JSONAPI with lineEntries returning the same format as it saved. I've tried placing relationship sections like the add on suggested and I've also tried placing relationship only data against the entries and an include section with all the references. Any help on this would be great as I've learned a lot through this but deadlines a looming and I can't see a viable solution that doesn't break as much as it fixes.
If you are looking for return format for relational data from the API server you need to make sure of the following:
Make sure the relationship is defined in the ember model
Return all successes with a status code of 200
From there you need to make sure you return relational data correctly. If you've set the ember model for the relationship to {async: true} you need only return the id of the relational model - which should also be defined in ember. If you do not set {async: true}, ember expects all relational data to be included.
return data with relationships in JSON API specification
Example:
models\unicorn.js in ember:
import DS from 'ember-data';
export default DS.Model.extend({
user: DS.belongsTo('user', {async: true}),
staticrace: DS.belongsTo('staticrace',{async: true}),
unicornName: DS.attr('string'),
unicornLevel: DS.attr('number'),
experience: DS.attr('number'),
hatchesAt: DS.attr('number'),
isHatched: DS.attr('boolean'),
raceEndsAt: DS.attr('number'),
isRacing: DS.attr('boolean'),
});
in routes\unicorns.js on the api server on GET/:id:
var jsonObject = {
"data": {
"type": "unicorn",
"id": unicorn.dataValues.id,
"attributes": {
"unicorn-name" : unicorn.dataValues.unicornName,
"unicorn-level" : unicorn.dataValues.unicornLevel,
"experience" : unicorn.dataValues.experience,
"hatches-at" : unicorn.dataValues.hatchesAt,
"is-hatched" : unicorn.dataValues.isHatched,
"raceEndsAt" : unicorn.dataValues.raceEndsAt,
"isRacing" : unicorn.dataValues.isRacing
},
"relationships": {
"staticrace": {
"data": {"type": "staticrace", "id" : unicorn.dataValues.staticRaceId}
},
"user":{
"data": {"type": "user", "id" : unicorn.dataValues.userId}
}
}
}
}
res.status(200).json(jsonObject);
In ember, you can call this by chaining model functions. For example when this unicorn goes to race in controllers\unicornracer.js:
raceUnicorn() {
if (this.get('unicornId') === '') {return false}
else {
return this.store.findRecord('unicorn', this.get('unicornId', { backgroundReload: false})).then(unicorn => {
return this.store.findRecord('staticrace', this.get('raceId')).then(staticrace => {
if (unicorn.getProperties('unicornLevel').unicornLevel >= staticrace.getProperties('raceMinimumLevel').raceMinimumLevel) {
unicorn.set('isRacing', true);
unicorn.set('staticrace', staticrace);
unicorn.set('raceEndsAt', Math.floor(Date.now()/1000) + staticrace.get('duration'))
this.set('unicornId', '');
return unicorn.save();
}
else {return false;}
});
});
}
}
The above code sends a PATCH to the api server route unicorns/:id
Final note about GET,POST,DELETE,PATCH:
GET assumes you are getting ALL of the information associated with a model (the example above shows a GET response). This is associated with model.findRecord (GET/:id)(expects one record), model.findAll(GET/)(expects an array of records), model.query(GET/?query=&string=)(expects an array of records), model.queryRecord(GET/?query=&string=)(expects one record)
POST assumes you at least return at least what you POST to the api server from ember , but can also return additional information you created on the apiServer side such as createdAt dates. If the data returned is different from what you used to create the model, it'll update the created model with the returned information. This is associated with model.createRecord(POST/)(expects one record).
DELETE assumes you return the type, and the id of the deleted object, not data or relationships. This is associated with model.deleteRecord(DELETE/:id)(expects one record).
PATCH assumes you return at least what information was changed. If you only change one field, for instance in my unicorn model, the unicornName, it would only PATCH the following:
{
data: {
"type":"unicorn",
"id": req.params.id,
"attributes": {
"unicorn-name" : "This is a new name!"
}
}
}
So it only expects a returned response of at least that, but like POST, you can return other changed items!
I hope this answers your questions about the JSON API adapter. Most of this information was originally gleamed by reading over the specification at http://jsonapi.org/format/ and the ember implementation documentation at https://emberjs.com/api/data/classes/DS.JSONAPIAdapter.html

Best design approach to query documents for 'labels'

I am storing documents - and each document has a collection of 'labels' - like this. Labels are user defined, and could be any plain text.
{
"FeedOwner": "4ca44f7d-b3e0-4831-b0c7-59fd9e5bd30d",
"MessageBody": "blablabla",
"Labels": [
{
"IsUser": false,
"Text": "Mine"
},
{
"IsUser": false,
"Text": "Incomplete"
}
],
"CreationDate": "2012-04-30T15:35:20.8588704"
}
I need to allow the user to query for any combination of labels, i.e.
"Mine" OR "Incomplete"
"Incomplete" only
or
"Mine" AND NOT "Incomplete"
This results in Raven queries like this:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND (Labels,
Text:Mine) AND (Labels,Text:Incomplete)
I realise that Raven will generate a 'dynamic index' for queries it has not seen before. I can see with this, this could result in a lot of indexes.
What would be the best approach to achieving this functionality with Raven?
[EDIT]
This is my Linq, but I get an error from Raven "All is not supported"
var result = from candidateAnnouncement in session.Query<FeedAnnouncement>()
where listOfRequiredLabels.All(
requiredLabel => candidateAnnouncement.Labels.Any(
candidateLabel => candidateLabel.Text == requiredLabel))
select candidateAnnouncement;
[EDIT]
I had a similar question, and the answer for that resolved both questions: Raven query returns 0 results for collection contains
Please notice that in case of FeedOwner being a unique property of your documents the query doesn't make a lot of sense at all. In that case, you should do it on the client using standard linq to objects.
Now, given that FeedOwner is not something unique, your query is basically correct. However, depending on what you actually want to return, you may need to create a static index instead:
If you're using the dynamically generated indexes, then you will always get the documents as the return value and you can't get the particular labels which matched the query. If this is ok for you, then just go with that approach and let the query optimizer do its job (only if you have really a lot of documents build the index upfront).
In the other case, where you want to use the actual labels as the query result, you have to build a simple map index upfront which covers the fields you want to query upon, in your sample this would be FeedOwner and Text of every label. You will have to use FieldStorage.Yes on the fields you want to return from a query, so enable that on the Text property of your labels. However, there's no need to do so with the FeedOwner property, because it is part of the actual document which raven will give you as part of any query results. Please refer to ravens documentation to see how you can build a static index and use field storage.

What are the resources or tools used to manage temporal data in key-value stores?

I'm considering using MongoDB or CouchDB on a project that needs to maintain historical records. But I'm not sure how difficult it will be to store historical data in these databases.
For example, in his book "Developing Time-Oriented Database Applications in SQL," Richard Snodgrass points out tools for retrieving the state of data as of a particular instant, and he points out how to create schemas that allow for robust data manipulation (i.e. data manipulation that makes invalid data entry difficult).
Are there tools or libraries out there that make it easier to query, manipulate, or define temporal/historical structures for key-value stores?
edit:
Note that from what I hear, the 'version' data that CouchDB stores is erased during normal use, and since I would need to maintain historical data, I don't think that's a viable solution.
P.S. Here's a similar question that was never answered: key-value-store-for-time-series-data
There are a couple options if you wanted to store the data in MongoDB. You could just store each version as a separate document, as then you can query to get the object at a certain time, the object at all times, objects over ranges of time, etc. Each document would look something like:
{
object : whatever,
date : new Date()
}
You could store all the versions of a document in the document itself, as mikeal suggested, using updates to push the object itself into a history array. In Mongo, this would look like:
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
// make changes to obj
...
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
A cooler (I think) and more space-efficient way, although less time-efficient, might be to store a history in the object itself about what changed in the object at each time. Then you could replay the history to build the object at a certain time. For instance, you could have:
{
object : startingObj,
history : [
{ date : d1, addField : { x : 3 } },
{ date : d2, changeField : { z : 7 } },
{ date : d3, removeField : "x" },
...
]
}
Then, if you wanted to see what the object looked like between time d2 and d3, you could take the startingObj, add the field x with the value 3, set the field z to the value of 7, and that would be the object at that time.
Whenever the object changed, you could atomically push actions to the history array:
db.foo.update({object : startingObj}, {$push : {history : {date : new Date(), removeField : "x"}}})
Yes, in CouchDB the revisions of a document are there for replication and are usually lost during compaction. I think UbuntuOne did something to keep them around longer but I'm not sure exactly what they did.
I have a document that I need the historical data on and this is what I do.
In CouchDB I have an _update function. The document has a "history" attribute which is an array. Each time I call the _update function to update the document I append to the history array the current document (minus the history attribute) then I update the document with the changes in the request body. This way I have the entire revision history of the document.
This is a little heavy for large documents, there are some javascript diff tools I was investigating and thinking about only storing the diff between the documents but haven't done it yet.
http://wiki.apache.org/couchdb/How_to_intercept_document_updates_and_perform_additional_server-side_processing
Hope that helps.
I can't speak for mongodb but for couchdb it all really hinges on how you write your views.
I don't know the specifics of what you need but if you have a unique id for a document throughout its lifetime and store a timestamp in that document then you have everything you need for robust querying of that document.
For instance:
document structure:
{ "docid" : "doc1", "ts" : <unix epoch> ...<set of key value pairs> }
map function:
function (doc) {
if (doc.docid && doc.ts)
emit([doc.docid, doc.ts], doc);
}
}
The view will now output each doc and its revisions in historical order like so:
["doc1", 1234567], ["doc1", 1234568], ["doc2", 1234567], ["doc2", 1234568]
You can use view collation and start_key or end_key to restrict the returned documents.
start_key=["doc1", 1] end_key=["doc1", 9999999999999]
will return all historical copies of doc1
start_key=["doc2", 1234567] end_key=["doc2", 123456715]
will return all historical copies of doc2 between 1234567 and 123456715 unix epoch times.
see ViewCollation for more details