Retrieve last_visited value in Aerospike - aerospike

According to the Aerospike docs you can
use a UDF to compare the last_visited value of a record
How do you access the last_visited value of a record? It is not included in the records metadata.

I believe its implying the last time the record was "touched". It has to be an update, not merely a read. A read does not change the record in any way. A touch operation on the record updates its metadata. In the metadata, look for last_update_time (LUT) (introduced ver 3.8.3+). You can access a record's last_update_time only inside a UDF (Lua - User Defined Function) as record.last_update_time.

Related

AWS DMS CDC to S3 target

So I was playing around seeing what could be achieved using Data Migration Service Chance Data Capture, to take data from MSSQL to S3 & also Redshift.
The redshift testing was fine, if I delete a record in my source DB, a second or two later the record disappears from Redshift. Same with Insert/update etc ..
But S3 ...
You get the original record from the first full load.
Then if you update a record in source, S3 receives a new copy of the record, marked with an 'I'.
If I delete a record, I get another copy of the record marked with a 'D'.
So my question is - what do I do with all this ?
How would I query my S3 bucket to see the 'current' state of my data set as reflecting the source database ?
Do I have to script some code myself to pick up all these files and process them, performing the inserts/updates and deletes until I finally resolve back to a 'normal' data set ?
Any insight welcomed !
The records containing 'I', 'D' or 'U' are actually CDC data (change data capture). This is called sometimes called "history" or "historical data". This type of data has some applications in data warehousing and also this can also be used in many Machine learning uses cases.
Now coming to the next point, in order to get 'current' state of data set you have to script/code yourself. You can use AWS Glue to perform the task. For-example, This post explains something similar.
If you do not want to maintain the glue code, then a shortcut is not to use s3 target with DMS directly, but use Redshift target and once all CDC is applied offload the final copy to S3 using Redshift unload command.
As explained here about what 'I', 'U' and 'D' means.
What we do to get the current state of the db? An alternate is to first of all add this additional column to the fullload files as well, i.e. the initial loaded files before CDC should also have this additional column. How?
Now query the data in athena in such a way where we exclude the records where Op not in ("D", "U") or AR_H_OPERATION NOT IN ("DELETE", "UPDATE"). Thus you get the correct count (ONLY COUNT as 'U' would only come if there is already an I for that entry).
SELECT count(*) FROM "database"."table_name"
WHERE Op NOT IN ('D','U')
Also to get all the records you may try something in athena with a complex sql, where Op not in ('D') and records when Op IN = 'I' and count 1 or else if count 2, pick the latest one or Op = 'U'.

Rally Lookback, Snapshot with empty custom field

I am trying to get a snapshot of deleted userstory to get value for a custom field(c_Dep). I get the snapshot but the custom field is empty. It had value in it. Does lookback not save value for cutomer created cutom field?
findConfig: {
_TypeHierarchy: 'HierarchicalRequirement',
"ObjectID": 12345,
"_ValidFrom": {
"$lte": "2017-01-25T19:00:57.475Z"
}
Sarita, It is hard to tell from the information you have given what is going on precisely. However, I can give you some pointers
The Lookback API will store changes in values for custom fields. The selection you have shown is valid from 24thJan to 25thJan. During this period was the custom field set? Probably not, because the array is only one long and I think it is showing the creation event.
Was the custom field updated to contain something after this time period?
The reason for asking is that a common misunderstanding is that the records stored in the lookback database will hold the current value of fields - it doesn't. It holds the changes in fields. If c_Dependencies didn't change during that time period, you may not see an entry returned in the array. The next entry in the database might be the record where the c_Dependencies field was set (changed from null to something) and that might be 'after' your time period filter.
It looks like your query is requesting snapshots earlier than 2017/1/25 ($lte). Since there's only one, it's probably the creation snapshot. If you get all snapshots for the ObjectID by removing the _ValidFrom parameter, you should see the changes made to c_Dep after artifact creation.
As I am not allowed to comment, I have to post a new answer.
I think William Scott meant remove the ValidTo filter. The one you have is the creation change. The update will be afterwards.

How to implement a key lookup for generated keys table in pentaho Kettle

I just started to use Pentaho Kettle for integration. Seems great so far, quite intuitive compared to Talend, which I was also investigating.
I am trying to migrate some customers without their keys. So I have their email addresses.
The customer may already exist in the database, so what I need to do is:
If the customer exists, add it's id to the imported field and continue.
But if the customer doesn't exist I need to get the next Hibernate key from the table Hibernate_Sequences and set it as the id.
But I don't want to always allocate a key, so I want to conditionally execute a step to allocate the next key.
So what I want to do, is in the flow execute the db procedure, which allocates the next key and returns it, only if there's no value in id from the "lookup id" step.
Is this possible?
Just posting my updated flow - so the answer was to use a filter rows component which splits the data on true/false. I really had trouble getting the id out of the database stored proc because of a bug, so I had to use decimal and then convert back to integer (which I also couldn't figure out how to do, so used a javascript component).
Yes it is. As per official documentation (i left only valuable information) "Lookup values are added as new fields onto the stream". So u need just to put step "Filter row" in Flow section and check for "id" which suppose to be added in "Existing Id Lookup" step.

What should be set as id in DB, if there is no proper field in id-less data

I am building database from XML. Subtree of XML on specific level should be equivalent of single record.
The problem is, that in this specific subtree there is no id, or any other type of value, which is uniq for whole XML. There is also lack group of fileds, which can help my create id based on several values.
Why I need this id? Because later I will get updated version of this XML and I want to prepare mechanism, which will detect whether record with specific id changed and update it.
I thought about hash function, like MD5, but this will fail, because if specific XML subtree will change, MD5 also will also change, so new id will be useless to asking about old record in database.
I am not able to tell which fields in XML subtree can be changed.

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?
You can use that column to make sure your users don't overwrite data from another user.
Lets say user A pulls up record 1 and at the same time user B pulls up record 1. User A edits the record and saves it. 5 minutes later, User B edits the record - but doesn't know about user A's changes. When he saves his changes, you use the recordversion column in your update where clause which will prevent User B from overwriting what User A did. You could detect this invalid condition and throw some kind of data out of date error.
Nothing that I'm aware of, or that Google seems to find quickly.
You con't get anything inherent by using that name for a column. Sure, you can create a column and do the record versioning as described in the next response, but there's nothing special about the column name. You could call the column anything you want and do versioning, and you could call any column RecordVersion and nothing special would happen.
Timestamp is mainly used for replication. I have also used it successfully to determine if the data has been updated since the last feed to the client (when I needed to send a delta feed) and thus pick out only the records which have changed since then. This does require having another table that stores the values of the timestamp (in a varbinary field) at the time you run the report so you can use it compare on the next run.
If you think that timestamp is recording the date or time of the last update, it does not do that, you would need dateTime fields and constraints (To get the orginal datetime)and triggers (to update) to store that information.
Also, keep in mind if you want to keep track of your data, it's a good idea to add these four columns to every table:
CreatedBy(varchar) | CreatedOn(date) | ModifiedBy(varchar) | ModifiedOn(date)
While it doesn't give you full history, it lets you know who and when created an entry, and who and when last modified it. Those 4 columns create pretty powerful tracking abilities without any serious overhead to your DB.
Obviously, you could create a full-blown logging system that tracks every change and gives you full-blown history, but that's not the solution for the issue I think you are proposing.