How to implement a key lookup for generated keys table in pentaho Kettle - pentaho

I just started to use Pentaho Kettle for integration. Seems great so far, quite intuitive compared to Talend, which I was also investigating.
I am trying to migrate some customers without their keys. So I have their email addresses.
The customer may already exist in the database, so what I need to do is:
If the customer exists, add it's id to the imported field and continue.
But if the customer doesn't exist I need to get the next Hibernate key from the table Hibernate_Sequences and set it as the id.
But I don't want to always allocate a key, so I want to conditionally execute a step to allocate the next key.
So what I want to do, is in the flow execute the db procedure, which allocates the next key and returns it, only if there's no value in id from the "lookup id" step.
Is this possible?
Just posting my updated flow - so the answer was to use a filter rows component which splits the data on true/false. I really had trouble getting the id out of the database stored proc because of a bug, so I had to use decimal and then convert back to integer (which I also couldn't figure out how to do, so used a javascript component).

Yes it is. As per official documentation (i left only valuable information) "Lookup values are added as new fields onto the stream". So u need just to put step "Filter row" in Flow section and check for "id" which suppose to be added in "Existing Id Lookup" step.

Related

SSIS Inserting incrementing ID with starting range into multiple tables at a time

Is there are one or some reliable variants to solve easy task?
I've got a number of XML files which will be converting into 6 SQL tables (via SSIS).
Before the end of this process i need to add a new (in fact - common for all tables) column (or field) into each of them.
This column represents ID with assigning range and +1 incrementing step. Like (350000, 1)
Yes, i know how to solve it on SSMS SQL stage. But i need a solution at SSIS's pre-SQL converting lvl.
I'm sure there should be well-known pattern-solutions to deal with it.
I am going to take a stab at this. Just to be clear, I don't have a lot of information in your question to go on.
Most XML files that I have dealt with have a common element (let's call it a customer) with one to many attributes (this can be invoices, addresses, email, contacts, etc).
So your table structure will be somewhat star shaped around the customer.
So your XML will have a core customer information on a 1 to 1 basis that can be loaded into a single main table, and will have array information of invoices and an array of addresses etc. Those arrays would be their own tables referencing the customer as a key.
I think you are asking how to create that key.
Load the customer data first and return the identity column to be used as a foreign key when loading the other tables.
I find it easiest to do so in script component. I'm only going to explain how to get the key back. I personally would handle the whole process in C# (deserializing and all).
Add this to Using Block:
Using System.Data.OleDB;
Add this into your main or row processing depending on where the script task / component is:
string SQL = #"INSERT INTO Customer(CustName,field1, field2,...)
values(?,?,?,...); Select cast(scope_identity() as int);";
OleDBCommanad cmd = new OleDBCommand();
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText = SQL;
cmd.Parameters.AddWithValue("#p1",[CustName]);
...
cmd.Connection.Open();
int CustomerKey = (int)cmd.ExecuteScalar(); //ExecuteScalar returns the value in first row / first column which in our case is scope_identity
cmd.Connection.Close();
Now you can use CustomerKey for all of the other tables.

How can i control the multiple users accessing the same database

I am making my first project using vb.net and access. I am trying to develop a project in which the data of patients of the is added from different counters. what I developed is working fine when only one counter adds data in database but when two counters access the same database and try to save the record it gives error primary key can not be duplicate.
I doing what, first I am generating a primary key number i.e. patient no that is unique to every patient. patient no. is one increment to the last saved record. then the user enter (counter data entry operator) adds the patient details and then hit the save button.
In multi-user environment both the operators generate the same patient no. when they hit the new record button as both see the same last saved record. while saving the record one operator save the record successfully but other operator get the duplicate primary key error.
Pessimistic and optimistic locks are not working for me or i am not understanding how to use them.
rsS As New ADODB.Recordset
rsS.Open(str, conn, ADODB.CursorTypeEnum.adOpenDynamic, ADODB.LockTypeEnum.adLockPessimistic)
What I have tried:
I also tried to solve this problem by saving the patient no. in another variable oldPatientno and beore saving the record checked if there is any change in the database if so then regenerate the patient no. but this is not working.

Redgate SQL Data Generator -> Reviewing sqlgen project -> What does "Same as mapped data" mean?

I am reviewing a coworkers sqlgen job and I am unable to figure out what this means in the table generation settings.
Specify number of rows by: "Same as mapped data"
My coworker has this selected on each table, I just need to know what is meant by this I have looked through documentation and been unable to find a definition for this.
I am on version 2 at the moment. Probably not the best question but I need an answer and he is gone for a long period of time and our data is not working correctly with this tool.
The "Same as mapped data" option is only available when you're using an existing table or view as a data source - it just means that the generator will insert all the rows from the source table or view. The other options are:
Numeric value - a set number of rows
Proportion of table - a proportion of the source table/view
Generation time - as much data as the tool can generate in a set time
There's a little more about using an existing table/view as a data source here on the website, but it doesn't have much else useful in it.

Check if a given DB object used in Oracle?

Hi does any one know how to check if a given DB object (Table/View/SP/Function) is used inside Oracle.
For example to check if the table "A" is used in any SP/Function or View definitions. I am trying to cleanup unused objects in the database.
I tried the query select * from all_source WHERE TEXT like '%A%' (A is the table name). Do you thing it is safe to assume it is not being used if it does not return any results?
From this ASKTOM question:
You'll have to enable auditing and then come back in 3 months to see.
We don't track this information by default -- also, even with auditing, it may be very
possible to have an object that is INDIRECTLY accessed (eg: via a foreign key for
example) that won't show up.
You can try USER_DEPENDENCIES but that won't tell you about objects referenced by code in
client apps or via dynamic sql
There's code in the thread for checking ALL_SOURCE, but it's highlighted that this isn't a silver bullet.

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?
You can use that column to make sure your users don't overwrite data from another user.
Lets say user A pulls up record 1 and at the same time user B pulls up record 1. User A edits the record and saves it. 5 minutes later, User B edits the record - but doesn't know about user A's changes. When he saves his changes, you use the recordversion column in your update where clause which will prevent User B from overwriting what User A did. You could detect this invalid condition and throw some kind of data out of date error.
Nothing that I'm aware of, or that Google seems to find quickly.
You con't get anything inherent by using that name for a column. Sure, you can create a column and do the record versioning as described in the next response, but there's nothing special about the column name. You could call the column anything you want and do versioning, and you could call any column RecordVersion and nothing special would happen.
Timestamp is mainly used for replication. I have also used it successfully to determine if the data has been updated since the last feed to the client (when I needed to send a delta feed) and thus pick out only the records which have changed since then. This does require having another table that stores the values of the timestamp (in a varbinary field) at the time you run the report so you can use it compare on the next run.
If you think that timestamp is recording the date or time of the last update, it does not do that, you would need dateTime fields and constraints (To get the orginal datetime)and triggers (to update) to store that information.
Also, keep in mind if you want to keep track of your data, it's a good idea to add these four columns to every table:
CreatedBy(varchar) | CreatedOn(date) | ModifiedBy(varchar) | ModifiedOn(date)
While it doesn't give you full history, it lets you know who and when created an entry, and who and when last modified it. Those 4 columns create pretty powerful tracking abilities without any serious overhead to your DB.
Obviously, you could create a full-blown logging system that tracks every change and gives you full-blown history, but that's not the solution for the issue I think you are proposing.