SSIS Inserting incrementing ID with starting range into multiple tables at a time - sql

Is there are one or some reliable variants to solve easy task?
I've got a number of XML files which will be converting into 6 SQL tables (via SSIS).
Before the end of this process i need to add a new (in fact - common for all tables) column (or field) into each of them.
This column represents ID with assigning range and +1 incrementing step. Like (350000, 1)
Yes, i know how to solve it on SSMS SQL stage. But i need a solution at SSIS's pre-SQL converting lvl.
I'm sure there should be well-known pattern-solutions to deal with it.

I am going to take a stab at this. Just to be clear, I don't have a lot of information in your question to go on.
Most XML files that I have dealt with have a common element (let's call it a customer) with one to many attributes (this can be invoices, addresses, email, contacts, etc).
So your table structure will be somewhat star shaped around the customer.
So your XML will have a core customer information on a 1 to 1 basis that can be loaded into a single main table, and will have array information of invoices and an array of addresses etc. Those arrays would be their own tables referencing the customer as a key.
I think you are asking how to create that key.
Load the customer data first and return the identity column to be used as a foreign key when loading the other tables.
I find it easiest to do so in script component. I'm only going to explain how to get the key back. I personally would handle the whole process in C# (deserializing and all).
Add this to Using Block:
Using System.Data.OleDB;
Add this into your main or row processing depending on where the script task / component is:
string SQL = #"INSERT INTO Customer(CustName,field1, field2,...)
values(?,?,?,...); Select cast(scope_identity() as int);";
OleDBCommanad cmd = new OleDBCommand();
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText = SQL;
cmd.Parameters.AddWithValue("#p1",[CustName]);
...
cmd.Connection.Open();
int CustomerKey = (int)cmd.ExecuteScalar(); //ExecuteScalar returns the value in first row / first column which in our case is scope_identity
cmd.Connection.Close();
Now you can use CustomerKey for all of the other tables.

Related

Implementing Pure SCD Type 6 in Pentaho

I have an interesting task to create a Kettle transformation for loading a table which is a Pure Type 6 dimension. This is driving me crazy
Assume the below table structure
|CustomerId|Name|Value|startdate|enddate|
|1|A|value1|01-01-2001|31-12-2199|
|2|B|value2|01-01-2001|31-12-2199|
Then comes my input file
Name,Value,startdate
A,value4,01-01-2010
C,value3,01-01-2010
After the kettle transformation the data must look like
|CustomerId|Name|Value|startdate|enddate|
|1|A|value1|01-01-2001|31-12-2009|
|1|A|value4|01-01-2010|31-12-2199|
|2|B|value2|01-01-2001|31-12-2199|
|3|C|value3|01-01-2010|31-12-2199|
Check for existing data and find if the incoming record is insert/update
Then generate Surrogate keys only for the insert records & perform inserts.
Retain the surrogate keys for the update records and insert it as new records and assign an open end date for the new record ( A very high value ) and close the previous corresponding record as new record's start date - 1
Can some one please suggest the best way of doing this? I could see only Type 1 and 2 using the Dimension Lookup-Update option
I did this using a mixed approach of ETLT.

How to implement a key lookup for generated keys table in pentaho Kettle

I just started to use Pentaho Kettle for integration. Seems great so far, quite intuitive compared to Talend, which I was also investigating.
I am trying to migrate some customers without their keys. So I have their email addresses.
The customer may already exist in the database, so what I need to do is:
If the customer exists, add it's id to the imported field and continue.
But if the customer doesn't exist I need to get the next Hibernate key from the table Hibernate_Sequences and set it as the id.
But I don't want to always allocate a key, so I want to conditionally execute a step to allocate the next key.
So what I want to do, is in the flow execute the db procedure, which allocates the next key and returns it, only if there's no value in id from the "lookup id" step.
Is this possible?
Just posting my updated flow - so the answer was to use a filter rows component which splits the data on true/false. I really had trouble getting the id out of the database stored proc because of a bug, so I had to use decimal and then convert back to integer (which I also couldn't figure out how to do, so used a javascript component).
Yes it is. As per official documentation (i left only valuable information) "Lookup values are added as new fields onto the stream". So u need just to put step "Filter row" in Flow section and check for "id" which suppose to be added in "Existing Id Lookup" step.

SQL UPDate same table

I know this has been posted before but I am not sure I have got my head around the logic let aloan trying to get it into to JET Friendly Syntax.
Here is what I am trying to do
I have a bunch of records that relate to documents and I am planning on renaming the documents with GUID's however some records point to the same document here lays the problem.
Table
ID, LegacyFullPathNme, GUID, isDuplicate
my code loops through and assigns each record a GUID. then I want to update the Duplicate Documents records with the same GUID
below is my hash at it but doesn't work "Operation must use an updateable Query
UPDATE [IO Documents] a
set a.codedFileName = (SELECT B.codedFileName
FROM [IO Documents] b
WHERE b.LegacyFullPathName = a.LegacyFullPathName)
Currently use a macro to go throw RBAR
I'm a little confused on why you would do it this way since now your globally unique id column isn't unique in that multiple rows will have it.
I think a better method would be to simply create a new table from your old one with a row for each file path.
SELECT LegacyFullPathNme
INTO newtable
FROM oldtable
GROUP BY LegacyFullPathNme;
and then add the guid into the new table afterwards. (note that I didn't test that sql snippet so that might not be proper syntax but I think it gets the point across).
I believe you are looking for something like this:
UPDATE [IO Documents] SET
codedFileName = DMin("codedFileName","IO Documents","LegacyFullPathName='" & LegacyFullPathName & "'")

SQL Server: Remove substrings from field data by iterating through a table of city names

I have two databases, Database A and Database B.
Database A contains some data which needs to be placed in a table in Database B. However, before that can happen, some of that data must be “cleaned up” in the following way:
The table in Database A which contains the data to be placed in Database B has a field called “Desc.” Every now and then the users of the system put city names in with the data they enter into the “Desc” field. For example: a user may type in “Move furniture to new cubicle. New York. Add electric.”
Before that data can be imported into Database B the word “New York” needs to be removed from that data so that it only reads “Move furniture to new cubicle. Add electric.” However—and this is important—the original data in Database A must remain untouched. In other words, Database A’s data will still read “Move furniture to new cubicle. New York. Add electric,” while the data in Database B will read “Move furniture to new cubicle. Add electric.”
Database B contains a table which has a list of the city names which need to be removed from the “Desc” field data from Database A before being placed in Database B.
How do I construct a stored procedure or function which will grab the data from Database A, then iterate through the Cities table in Database B and if it finds a city name in the “Desc” field will remove it while keeping the rest of the information in that field thus creating a recordset which I can then use to populate the appropriate table in Database B?
I have tried several things but still haven’t cracked it. Yet I’m sure this is probably fairly easy. Any help is greatly appreciated!
Thanks.
EDIT:
The latest thing I have tried to solve this problem is this:
DECLARE #cityName VarChar(50)
While (Select COUNT(*) From ABCScanSQL.dbo.tblDiscardCitiesList) > 0
Begin
Select #cityName = ABCScanSQL.dbo.tblDiscardCitiesList.CityName FROM ABCScanSQL.dbo.tblDiscardCitiesList
SELECT JOB_NO, LTRIM(RTRIM(SUBSTRING(JOB_NO, (LEN(job_no) -2), 5))) AS LOCATION
,JOB_DESC, [Date_End] , REPLACE(Job_Desc,#cityName,' ') AS NoCity
FROM fmcs_tables.dbo.Jobt WHERE Job_No like '%loc%'
End
"Job_Desc" is the field which needs to have the city names removed.
This is a data quality issue. You can always make a copy of the [description] in Database A and call it [cleaned_desc].
One simple solution is to write a function that does the following.
1 - Read data from [tbl_remove_these_words]. These are the phrases you want removed.
2 - Compare the input - #var_description, to the rows in the table.
3 - Upon a match, replace with a empty string.
This solution depends upon a cleansing table that you maintain and update.
Run a update query that uses the input from [description] with a call to [fn_remove_these_words] and sets [cleaned_desc] to the output.
Another solution is to look at products like Melisa Data (DQ) product for SSIS or data quality services in the SQL server stack to give you a application frame work to solve the problem.

Duplicate a record and its references in web2py

In my web2py application I have a requirement to duplicate a record and all its references.
For example
one user has a product (sponserid is the user). and this product has so many features stored in other tables (reference to product id).
And my requirement is if an another user is copying this product, the a new record will generate in the product table with new productid and new sponserid. And all the reference table records will also duplicate with the new product id. Effectively a duplicate entry is creating in all the tables only change is product id and sponserid.
The product table fields will change. So I have to write a dynamic query.
If I can write a code like below
product = db(db.tbl_product.id==productid).select(db.tbl_product.ALL).first()
newproduct = db.tbl_product.insert(sponserid=newsponserid)
for field,value in product.iteritems():
if field!='sponserid':
db(db.tbl_product.id==newproduct).update(field=value)
But I cannot refer a field name like this in the update function.
Also I would like to know if there is any other better logic to achieve this requirement.
I would greatly appreciate any suggestions.
For the specific problem of using the .update() method when the field name is stored in a variable, you can do:
db(db.tbl_product.id==newproduct).update(**{field: value})
But an easier approach altogether would be something like this:
product = db(db.tbl_product.id==productid).select(db.tbl_product.ALL).first()
product.update(sponserid=newsponserid)
db.tbl_product.insert(**db.tbl_product._filter_fields(product))
The .update() method applied to the Row object updates only the Row object, not the original record in the db. The ._filter_fields() method of the table takes a record (Row, Storage, or plain dict) and returns a dict including only the fields that belong to the table (it also filters out the id field, which the db will auto-generate).