loading fact table using pentaho data integation - pentaho

I am using pentaho DI to insert data into fact table . Thing is the table from which I am populating my fact table contains 10000 reccords and changes frequently . Using database lookups and insert update I am able to load my fact table correctly once . But when new records are added to my souce table(say it becomes 15000) and I am again inserting records into fact table then these 15000 recods are again added to my fact table . What I want is to add new 5000 records that do not exist in fact table .Please suggest me on what transformations I need to perform to acheive this .

try doing an upsertion instead insertion (if the row exists then update , if not insert)

You can use some DB function.
In SQL Server 2008, there is a merge sql that solve this type problem.
It is a example as follows in SQL Server 2008:
MERGE Production.UnitMeasure AS target
USING (SELECT #UnitMeasureCode, #Name) AS source (UnitMeasureCode, Name)
ON (target.UnitMeasureCode = source.UnitMeasureCode)
WHEN MATCHED THEN
UPDATE SET Name = source.Name
WHEN NOT MATCHED THEN
INSERT (UnitMeasureCode, Name)
VALUES (source.UnitMeasureCode, source.Name)
OUTPUT deleted., $action, inserted. INTO #MyTempTable;

Related

How to update oracle table with long string in where clause

I am using bulk copy to insert data from datatable(got data from oracle database) to sql table. So that is good and I do not have any problem whith that. So after this job when data inserted correctly I am trying to update a field of oracle database table with key of above datatable. the schema to my approach shows below.
update table1 set column1=1 where id in ( all keys of above datatable)
It is not working and oracle do not run that because string literal too long.
how to can I solve that? I do not want to create a temp table in oracle because this service working all time.
I'd consider using a subquery instead, e.g.
update table1 set
column1 = 1
where id in (select key
from above_datatable
)

Using MERGE in SQL Server 2012 to insert/update data

I am using SQL Server 2012 and have two tables with identical structure. I want to insert new records from table 1 to table 2 if they don't already exist in table 2.
If they already exist, I want to update all of the existing records in table 2.
There are some 30 columns in my tables and I want to update all of them.
Can someone please help with this? I had a look at various links posted over internet, but quite don't understand how my statement should look like.
It's really not that hard....
You need:
a source table (or query) to provide data
a target table to merge it into
a condition on which those two tables are checked
a statement what to do if a match (on that condition) is found
a statement what to do if NO match (on that condition) is found
So basically, it's something like:
-- this is your TARGET table - this is where the data goes into
MERGE dbo.SomeTable AS target
-- this is your SOURCE table where the data comes from
USING dbo.AnotherTable AS source
-- this is the CONDITION they have to "meet" on
ON (target.SomeColumn = source.AnotherColumn)
-- if there's a match, so if that row already exists in the target table,
-- then just UPDATE whatever columns in the existing row you want to update
WHEN MATCHED THEN
UPDATE SET Name = source.Name,
OtherCol = source.SomeCol
-- if there's NO match, that is the row in the SOURCE does *NOT* exist in the TARGET yet,
-- then typically INSERT the new row with whichever columns you're interested in
WHEN NOT MATCHED THEN
INSERT (Col1, Col2, ...., ColN)
VALUES (source.Val1, source.Val2, ...., source.ValN);

How to overwrite matching key in "load from" statement in informix sql

I have a table (in informix ) which stores two columns :empId and status (Y/N). There are some other scripts which, when run, update the status of these employeeIDs based on certain conditions.
The task at hand is , a user provides a path to a file containing employee-IDs. I have a script which then looks at this file and does a "load from user_supplied_file insert into employeeStatusTable".
All the employeeIDs mentioned in this file are to be inserted in this table with the status 'N'. Th real issue is the user-supplied file may contains an employeeId that is already present in the table with the status updated to 'Y' (by some other script or job). In this case, the existing entry should get overwritten. In short, the entry in the table should read "empId", "N".
Is there any way to acheive this? Thanks in advance.
As far I know , the LOAD statement is limited to use together of INSERT statement.
I pretty sure there a lot of ways to do this , I will suggest two way:
In both cases, is supported only for database version >= 11.50 and have certain limitations like:
The Merge works only if the two tables have 1 to 1 relationship
The external table is limited to Database Server file system, not will access anything on client machine
SUGGESTION 1
Load into a temporary table and then use the MERGE statement.
create temp table tp01 ( cols.... ) with no log ;
load from xyz.txt insert into tp01 ;
merge into destTable as A
using tp01 as B
ON A.empID = B.empID
WHEN MATCHED THEN UPDATE SET status = 'N'
WHEN NOT MATCHED THEN INSERT (empid, status) values ( b.empid, 'N');
drop table tp01;
SUGGESTION 2
Create a external table to you TXT file and then just use the MERGE or UPDATE using this table when needed.
create external table ex01 .... using ( datafile('file:/tmp/your.txt'), delimited ...);
merge into destTable as A
using ex01 as B
ON A.empID = B.empID
WHEN MATCHED THEN UPDATE SET status = 'N'
WHEN NOT MATCHED THEN INSERT (empid, status) values ( b.empid, 'N');

Using OUTPUT with joined tables

Why doesn't the following work?
INSERT INTO dbo.Pending_Break
(Pending_Pod_ID, Break_Date_Time, Break_Length, Booked_Length)
OUTPUT INSERTED.Pending_BH_ID -- This is the inserted identity
, INSERTED.Pending_Pod_ID
, INSERTED.Break_Name
, INSERTED.Break_Date_Time
, pb.PENDING_BH_ID -- complains on this one
INTO #InsertedPendingBreaks
SELECT ippod.Pending_Pod_ID,
pb.Break_Date_Time
pb.break_length,
0
FROM PendingBreak pb
JOIN InsertedPod ippod ON ...
Can I not use anything other than Inserted or Deleted in the OUTPUT clause?
Can I not use anything other than Inserted or Deleted in the OUTPUT
clause?
No you can't. At least not with an insert. In SQL Server 2008 you can convert your insert to a merge statement instead and there you can use values from the source table in the output clause.
Have a look at this question how to do that in SQL Server 2008. Using merge..output to get mapping between source.id and target.id
The inserted and deleted tables are available only in DML triggers. I'm not sure if you just pulled a code snippet out of a trigger, but if that is a standalone batch then it won't work.
Also, there is no updated table. An update is a delete and then an insert for this. deleted contains the old data and inserted contains the new data on an UPDATE.

Using SQL Server DTS Package to Conditionally Insert / Update Rows in Destination Table

I want to create a DTS Package to pull data from an Oracle table into a SQL2K
table. How can I insert rows that are not already in the SQL2K table and
update rows that already exist in the SQL2K table?
I guess I could truncate and repopulate the entire table or create a
temporary table and then do updates/inserts from the temp table into the
destination table.
Is there any easier way using DTS?
Thanks,
Rokal
You can do that in a DTS package using two data driven query tasks: one for the inserts and one for the updates. The data driven query tasks are a bit of a pain to use, but they work. I've also done this (a "merge") in sql server 2000 with an AS/400 database using a dynamic t-sql. You'd write a t-sql script that outputs psql and runs it againt a linked server to the Oracle database.
UPDATE:
A DTS "data driven query task" will let you insert|update data from the sql server connection in DTS to an oracle server connection in DTS w/o a temp table or a linked server.
Update2; here's some more info on what I mean:
http://www.databasejournal.com/features/mssql/article.php/3315951
http://msdn.microsoft.com/en-us/library/aa933507(SQL.80).aspx
Are you keeping the same primary key values?
If you are you have a number of options, some versions of SQL support the MERGE statement which will update or insert just like you require.
Or you can write your own.
Something along the lines of loading all the rows into a staging table in your SQL database and row by row checking for the existence of your primary key in your main SQL table. If the key exists update the row and if not insert it.
Yes, the primary key values in the source and destination will match.
I was hoping to accomplish this task without the use of a temporary (staging) table.
Also, I am using sql server 2000 so the MERGE statement is not available.
Try:
DELETE FROM dbo.WhateverTable WHERE WhateverTableID IN (SELECT WhateverTableID FROM MySource)
It might be pretty slow, use join instead:
Delete a
from firstTable a join secondTable b on a.id = b.id
There's no way with TSQL to do a INSERT or UPDATE in the same statement, but you could very easily do it in two statements (as you mentioned above).
Statement 1:
DELETE FROM dbo.WhateverTable
WHERE WhateverTableID IN (SELECT WhateverTableID FROM MySource)
Statement 2:
INSERT INTO dbo.WhateverTable
SELECT * FROM MySource
Also, is there any reason why you don't want to use a temp table?