How to overwrite matching key in "load from" statement in informix sql - sql

I have a table (in informix ) which stores two columns :empId and status (Y/N). There are some other scripts which, when run, update the status of these employeeIDs based on certain conditions.
The task at hand is , a user provides a path to a file containing employee-IDs. I have a script which then looks at this file and does a "load from user_supplied_file insert into employeeStatusTable".
All the employeeIDs mentioned in this file are to be inserted in this table with the status 'N'. Th real issue is the user-supplied file may contains an employeeId that is already present in the table with the status updated to 'Y' (by some other script or job). In this case, the existing entry should get overwritten. In short, the entry in the table should read "empId", "N".
Is there any way to acheive this? Thanks in advance.

As far I know , the LOAD statement is limited to use together of INSERT statement.
I pretty sure there a lot of ways to do this , I will suggest two way:
In both cases, is supported only for database version >= 11.50 and have certain limitations like:
The Merge works only if the two tables have 1 to 1 relationship
The external table is limited to Database Server file system, not will access anything on client machine
SUGGESTION 1
Load into a temporary table and then use the MERGE statement.
create temp table tp01 ( cols.... ) with no log ;
load from xyz.txt insert into tp01 ;
merge into destTable as A
using tp01 as B
ON A.empID = B.empID
WHEN MATCHED THEN UPDATE SET status = 'N'
WHEN NOT MATCHED THEN INSERT (empid, status) values ( b.empid, 'N');
drop table tp01;
SUGGESTION 2
Create a external table to you TXT file and then just use the MERGE or UPDATE using this table when needed.
create external table ex01 .... using ( datafile('file:/tmp/your.txt'), delimited ...);
merge into destTable as A
using ex01 as B
ON A.empID = B.empID
WHEN MATCHED THEN UPDATE SET status = 'N'
WHEN NOT MATCHED THEN INSERT (empid, status) values ( b.empid, 'N');

Related

How to append unique values from temp_tbl into original_tbl (SQL Server)?

I have a table that I'm trying to append unique values to. Every month I get list of user logins to import into this table. I would like to keep all the original values and just append the new and unique values onto the existing table. Both the table and the flatfile have a single column, with unique values, built like this:
_____
login
abcde001
abcde002
...
_____
I'm bulk ingesting the flat file into a temp table, with this:
IF OBJECT_ID('tempdb..#FLAT_FILE_TBL') IS NOT NULL
DROP TABLE #FLAT_FILE_TBL
CREATE TABLE #FLAT_FILE_TBL
(
ntlogin2 nvarchar(15)
)
BULK INSERT #FLAT_FILE_TBL
FROM 'C:\ImportFiles\logins_Dec2021.csv'
WITH (FIELDTERMINATOR = ' ');
Is there a join that would give me the table with existing values + new unique values appended? I'd rather not hard code a loop to evaluate it line by line.
Something like (pseudocode):
append unique {login} from temp_tbl into original_tbl
Hopefully it's an easy answer for someone out there.
Thanks!
Poster on Reddit r/sql provided this answer, which I'm pursuing:
Merge statement?
It looks like using a merge statement will do exactly what I want. Thanks for those who already posted replies.
You can check if a record exists using 'EXISTS' clause and insert if it doesn't exist in the target table. You can also use MERGE statement to achieve the same. Depending on what you want to do to the existing records in the target table, you can modify the Merge statement. Here since you only want to insert new records, you need to specify only what you want to do when a new record comes in. Here is an example
MERGE original_tbl T
USING temp_tbl S
ON T.login = S.login
WHEN NOT MATCHED THEN
INSERT (login)
VALUES(S.login)
Another solution would be to left join the target table to the temp table and insert only when the record doesn't exist.
INSERT INTO original_tbl(login)
SELECT S.Login
FROM temp_tbl S
LEFT JOIN original_tbl T
ON S.Login = T.Login
WHERE T.Login IS NULL

Merge update records in a final table

I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.

SQL check if record exists in table before bulk insert

I currently have a stored procedure that performs bulk insert into a table named "TomorrowPatients" from a .csv file. When performing the bulk insert, I need to determine if the record being added already exists within the table and if so DO NOT add the record. If the record does not exist then I need to APPEND it to the table. What is the most efficient way to go about this? Any help will be greatly appreciated.
EDIT: I have created a temp table called "TomorrowPatients_Temp". I am trying to use this table to determine which records to insert.
Insert your the whole data into a temporary table, say #TempData.Then use the following code :
INSERT INTO TommorowPatients
SELECT * FROM #TempTable TT
LEFT JOIN TommorowPatients TP ON TT.PatientId = TP.PatienId
AND TT.PatientName = TP.PatientName
AND TT.PatientSSN = TP.PatientSSN
WHERE TP.PatientId IS NULL
Where PatientId is you primary key for the TommorowPatients table.
DO NOT add the "RoomNumber" column with the LEFT JOIN like : TT.RoomNo = TP.RoomNo. This way even if the room number changes, new data won't be inserted as we have joined only based on patient specific data.

How to fix this stored procedure problem

I have 2 tables. The following are just a stripped down version of these tables.
TableA
Id <pk> incrementing
Name varchar(50)
TableB
TableAId <pk> non incrementing
Name varchar(50)
Now these tables have a relationship to each other.
Scenario
User 1 comes to my site and does some actions(in this case adds rows to Table A). So I use a SqlBulkCopy all this data in Table A.
However I need to add the data also to Table B but I don't know the newly created Id's from Table A as SQLBulkCopy won't return these.
So I am thinking of having a stored procedure that finds all the id's that don't exist in Table B and then insert them in.
INSERT INTO TableB (TableAId , Name)
SELECT Id,Name FROM TableA as tableA
WHERE not exists( ...)
However this comes with a problem. A user at any time can delete something from TableB so if a user deletes say a row and then another user comes around or even the same user comes around and does something to Table A my stored procedure will bring back that deleted row in Table B. Since it will still exist in Table A but not Table B and thus satisfy the stored procedure condition.
So is there a better way of dealing with two tables that need to be updated when using bulk insert?
SQLBulkCopy complicates this so I'd consider using a staging table and an OUTPUT clause
Example, in a mixture of client pseudo code and SQL
create SQLConnection
Create #temptable
Bulkcopy to #temptable
Call proc on same SQLConnection
proc:
INSERT tableA (..)
OUTPUT INSERTED.key, .. INTO TableB
SELECT .. FROM #temptable
close connection
Notes:
temptable will be local to the connection and be isolated
the writes to A and B will be atomic
overlapping or later writes don't care about what happens later to A and B
emphasising the last point, A and B will only ever be populated from the set of rows in #temptable
Alternative:
Add another column to A and B called sessionid and use that to identify row batches.
One option would be to use SQL Servers output clause:
INSERT YourTable (name)
OUTPUT INSERTED.*
VALUES ('NewName')
This will return the id, name of the inserted rows to the client, so you can use them in the insert operation for the second table.
Just as an alternative solution you could use database triggers to update the second table.

Does DB2 have an "insert or update" statement?

From my code (Java) I want to ensure that a row exists in the database (DB2) after my code is executed.
My code now does a select and if no result is returned it does an insert. I really don't like this code since it exposes me to concurrency issues when running in a multi-threaded environment.
What I would like to do is to put this logic in DB2 instead of in my Java code.
Does DB2 have an insert-or-update statement? Or anything like it that I can use?
For example:
insertupdate into mytable values ('myid')
Another way of doing it would probably be to always do the insert and catch "SQL-code -803 primary key already exists", but I would like to avoid that if possible.
Yes, DB2 has the MERGE statement, which will do an UPSERT (update or insert).
MERGE INTO target_table USING source_table ON match-condition
{WHEN [NOT] MATCHED
THEN [UPDATE SET ...|DELETE|INSERT VALUES ....|SIGNAL ...]}
[ELSE IGNORE]
See:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0010873.htm
https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.sql.ref.doc/doc/r0010873.html
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/merge?lang=en
I found this thread because I really needed a one-liner for DB2 INSERT OR UPDATE.
The following syntax seems to work, without requiring a separate temp table.
It works by using VALUES() to create a table structure . The SELECT * seems surplus IMHO but without it I get syntax errors.
MERGE INTO mytable AS mt USING (
SELECT * FROM TABLE (
VALUES
(123, 'text')
)
) AS vt(id, val) ON (mt.id = vt.id)
WHEN MATCHED THEN
UPDATE SET val = vt.val
WHEN NOT MATCHED THEN
INSERT (id, val) VALUES (vt.id, vt.val)
;
if you have to insert more than one row, the VALUES part can be repeated without having to duplicate the rest.
VALUES
(123, 'text'),
(456, 'more')
The result is a single statement that can INSERT OR UPDATE one or many rows presumably as an atomic operation.
This response is to hopefully fully answer the query MrSimpleMind had in use-update-and-insert-in-same-query and to provide a working simple example of the DB2 MERGE statement with a scenario of inserting AND updating in one go (record with ID 2 is updated and record ID 3 inserted).
CREATE TABLE STAGE.TEST_TAB ( ID INTEGER, DATE DATE, STATUS VARCHAR(10) );
COMMIT;
INSERT INTO TEST_TAB VALUES (1, '2013-04-14', NULL), (2, '2013-04-15', NULL); COMMIT;
MERGE INTO TEST_TAB T USING (
SELECT
3 NEW_ID,
CURRENT_DATE NEW_DATE,
'NEW' NEW_STATUS
FROM
SYSIBM.DUAL
UNION ALL
SELECT
2 NEW_ID,
NULL NEW_DATE,
'OLD' NEW_STATUS
FROM
SYSIBM.DUAL
) AS S
ON
S.NEW_ID = T.ID
WHEN MATCHED THEN
UPDATE SET
(T.STATUS) = (S.NEW_STATUS)
WHEN NOT MATCHED THEN
INSERT
(T.ID, T.DATE, T.STATUS) VALUES (S.NEW_ID, S.NEW_DATE, S.NEW_STATUS);
COMMIT;
Another way is to execute this 2 queries. It's simpler than create a MERGE statement:
update TABLE_NAME set FIELD_NAME=xxxxx where MyID=XXX;
INSERT INTO TABLE_NAME (MyField1,MyField2) values (xxx,xxxxx)
WHERE NOT EXISTS(select 1 from TABLE_NAME where MyId=xxxx);
The first query just updateS the field you need, if the MyId exists.
The second insertS the row into db if MyId does not exist.
The result is that only one of the queries is executed in your db.
I started with hibernate project where hibernate allows you to saveOrUpdate().
I converted that project into JDBC project the problem was with save and update.
I wanted to save and update at the same time using JDBC.
So, I did some research and I came accross ON DUPLICATE KEY UPDATE :
String sql="Insert into tblstudent (firstName,lastName,gender) values (?,?,?)
ON DUPLICATE KEY UPDATE
firstName= VALUES(firstName),
lastName= VALUES(lastName),
gender= VALUES(gender)";
The issue with the above code was that it updated primary key twice which is true as
per mysql documentation:
The affected rows is just a return code. 1 row means you inserted, 2 means you updated, 0 means nothing happend.
I introduced id and increment it to 1. Now I was incrementing the value of id and not mysql.
String sql="Insert into tblstudent (id,firstName,lastName,gender) values (?,?,?)
ON DUPLICATE KEY UPDATE
id=id+1,
firstName= VALUES(firstName),
lastName= VALUES(lastName),
gender= VALUES(gender)";
The above code worked for me for both insert and update.
Hope it works for you as well.