Kettle: Filling field with sequence without conflict - sequence

I have a data stream with the following structure
user_id (integer)
user_name (string)
The user_id is anything between 100 and 65536. I want to add a target_user_id (integer) field according to the following logic:
If user_id is in range 1000..9999, then let the target_user_id field be equal to the user_id
If not, then fill target_user_id with something in range 1000..9999 without causing a conflict. Preferably the lowest possible.
The length of the stream is under 9000. The user_id field is unique in the original stream.

I am not sure what Kettle environment you are using but a general procedure could be as follows:
Create a temporary database table (Maybe an in-memory database table)
Initialise it with records with user_id 1000..9999 and user_name=null (use TableOutput)
Open the input stream and process records with user_id 1000..9999 by updating the respective database record with the user_name. (use Update) Ignore all other records.
Close and reopen the input stream
Process each input stream record with user_id not in 1000..9999 by:
get the lowest unused user_id by executing a SQL query (DBLookup)
SELECT MIN(user_id) FROM temporary_table WHERE user_name IS NULL;
Update this record with the current user_name (use Update)
Read each record in the temporary database table with a non null user_name (use TableInput) and write to output stream
Delete temporary database table
Hope this helps

Related

Postgresql update sequence after importing data from csv

I have a database with 10 tables, I'm using Postgresql db on pgAdmin4 (symfony back-end), I imported the data from 10 different csv files, in each data files i added an ID column to set their values for the single fact that there is foreign key, lets just say TABLE1 is a foreign key in TABLE_2, TABLE_2_ID is a foreign key in TABLE_3, TABLE_3_ID is a foreign key in TABLE_C etc... it goes on like this till the last table.
The data import for the csv files worked. I can display my data in my front-end.
Now that I'm trying to insert new data into any table via the the plateform user interface, I'm getting a constraint error :
SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "theme_pkey" DETAIL: Key (id)=(4) already exists."
Let just say I'm trying to create a new value in Table_2, Doctrine orm will launch an error that's going to say the id_key already exist, it seem like my database didn't update itself by following the current data I have already in the data base, like the ID isn't incremented.
I looked around it seems like it is a common when you import your database from an outside source, but can't manage to find something that would get me out of this error message and move forward with my dev. I look at everything they all talk about update sequence but nothing seems to fit my problem.
After importing my data from a csv file to my postrgresql database using PGAdmin4, the data was successfully imported, querying to see the data worked and each row had an id, in my mind everything was fine until i tried to insert in to my database a new object, but it seems like my database was stuck in Primary_Key id 1, it wasn't taking into account the database that was imported and the fact that it showed on the table an object with a an Id of 1. I didn't fully understand what was going on cause i'm not an expert with sql, the convention and restriction that applies to it, i knew enough to get things working around with using an ORM.
I came to the conclusion after hours of research and documentations reading that a thing called Sequence existed and that each table had their own sequence, for example tableA and tableB had a tableA_id_seq and a tableB_id_seq.
All i had to do was to get the max id from my table / get the next Id and simply increment that next Id.
---GETTING THE MAX VALUE ID FROM MY TARGETED TABLE
SELECT MAX(id) FROM table;
--- GETTING THE NEXT ID VALUE FROM THAT TARGETED TABLE
SELECT NEXT ID
SELECT NEXTVAL('table_id_seq');
Lets just say my max id value from my table was 90, so in theory my next value is suppose to be 91, but when i launched the 2nd script the next value was 1,
So i had to set the next value based on my max value + 1
---- SETTING NEXT VALUE THAT FOLLOWS MY MAX VALUE
SELECT setval('table_id_seq', (SELECT MAX(id) FROM table)+1)
Hope this helps out the next person that runs into similar a issue.

Querying a SQL table and only transferring updated rows to a different database

I have a database table which constantly gets updated. I am looking to query only the changes/additions that have been made on rows with a specific attribute in a column. e.g. get the rows which have been changed/added, the 'description' column of which is "xyz". My end goal is to copy these rows to another table in another database. Is this even possible? The reason for not just querying and overwriting the rows in the other database is to avoid inefficiency.
What I have tried so far?
I am able to select query on the table to get the rows but it gives me all the rows, not the ones that have been changed or recently added. If i add these rows to the table in the other database, the only option I have is to overwrite the rows.
Log table logs the changes in a table but I can't put additional filters in SQL which tells me which of these changes are associated with 'description' column as 'xyz'.
Write your update statements to make use of OUTPUT to capture the before and after values and log them to a table of your choice.
Here is a really simple example update example that uses output to store the RowID, before and after values for the ActivityType column:
DECLARE #MyTableVar table (
SummaryBefore nvarchar(max),
SummaryAfter nvarchar(max),
RowID int
);
update DBA.dbo.dtest set ActivityType = 3
OUTPUT deleted.ActivityType,
inserted.ActivityType,
inserted.RowID
INTO #MyTableVar
select * From #MyTableVar
You can do it two ways
Have new date fields/columns like update_time and/or create_time(Can be defaulted if needed). These fields will indicate the status of the record. You need to save your previous_run_time and then your select query will look for records with update_time/create_time greater than previous_run_time, and then you can move these records to the new DB.
Have CDC turned on the source table, which is available by default in SQL server and then move only those records that have been impacted.

PostgreSQL: find out if a row has been created by the current transaction

Is it possible to find out if a row in a table has been created by the current transaction (and therefore is not yet visible for other transactions, because the current transaction is still active)?
My use case: I am adding event logging to the database. This is done in plpgsql triggers. A row in the event table looks like this: (event id:serial, event action:text, count:integer:default 1).
Now, the reasoning behind my question: If a certain row has been created by this transaction (most likely in another trigger), I could increment the count instead of creating a new row in the event table.
You could just look for logging entries like this:
SELECT ...
FROM tablename
WHERE xmin = current_txid() % (2^32)::bigint;
That will find all rows added or modified in the current transaction.
The downside is that this will force a sequential scan of the whole table, and you cannot avoid that since you cannot have an index on a system column.
So you could add an extra column xid to your table that is filled with txid_current()::bigint whenever a row is inserted or updated. Such a column can be indexed and efficiently used in a search:
SELECT ...
FROM tablename
WHERE xid = current_txid();
You might consider something like this:
create table ConnectionCurrentAction (
connectionID int primary key,
currentActionID uuid
)
then at the beginning of the transaction:
delete ConnectionCurrentAction where connectionID = pg_backend_pid()
insert ConnectionCurrentAction(connectionID, currentActionID)
select pg_backend_pid(), uuid_generate_v4()
You can wrap this in a proc called say, audit_action_begin
Note: You may instead choose to enforce the requirement that an "action" be created explicitly by removing the delete here.
At the end of a transaction, do audit_action_end:
delete ConnectionCurrentAction where connectionID = pg_backend_pid()
Whenever you want to know the current transaction:
(select currentActionID from ConnectionCurrentAction where connectionID - pg_backend_pid()(
You can wrap that in a function audit_action_current()
You can then put the currentActionID into your log which will enable you to identify whether a row was created in the current action or not. This will also allow you to identify where rows in different audit tables were created in the current logical action.
If you don't want to use a uuid a sequence would do just as well here. I like uuids.

Trying to copy one table from another database to another in SQL Server 2008 R2

i am trying to copy table information from a backup dummy database to our live sql database(as an accident happened in our program, Visma Business, where someone managed to overwrite 1300 customer names) but i am having a hard time figuring out the perfect code for this, i've looked around and yes there are several similar problems, but i just can't get this to work even though i've tried different solutions.
Here is the simple code i used last time, in theory all i need is the equivilant of mysqls On Duplicate, which would be MERGE on SQL server? I just didn't quite know what to write to get that merge to work.
INSERT [F0001].[dbo].[Actor]
SELECT * FROM [FDummy].[dbo].[Actor]
The error message i get with this is:
Violation of PRIMARY KEY constraint 'PK__Actor'. Cannot insert duplicate key in object 'dbo.Actor'.
What error message says is simply "You cant add same value if an attribute has PK constraint". If you already have all the information in your backup table what you should do is TRUNCATE TABLE which removes all rows from a table, but the table structure and its columns, constraints, indexes, and so on remain.
After that step you should follow this answer . Or alternatively i recommend a tool called Kettle which is open source and easy to use for these kinds of data movements. That will save you a lot of work.
Here are thing which can be the reason :
You have multiple row in [FDummy].[dbo].[Actor] with same data in a column which is going to be inserted in primary key column of [F0001].[dbo].[Actor].
You have existing rows in [FDummy].[dbo].[Actor] with some value x in primary key column and there is/are row(s) in [F0001].[dbo].[Actor] with same value x in the column which is going to be inserted in primary key column.
List item
-- to check first point. if it returns row then you have some problem
SELECT ColumnGoingToBeMappedWithPK,
Count(*)
FROM [FDummy].[dbo].[Actor]
GROUP BY ColumnGoingToBeMappedWithPK
HAVING Count(*) > 1
-- to check second point. if count is greater than 0 then you have some problem
SELECT Count(*)
FROM [FDummy].[dbo].[Actor] a
JOIN [F0001].[dbo].[Actor] b
ON a.ColumnGoingToBeMappedWithPK = b.PrimaryKeyColumn
The MERGE statement will be possibly the best for you here, unless the primary key of the Actor table is reused after a previous record is deleted, so not autoincremented and say record with id 13 on F0001.dbo.Actor is not the same "actor" information as on FDummy.dbo.Actor
To use the statement with your code, it will look something like this:
begin transaction
merge [F0001].[dbo].[Actor] as t -- the destination
using [FDummy].[dbo].[Actor] as s -- the source
on (t.[PRIMARYKEY] = s.[PRIMARYKEY]) -- update with your primary keys
when matched then
update set t.columnname1 = s.columnname1,
t.columnname2 = s.columnname2,
t.columnname3 = s.columnname3
-- repeat for all your columns that you want to update
output $action,
Inserted.*,
Deleted.*;
rollback transaction -- change to commit after testing
Further reading can be done at the sources below:
MERGE (Transact-SQL)
Inserting, Updating, and Deleting Data by Using MERGE
Using MERGE in SQL Server to insert, update and delete at the same time

Deleting record from sql table and update sql table

I am trying to delete a record in sqlite. i have four records record1, record2, record3, record 4
with id as Primary Key.
so it will auto increment for each record that i insert. now when i delete record 3, the primary key is not decrementing. what to do to decrement the id based on the records that i am deleting.
i want id to be 1,2,3 when i delete the record 3 from the database. now it is 1,2,4. Is there any sql query to change it. I tried this one
DELETE FROM TABLE_NAME WHERE name = ?
Note: I am implementing in xcode
I don't know why you want this but I would recommend leaving these IDs as is.
What is wrong with having IDs as 1,2,4?
Also you can potentially break things (referential integrity) if you use these ID values as foreign keys somewhere else.
Also please refer to this page to get a better understanding how autoincrement fields works
http://sqlite.org/autoinc.html
The sense of auto increment is always to create a new unique ID and not to fill the gaps created by deleting records.
EDIT
You can reach it by a special table design. There are no deleted records but with a field "del" marked as deleted.
For example, with a "select ... where del> 0" will find all active records.
Or place without the "where" all the records, then the ID's remain unaffected. To loop through an array with "if del = 0 continue". Thus, the array is always in consecutive order.
It's very flexible. Depending on the select ... you get.
all active records
all the deleted records
all records