mysql query to import data from dump to existing data and overwrite new entry - sql

I want to import new DUMP file to my old database which has same schema. I want to overwrite if there is anychange in record with dump file record I am importing and want to add new records. I dont want to delete the existing records in the DB I am gonna import.

If you create the dump file using mysqldump, you can give it the option --replace,which generates a dump file containing replace statements instead of insert statements.
When this dump file is loaded into MySQL, records that matches primary or unique keys in the database will replace the old ones, while records not matching existing keys will be inserted.

Related

Import CSV file to SQLite database (without headers)

How to load a CSV file into the table using the console? The problem is that I have to somehow omit the headers from the CSV file (I can not delete them manually).
From the sqlite3 doc on CSV Import:
There are two cases to consider: (1) Table "tab1" does not previously
exist and (2) table "tab1" does already exist.
In the first case, when the table does not previously exist, the table
is automatically created and the content of the first row of the input
CSV file is used to determine the name of all the columns in the
table. In other words, if the table does not previously exist, the
first row of the CSV file is interpreted to be column names and the
actual data starts on the second row of the CSV file.
For the second case, when the table already exists, every row of the
CSV file, including the first row, is assumed to be actual content. If
the CSV file contains an initial row of column labels, that row will
be read as data and inserted into the table. To avoid this, make sure
that table does not previously exist.
It is either/or. You will have to outsmart it.
Assuming "I can not delete them manually" means from the csv, not from the table, you could possibly sql delete the header line after the import.
Or: Import into a temp table in the target database, insert into target table from the temp table, drop the temp table.
Or:
connect to an in-memory database
import the CSV into a table
attach the target database
insert into target table from the imported in-memory table
just add option --skip 1, see https://www.sqlite.org/cli.html#importing_csv_files

Using SSIS Package, How to validate the source records for duplicate before inserting?

SQL Server 2012: using a SSIS package, how to validate the source records for duplicate before inserting?
Our source file is a .csv. We are facing duplicate records loaded in the staging table.
At present , we are following manual process of loading data.
How to validate the source file data against the destination table before loading and load only the valid records? Possibility of loading duplicate records not only because of the source file having duplicate records in it but also reloading the same file to the staging table.
We are not Truncate the staging table. We are keeping records as is.
Second question : How to pick the name of the source file and pass it in the loading ? Possibly having a derived column as "FileName" which will get loaded along with raw data to the staging table.
The typical load pattern I use in this case is:
Prepare a staging table that matches the source file
In SSIS run a SQL Task with TRUNCATE StagingTable; (which clears it out)
Then, run a data flow task that loads the entire data file into the staging table
Lastly, merge the staging table into the final table.
I prefer to do this last step in a SQL Task also:
INSERT INTO FinalTable
(PrimaryKey,Column1,Column2,Column3)
SELECT
PrimaryKey,Column1,Column2,Column3
FROM StagingTable SRC
WHERE NOT EXISTS (
SELECT * FROM FinalTable TGT WHERE TGT.PrimaryKey=SRC.PrimaryKey
);
If you prefer a graphical UI, and you don't mind the extra network traffic, and slower processing time, you can do the same type of merge operation using lookups. You can even use the SCD component but I strongly discourage it's use.
Whether you do it in T-SQL or the UI, you need a key that can be used to uniquely identify the records (referred to as PrimaryKey in my example). If you don't have this key, there is no way to 'deduplicate'
Note in this example you have a 'real' staging table whose only purpose is to get the data file into the database. Then you have a final table that contains the final consistent result
Also note that this pattern only adds new rows - it will not update existing rows if they change in the data file.
Given your exact scenario (of loading the same file again), I would first check if the data is even loaded to the staging table. If you do that, you don't have to worry about checking the duplicates at record level.
How are you setting the connection to the file? Most of the data loads I have dealt with, I designed for-each-loop-container where the file name/path would be populated in a user variable. As you said, you could just use a derived column transform to add a new column which gets the value from a variable. If you don't have the file name in a user variable, you could use expression task in the control flow to populate it.
To cover your exact requirement, I would use the above step to populate the file name in the table. You could even normalize to a different table instead of storing long file name for every data record. Once you have all the file names in the database, you could just have an "Execute SQL" at the beginning to see if that file name is already in the database.
Two years back I have faced the same problem with importing TSV files.
I tried many other solutions but best I could design is C# code script for such validation at its best.
What I did as a solution
Create one C# DataTable object in memory with Primary Key constraints,
like:-
DataColumn[] keyColumn = new DataColumn[30];
keyColumn[intJ] = dtFilterdPK.Columns["Column name"];
Then try to add one by one row from your CSV to this DataTables.
Whenever your data will get Duplication based on Primary Key will have an error
Handle this error code in (TRY)..CATCH block and make this duplication error as per your logging requirement.
Avoid those error records importing in DataTable object.
Atlast import your CSV file into your table as BulkImport
Like:
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(myConnection))
{
bulkCopy.DestinationTableName = "Your DB Table Name"; //Assign table name
bulkCopy.WriteToServer(dtToBeImport); //Write into Actual table.
}
Hope this will help you.

What is the best way to update more than 1 million rows in a table in Oracle using a CSV file

I am trying to update only one column of the 1 million records in a table based on the value in the CSV file.
Sample CSV file:
1,Apple
2,Orange
3,Mango
The first column in the file is the PK I will use to filter the record and the second column is the new value of the column in the table I want to update.The PK in the CSV file may or may not exist in the DB though. I was thinking of creating a script to make a million update statements based on the file. I would like to know if there are any better way on how could I do this?
personally i would
load the CSV file into a new table using sqlldr
make sure the correct indexes are on the new and existing table
write ONE update statement to update the existing table from the new
one
I would:
Create an external table using the csv
Update existing table from the new external table in just one update

Import Oracle data dump and overwrite existing data

I have an oracle dmp file and I need to import data into a table.
The data in the dump contains new rows and few updated rows.
I am using import command and IGNORE=Y, so it imports all the new rows well. But it doesn't import/overwrite the existing rows (it shows a warning of unique key constraint violated).
Is there some option to make the import UPDATE the existing rows with new data?
No. If you were using data pump then you could use the TABLE_EXISTS_ACTION=TRUNCATE option to remove all existing rows and import everything from the dump file, but as you want to update existing rows and leave any rows not in the new file alone - i.e. not delete them (I think, since you only mention updating, though that isn't clear) - that might not be appropriate. And as your dump file is from the old exp tool rather than expdp that's moot anyway, unless you can re-export the data.
If you do want to delete existing rows that are not in the dump then you could truncate all the affected tables before importing. But that would be a separate step that you'd have to perform yourself, its not something imp will do for you; and the tables would be empty for a while, so you'd have to have downtime to do it.
Alternatively you could import into new staging tables - in a different schema sinceimp doesn't support renaming either - and then use those to merge the new data into the real tables. That may be the least disruptive approach. You'd still have to design and write all the merge statements though. There's no built-in way to do this automatically.
You can import into temp table and then do record recon by joining with it.
Use impdp option REMAP_TABLE to load existing file into temp table.
impdp .... REMAP_TABLE=TMP_TABLE_NAME
when load is done run MERGE statement on existing table from temp table.

Extract specific data from full mysqldump backup

I am making regular backups of my MySQL database with mysqldump. This gives me a .sql file with CREATE TABLE and INSERT statements, allowing me to restore my database on demand. However, I have yet to find a good way to extract specific data from this backup, e.g. extract all rows from a certain table matching certain conditions.
Thus, my current approach is to restore the entire file into a new temporary database, extract the data I actually want with a new mysqldump call, delete the temporary database and then import the extracted lines into my real database.
Is this really the best way to do this? Is there some sort of script that can directly parse the .sql file and extract the relevant lines? I don't think there is an easy solution with grep and friends unfortunately, as mysqldump generates INSERT statements that insert many values per line.
The solution to this just ended up being to import the whole file, extract the data I needed and drop the database again.