Best way to import a large dataset into Oracle sql database? - sql

I'm new to oracle Database and I'm making a very simple bilingual dictionary ( 2 tables - English and French).
A dictionary often contains thousands of words. My question is that instead of writing thousand of [INSERT INTO ... VALUES] commands, is there any better way to somehow automate the process, like making a import form? Since the book my school provides only mentions about the INSERT command.

You can use a combination of a spreadsheet and an INSERT ALL command to quickly generate an SQL INSERT command to insert your data.
If you have your data in a spreadsheet format, you can use formulas to construct an INSERT statement for each row.
However, you could use the INSERT ALL syntax instead of the INSERT (single row) syntax.
To do this:
Add a column into your spreadsheet with the formula that looks like this: "INTO table (col1, col2, col3) VALUES ('val1', 'val2', 'val3')". You would need to use concatenation to add the values into this formula.
So your formula may look like this (assuming columns of A, B, and C):
="INTO table (col1, col2, col3) VALUES ('"&A2&"', '"&B2&"', '"&C2&"')"
Copy the formula to each row.
Copy and paste all of these formulas to a query window inside your IDE.
Add the words INSERT ALL at the start of your command
Add the words SELECT * FROM dual; at the end of your command
Your command would then look like this:
INSERT ALL
INTO table (col1, col2, col3) VALUES ('val1', 'val2', 'val3')
INTO table (col1, col2, col3) VALUES ('val1a', 'val2a', 'val3a')
INTO table (col1, col2, col3) VALUES ('val1b', 'val2b', 'val3b')
SELECT * FROM dual;
This will insert all records in a single statement and is likely to be much faster than hundreds or thousands of INSERT statements.
Alternatively, you could use a tool like Data Pump Import and Export, but I have limited experience with that so perhaps another user could elaborate on that.

If you have your data in a file, you can use the UTL_FILE package to read, parse and load your data.
Oracle ULT_FILE package
You can use Oracle's external table feature as well. See here
Oracle External tables

Oracle's Sql Loader product is great for quickly loading large amounts of data.

Related

Avoid Duplicates with INSERT INTO TABLE VALUES from csv file

I have a .csv file with 600 million plus rows. I need to upload this into a database. It will have 3 columns assigned as primary keys.
I use pandas to read the file in chunks of 1000 lines.
At each chunk iteration I use the
INSERT INTO db_name.dbo.table_name("col1", "col2", "col3", "col4")
VALUES (?,?,?,?)
cursor.executemany(query, df.values.tolist())
Syntax with pyodbc in python to upload data in chunks of 1000 lines.
Unfortunately, there are apparently some duplicate rows present. When the duplicate row is encountered the uploading stops with an error from SQL Server.
Question: how can I upload data such that whenever a duplicate is encountered instead of stopping it will just skip that line and upload the rest? I found some questions and answers on insert into table from another table, or insert into table from variables declared, but nothing on reading from a file and using insert into table col_names values() command.
Based on those answers one idea might be:
At each iteration of chunks:
Upload to a temp table
Do the insertion from the temp table into the final table
Delete the rows in the temp table
However, with such a large file each second counts, and I was looking for an answer with better efficiency.
I also tried to deal with duplicates using python, however, since the file is too large to fit into the memory I could not find a way to do that.
Question 2: if I were to use bulk insert, how would I achieve to skip over the duplicates?
Thank you
You can try to use a CTE and an INSERT ... SELECT ... WHERE NOT EXISTS.
WITH cte
AS
(
SELECT ? col1,
? col2,
? col3,
? col4
)
INSERT INTO db_name.dbo.table_name
(col1,
col2,
col3,
col4)
SELECT col1,
col2,
col3,
col4
FROM cte
WHERE NOT EXISTS (SELECT *
FROM db_name.dbo.table_name
WHERE table_name.col1 = cte.col1
AND table_name.col2 = cte.col2
AND table_name.col3 = cte.col3
AND table_name.col4 = cte.col4);
Possibly delete some of the table_name.col<n> = cte.col<n>, if the column isn't part of the primary key.
I would always load into a temporary load table first, which doesn't have any unique or PK constraint on those columns. This way you can always see that the whole file has loaded, which is an invaluable check in any ETL work, and for any other easy analysis of the source data.
After that then use an insert such as suggested by an earlier answer, or if you know that the target table is empty then simply
INSERT INTO db_name.dbo.table_name(col1,col2,col3,col4)
SELECT distinct col1,col2,col3,col4 from load_table
The best approach is to use a temporary table and execute a MERGE-INSERT statement. You can do something like this (not tested):
CREATE TABLE #MyTempTable (col1 VARCHAR(50), col2, col3...);
INSERT INTO #MyTempTable(col1, col2, col3, col4)
VALUES (?,?,?,?)
CREATE CLUSTERED INDEX ix_tempCol1 ON #MyTempTable (col1);
MERGE INTO db_name.dbo.table_name AS TARGET
USING #MyTempTable AS SOURCE ON TARGET.COL1 = SOURCE.COL1 AND TARGET.COL2 = SOURCE.COL2 ...
WHEN NOT MATCHED THEN
INSERT(col1, col2, col3, col4)
VALUES(source.col1, source.col2, source.col3, source.col4);
You need to consider the best indexes for your temporary table to make the MERGE faster. With the statement WHEN NOT MATCHED you avoid duplicates depending on the ON clause.
SQL Server Integration Services offers one method that can read data from a source (via a Dataflow task), then remove duplicates using it's Sort control (a checkbox to remove duplicates).
https://www.mssqltips.com/sqlservertip/3036/removing-duplicates-rows-with-ssis-sort-transformation/
Of course the data has to be sorted and 60 million+ rows isn't going to be fast.
If you want to use pure SQL Server then you need a staging table (without a pk constraint). After importing your data into Staging, you would insert into your target table using filtering for the composite PK combination. For example,
Insert into dbo.RealTable (KeyCol1, KeyCol2, KeyCol3, Col4)
Select Col1, Col2, Col3, Col4
from dbo.Staging S
where not exists (Select *
from dbo.RealTable RT
where RT.KeyCol1 = S.Col1
AND RT.KeyCol2 = S.Col2
AND RT.KeyCol3 = S.Col3
)
In theory you could also use the set operator EXCEPT since it takes the distinct values from both tables. For example:
INSERT INTO RealTable
SELECT * FROM Staging
EXCEPT
SELECT * FROM RealTable
Would insert distinct rows from Staging into RealTable (that don't already exist in RealTable). This method doesn't take into account the composite PK using different values on multiple rows- so an insert error would indicate different values are being assigned to the same PK composite key in the csv.

Insert with select statement gives error Only one expression in select without exists

I am new to SQL Server and have a problem with an insert statement. I am to convert an old database to a SQL server relational database. I am transferring the old data into new tables. The old records are not complete which is causing problems because the fields in the new tables do not allow null values. So what I am trying to do is in insert n/a in the missing fields and then use the select statement to retrieve the available data from the old table all at the same time so I don't get null value not allowed, but I get the error Only one expression can be specified in the select list when the subquery is not introduced with EXISTS along with the Insert statement has more columns than the values statement.
I sure there is a way to do this but I can't figure it out, hope someone can help. Below is an abbreviated description to the statement.
insert into database1.dbo.table (col1, col2, .....col10)
values('n/a','n/a',(select col3, col4...col10 from database2.dbo.table)
You can try to use INSERT INTO ... SELECT
INSERT INTO database1.dbo.table (col1, col2, .....col10)
SELECT 'n/a',
'n/a',
col3,
col4,
...col10
FROM database2.dbo.table

Partial copy of table and insert values at same time?

So this is going to be an odd question but I'm going to try and explain it as best as I can in order to assist anybody trying to help me here...
I am presented with a situation in which I am trying to copy data from one database to another to similar tables, however there is a slight difference which makes a world of difference. db1.table1 allows null values in col3 and does in fact have a number of rows which have null values but db2.table1 does not allow null values in col3 but I still need to copy the values over. Furthermore, db1.table1.col3 is a GUID while db2.table1.col3 is VARCHAR which is part of the issue. If db1.table1.col3 weren't of type GUID I was simply going to UPDATE the column with the text that I need to insert there that I am going to need in db2.table1.col3.
So, to summarize: I am looking for a way to
INSERT INTO db2.table1 (col1, col2, col3...) SELECT col1, col2, col3... FROM db1.table1 but at places where col3 is null, I need to insert text/varchar so that it's not null.
Is there any simpler way to do this than building a temporary table that anybody knows of?
use COALESCE or ISNULL with the replacement text that you want
for example ISNULL(Col3, 'Sometext')
for a GUID, you can use the NEWID() function since you can't insert regular text into a uniqueidentifier data type
The NEWID() function returns a GUID....for example
SELECT NEWID()
26C064EF-0AB6-4DBE-91B3-C2EE40DE7AD6

How to manually move rows from one database to the next?

I have two databases on the same server. One db is newer than the other and has had its schema modified quite a bit. I want to transfer data from a table in the old db to a table in the new db but I need total control over the process so I can mold the old data to fit the new schema.
[NewDB].[dbo].[Aliases]
[OldDB].[Terminal].[Alias]
I'm not very adept at SQL yet. Is there a way I can loop through all the records in the old table and then on each iteration of the loop craft a custom insert statement for the new table?
Try an INSERT SELECT statement.
INSERT INTO [NewDB].[dbo].[Aliases]
SELECT columns
FROM [OldDB].[Terminal].[Alias]
http://msdn.microsoft.com/en-us/library/ms174335(v=sql.100).aspx - derived table section
http://msdn.microsoft.com/en-us/library/ms189499(v=sql.100).aspx
Do it the same way as you would if both tables were in the same database. Just fully qualify the names of the tables. Example:
INSERT INTO [NewDB].[dbo].[Aliases] (col1, col2, col3)
SELECT LEFT(col1,3), col2, col3 FROM [OldDB].[Terminal].[Alias]
Depending on the data size you wish to transfer, you might want to consider using BULK INSERT: http://msdn.microsoft.com/en-us/library/ms188365.aspx
INSERT INTO NEWDB..TABLENAME( *fieldlist* )
SELECT *fieldlist* FROM OLDDB..TABLENAME

Exporting Result Set to Another Table?

How can you export a result set given by a query to another table using SQL Server 2005?
I'd like to accomplish this without exporting to CSV?
INSERT INTO TargetTable(Col1, Col2, Col3)
SELECT Col1, Col2, Col3
FROM SourceTable
insert into table(column1, columns, etc) select columns from sourcetable
You can omit column list in insert if columns returned by select matches table definition. Column names in select are ignored, but recommended for readability.
Select into is possible too, but it creates new table. It is sometimes useful for selecting into temporary table, but be aware of tempdb locking by select into.
SELECT col1, col2, ...
INTO dbo.newtable
FROM (SELECT ...query...) AS x;