I have a huge XML file, about 60 GB. I've managed to create a XML reader with a StringBuilder where I can extract the specific elements I need. This works great. I've also connected my application to a Lan Microsoft SQL server.
How do I use bulk insert to insert all my rows most efficient. I have about 8,400,000 rows which I need to insert the fastest way possible.
My StringBuilder sets up the string like this:
"FirstVariable, SecondVariable, ThirdVariable;
FirstVariable, SecondVariable, ThirdVariable;
FirstVariable, SecondVariable, ThirdVariable;"
I need to import this like a CSV file with bulk insert :) Please help
If I understand you correctly you convert your huge XML into a CSV-file.
With this syntax you can insert a CSV-file in one go:
CREATE TABLE testTable(int1 INT,int2 INT, int3 INT);
--Assuming a file with the following content: 1, 2, 3; 4, 5, 6; 7, 8, 9;
BULK INSERT testTable FROM 'F:\testCSV.txt' WITH (FIELDTERMINATOR=',', ROWTERMINATOR=';');
SELECT * FROM testTable;
/*
Result
int1 int2 int3
1 2 3
4 5 6
7 8 9
*/
DROP TABLE testTable;
You might try to avoid the CSV-conversion and import this directly, but this will probably try to load the XML in one piece, 60GB is very much...
CREATE TABLE testTable(int1 INT,int2 INT, int3 INT);
/*
Assuiming a file with the following content:
<data>
<row>
<a>1</a>
<b>2</b>
<c>3</c>
</row>
<row>
<a>4</a>
<b>5</b>
<c>6</c>
</row>
<row>
<a>7</a>
<b>8</b>
<c>9</c>
</row>
</data>
*/
INSERT INTO testTable
SELECT RowData.value('a[1]','int')
,RowData.value('b[1]','int')
,RowData.value('c[1]','int')
FROM
(
SELECT CAST(x.y AS XML) AS XmlData
FROM OPENROWSET(BULK 'F:\testXML.xml',SINGLE_CLOB) AS x(y)
) AS XmlData
CROSS APPLY XmlData.nodes('/data/row') AS x(RowData)
SELECT * FROM testTable;
/*
Result
int1 int2 int3
1 2 3
4 5 6
7 8 9
*/
DROP TABLE testTable;
Last but not least you'll find explanations how to use BULK INSERT directly against an XML file using an explicitly specified format file here: https://msdn.microsoft.com/en-us/library/ms191184.aspx
I finally figured it out. I created a datatable before the while loop and then I added each element to the datatable as it was extracting the data. Then I made a counter in the while loop, which would connect to the database every 5000 elements and bulk insert them and empty the datatable after this. This made it possible to only use a few MB ram and I'm able to run through the entire 60 GB file and parse all 8,400,000 elemets to my database in about 12 min. The bulk insert code I used was pretty standard, here is part of my solution:
Using bulkCopy As SqlBulkCopy =
New SqlBulkCopy("Server=testserver;Database=test1;User=test1;Pwd=12345;")
bulkCopy.DestinationTableName = "dbo.testtable"
bulkCopy.BatchSize = 5000
Try
' Write from the source to the destination.
bulkCopy.WriteToServer(table)
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Using
Try Using SqlBulkCopy for inserting bulk records from csv
Related
I need to insert multiple rows in a database table from a single string.
Here is my string it will be comma-seperated values
Current string:
batch 1 45665 987655,1228857 76554738,12390 8885858,301297 38998798
What I want is that batch 1 should be ignored or removed and remaining part should be added into the SQL Server database as a separate row for after each comma like this
Table name dbo.MSISDNData
Data
------------------
45665 987655
1228857 76554738
12390 8885858
301297 38998798
and when I query the table it should return the results like this
Query :
Select data
from dbo.MSISDNData
Results
Data
---------------------
45665 987655
1228857 76554738
12390 8885858
301297 38998798
Try this:
DECLARE #Data NVARCHAR(MAX) = N'batch 1 45665 987655,1228857 76554738,12390 8885858,301297 38998798'
DECLARE #DataXML XML;
SET #Data = '<a>' + REPLACE(REPLACE(#Data, 'batch 1 ', ''), ',', '</a><a>') + '</a>';
SET #DataXML = #Data;
SELECT LTRIM(RTRIM(T.c.value('.', 'VARCHAR(MAX)'))) AS [Data]
FROM #DataXML.nodes('./a') T(c);
It demonstrates how to split the data. You may need to sanitize it, too - remove the batch 1, perform trimming, etc.
I'm trying to insert very large csv file into a table on SQL server.
On the table itself the fields are defined as nvarchar but when i'm trying to use the bulk statement to load that file - all the hebrew fields are gibberish.
When i'm using the INSERT statement everything is ok but the BULK one's getting all wrong. I even tried to put the string in the CSV file with the N'string' thing - but it just came to be (in the table: N'gibberish'.
The reason i'm not using just INSERT is because the file contains more than 250K long rows.
This is the statement that i'm using. The delimiter is '|' on purpose:
BULK INSERT [dbo].[SomeTable]
FROM 'C:\Desktop\csvfilesaved.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\Desktop\Error.csv',
TABLOCK
)
And this is two row sample of the csv file:
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ב. קלה|יהודים|ב. בין 31 ל-180 יום||הנדסאים, טכנאים, סוכנים ובעלי משלח יד נלווה|1|0|0|1|0|0
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ג. בינונית|יהודים|ב. בין 31 ל-180 יום||עובדי מכירות ושירותים|1|0|0|1|0|0
Thanks!
I have a SQL script and a ".csv" file. I want the SQL script to read the data from the ".csv" file instead of manually entering the data in the script. Is it possible?
....
.....
......
and SP_F.trade_id = SP_R.trade_id
and SP_R.iSINCode IN (here is where I can manually enter the data)
ps: I am new to SQL and I am still learning.
Here is good solution.
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
More explained:
1) We have csv file named test.csv with such content:
'JE000DT', 'BE000DT2J', 'DE000DT2'
1, 2, 3
2, 3, 4
4, 5, 6
2) We need to create table for this file in DB:
CREATE TABLE CSVTest ([columnOne] int, [columnTwo] int, [columnThree] int)
3) Insert your data with BULK INSERT. The columns count and type must match your csv.
BULK INSERT CSVTest
FROM 'C:\test.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
4) Use your this table in yours subquery:
Select
SP_F.trade_id, -- as 'Trade ID',
SP_F.issuer_id, --as 'Issuer ID',
SP_R.iSINCode --as 'ISIN'
from t_SP_Fundamentals SP_F
JOIN t_SP_References SP_R ON SP_F.trade_id = SP_R.trade_id
where
(SP_F.issuer_id = 3608 or SP_F.issuer_id = 3607)
and SP_R.iSINCode IN (SELECT [columnOne] FROM CSVTest)
There is another solution with OPENROWSET statement, that allows direct reading from the file. But I strongly recommend you to use the solution above. Reading direct from the file in QUERY is not very great choose.
I'm trying to build a script that insert random datas into my table.
My actual script looks like that :
INSERT INTO Utilisateurs (id_utilisateur, Uti_nom, Uti_prenom, Uti_role, Uti_mdp, Uti_Statut)
SELECT
-- here to input the id (number that increment each time)
dbms_random.string('A', trunc(dbms_random.value(5, 50))), -- data for uti_nom
dbms_random.string('A', trunc(dbms_random.value(5, 100))), -- data for uti_prenom
-- randomly get 'Administrateur' or 'Utilisateur'
dbms_random.string('X', 10), -- data for uti_mdp
trunc(dbms_random.value(0, 1)) -- data for uti_status
FROM dual
CONNECT BY LEVEL < 100;
So if someone can help me to get the both comment line...
There's a sample, but what i really need it's the ID that increments and Uti_role (Administrateur/Utilisateur) the others fields can be generated and looks like "dsjhadakj"
id_utilisateur Uti_nom Uti_prenom Uti_role Uti_mdp Uti_Statut
d--------+---------+---------+---------+---------+---------+---------+---------+
1 Elche Marco Administrateur Haj432Hgn 1
2 Babo Jules Utilisateur Haj432Hgn 0
3 Ghale Alex Administrateur Haj432Hgn 1
For self-incremental ID you can use LEVEL
For uti_role something like this:
CASE WHEN dbms_random.value(0, 1) > 0.5 THEN 'Administrateur' ELSE 'Utilisateur' END
Here's SQL Fiddle for just the SELECT part.
I'm trying to turn this XML string into a select
I have #Schedule XML = '<days><day enabled="0">0</day><day enabled="1">1</day><day enabled="1">2</day><day enabled="1">3</day><day enabled="1">4</day><day enabled="1">5</day><day enabled="0">6</day></days>'
What I'm trying to see at the end is..
DayNumber DayEnabled
0 0
1 1
2 1
3 1
4 1
5 1
6 0
I've tried a few ways, so far nothing is working right.. I am handling this as an XML data type, I'd prefer not to use a function as this will just be in a stored procedure..
Update: Maybe I didn't explain it correctly..
I have a stored procedure, XML is one of the parameters passed to it, I need to send it to a table to be inserted, so I'm trying to do the following..
INSERT INTO tblDays (DayNumber, DayEnabled)
SELECT #XMLParsedOrTempTableWithResults
I just can't figure out how to parsed the parameter
DECLARE #myXML as XML = '<days><day enabled="0">0</day><day enabled="1">1</day><day enabled="1">2</day><day enabled="1">3</day><day enabled="1">4</day><day enabled="1">5</day><day enabled="0">6</day></days>'
DECLARE #XMLDataTable table
(
DayNumber int
,DayEnabled int
)
INSERT INTO #XMLDataTable
SELECT d.value('text()[1]','int') AS [DayNumber]
,d.value('(#enabled)[1]','int') AS [DayEnabled]
FROM #myXML.nodes('/days/*') ds(d)
SELECT * FROM #XMLDataTable
Refer:
http://beyondrelational.com/modules/2/blogs/28/posts/10279/xquery-labs-a-collection-of-xquery-sample-scripts.aspx
The XMLTABLE function is how most XML-enabled DBMSes shred an XML document into a relational result set.
This example uses DB2's syntax for XMLTABLE and an input parameter passed into a stored procedure:
INSERT INTO tblDays (DayNumber, DayEnabled)
SELECT X.* FROM
XMLTABLE ('$d/days/day' PASSING XMLPARSE( DOCUMENT SPinputParm ) as "d"
COLUMNS
dayNumber INTEGER PATH '.',
dayEnabled SMALLINT PATH '#enabled'
) AS X
;