How to bulk load JSON with values into Synapse SQL dedicated pool - azure-synapse

I'm attempting to bulk load json files, along with their filenames and paths into a synapse analytics dedicated sql table but I'm just stumped on how to accomplish it. I can load the json files solo without a problem, but I really need the additional values as well.
This is what I'm trying but it doesn't work:
Copy INTO dbo.PolicyStagingJsonOnly
SELECT jsonContent,
[result].filename() as fn,
[result].filepath() as fp
FROM
OPENROWSET(
BULK 'https://datalakexxxx.blob.core.windows.net/staging/policy/*.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
rowterminator = '0x0c'
)
WITH (
jsonContent varchar(MAX)
) AS [result]

Related

SQL; csv import with semicolons in data and double quotes

I'm wanting to import a CSV file which has some values as such:
123;456;"78;9";1011
Simply said, there are some quotes in a value, but the value is within double quotes. When I use a bulk import, the value '"78' is put into one column, whereas '9"' is put into the next column. How can I prevent this?
I am using below query:
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
GO
I'm using SQL Server!
In a test environment i've setup the new sql server, and the fieldquote seems to be ignored in the statement, and the fields are still split up. What am I doing wrong? I'm doing:
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
FIELDQUOTE='"'
)
GO

Import CSV into SQL (CODE)

I want to import several CSV files automatically using SQL-code (i.e. without using the GUI). Normally, I know the dimensions of my CSV file. So, in many cases I create an empty table with, let say, x columns with the corresponding data types. Then, I import the CSV file into this table using BULK INSERT. However, in this case I don't know much about my files, i.e. information about data types and dimensions are not given.
To summerize the problem:
I receive a file path, e.g. C:...\DATA.csv. Then, I want to use this path in SQL-code to import the file to a table without knowing anything about it.
Any ideas on how to solve this problem?
Use something like this:
BULK INSERT tbl
FROM 'csv_full_path'
WITH
(
FIRSTROW = 2, --Second row if header row in file
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'error_file_path',
TABLOCK
)
If columns are not known, you could try with:
select * from OpenRowset
Or, do a bulk insert with only the first row as one big column, then parse it to create the dynamic main insert. Or bulk insert the whole file into a table with just one column, then parse that...
You can use OPENROWSET (documantation).
SELECT *
INTO dbo.MyTable
FROM
OPENROWSET(
BULK 'C:\...\mycsvfile.csv',
SINGLE_CLOB) AS DATA;
In addition, you can use dynamic SQL to parameterize table name and location of csv file.

Bulk insert from txt in SQL table

I need to do some bulk inserts in SQL Table from a txt file.
bulk insert [dbo].[TempSample]
from 'D:\sqls\sample.txt'
with (fieldterminator = ',', rowterminator = '\n')
go
In the txt file I have descriptions like 'Hörsching'. After insert is made I found descriptions in my table like 'H÷rsching'. How can we deal with that ? The collation of the table is set to Latin1_General_CI_AS.
How is the file encoded?
Have you tried using the CODEPAGE parameter to specify the file's encoding?

Special characters displaying incorrectly after BULK INSERT

I'm using BULK INSERT to import a CSV file. One of the columns in the CSV file contains some values that contain fractions (e.g. 1m½f).
I don't need to do any mathematical operations on the fractions, as the values will just be used for display purposes, so I have set the column as nvarchar. The BULK INSERT works but when I view the records within SQL the fraction has been replaced with a cent symbol (¢) so the displayed text is 1m¢f.
I'm interested to understand why this is happening and any thoughts on how to resolve the issue. The BULK INSERT command is:
BULK INSERT dbo.temp FROM 'C:\Temp\file.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );
You need to BULK INSERT using the CODEPAGE = 'ACP', which converts string data from Windows codepage 1252 to SQL Server codepage.
BULK INSERT dbo.temp FROM 'C:\Temp\file.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', CODEPAGE = 'ACP');
If you are bringing in UTF-8 data on a new enough version of SQL Server:
[...] , CODEPAGE = '65001');
You may also need to specify DATAFILETYPE = 'char|native|widechar|widenative'.

Bulk insert with a different schema

I am trying to get data from a csv file with the following data.
Station code;DateBegin;DateEnd
01;20100214;20100214
02;20100214;20100214
03;20100214;20100214
I am trying bulk insert as
BULK INSERT dbo.#tmp_station_details
FROM 'C:\station.csv'
WITH (
FIELDTERMINATOR ='';'',
FIRSTROW = 2,
ROWTERMINATOR = ''\n''
)
But the table tmp_station_details has one extra column as Priority.
Its schema is like
[Station code] [Priority] [DateBegin] [DateEnd]
Now is this possible to bulk insert without altering the schema of the table.
Add FORMATFILE = 'format_file_path' to your "with" block. Refer to BOL: using a format file to skip a table column for an example.