Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 in Azure Synapse - azure-synapse

I have a Spotify CSV file in my Azure Data Lake. I am trying to create external table you SQL serverless pool in Azure Synapse.
I am getting the below error message
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 (Track_popularity) in data file https://test.dfs.core.windows.net/data/folder/updated.csv.
I am using the below script
IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseDelimitedTextFormat')
CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat]
WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
USE_TYPE_DEFAULT = FALSE
))
GO
IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'test.dfs.core.windows.net')
CREATE EXTERNAL DATA SOURCE [test.dfs.core.windows.net]
WITH (
LOCATION = 'abfss://data#test.dfs.core.windows.net'
)
GO
CREATE EXTERNAL TABLE updated (
[Artist] nvarchar(4000),
[Track] nvarchar(4000),
[Track_id] nvarchar(4000),
[Track_popularity] bigint,
[Artist_id] nvarchar(4000),
[Artist_Popularity] bigint,
[Genres] nvarchar(4000),
[Followers] bigint,
[danceability] float,
[energy] float,
[key] bigint,
[loudness] float,
[mode] bigint,
[speechiness] float,
[acousticness] float,
[instrumentalness] float,
[liveness] float,
[valence] float,
[tempo] float,
[duration_ms] bigint,
[time_signature] bigint
)
WITH (
LOCATION = 'data/updated.csv',
DATA_SOURCE = [data_test_dfs_core_windows_net],
FILE_FORMAT = [SynapseDelimitedTextFormat]
)
GO
SELECT TOP 100 * FROM dbo.updated
GO
Below is the data sample
My CSV is utf-8 encoding. Not sure what is the issue. The error shows column (Track_popularity). Please advise

I’m guessing you may have a header row that should be skipped. Drop your external table and then drop and recreate the external file format as follows:
CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat]
WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
USE_TYPE_DEFAULT = FALSE,
FIRST_ROW = 2
))

Related

Azure Synapse Delta Table Creation and Import Data From ADLS delta lake

We have requirement to load the data from ADLS delta data into synapse table. actually, we are writing the delta format data into ADLS gen2 from databricks. now we want to load the data from ADLS gen2(with delta table) to synapse table delta table. below steps we followed to create table but we are getting issues.
CREATE EXTERNAL FILE FORMAT DeltaFileFormat
WITH (  
     FORMAT_TYPE = DELTA  
);   
CREATE EXTERNAL DATA SOURCE test_data_source
WITH
(     LOCATION = 'abfss://container#storage.dfs.core.windows.net/table_metadata/testtable'
        --,CREDENTIAL = <database scoped credential>
); 
CREATE EXTERNAL TABLE testtable (
     job_id int,
     source_type varchar(10),
     server_name varchar(10),
     database_name varchar(15),
     table_name varchar(20),
     custom_query varchar(100),
     source_location varchar(500),
     job_timestamp datetime2,
     job_user varchar(50)
) WITH (
        LOCATION = 'abfss://targetcontainer#targetstorage.dfs.core.windows.net/table_metadata/testtable',
        data_source = test_data_source,
        FILE_FORMAT = DeltaFileFormat
);
select * from testtable; 
while query select statement, below issues throwing exception.
Content of directory on path
'https://container#storage.dfs.core.windows.net_delta_log/.' cannot
be listed.
I also tried and getting similar error.
Content of directory on path 'https://container#storage.dfs.core.windows.net_delta_log/.' cannot be listed.
This error message typically occurs when there is no data or delta files present in the _delta_log directory of the specified ADLS Gen2 storage account. The _delta_log directory is created automatically when you create a Delta table, and it contains the transaction log files for the Delta table.
In below code, the delta table files are located at /demodata/ and the external data source for this example is test_data_source18 which contain file path information and credentials. LOCATION the delta table folder or the absolute file itself, which would be located at /demodata/_delta_log/00000000000000000000.json.
CREATE EXTERNAL FILE FORMAT DeltaFileFormat
WITH (
FORMAT_TYPE = DELTA
);
CREATE EXTERNAL DATA SOURCE test_data_source18
WITH
( LOCATION = 'abfss://demo#dlsg2p.dfs.core.windows.net');
CREATE EXTERNAL TABLE testtable24(
Id varchar(20),
Name varchar(20)
) WITH (
LOCATION = '/demodata/',
data_source = test_data_source18,
FILE_FORMAT = DeltaFileFormat
);
select * from testtable24;
Output:
Reference: delta table with PolyBase

Azure SQL bulk insert from blob storage failure: Referenced external data source "MyAzureBlobStorage" not found

I keep getting this error (Referenced external data source "MyAzureBlobStorage" not found.) when loading csv from blob to Azure SQL. I am following this example and I set my blob to be public but the following just does not work:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://test.blob.core.windows.net/test'
);
BULK INSERT SubscriberQueue
FROM 'inputs.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage', FORMAT='CSV');
Any ideas what I am missing here?
If you want to bulk insert from Azure blob, please refer to following script
my csv file
1,Peter,Jackson,pjackson#hotmail.com
2,Jason,Smith,jsmith#gmail.com
3,Joe,Raasi,jraasi#hotmail.com
script
create table listcustomer
(id int,
firstname varchar(60),
lastname varchar(60),
email varchar(60))
Go
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://****.blob.core.windows.net/test'
);
Go
BULK INSERT listcustomer
FROM 'mycustomers.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage', FORMAT='CSV');
Go
select * from listcustomer;

Export result of a query to a text file by Teradata Parallel transporter

I need to automate an export of a large amount of data (from TERADATA by an sql query) to a pipe delimited text file. I used PowerShell (ConvertTo-CSV cmdlet) but it was very slow.
I was advised to use TPT to export but I have never used this Tool and all I have found is how to export one table to a flat file and not a complex query with more than a table.
Does someone know how to proceed with TPT or has a sample of a script for that?
Edit :
I am using this script for TPT but it has I think a lot of errors
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a delimited file'
(
DEFINE SCHEMA select EXPORT_DELIMITED_FILE from DELIMITED OF OPERATOR SQL_SELECTOR
DEFINE OPERATOR SQL_SELECTOR
TYPE SELECTOR
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'selector_log',
VARCHAR TdpId = 'Server',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'password',
VARCHAR SelectStmt = 'E:\PowerShell\SQL\file.sql',
VARCHAR ReportModeOn
);
DEFINE OPERATOR FILE_WRITER
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'dataconnector_log',
VARCHAR DirectoryPath = 'E:\PowerShell\Output',
VARCHAR FileName = 'test_file.txt',
VARCHAR Format = 'Delimited',
VARCHAR OpenMode = 'Write',
VARCHAR TextDelimiter = '|'
);
APPLY TO OPERATOR (FILE_WRITER)
SELECT * FROM OPERATOR (SQL_SELECTOR);
);
Your schema definition is wrong. Schema defines how your data set looks like in terms of field names and data types.
Producer operators (in your case SQL Selector) always need to have a defined schema. Deferred schema (SCHEMA *) can only be used with consumer operators (in your case Data Connector), which allows consumers to have the same schema as the related producer.
Schema definitions looks something like this:
DEFINE SCHEMA FILE_SCHEMA
(
Column1 VARCHAR(255),
Column2 VARCHAR(255),
Column3 VARCHAR(255),
Column4 VARCHAR(255)
);
Remember that data connector operator accepts only character data in the schema. If you specify any other data type, it will result in error.
Also, SelectStmt must contain an actual SQL query, not the path to file containing query. For exporting data to a flat file through data connector operator, you will need to cast everything to VARCHAR type in your query
SelectStmt = 'SELECT CAST(ColumnA AS VARCHAR(100), CAST(ColumnB AS VARCHAR(100), CAST(ColumnC AS VARCHAR(100), CAST(COUNT(*) AS VARCHAR(100)) FROM MyTable GROUP BY 1,2,3,4;'
Note that number of columns returned by the SelectStmt are same as the define schema.
Also set ReportModeOn = 'Y' instead of leaving it to default value.
Always remember to indent the code. With indentation, the script now looks like:
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a delimited file'
(
DEFINE SCHEMA FILE_SCHEMA
(
Column1 VARCHAR(100),
Column2 VARCHAR(100),
Column3 VARCHAR(100),
Column4 VARCHAR(100)
);
DEFINE OPERATOR SQL_SELECTOR
TYPE SELECTOR
SCHEMA FILE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = 'selector_log',
VARCHAR TdpId = 'Server',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'password',
VARCHAR SelectStmt = 'SELECT CAST(ColumnA AS VARCHAR(100), CAST(ColumnB AS VARCHAR(100), CAST(ColumnC AS VARCHAR(100), CAST(COUNT(*) AS VARCHAR(100)) FROM MyTable GROUP BY 1,2,3;',
VARCHAR ReportModeOn = 'Y'
);
DEFINE OPERATOR FILE_WRITER
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'dataconnector_log',
VARCHAR DirectoryPath = 'E:\PowerShell\Output',
VARCHAR FileName = 'test_file.txt',
VARCHAR Format = 'Delimited',
VARCHAR OpenMode = 'Write',
VARCHAR TextDelimiter = '|'
);
APPLY TO OPERATOR (FILE_WRITER)
SELECT * FROM OPERATOR (SQL_SELECTOR);
);

SQL Server 2016 with Azure Blob Container , nvarchar(max) is not supported for external tables?

I'm trying to create external table as it described in
https://msdn.microsoft.com/en-us/library/mt652315.aspx
using following structure:
CREATE EXTERNAL TABLE dbo.Blob2 (
ID int NOT NULL,
Description nvarchar(max) NULL,
Title nvarchar(max) NULL
)
WITH (LOCATION='/Blob/',
DATA_SOURCE = AzureStorage,
FILE_FORMAT = TextFileFormat
);
and getting error:
Msg 46518, Level 16, State 12, Line 49
The type 'nvarchar(max)' is not supported with external tables.
Am I'm missing something? I'm seeing that in documentation for external tables nvarchar/string/text in the list
https://msdn.microsoft.com/en-us/library/dn935021.aspx
Is there any chance that I can store text data in that container?

Inserting data from a text file to a table

I have a text file with following content:
Output.txt:
windows 32708
linux 30996
macos 32811
I am trying with:
CREATE TABLE data
(
Name varchar (100) NOT NULL,
Memory number (20) NOT NULL,
)
BULK INSERT data FROM 'C:\users\Output.txt' WITH
(
FIELDTERMINATOR = '|'
ROWTERMINATOR = '|\n'
)
\
I see two options.
Use SQL*Loader to load the data into some staging table and use procedure to do the data changes (conversion from bytes to GB)
Use External tables to refer to the file and populate your table and do the conversion