I need to upload a CSV file to BigQuery via the UI, after I select the file from my local drive I specify BigQuery to automatically detect the Schema and run the job. It fails with the following message:
"Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2; errors: 1. Please look into the
errors[] collection for more details."
I have tried removing the comma in the last column, and tried changing options in the advanced section but it always results in the same error.
The error log is not helping me understand where the problem is, this is example of the error log entry:
2
019-04-03 23:03:50.261 CLST Bigquery jobcompleted
bquxjob_6b9eae1_169e6166db0 frank#xxxxxxxxx.nn INVALID_ARGUMENT
and:
"Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2; errors: 1. Please look into the
errors[] collection for more details."
and:
"Error while reading data, error message: Error detected while parsing
row starting at position: 46. Error: Data between close double quote
(") and field separator."
The strange thing is that the sample CSV data has NO double quote field separator!?
2019-01-02 00:00:00,326,1,,292,0,,294,0,,-28,0,,262,0,,109,0,,372,0,,453,0,,536,0,,136,0,,2609,0,,1450,0,,352,0,,-123,0,,17852,0,,8528,0
2019-01-02 00:02:29,289,1,,402,0,,165,0,,-218,0,,150,0,,90,0,,263,0,,327,0,,275,0,,67,0,,4863,0,,2808,0,,124,0,,454,0,,21880,0,,6410,0
2019-01-02 00:07:29,622,1,,135,0,,228,0,,-147,0,,130,0,,51,0,,381,0,,428,0,,276,0,,67,0,,2672,0,,1623,0,,346,0,,-140,0,,23962,0,,10759,0
2019-01-02 00:12:29,206,1,,118,0,,431,0,,106,0,,133,0,,50,0,,380,0,,426,0,,272,0,,63,0,,1224,0,,740,0,,371,0,,-127,0,,27758,0,,12187,0
2019-01-02 00:17:29,174,1,,119,0,,363,0,,59,0,,157,0,,67,0,,381,0,,426,0,,344,0,,161,0,,923,0,,595,0,,372,0,,-128,0,,22249,0,,9278,0
2019-01-02 00:22:29,175,1,,119,0,,301,0,,7,0,,124,0,,46,0,,382,0,,425,0,,431,0,,339,0,,1622,0,,1344,0,,379,0,,-126,0,,23888,0,,8963,0
I shared an example of a few lines of CSV data. I expect BigQuery to be able to detect the schema and load the data into a new table.
Using BigQuery new WebUI and your input data I did the following:
Select a dataset
Clicked on create a table
Filled the create table form as follow:
The table was created and I was able to SELECT 6 rows as expected
SELECT * FROM projectId.datasetId.SO LIMIT 1000
Tried to load csv files into bigquery table. There are columns where the types are INTEGER, but some missing values are NULL. So when I use the command bq load to load, got the following error:
Could not parse 'null' as int for field
So I am wondering what are the best solutions to deal with this, have to reprocess the data first for bq to load?
You'll need to transform the data in order to end up with the expected schema and data. Instead of INTEGER, specify the column as having type STRING. Load the CSV file into a table that you don't plan to use long-term, e.g. YourTempTable. In the BigQuery UI, click "Show Options", then select a destination table with the table name that you want. Now run the query:
#standardSQL
SELECT * REPLACE(SAFE_CAST(x AS INT64) AS x)
FROM YourTempTable;
This will convert the string values to integers where 'null' is treated as null.
Please try with job config setting.
job_config.null_marker = 'NULL'
configuration.load.nullMarker
string
[Optional] Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load
BigQuery Console has it's limitations and doesn't allow you to specify a null marker while loading data from a CSV. However, it can easily be done by using the BigQuery command-line tool's bq load command. We can use the --null_marker flag to specify the marker which is simply null in this case.
bq load --source_format=CSV \
--null_marker=null \
--skip_leading_rows=1 \
dataset.table_name \
./data.csv \
./schema.json
Setting the null_marker as null does the trick here. You can omit the schema.json part if the table is already present with a valid schema. --skip_leading_rows=1 is used because my first row was a header.
You can learn more about the bg load command in the BigQuery Documentation.
The load command however lets you create and load a table in a single go. The schema needs to be specified in a JSON file in the below format:
[
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
},
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
}
]
I am trying to insert the data from this link to my SQL server
https://www.ian.com/affiliatecenter/include/V2/CityCoordinatesList.zip
I created the table
CREATE TABLE [dbo].[tblCityCoordinatesList](
[RegionID] [int] NOT NULL,
[RegionName] [nvarchar](255) NULL,
[Coordinates] [nvarchar](4000) NULL
) ON [PRIMARY]
And I am running the following script to do the bulk insert
BULK INSERT tblCityCoordinatesList
FROM 'C:\data\CityCoordinatesList.txt'
WITH
(
FIRSTROW = 2,
MAXERRORS = 0,
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n'
)
But the bulk insert fails with following error
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
When I google, I found several articles which says the issue may be with RowTerminator, but I tried everything like \n\r, \n etc, but nothing is working.
Could anyone please help me to insert this data into my database?
Try ROWTERMINATOR = '0x0a'.
it should work.
This can also happen if the number of columns mismatch between the table and the imported file
I got the same error message, and as you had mention, it was related to unexpected line ending.
In my case the line ending was specified in a fmt file as a Windows Line ending (CRLF), written as \r\n, and the data file to process has a Mac classic one (CR).
I solved it with an editor that can show the current line ending and change it. I used EditPad Lite wich shows the opened file line ending in the bottom bar and pressing it allow to replace with the expected one.
I had this on SQL2019 when the FORMAT='CSV' option was used, and there was a comma on the end of each line in the source file. So the table your BULK inserting into needed to have an extra dummy field to cater for the fact each record has essentially a blank field in the source file.
!
I get the same error, probably from the file encoding problem. I fixed it by opening the problem CSV file using Notepad++, select everything and copy to clipboard. Next, create a new text file (making sure it has the CSV file extension), open it using Notepad++, then paste the text to the new file. Save and close all files. You should be able to successfully load the new CSV file into the SQL server.
you need run BULK INSERT - command from windows login (not from SQL). Now I don't have any examples