unable to load csv file from GCS into bigquery - google-bigquery

I am unable to load 500mb csv file from google cloud storage to big query but i got this error
Errors:
Too many errors encountered. (error code: invalid)
Job ID xxxx-xxxx-xxxx:bquijob_59e9ec3a_155fe16096e
Start Time Jul 18, 2016, 6:28:27 PM
End Time Jul 18, 2016, 6:28:28 PM
Destination Table xxxx-xxxx-xxxx:DEV.VIS24_2014_TO_2017
Write Preference Write if empty
Source Format CSV
Delimiter ,
Skip Leading Rows 1
Source URI gs://xxxx-xxxx-xxxx-dev/VIS24 2014 to 2017.csv.gz
I have gzipped 500mb csv file to csv.gz to upload to GCS.Please help me to solve this issue

The internal details for your job show that there was an error reading the row #1 of your CSV file. You'll need to investigate further, but it could be that you have a header row that doesn't conform to the schema of the rest of the file, so we're trying to parse a string in the header as an integer or boolean or something like that. You can set the skipLeadingRows property to skip such a row.
Other than that, I'd check that the first row of your data matches the schema you're attempting to import with.
Also, the error message you received is unfortunately very unhelpful, so I've filed a bug internally to make the error you received in this case more helpful.

Related

SSIS Flat File Import errors

I have a ssis job that imports flat file data into my database and also data conversion. Please find a view of the scheme:
The issue is that I keep getting errors on the "Violations" field see below:
[Flat File Source [37]] Error: Data conversion failed. The data
conversion for column "Violations" returned status value 4 and status
text "Text was truncated or one or more characters had no match in the
target code page.".
[Flat File Source [37]] Error: The "Flat File Source.Outputs[Flat File
Source Output].Columns[Violations]" failed because truncation
occurred, and the truncation row disposition on "Flat File
Source.Outputs[Flat File Source Output].Columns[Violations]" specifies
failure on truncation. A truncation error occurred on the specified
object of the specified component.
[Flat File Source [37]] Error: An error occurred while processing file
"C:\Users\XXXX\XXXX\XXXX\XXXX\XXXX\XXXX\XXXX\Food_Inspections.csv"
on data row 25.
In line 25 of the CSV file, this field is over 4000 characters long.
In data conversion, I currently have the Data Type set to string [DT_STR] of length 8000, coding 65001.
Row delimiter {LF},
Column delimiter Semicolon {;}
I have already looked at other suggested solutions, i.e. increasing OutputColumnWidth to 5000 but it did not help - please advise how to solve this.

"Error while reading data" error received when uploading CSV file into BigQuery via console UI

I need to upload a CSV file to BigQuery via the UI, after I select the file from my local drive I specify BigQuery to automatically detect the Schema and run the job. It fails with the following message:
"Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2; errors: 1. Please look into the
errors[] collection for more details."
I have tried removing the comma in the last column, and tried changing options in the advanced section but it always results in the same error.
The error log is not helping me understand where the problem is, this is example of the error log entry:
2
019-04-03 23:03:50.261 CLST Bigquery jobcompleted
bquxjob_6b9eae1_169e6166db0 frank#xxxxxxxxx.nn INVALID_ARGUMENT
and:
"Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2; errors: 1. Please look into the
errors[] collection for more details."
and:
"Error while reading data, error message: Error detected while parsing
row starting at position: 46. Error: Data between close double quote
(") and field separator."
The strange thing is that the sample CSV data has NO double quote field separator!?
2019-01-02 00:00:00,326,1,,292,0,,294,0,,-28,0,,262,0,,109,0,,372,0,,453,0,,536,0,,136,0,,2609,0,,1450,0,,352,0,,-123,0,,17852,0,,8528,0
2019-01-02 00:02:29,289,1,,402,0,,165,0,,-218,0,,150,0,,90,0,,263,0,,327,0,,275,0,,67,0,,4863,0,,2808,0,,124,0,,454,0,,21880,0,,6410,0
2019-01-02 00:07:29,622,1,,135,0,,228,0,,-147,0,,130,0,,51,0,,381,0,,428,0,,276,0,,67,0,,2672,0,,1623,0,,346,0,,-140,0,,23962,0,,10759,0
2019-01-02 00:12:29,206,1,,118,0,,431,0,,106,0,,133,0,,50,0,,380,0,,426,0,,272,0,,63,0,,1224,0,,740,0,,371,0,,-127,0,,27758,0,,12187,0
2019-01-02 00:17:29,174,1,,119,0,,363,0,,59,0,,157,0,,67,0,,381,0,,426,0,,344,0,,161,0,,923,0,,595,0,,372,0,,-128,0,,22249,0,,9278,0
2019-01-02 00:22:29,175,1,,119,0,,301,0,,7,0,,124,0,,46,0,,382,0,,425,0,,431,0,,339,0,,1622,0,,1344,0,,379,0,,-126,0,,23888,0,,8963,0
I shared an example of a few lines of CSV data. I expect BigQuery to be able to detect the schema and load the data into a new table.
Using BigQuery new WebUI and your input data I did the following:
Select a dataset
Clicked on create a table
Filled the create table form as follow:
The table was created and I was able to SELECT 6 rows as expected
SELECT * FROM projectId.datasetId.SO LIMIT 1000

Getting Internal Server Error on pgSQL

Im tryingto import data from windows CSV (comma delimiter) file into pgSQL faxtest1 table, but I keep getting error saying "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."
The following is my code:
COPY faxtest1
FROM 'C:‪\Users\David\Desktop\test3.csv'
WITH DELIMITER AS ',' CSV ;
The CSV file is like:
Status,Fax ID
Fax to Email,2104
Fax to Email,2108
It is a bug of pg admin 4, hope they will fix it in the future.
In version 14, in the Import/Export data function, there are 2 columns, "Options" and "Columns." Try manually select the columns one at time, separated by a comma. See if this would by pass the error.
It worked for me.

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Bad character in the file

I tried to load the data from cloud and it failed 3 times.
Job ID: job_2ed0ded6ce1d4837873e0ab498b0bc1b
Start Time: 9:10pm, 1 Aug 2012
End Time: 10:55pm, 1 Aug 2012
Destination Table: 567402616005:company.ox_data_summary_ad_hourly
Source URI: gs://daily_log/ox_data_summary_ad_hourly.txt.gz
Delimiter:
Max Bad Records: 30000
Job ID: job_47447ab60d2a40f588c89dfe638aa438
Line:176073205 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.
Too many errors encountered. Limit is: 0.
Should I try again? or is there any issue with the source file?
This is a known bug dealing with gzipped files. The only workaround currently is just to use an uncompressed file.
There are changes coming soon that should make it easier to handle large, uncompressed files (imports will be faster, and file size limits will increase).