Can't update CSV file to BQR - google-bigquery

I tried to update manually this data sample to BigQuery (after tried to update from Google Cloud, I extracted some rows to detect what's the problem). I met these errors:
Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Error while reading data, error message: CSV table references column position 77, but line starting at position:0 contains only 56 columns.
My sample data is: https://drive.google.com/file/d/1v8jcIKSY7HiOpdc40BFJXACvgX8prWm0/view?usp=sharing

Please use the following steps to resolve the issue:
Download the file from Google Drive
Open the File and Save As "CSV UTF-8 (Comma delimited) (*.csv)"
Open BigQuery and upload CSV file with "Auto detect" schema

Related

Reading .mpr/.mps/.mpt files with pd.read_csv

I'm trying to read some .mpr, .mps and .mpt files with Pandas and my professor told us to use pd.read_csv to open them. When I try to do this using: pd.read_csv(path+filename, delimiter='\t', header=55) I get the following error consistently:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 2
If I use skip_rows I can circumvent this problem and display the column names of the file, but no data appears. I know the file isn't empty so I'm wondering why this is happening.
Thanks!

I am getting the error message "ERROR: invalid byte sequence for encoding "WIN1251": 0x00"

I am trying to load the CSV to my postgres database but I am getting the above error message:
My query is :
COPY dbo.tbl(col1,col2)
FROM 'C:\Data\dbo.tbl.csv' DELIMITER ',' null as 'null' encoding 'windows-1251' CSV;
I tried this link for reference:https://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/'
Can someone please help me what is the issue. I am new to postgres database.
For my experience, I have csv file with encoding 'WIN874' and postgres client encoding is 'UTF-8'. I can use copy command without problems.
To make sure that your file encoding is 'WIN1251', you can open your csv file in visual studio code. It will show file encoding on the right bottom pane.

Load compressed data from Amazon S3 to Postgres using datastage

I am trying to load data which is stored in .gz format in S3 to PostgreSQL server using Datastage. I am using the ODBC connector on the target (database) side. I am able to load uncompressed data from S3 to PostgreSQL but no luck with compressed data so far. I have tried the Expand Stage but it's not helping or I am not doing the right thing. Without the "Expand" the data is coming but it is trying to read the compressed data, while doing so it fails and throws an error:
Amazon_S3_0,1: com.ascential.e2.common.CC_Exception: Failed to initialize the parser: The row delimiter was not found within the first 132 bytes of the file. Ensure that the Row delimiter property matches the row delimiter of the file.
at com.ibm.iis.cc.cloud.CloudLogger.createCCException (CloudLogger.java: 196)
at com.ibm.iis.cc.cloud.CloudStage.processReadAndParse (CloudStage.java: 1591)
at com.ibm.iis.cc.cloud.CloudStage.process (CloudStage.java: 680)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run (CC_JavaAdapter.java: 443)
Amazon_S3_0,1: Failed to initialize the parser: The row delimiter was not found within the first 132 bytes of the file. Ensure that the Row delimiter property matches the row delimiter of the file. (com.ibm.iis.cc.cloud.CloudLogger::createCCException, file CloudLogger.java, line 196)
If someone has come across this, please share your valuable inputs.

Proper CSV export from SQL Server

I have a table in SQL Management Studio I want to export as a CSV and afterwards import it into WEKA.
I queried all data from the table, selected it, then right-clicked and chose "Save results as"->CSV.
When I try to import this CSV into WEKA, I get the following error message:
File <path> not recognized as an 'CSV data files' here.
Reason:
wrong number of values. READ 27, expected 26, read Token[EOL], line 1023
I assume, I need to escape a String at line 1023, but what if another 100... errors will follow?
Is there any way to automatically escape all characters to get a proper CSV file, without post-processing?

Internal error while loading to Bigquery table

I ran this command to load 11 files to a Bigquery table:
bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
I got this error:
Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE
BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0.
Failure details:
- File: 5: Unexpected. Please try again.
I tried many times after that and still got the same error.
To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:
/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt
There are 11 files total and each ran fine.
Could someone please help? Is this a bug on Bigquery side?
Thank you.
There was an error reading one of the files: gs://...part-m-00005.gz
Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.
It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.
When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.
If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.
Please let me know if you think this is not the case and I'll dig in some more.