Upload error in S3 for CSV, despite having "ESCAPE ACCEPTINVCHARS"

Upload error in S3 for CSV, despite having "ESCAPE ACCEPTINVCHARS" - sql

This seems to be a common problem with a common solution: Uploading a CSV via S3 and getting the Missing newline: Unexpected character error? Just add ESCAPE ACCEPTINVCHARS to your COPY statement!
So I did that and still get the error.
My CSV looks like this:
email, step1_timestamp, step2_timestamp, step3_timestamp, step4_timestamp, url, type
fake#email.gov, 2015-01-28 12:1I:05, 2015-01-28 12:1I:05, NULL, NULL, notasite.gov, M Final
wrong#email.net, 2015-01-28 12:7I:19, NULL, NULL, NULL, notasite.gov/landing, M
I successfully upload in S3 and run the following COPY
COPY <my_table> FROM 's3://<my_bucket>/<my_folder>/uploadaws.csv'
CREDENTIALS 'aws_access_key_id=<my_id>;aws_secret_access_key=<'
REGION 'us-west-1'
DELIMITER ','
null as '\00'
IGNOREHEADER 1
ESCAPE ACCEPTINVCHARS;
My error code:
Missing newline: Unexpected character 0x6e found at location 4194303
The first characters of the error:
:05,,,,,M Final
xxxx#yyyyy.com,2015-01-28 12:1I:05,,,,,M Final
xxx.xxx#yyyy.com,2015-01-28 12:1I:05,,,,,M Final
xxxx

Your file probably just needs a newline at the end of the very last row.
ACCEPTINVCHARS won't help as it for files that contain invalid UTF8 codepoints or control characters.
ESCAPE is for loading embedded quotes in files with quoted data. Your file would have to be specially prepared for that.

Related

Weird characters showing up when importing from csv files in sql

I'm trying to import data from a csv file into SQL but I keep getting the error
\copy owner (owner_id, owner_name, owner_surname) FROM 'C:\Users\Documents\owners.csv' DELIMITER ',';
ERROR: invalid input syntax for type integer: "ï»¿0"
CONTEXT: COPY owner, line 1, column owner_id: "ï»¿0"
Here's what owners.csv looks like
I understand that the error is to do with the encoding and that I should change the encoding to UTF 8 BOM, which I have done but the error still persist

ï»¿0 in WIN1252 is hexadecimal 0xEFBBBF30, which would be a BOM and a 0.
Remove that BOM from the file, and you will get better results.

Problem to replace the blank columns in big query with null

I was expecting Null_marker to replace the blank STRING with null but, it did not work. Any suggestions, please?
tried using the --null_marker="null"
$gcloud_dir/bq load $svc_ac --max_bad_records=10 --replace --source_format=CSV --null_marker="null" --field_delimiter=',' table source
the empty stings did not get replaced with NULL

Google Cloud Support here!
After reading through the documentation, the description for the --null_marker flag states:
Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string.
Therefore setting null_marker=null will not replace empty strings with NULL, it will only treat 'null' as a null value. At this point you should either:
Replace empty strings before uploading the CSV file.
Once you have uploaded the CSV file make a query using the replace function.

Unable to visualize junk character

I am receiving data input from a .DAT file to my teradata table using Informatica. However it is failing on junk character issue.
My solution -
Remove the junk character using a REPLACE function. I tried to open the .dat file in NOTEPAD++ to see what is the junk/bad character but I see this (few samples)
Crea􀆟ve Cloud
Mul􀆟ple
Image of how it looks in NOTEPAD++
The text it shows is xEDxAFx80 xEDXB6X9F
My ask -
I don't know what these character mean. Can anyone tell me the ASCII code or how to put this in a REPLACE function so I can replace it with another character ?
EDIT -
Target column_name - COLUMN_NAME VARCHAR(240) CHARACTER SET UNICODE NOT CASESPECIFIC [Teradata Database]
Source Column_name - VARCHAR2(240) [ORACLE Database]
Data in Oracle -

You can parse the character using an xml parser or to an xml target and the junk character will get converted to its hex representation thus not erroring out. However, Nico has provided a simpler solution here
https://network.informatica.com/thread/20642

How to import from a mixed-encoding file to a PostgreSQL table

I have a 30 GB text file. the encoding of the file is UTF8 but it also contains some Windows-1252 characters. So, when I try to import, it gives the following error:
ERROR: invalid byte sequence for encoding "UTF8": 0x9b
How can I fix this?
the file already has UTF8 format, when i run the 'file' command for this file it says the encoding is UTF8. but it also contains some not UTF8 byte sequences. for example when I run the \copy command after a while it gives the above mentioned error for this row:
0B012234 Basic study of <img src="/fulltext-image.asp?format=htmlnonpaginated&src=323K744431152658_html\233_2 basic study of img src fulltext image asp format htmlnonpaginated src 323k744431152658_html 233_2 1975 Semigroup Forum semigroup forum 04861B53 19555

The issue is caused by the backslash (\).
Use CSV format which does not treat backslash as a special character, e.g. -
\copy t from myfile.txt with csv quote E'\x1' delimiter E'\x2'

Why double quote and <N> cause errors when upload to BigQuery?

Errors were reported when my program tried to upload a .csv file, via job upload to BigQuery:
Job failed while writing to Bigquery. invalid: Too many errors encountered. Limit is: 0. at
Error: [REASON] invalid [MESSAGE] Data between close double quote (") and field separator: field starts with: <N> [LOCATION] File: 0 / Line:21470 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I traced back to my file and did find the specified line like:
3D0F92F8-C892-4E6B-9930-6FA254809E58~"N" STYLE TOWING~1~0~5.7.1512.441~10.20.10.25:62342~MSSqlServer: N_STYLE on localhost~3~2015-12-17 01:56:41.720~1~<?xml version="1
The delimiter was set to be ~ , then why the double quote or maybe <N> is a problem?

The specification for csv says that if there is a quote in the field, then the entire field should be quoted. As in a,b,"c,d", which would have only three fields, since the third comma is quoted. The csv parser gets confused when there is data after a closing quote but before the next delimiter, as in a,b,"c,d"e.
You can fix this by specifying a custom quote character, since it sounds like you don't need a quote char at all, so you could just set it to something that you'll never see, like \0 or |. You're already setting configuration.load.delimiter, just set configuration.load.quote as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Upload error in S3 for CSV, despite having "ESCAPE ACCEPTINVCHARS" - sql

Your file probably just needs a newline at the end of the very last row. ACCEPTINVCHARS won't help as it for files that contain invalid UTF8 codepoints or control characters. ESCAPE is for loading embedded quotes in files with quoted data. Your file would have to be specially prepared for that.

Related

Weird characters showing up when importing from csv files in sql

Problem to replace the blank columns in big query with null

Unable to visualize junk character

How to import from a mixed-encoding file to a PostgreSQL table

Why double quote and <N> cause errors when upload to BigQuery?

Categories

Resources