How to escape double quotes within a data when it is already enclosed by double quotes - sql

I have CSV data separated by comma like below which has to be imported into snowflake table using copy command .
"1","2","3","2"In stick"
Since I am already passing the parameter OPTIONALLY_ENCLOSED_BY = '"' to copy command I couldn't escape the " (double quotes) within the data ("2"In stick") .
The imported data that I want to see in the table is like below
1,2,3,2"In stick
Can someone please help here ? Thanks !

If you are on Windows, I have a funny solution for that. Open this CSV file in MS Excel. Excel consumes correct double quotes to show data in the cellular format and leaves the extra in the middle of a cell (if each cell is separated properly by commas). Then choose 'replace' and replace double quotes with something else (like two single quotes or replace by nothing to remove them). Then save it again as a CSV. I assume other spreadsheet programs should do the same.

If you have an un-escaped quote inside a field which is surrounded by quotes that isn't really valid CSV. For example, here is an excerpt from the RFC4180 spec
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with another double quote.
For example:
"aaa","b""bb","ccc"
I think that whatever is generating the CSV file is doing it incorrectly and needs to be fixed before you will be able to load it into Snowflake. I don't think any file_format option will be able to solve this for you since it's not valid CSV.
The CSV row should either look like this:
"1","2","3","2""In stick"
or this:
"1","2","3","2\"In stick"

I had this same problem, and while writing up the question, I found an answer:
Import RFC4180 files (CSV spec) into snowflake? (Unable to create file format that matches CSV RFC spec)
Essentially, set:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
{you have to decide what to put here}
Field optionally enclosed by
Double Quote
Escape Character
None
Escape Unenclosed Field
None
Here is my ALTER statement:
ALTER FILE FORMAT "DB_NAME"."SCHEMA_NAME"."CSV_SPEC3" SET COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
As I mention in the answer, I don't know why the above works, but it is working for me. Go figure.

Related

Remove double double quotes before copy into snowflake

I am trying to load some csv data to a Snowflake table. However, I am facing some issues with double double quotes in some rows of the file.
This is the file format I was using inside COPY INTO command:
file_format=(TYPE=CSV,
FIELD_DELIMITER = '|',
FIELD_OPTIONALLY_ENCLOSED_BY='"',
SKIP_HEADER =1);
As you can see in the example below, I have double double quotes around ID, which is a data quality problem, but I have to deal with it because
Column1|Column2|Column3|Column4|Column5|Column6|Column7|""ID""|Column9
I cannot change it in its source. I tried to replace the double double quotes ("") by a single double quote ("), as the below example depicts:
However, Snowflake is still returning the same error:
Found character 'I' instead of field delimiter '|' File 'XXXX', line 709, character 75 Row 708, column "Column8"["$8":8] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
Do you know how can I deal with this, allowing the file content to be properly loaded into Snowflake table?

Custom delimiter while exporting Google Cloud SQL to CSV

I've been successfully exporting GCloud SQL to CSV with its default delimiter ",". I want to import this CSV to Google Big Query and I've succeed to do this.
However, I'm experiencing a little problem. There's "," in some of my cell/field. It causes Big Query import process not working properly. For Example:
"Budi", "19", "Want to be hero, and knight"
My questions are:
Is it possible to export Google Cloud SQL with custom delimiter e.g. "|"?
If not, how to make above sample data to be imported in Google Big Query and become 3 field/cell?
Cheers.
Is it possible to export Google Cloud SQL with custom delimiter e.g. "|"?
Yes it's, See the documentation page of BigQuery how to set load options provided in this link
You will need to add --field_delimiter = '|' to your command
From the documentation:
(Optional) The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF8. BigQuery converts the string to ISO-8859-1 encoding, and uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (,).
As far as I know there's no way of setting a custom delimiter when exporting from CloudSQL to CSV. I attempted to introduce my own delimiter by formulating my select query like so:
select column_1||'|'||column_2 from foo
But this only results in CloudSQL escaping the whole result in the resulting CSV with double quotes. This also aligns with the documentation which states:
Exporting in CSV format is equivalent to running the following SQL statement:
SELECT <query> INTO OUTFILE ... CHARACTER SET 'utf8mb4'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
ESCAPED BY '\\' LINES TERMINATED BY '\n'
https://cloud.google.com/sql/docs/mysql/import-export/exporting

skipLeadingRows=1 in external table definition

In the below example, how can I set the skip leading row option?
bq --location=US query --external_table_definition=sales::Region:STRING,Quarter:STRING,Total_sales:INTEGER#CSV=gs://mybucket/sales.csv 'SELECT Region,Total_sales FROM sales;'
Regards,
Sreekanth
Flags options can be found under the installation home folder (I marked in bold below the flag you are looking for)
/google-cloud-sdk/platform/bq/bq.py:
--[no]allow_jagged_rows: Whether to allow missing trailing optional columns in
CSV import data.
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import
data.
-E,--encoding: : The character encoding used by the input
file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between
columns in the input file. "\t" and "tab" are accepted names for tab.
--[no]ignore_unknown_values: Whether to allow and ignore extra, unrecognized
values in CSV or JSON import data.
--max_bad_records: Maximum number of bad records allowed before the entire job
fails.
(default: '0')
(an integer)
--quote: Quote character to use to enclose records. Default is ". To indicate
no quote character at all, use an empty string.
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form
name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to
skip.
(an integer)
--source_format: : Format of
source data. Options include:
CSV
NEWLINE_DELIMITED_JSON
DATASTORE_BACKUP

I was wondering if there is any way to treat delimiters inside quotes as merely characters and not delimiters

I have a massive amount of files that are all made using the same schema. They are put into a format where they are space delimited. A sample file row looks like this:
1 2 abc def "g h" 3
And when I try to use the schema INT, INT, STRING, STRING, STRING, INT, it fails for me because of the space inside the quotation marks.
I know this is where the error is because if I make a sample tab separated instead of space separated, no such error occurs, but that is not feasible for me to do with all of my data. I was wondering if there is any way to be able to indicate in a file upload that delimiters in quotes should not be treated as delimiters but rather as characters? (Rather that all quoted text should be treated as one string.)
I know this feature exists for new line characters, and so I was wondering about delimiters.
Thank you!
I figured it out. The error was there was an extra delimiter character at the end of the file. Now I just need to trim each line of the file before uploading.

How to import data from cvs which some field contains " " by using load data infie

I have some columns want to insert a existing table, some columns have contents like this"how, using,list, file",each column is separated by",",so how do I use load data infile to import them?
You didn't indicate which db you're using, So I'll answer you for mysql.
Please see this.
If the lines in such a file are terminated by carriage return/newline pairs, the statement shown here illustrates the field- and line-handling options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
And further down the page:
Occurrences of the ENCLOSED BY character within a field value are escaped by prefixing them with the ESCAPED BY character. Also note that if you specify an empty ESCAPED BY value, it is possible to inadvertently generate output that cannot be read properly by LOAD DATA INFILE. For example, the preceding output just shown would appear as follows if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:
1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20
So, if you have unescaped "" inside columns these data could not be imported in the general case, and you'll have either to export data using correct ENCLOSED BY character, or preprocess the file to escape " first.
If you just have , inside your columns, then it's easy, you'll have to use ENCLOSED BY and TERMINATED BY.