inserting quotes in big query - google-bigquery

I can easily upload a file delimited by ^
It looks something like...
CN^others^2012-05-03 00:02:25^^^^^Mozilla/5.0^generic web browser^^^^^^^^
CN^others^2012-05-03 00:02:26^^^^^Mozilla/5.0^generic web browser^^^^^^^^
But if I have a double quote somewhere, it fails with an error message...
Line:1 / Field:, Data between close double quote (") and field separator: field starts with:
Too many errors encountered. Limit is: 0.
CN^others^2012-05-03 00:02:25^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
I do regulary get the files with "Mozilla as browser name, how do I insert data with double quotes?

Quotes can be escaped with another quote. For example, the field: This field has "internal quotes". would become This field has ""internal quotes"".
sed 's/\"/\"\"/g' should do the trick.
Note that in order to import data that contains quoted newlines, you need to set the allow_quoted_newlines flag to true on the import configuration. This means the import cannot be processed in parallel, and so may be slower than importing data without that flag set.

Related

Remove double double quotes before copy into snowflake

I am trying to load some csv data to a Snowflake table. However, I am facing some issues with double double quotes in some rows of the file.
This is the file format I was using inside COPY INTO command:
file_format=(TYPE=CSV,
FIELD_DELIMITER = '|',
FIELD_OPTIONALLY_ENCLOSED_BY='"',
SKIP_HEADER =1);
As you can see in the example below, I have double double quotes around ID, which is a data quality problem, but I have to deal with it because
Column1|Column2|Column3|Column4|Column5|Column6|Column7|""ID""|Column9
I cannot change it in its source. I tried to replace the double double quotes ("") by a single double quote ("), as the below example depicts:
However, Snowflake is still returning the same error:
Found character 'I' instead of field delimiter '|' File 'XXXX', line 709, character 75 Row 708, column "Column8"["$8":8] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
Do you know how can I deal with this, allowing the file content to be properly loaded into Snowflake table?

Escaping custom character in objective-c

I am working on macOS, not iOS, XCode 11.
My app allows in a specific location to enter text. This text can be anything. Once done it exports a csv which will be passed to an external process i cannot influence.
The issue: the external process uses semicolon ";" as a separator (csv is separated differently). If the user writes semicolon the external process will fail.
If I manually add an escaping backslash before each semicolon to the csv and then pass it to the external app it works.
What I need: having each semicolon escaped with ONE backslash in the final csv
What I tried
Escaping the whole text with quotation marks - fail
Escaping semicolons in objective-c before writing csv by trying
stringByReplacingOccurrencesOfString (look for #";" replace with #"\;" - compiler throws a warning that escape character is unknown - fail
Appreciate any help
UPDATE:
I also tried to set a double backslash like #Corbell mentioned but this leads in a double backslash in the exported CSV -> fail
I also tried to set a single backslash by using its unicode character:
[NSString stringWithFormat:#"%C;",0x5C]; --> "\\;"
Also failed and produces two backslashes in the final CSV (where i need ONE only).
In your stringByReplacingOccurencesOfString call, second parameter, try escaping your backslash with a backslash to make it a literal character to insert, i.e. #"\\;" - otherwise the compiler thinks you're trying to specify #"\;" as an escape sequence (backslash-semicolon) which is invalid.
Solved. It was the CSV Parser that added additional escaping characters. Once solved that it worked like a charm.

Why double quote and <N> cause errors when upload to BigQuery?

Errors were reported when my program tried to upload a .csv file, via job upload to BigQuery:
Job failed while writing to Bigquery. invalid: Too many errors encountered. Limit is: 0. at
Error: [REASON] invalid [MESSAGE] Data between close double quote (") and field separator: field starts with: <N> [LOCATION] File: 0 / Line:21470 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I traced back to my file and did find the specified line like:
3D0F92F8-C892-4E6B-9930-6FA254809E58~"N" STYLE TOWING~1~0~5.7.1512.441~10.20.10.25:62342~MSSqlServer: N_STYLE on localhost~3~2015-12-17 01:56:41.720~1~<?xml version="1
The delimiter was set to be ~ , then why the double quote or maybe <N> is a problem?
The specification for csv says that if there is a quote in the field, then the entire field should be quoted. As in a,b,"c,d", which would have only three fields, since the third comma is quoted. The csv parser gets confused when there is data after a closing quote but before the next delimiter, as in a,b,"c,d"e.
You can fix this by specifying a custom quote character, since it sounds like you don't need a quote char at all, so you could just set it to something that you'll never see, like \0 or |. You're already setting configuration.load.delimiter, just set configuration.load.quote as well.

Access VBA, importing csv file via TransferText with commata as decimal separator and semicolon as delimiter

I'm having some problems importing double numbers from csv files. The files have a semicolon delimiter and comma as decimal separator.
I can't set up import specs since the order of the fields in the csv often changes and it would be a desaster if the data goes into the wrong field.
Also the csv files will have to written to a temporary table first. Don't hate me for it, but since I have to process data and set some information fields for later data processing this is by far the easiest, fastest and safest way to achieve it.
Here is the problem itself:
When using TransferText it will import, but of course interpret the comma as delimiter. Not good ...
When replacing comma by full stop and semicolon by comma it works. But it will ignore full stops, so 1.2 becomes 12, 1.333 becomes 1333. The field will be of type double.
I've tests numerous things. Besides TransferText I've tried:
DoCmd.RunSQL ("INSERT INTO Tabelle1 SELECT cdbl(a1) as aa FROM[TEXT;FMT=Delimited;HDR=YES;CharacterSet=437;DATABASE=C:\SPOT].[test.csv]")
But nothing seems to work, even when I create a new table with field type DOUBLE before using TransferText ... decimals are still ignored.
So, I would be happy if you could tell me either how to use TransferText with or without replacing semicolon and comma in a first step or how to use the INSERT INTO stuff.
Thank you very much!
Ok, I think I got it!
The problem where the regional settings and that my Access uses comma as decimal separator. I was also not able to create a Import Spec via manual import, since it needs to have defined which fields will have to be imported.
What I did now was this:
Open the table MSysIMEXSpecsthat contains the import specs via query:
select * from MSysIMEXSpecs
Then add a new row and set SpecName = "Whatever", DecimalPoint= "," and 'FieldSeparator` = ";" and whatever other settings have to be made.
Since there is this workaround, isn't there a way to do this easier?

Valid CSV filed import fails with Data between close double quote (") and field separator: field starts with

I am trying to import a CSV file into BQ from GS.
The cmd I use is:
$ bq load --field_delimiter=^ --quote='"' --allow_quoted_newlines
--allow_jagged_rows --ignore_unknown_values wr_dev.drupal_user_profile gs://fls_csv_files/user_profileA.csv
uid:string,first_name:string,last_name:string,category_id:string,logo_type:string,country_id:string,phone:string,phone_2:string,address:string,address_2:string,city:string,state:string,zip:string,company_name:string,created:string,updated:string,subscription:string
the reported error is
File: 0 / Line:1409 / Field:14, Data between close double quote (")
and field separator: field starts with: <Moreno L>
sample data is:
$ sed -n '1409,1409p' user_profileA.csv
$ 1893^"Moreno"^"Jackson"^17^0^1^"517-977-1133"^"517-303-3717"^""^""^""^""^""^"Moreno L Jackson \"THE MOTIVATOR!\" "^0^1282240785^1
which was generated from MySQL with:
SELECT * INTO OUTFILE '/opt/mysql_exports/user_profileA.csv'
FIELDS TERMINATED BY '^'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM p;
Why I get the error message in BQ? How to properly export from MySQL CSV files that have newlines (CR and LF mixed, as it was user input from Windows or Mac)
Couple of job IDs:
Job ID: aerobic-forge-504:bqjob_r75d28c332a179207_0000014710c6969d_1
Job ID: aerobic-forge-504:bqjob_r732cb544f96e3d8d_0000014710f8ffe1_1
Update
Apparently it's more to this. I used 5.5.34-MariaDB-wsrep-log INTO OUTFILE, and either is a bug or something wrong, but I get invalid CSV exports. I had to use other tool to export proper CSV. (tool: SQLYog)
it has problems with double quotes, for example Field 14 here has error:
3819^Ron ^Wolbert^6^0^1^6123103169^^^^^^^""Lil"" Ron's^0^1282689026^1
UPDATE 2019:
Try this as an alternative:
Load the MySQL backup files into a Cloud SQL instance.
Read the data in BigQuery straight out of MySQL.
Longer how-to:
https://medium.com/google-cloud/loading-mysql-backup-files-into-bigquery-straight-from-cloud-sql-d40a98281229
The proper way to encode a double quote in CSV is to put another double quote in front of it.
So instead of:
"Moreno L Jackson \"THE MOTIVATOR!\"...
Have:
"Moreno L Jackson ""THE MOTIVATOR!""...