I was wondering if there is any way to treat delimiters inside quotes as merely characters and not delimiters - google-bigquery

I have a massive amount of files that are all made using the same schema. They are put into a format where they are space delimited. A sample file row looks like this:
1 2 abc def "g h" 3
And when I try to use the schema INT, INT, STRING, STRING, STRING, INT, it fails for me because of the space inside the quotation marks.
I know this is where the error is because if I make a sample tab separated instead of space separated, no such error occurs, but that is not feasible for me to do with all of my data. I was wondering if there is any way to be able to indicate in a file upload that delimiters in quotes should not be treated as delimiters but rather as characters? (Rather that all quoted text should be treated as one string.)
I know this feature exists for new line characters, and so I was wondering about delimiters.
Thank you!

I figured it out. The error was there was an extra delimiter character at the end of the file. Now I just need to trim each line of the file before uploading.

Related

How to escape double quotes within a data when it is already enclosed by double quotes

I have CSV data separated by comma like below which has to be imported into snowflake table using copy command .
"1","2","3","2"In stick"
Since I am already passing the parameter OPTIONALLY_ENCLOSED_BY = '"' to copy command I couldn't escape the " (double quotes) within the data ("2"In stick") .
The imported data that I want to see in the table is like below
1,2,3,2"In stick
Can someone please help here ? Thanks !
If you are on Windows, I have a funny solution for that. Open this CSV file in MS Excel. Excel consumes correct double quotes to show data in the cellular format and leaves the extra in the middle of a cell (if each cell is separated properly by commas). Then choose 'replace' and replace double quotes with something else (like two single quotes or replace by nothing to remove them). Then save it again as a CSV. I assume other spreadsheet programs should do the same.
If you have an un-escaped quote inside a field which is surrounded by quotes that isn't really valid CSV. For example, here is an excerpt from the RFC4180 spec
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with another double quote.
For example:
"aaa","b""bb","ccc"
I think that whatever is generating the CSV file is doing it incorrectly and needs to be fixed before you will be able to load it into Snowflake. I don't think any file_format option will be able to solve this for you since it's not valid CSV.
The CSV row should either look like this:
"1","2","3","2""In stick"
or this:
"1","2","3","2\"In stick"
I had this same problem, and while writing up the question, I found an answer:
Import RFC4180 files (CSV spec) into snowflake? (Unable to create file format that matches CSV RFC spec)
Essentially, set:
Name
Value
Column Separator
Comma
Row Separator
New Line
Header lines to skip
{you have to decide what to put here}
Field optionally enclosed by
Double Quote
Escape Character
None
Escape Unenclosed Field
None
Here is my ALTER statement:
ALTER FILE FORMAT "DB_NAME"."SCHEMA_NAME"."CSV_SPEC3" SET COMPRESSION = 'NONE' FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '\042' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE' ESCAPE_UNENCLOSED_FIELD = 'NONE' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
As I mention in the answer, I don't know why the above works, but it is working for me. Go figure.

Escape all commas in line except first and last

I have a CSV file which I'm trying to import to a SQL Server table. The file contains lines of 3 columns each, separated by a comma. The only problem is that some of the data in the second column contains an arbitrary number of commas. For example:
1281,I enjoy hunting, fishing, and boating,smith317
I would like to escape all occurrences of commas in each line except the first and the last, such that the result of this line would be:
1281,I enjoy hunting\, fishing\, and boating,smith317
I know I will need some type of regular expression to accomplish this task, but my knowledge of regular expressions is very limited. Currently, I'm trying to use Notepad++ find/replace with regex, but I am open to other ideas.
Any help would be greatly appreciated :-)
Okay, could be a manual stuff. Do this:
Normal find all the , and replace it with \,. Escape everything.
Regex find ^(.*)(\\,) and replace it with $1,.
Regex find (\\,)(.*)$ and replace it with ,$2.
Worked for me in Sublime Text 2.

Access VBA, importing csv file via TransferText with commata as decimal separator and semicolon as delimiter

I'm having some problems importing double numbers from csv files. The files have a semicolon delimiter and comma as decimal separator.
I can't set up import specs since the order of the fields in the csv often changes and it would be a desaster if the data goes into the wrong field.
Also the csv files will have to written to a temporary table first. Don't hate me for it, but since I have to process data and set some information fields for later data processing this is by far the easiest, fastest and safest way to achieve it.
Here is the problem itself:
When using TransferText it will import, but of course interpret the comma as delimiter. Not good ...
When replacing comma by full stop and semicolon by comma it works. But it will ignore full stops, so 1.2 becomes 12, 1.333 becomes 1333. The field will be of type double.
I've tests numerous things. Besides TransferText I've tried:
DoCmd.RunSQL ("INSERT INTO Tabelle1 SELECT cdbl(a1) as aa FROM[TEXT;FMT=Delimited;HDR=YES;CharacterSet=437;DATABASE=C:\SPOT].[test.csv]")
But nothing seems to work, even when I create a new table with field type DOUBLE before using TransferText ... decimals are still ignored.
So, I would be happy if you could tell me either how to use TransferText with or without replacing semicolon and comma in a first step or how to use the INSERT INTO stuff.
Thank you very much!
Ok, I think I got it!
The problem where the regional settings and that my Access uses comma as decimal separator. I was also not able to create a Import Spec via manual import, since it needs to have defined which fields will have to be imported.
What I did now was this:
Open the table MSysIMEXSpecsthat contains the import specs via query:
select * from MSysIMEXSpecs
Then add a new row and set SpecName = "Whatever", DecimalPoint= "," and 'FieldSeparator` = ";" and whatever other settings have to be made.
Since there is this workaround, isn't there a way to do this easier?

How can I write special character in VB code

I have a Sql statament using special character (ex: ('), (/), (&)) and I don't know how to write them in my VB.NET code. Please help me. Thanks.
Find out the Unicode code point for the character (from http://www.unicode.org) and then use ChrW to convert from the code point to the character. (To put this in another string, use concatenation. I'm somewhat surprised that VB doesn't have an escape sequence, but there we go.)
For example, for the Euro sign (U+20AC) you'd write:
Dim euro as Char = ChrW(&H20AC)
The advantage of this over putting the character directly into source code is that your source code stays "just pure ASCII" - which means you won't have any strange issues with any other program trying to read it, diff it, etc. The disadvantage is that it's harder to see the symbol in the code, of course.
The most common way seems to be to append a character of the form Chr(34)... 34 represents a double quote character. The character codes can be found from the windows program "charmap"... just windows/Run... and type charmap
If you are passing strings to be processed as SQL statement try doubling the characters for example.
"SELECT * FROM MyRecords WHERE MyRecords.MyKeyField = ""With a "" Quote"" "
The '' double works with the other special characters as well.
The ' character can be doubled up to allow it into a string e.g
lSQLSTatement = "Select * from temp where name = 'fred''s'"
Will search for all records where name = fred's
Three points:
1) The example characters you've given are not special characters. They're directly available on your keyboard. Just press the corresponding key.
2) To type characters that don't have a corresponding key on the keyboard, use this:
Alt + (the ASCII code number of the special character)
For example, to type ¿, press Alt and key in 168, which is the ASCII code for that special character.
You can use this method to type a special character in practically any program not just a VB.Net text editor.
3) What you probably looking for is what is called 'escaping' characters in a string. In your SQL query string, just place a \ before each of those characters. That should do.
Chr() is probably the most popular.
ChrW() can be used if you want to generate unicode characters
The ControlChars class contains some special and 'invisible' characters, plus the quote - for example, ControlChars.Quote

CSV Carriage Return Character

I have a CSV output on one of my applications. This produces a file from of web form data.
In some cases I am getting a carriage return character in my notes field. This causes an error when importing the file. I would like to remove this character.
The issue appears to be happening when users paste information into the form from word documents or holding down the shift key and pressing enter.
The field is ntext and populated in a multi line text box control.
I have been trying to remove this with a replace function but some carriage return characters seem to be getting through.
SQL
REPLACE(Fieldname), CHAR(13) + CHAR(10), ' ') AS new_Fieldname
It may be best to replace the characters separately, as they do not always occur together or in that order:
REPLACE(REPLACE(Fieldname, CHAR(13),' '), CHAR(10), ' ') AS new_Fieldname
Note that you may have a carriage return + line feed, or just a carriage return (depending on the source platform, the source of the data etc.). So you will probably need to handle both cases.
You can read CSVs with carriage return in them. The carriage return should be in a string represented field (i.e. surrounded by quotes). This allows you to read lines and incldue them in your field. If you are reading your CSV one line at a time, you need to maintain state between lines and append the data as necessary.
In .Net, the easiest way to read a CSV is using the Microsoft.VisualBasic.FileIO.textFileParser object (yes, you can use this in C# if you add a reference). This reads even the nastiest CSVs I've thrown at it with ease.
In Word, there are different kinds of new-line characters. Maybe you should also search/replace the other ones.
I'm not sure which are all the different possibilities, at least the paragraph mark is one that I know of.