Delimiter after a quoted field, how to escape quote - sql

I have that kind of file
info1;info2;info3";info4;info5
And after parsing I have that error
Error: [42636] ETL-2106: Error while parsing row=0 (starting from 0) [CSV Parser found at byte 5 (starting with 0 at the beginning of the row) of 5 a field delimiter after an quoted field (with an additional whitespace) in file '~/path'. Please check for correct enclosed fields, valid field separators and e.g. unescaped field delimiters that are contained in the data (these have to be escaped)]
I'm sure that the reason is here info3"; but how can I solve this problem I have no idea
Also I can't rid of quotes, because it should be in report
The main part of python code is
# Transform data to valid CSV format: remove BOM, remove '=' sign, remove repeating quotes in Size column
decoded_csv = r.content.decode('utf-8').replace(u'\ufeff', '').replace('=', '')
print(decoded_csv)
cr = csv.reader(decoded_csv.splitlines(), delimiter=';')
lst = list(cr)[1:]
f = csv.writer(open(base_folder + 'txt/' + shop, "w+"), delimiter=';')
for row in lst:
f.writerow(row[:-2])
After this code I get that kind of file
info1;info2;"info3""";info4;info5
And it is not what I need
But when I change code a little by adding "quoting=csv.QUOTE_NONE, quotechar='')"
# Transform data to valid CSV format: remove BOM, remove '=' sign, remove repeating quotes in Size column
decoded_csv = r.content.decode('utf-8').replace(u'\ufeff', '').replace('=', '')
print(decoded_csv)
cr = csv.reader(decoded_csv.splitlines(), delimiter=';')
lst = list(cr)[1:]
f = csv.writer(open(base_folder + 'txt/' + shop, "w+"), delimiter=';' quoting=csv.QUOTE_NONE, quotechar='')
for row in lst:
f.writerow(row[:-2])
I get what I need
info1;info2;info3";info4;info5
It is a 2nd step (exasol) and code returned the error
MERGE INTO hst AS dst
USING (
SELECT DISTINCT
ar,
ar_na,
FROM (
IMPORT INTO
(
ar VARCHAR(100) UTF8 COMMENT IS 'ar',
ar_na VARCHAR(100) UTF8 COMMENT IS 'ar na',
)
FROM CSV /*SS:R*/
AT '&1'
USER '&2'
IDENTIFIED BY '&3'
FILE '~/path'
SKIP = 0
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ';'
TRIM
)
GROUP BY
ar,
ar_na,
) src ON src.ar = dst.ar
WHEN MATCHED THEN UPDATE SET
dst.ar_na = src.ar_na,
WHEN NOT MATCHED THEN
INSERT (
ar
ar_na,
)
VALUES (
src.ar,
src.ar_na,
);
If file looks like info1;info2;info3;info4;info5 everything works fine, all scripts work

By default, Exaosl treats double quotes (") as column delimiter. This enables you to specify values that contain the column separator (in your case that's the semicolon). See the entry "Special characters" in the documentation.
You have two options here:
Disable the column delimiter by passing COLUMN DELIMITER = '' to the import statement.
Duplicate all double quotes in the csv file. Exasol ignores the column delimiter if it occurs twice consecutively.

Related

Query columns containing special characters or starting with digits with SQL statement in DolphinDB

Server version: 2.00.8 2022.09.28
I obtain a table re with pivot by in DolphinDB and the generated column names contain special characters.
date = take(2021.08.01 2021.08.02 2021.08.03, 12)
sym = take(["IBM N", "_MSFTN", "3_GOOGS", ""], 12).sort()
value = 1..12
t=table(date, sym, value)
re = select value from t pivot by date, sym
When I query table re with a select statement,
select 3_GOOGS from re
An error message “Can't recognize token 3_GOOGS“ is raised. How can I fix the query?
When column names containing special characters or starting with digits are used in SQL statements, they should be enclosed in double quotes and use an underscore as an identifier before it in DolphinDB. For example: _”IBM.N”, _”000001.SH”. So your query can be modified as:
select _"3_GOOGS" from re

How to read TSV file without text delimiters where text contains single and double quotes

I have an input text file where fields are tab searated. Some fields contains text with single quotes (') and some fields contains text with double quotes ("). Soem fields contains both single and double quotes. Here is an example:
Theme from Bram Stoker's Dracula (From Bram Stoker's Dracula"") Soundtrack & Theme Orchestra
Is there any way to tell OPENROWSET to not try to parse the fields?
I have found that I can set the FIELDQUOTE to either a single quote or a double quote but not to both (using FIELDQUOTE = '''"' gives error Multi-byte field quote is not supported)
Here's an example of a query I try to use:
SELECT TOP 10 *
FROM OPENROWSET
(
BULK 'files/*.txt',
DATA_SOURCE = 'files',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIELDTERMINATOR = '\t',
FIELDQUOTE = ''''
)
AS r
and I can also use FIELDQUOTE = '"' but not the two at the same time...
Any suggestions on how to fix this? (without changing the source files)

Remove carriage return / Line feeds in string and output with quotes

I have the following SQL that outputs a string with double quotes around it:
SELECT '"'+A.DESCR+'"'
FROM MyTable
Results:
"Stackable Storage Basket"
I have found that I have carriage returns / line feeds in some of the data and want to remove any of these while retaining the quotes. I have modified the code as below, however this is adding an extra quote onto the end of the string, and not retaining the beginning quote either.
SELECT REPLACE(REPLACE(DESCR + '"'+DESCR+'"' + '''', char(10),'"'), char(13), '"')
FROM MyTable
Results:
Stackable Storage Basket"'
How do I need to modify this to get the expected behavior?

How to break a string into columns, where the character delimiter and comma, but this character appears as content of the fields

I am doing a data load, where each line has the characters "at the beginning and end of the fields, and comma as delimiter, as below:
"sU92", "eRouter1.0"
"sU92" "," eRouter1.0 "
"sU9.2", "eRouter1.0"
Note that in the second line there are 2 double quotes (2 ") and that in the third row there is a comma between numbers 9 and 2 (9,2).
Whenever I try to create the table with the delimiter being comma and with quotechar = '\ "', the records break.
Create table without un-quoting enabled, use LasySimpleSerDe(default)
create table mytable(
col1 string,
col2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
Then un-quote strings and remove extra spaces in the select using for example regexp_replace:
trim(regexp_replace(str, '\\"',''))

Escaping an apostrophe in Sqldf with R

I have a dataframe mergeTest which have a column Name with apostrophes on few values,
So I'm looping through it to divide values which denominator are contained in the table nbrToDivide
test1 <- sqldf(c(paste('UPDATE mergeTest SET Value = Value/',nbrToDivide[i],
' WHERE `Year` =',nbrToDivide$`Year`[i],
' AND UPPER(Name) = \'',nbrToDivide$Name[i],sep=""),
'SELECT * from mergeTestt'))
The problem is when the value of UPPER(Name) contains an apostrophe in it, it will interprete it and return an error.
I tried to use gsub with grepl but it adds two backslashes to my names so I dunno if there is a way to deal with it or should I just suppress the apostrophe in my two datraframes ?
Double the single quote. Here is an example:
> sqldf("select 'O''Brian' Name")
Name
1 O'Brian
Replacing one single quote with two single quotes will fix this as in :
"SELECT * FROM TableName WHERE FieldName = 'QueryString''s Value'"