Undesirable Double Quotes When Exporting a Tab Delimited File - sql

When I export my SQL results into a tab delimited file, double quotes surround a handful of my records. I don't know why this is. I am assuming it is because some of the record names have a special character breaking something?
A simple solution would be to go in and do a find and replace on double quotes, but that seems a bit improper.

I encountered this with fields that contained commas, even when loading a tab delimited file. A field qualifier and "Always Qualify Fields" prevented the fields with commas being quoted in the DB.

Related

How to discard (trim) double inverted quotes in CSV file on data prep?

I'm dealing with a csv file that contains double inverted quotes (since the data has commas in it). But Google sheets do not show me the double-quotes and hence i'm able to split the column by a delimiter (space in this case) properly, to retrieve its first word. The initial column values can be seen in the picture below.
However, in Data Prep when i upload the csv file, the column's double quotes are considered as a value and it causes extra work (if at all possible) in handling the resulting split data.
Is there a way in Data Prep where we can have the data displayed as in Google Sheets ?
You can "ignore" these double-quotes, and when running a job, untick the box of "include quotes", then these original quotes shall remain.
Not perfect, but could be a working workaround.

SQL Developer interprets strings incorrectly from google sheets

Version 11.2g
I have a lot of strings to copy over from google docs that need to be inserted into rows in SQL Developer.
How come if I copy a string from google docs into Oracle SQL Developer, then SQL Developer will change the single quotation marks into 2 different characters with different Unicode values? Oracle will then proceed to say that the ASCII symbols are out of range. Seems like a quirk.
I need it to be like the second row in the image (without manually changing every single quotation mark), but Oracle interprets it as the first row from google docs.
So you don't have to manually change every quote, you can do a find-replace on:
opening single quote ‘ (Unicode 145);
closing single quote ’ (Unicode 146);
left single quotation mark ‘ (Unicode 8216);
right single quotation mark ’ (Unicode 8217); or
single high-reversed quotation mark ‛ (Unicode 8219).
Changing all to a straight single quote ' (ASCII 36).
You could even do a replacement using the regular expression [‘’‘’‛].
And then verify that the document you are copying from does actually use straight single quotes or that the value being copied is correct (if unwanted). Or disable smart quotes in Google Documents (thanks to #Alex Poole's comment) in the Google document.

SSIS - Text Qualifier Purpose

What is the purpose of adding a text qualifier to a SSIS flat text file output?
I'm pulling data out of a SQL database that has quotes/commas/pipes/and many other common delimiters in the data.
Extreme example of a data point in a column:
"Johnson"|Smith,Jones
I set up the export as a comma delimited, with a double quote " text qualifier. I assumed it would export the data like so, and it did:
,""Johnson"|Smith,Jones",
Now i'm testing re-importing the data back in, as a comma delimited, with a double quote text qualifier. I got errors saying SSIS couldn't find the delimiter. I thought it would recognize the combination comma, and double quote, essentially as a more complex delimiter.
If adding a text delimiter to the output doesn't help with the problem of having the characters in the actual data, what does it do?
Assuming the person receiving the data might use a tool like Excel to process the data, which doesn't seem to be able to handle a complex multi character delimiter like |", is the best way to handle this by removing the most common delimiter from my data, and using that as the delimiter? Probably pipe in my case, instead of comma.
Text qualifier is used in the event that delimiters are contained within the row cell. Typically, the text qualifier is a double quote. In the event that the cell contains a delimiter and a text qualifier is not used, then the data that occurs after the delimiter will spill into the next column. From there, the data row can potentially blow up and none of the columns will line up afterwards. It can be a real mess.
Additionally, you will not see the text qualifier in applications, like Excel. However, if you open the file in Notepad++, then you will see the text qualifiers. There can be a lot of data (e.g., text qualifiers, new line characters, column delimiters, etc.) that is contained within a file but is not displayed in certain applications. This data typically is used to define the structure of the data as opposed to being the actual data.
For your problem, you will need to remove the double quotes from the source data or use a different text qualifier. You could use a single quote, but what if you have data like Jones's? The idea here is that the text qualifier should be unique in defining the data structure, which, as I understand it, means that you cannot have a text qualifier that is actually a part of the data (see note from Microsoft below - emphasis mine).
Per Microsoft:
Specify a text qualifier character. Each column can be configured to
recognize a text qualifier.
The use of a qualifier character to embed a qualifier character into a
qualified string is supported by the Flat File Connection Manager. The
double instance of a text qualifier is interpreted as a literal,
single instance of that string. For example, if the text qualifier is
a single quote and the input data is 'abc', 'def', 'g'hi', the output
data is abc, def, g'hi. However, an instance of a qualifier embedded
in a qualified string causes the Flat File Source to fail with the
error DTS_E_PRIMEOUTPUTFAILED.
References
Flat File Connection Manager official documentation

Is it possible to change the column and record delimiter in sqoop2?

Is it possbile to change the default column delimiter (comma) to a different character in Sqoop2? I read in some mail archives that it is not supported yet.
If no, how can we specify the enclosed-by and escaped-by chars so that Sqoop properly extracts columns with values containing comma and quotes? Does this work by default or need to turn it on by setting any options?
Like I doubted, changing the delimiter is not yet supported in Sqoop2. Refer below:
http://grokbase.com/t/cloudera/cdh-user/137q954ffz/sqoop2-import-field-delimiter
By default, the column delimiter is comma
String fields should be enclosed within single quotes (this takes care of fields with comma in them)
If the field contains single quote in itself then escape it with backslash \

Comma delimited flat file source

I have a text file that is split using commas
Simple enough to do in SSIS but i have the following row in my source flat file:
Desc,Curr,Desc,ID,Quantity
05969A105 ,CU,BANCORP INC, THE DEL COMMON ,1,2126
there is a comma in my Desc column and im not sure how i can ignore that comma
AFAIK, you can't do anything in SSIS (or any other app that I have ever used) to handle this, because it is simply bad data. If you need to persist with comma delimiters then you will need to get the data provider to use text-delimiters, e.g. double-quotes, to wrap the data. SSIS can be told what is the text delimiter and will strip these chars off the data automatically.
Of course this may raise the issue of 'but the text may need to contain a double-quote!', in which case you would be better off getting the delimiter changed to something else, such as a tab or pipe.