How to escape double quotes (") while reading data in Pentaho - pentaho

I have a csv source file with comma (,) delimiter and values are enclosed with double quotes (") and using Text file Input to read the data in PDI 8.3. I am using , in Separator and " in Enclosure options in Content tab.
However, there is a field that contains quotes within the double quotes in the values itself, see the example below:
"abc","cde",
"abc" - 1st col
"cde" - 2nd col
"ef"A"gh" - 3rd col
"ijk" - 4th col and so on..
And issue in the 3rd col, in output it's reading "ef" as 3rd col and remaining values is passing to the next subsequent col. Hope I am able to clarify the issue here, only Expecting to escape the " within the values.
I have tried " in the Escape option but it's not working. Can someone please suggest how to handle this.
Thanks!

You can just leave the Enclosure attribute empty. That way the string will only be divided into columns by the Delimiter.
See CSV File Input Doc and Text File Input Doc

Related

Using Find and Replace function together in VBA

I have a problem with an .CSV file.
Some of the values are prices with a comma to separate units and decimals, all the other fields are separated with a comma too.
So, as expected, it is impossible to convert my csv file like this. (If there is a way, please tell me)
Therefore, I am trying to write a vba macro that will replace the comma by a dot.
More specifically, I need to replace the 9th occurence of "," to a "." IF AND ONLY IF the character next to the comma respects a specific condition. That is why I need a macro to do so.
In excel, I was using the following formula to find the position of my comma:
=Find(char(160),substitute(A2;",";char(160),9))
This formula gives me the position of the 9th comma, that's perfect. I would like to know how to code this in VBA
Thanks in advance !
Alex

Pentaho spoon + redoing field enclosures in output file

I'm new to Pentaho 8.3 CE (Spoon) and am trying add an extra column to a CSV file by concatenating 3 other text fields together. I'm using 2 options - Calculator and the inbuilt 'Concat fields' transformations.
The issue I'm facing is that some rows are enclosed by " " while others aren't... e.g.
Field A = "One thing, another thing"
Field B = Yet another thing
Field C = Final thing
Ideally, I want,
New field = "One thing, another thing Yet another thing Final thing",
I find I can't get the final " to enclose each line, so it looks like "One thing, another... Final thing
How do I get Pentaho to add that final " on? I've set to force the enclosure on.
enter image description here
First strip the double quotes with a String operations step or a Replace in String step (the latter allows regexp search and replace).
The use a Concat strings step to join them all together comma separated.
Finally, either prepend & append double quotes, or when writing out with e.g. a text file output, add the enclosure character.

Using single, double and triple quotes in SQL for text qualifiers

I am processing some CSV data from a client and one of the headers is 'booktitle'. The values of 'booktitle' are text qualifed with double quote and there are quotes in some of the titles, such as;
"How to draw the "Marvel" way"
I asked the client to escape the quotes in quotes with double quotes, and they sent me back this
'"How to draw the """Marvel""" way "'
So single, double, then triple quote. My question is will this work? I have not seen it done this way before for escaping text qualifiers.
Which way are you using to save the info?
If you are saving it directly from SQL, double quotes are not a problem as SQL use single quotes.
If you are using a program, use SqlParameters. It will wrap you everything
Oracle Setup:
CREATE TABLE BOOKS(
ID INT,
TITLE VARCHAR2(4000)
);
CSV:
ID,TITLE
1,"How to draw the ""Marvel"" Way"
2,"Test ""Someone's Data"""
(One double quote at start and end, two double quotes to escape a double quote in the middle of the string and single quotes don't need to be escaped.)
Import via Oracle SQL Developer
In the connections pane, right click the table name and select "Import Data..."
Browse and select the CSV file then the default values will be sufficient so click "Next >".
Set the Import Method to "Insert" and deselect "Send Create Script to SQL Worksheet" then click "Next >" three times.
Click "Finish" and the data will be imported.

SQL Server Procedure Output CR and LF replaced with Spaces on Copy and Paste

I have a curious problem for which I can find nothing on the net and hope someone here can provide some insight:
In a stored procedure, I am traversing a table using a cursor and collating all the values of a single text field (varchar) into a single string variable. Each row I am separating with a Carriage Return / Line Feed (characters 13 and 10).
If I print the variable to the screen, the CR and LF characters are clearly there and in the output in the message window are showing as I would expect by formatting the string into separate lines.
If I check the ASCII values in the string variable at the appropriate positions, the correct ASCII values are showing (13 and 10 respectively)
If I insert the string into a temporary table, I can also see that the CR and LF characters have been retained (by again inspecting the ASCII values in code).
BUT if I 'select' the variable so it appears in a grid and then copy the output and paste into an editor (notepad, word, SQL Query Window etc) the CR and LF have been replaced with spaces.
I don't think this has anything to do with normal Copy/Paste functionality as copying from any other environment retains all the characters. Is there something peculiar to the copy facility when copying a cell in SQL Query Analyser grid output?
While this isn't a show-stopper for me, it does mean I have to jump through some hoops I feel I shouldn't have to - but also, I'm just curious.
Cheers, J
Assuming you are using SQL Server Management Studio, use "Results to Text" (Ctrl+T). That should maintain the line breaks.
If I run this in management studio then copy and paste the resulting cell into notepad it appears correctly:
select 'line1' + char(13) + char(10) + 'line2'
alternatively you could try this:
select 'line1
line2'
in case it's a unicode problem you could try using the unicode crlf equivalent:
select N'line1' + nchar(0x000D) + N'line2'

VBA: How to convert word ASCII A1 to web quote when read from doc

e.g.
In doc is:
I am from "Moon"
Read via VBA, " will be ASCII A1, I use Replace change Chr(161) to web-displayed quote, but seems useless.
VBA Code:
Value = .Rows(i).Cells(2).Range.Text
Value = Replace(Value, Chr(161), """)
Anyone know how?
Chr(161) is ยก, not a quote mark. If you want to match a Windows-1262 curly quote, you want Chr(147) and Chr(148).