How to solve parse issues when a CSV has a field content escaped with double quotes - mule

The input is received from a Salesforce Bulk API query.
INPUT
"RecordTypeId","Name","Description"
"AAA","Talent 2022 - Skills Renewal - ABC","DF - 14/03 - Monty affirmed that the ""mastercard approach"" would best fit in this situation. I will connect (abc, def, ghi) and the confirm booking tomorrow (15/03)"
SCRIPT:
%dw 2.0
output application/csv separator=",", ignoreEmptyLine=false, quoteValues=true, quoteHeader=true, lineSeparator="\r\n"
---
payload
OUTPUT:
"RecordTypeId","Name","Description"
"AAA","Talent 2022 - Skills Renewal - ABC","DF - 14/03 - Monty affirmed that the , def, ghi) and the confirm booking tomorrow (15/03)"
Expected OUTPUT:
The column description has " and , in it and therefore some description content is getting lost and some is getting shifted to different columns. I need entire description value in one column

The escape character has to be set to a double quote (") for DataWeave to recognize that "" is an escaped quote and not the end of a string. You can not use replace or any string operation because they are executed after the input is parsed.
You need to configure the reader properties in the source of that payload. For example in the SFTP or HTTP listeners, or whatever connector or operation reads the CSV. There you can add the outputMimeType attribute and set the input type and its properties. Note that because the flow is in an XML file you need to be mindful of XML escaping also to use double quotes, and also need to escape the double quotes as DataWeave expects it, with a backslash (\).
Example:
outputMimeType="application/csv; escape="\"""

It looks like your payload is using " as escape character. By default DataWeave expects \ as the escape character for CSV, so you will need to specify the escape character explicitly while reading your input, after which DataWeave should be able to read the completely description as a single value.
For example the below DataWeave shows how you can use the input derivative to read your csv correctly. I do not know what exactly is your expected output so I am just giving an example that writes the value of description as text
%dw 2.0
input payload application/csv escape='"'
output text
---
payload[0].Description
The output of this will is
DF - 14/03 - Monty affirmed that the "mastercard approach" would best fit in this situation. I will connect (abc, def, ghi) and the confirm booking tomorrow (15/03)

Related

How to take in a sample DAT file and break each record by its header in Mule?

I am trying to taken in a sample DAT file into a Mule application.
SEC.DAT
D1030325 ADFSA 12321.00 XXXX
A1354610 AEWTF 94332.00 AAAA
V1030325 ADFSA 12321.00 XXXX
I am fairly new to the platform and having been somewhat lost on how to structure the flow in that context, but my goal is to break each record by its beginning value.
Example:
Where examples D, A, & V are the conditions.
Expected outputs:
SEC1.DAT
D1030325 ADFSA 12321.00 XXXX
SEC2.DAT
A1354610 AEWTF 94332.00 AAAA
SEC3.DAT
V1039325 AOFSA 12321.00 XXYF
This is about a Mule application to process a file. The rest of the platform has no impact for this particular question.
Assuming it is a fixed length file but not using DataWeave Fixed Length feature you can treat the input file as a single string (ie format text/xml).
First you can separate into records using a DataWeave transform to split by the end of line:
%dw 2.0
output application/java
---
payload splitBy "\n"
Following that have a <foreach> to loop over each record. Inside the body of the <foreach> have a <choice> router to select to which output file write based on the first character of the record. Example of a condition: $[0] == "A". Then inside that branch of the <choice> just append to a list of records or write directly to the file (with append).
Note that writing directly will not work as expected if you add any kind of concurrency because overlapping writes may corrupt the output.

" replaced by ""

redshift unload command is replacing " by "".
example :
UNLOAD($$ select '"Jane"' as name $$)
TO 's3://s3-bucket/test_'
iam_role arn:aws:iam::xxxxxx:role/xxxxxx'
HEADER
CSV
DELIMITER ','
ALLOWOVERWRITE
The output looks like : ""Jane""
If I run the same command with select 'Jane' as name , the output shows without quote at all like Jane. But I need the output to be "Jane"
You are asking for the unloaded file to be in CSV format and CSV format says that if you want a double quote in your data you need to escape it with another double quote. See https://datatracker.ietf.org/doc/html/rfc4180
So Redshift is doing exactly as you requested. Now if you just want a comma delimited file then you don't want to use "CSV" as this will add all the necessary characters to make the file fully compliant with the CSV specification.
This choice will come down to what tool or tools are reading the file and if they expect an rfc compliant CSV or just a simple file where fields are separated by commas.
This is a gripe of mine - tools that say they read CSV but don't follow the spec. If you say CSV then follow the format. Or call what you read something different, like CDV - comma delimited values.

Azure Data Factory: Reading doubles with a comma as decimal separator instead of a dot

I'm currently trying to read from a CSV files ( separated with semicolon ';') with decimal numbers formatted with a comma(,) as a decimal separator instead of a dot (.).
i.e: the number 12356.12 is stored as 12356,12.
In the source's projection, what would be the correct format to read the value correctly?
The format should in Java Decimal Format
If your CSV file's columnDelimiter is a comma (','), your first concern is how to avoid your number data won't be treated as different columns. Since your number data is stored as 12356,12, so my suggests as below :
Change the columnDelimiter as | or other special characters.
2.Set escape char. Please see this description:
In addition, 12356,12 can't be identified as Decimal format in ADF automatically. And no such mechanism o turn , into .. So I think you need to transfer data as string temporary. Then convert it into Decimal in your destination with java code.
True answer is in the comments: In the copy job the culture can be defined, which influences the decimal separator. Go to "mapping" > "Type conversion settings" > "culture" and chose en-us, de-de or whatever works for you. Be aware that this will also influence other types like dates.

Extract a substring using XPath where there might be no trailing delimiter in the field

I'm trying to parse an XML file where the users (in their infinite wisdom) type a key value into a free-form field, <Description>. The values are normally typed in with returns (BR's?) between them. For instance:
<Description>
% Increase: 27%
Completion Date: 10-Aug-2015
</Description>
I need to look for an extract the date following the string "Completion Date:" Looking around here on SO I found something similar and adapted it to:
compdate = deal.SelectSingleNode("./Terms/Description[substring-before(substring-after(.,'Completion Date:'),'/')]")
The problem is that in the original question there was a trailing character that could be used to delimit the text, a /. In my case, there might be a BR of some sort, or it might be the last (as in this case) or only item on the line and thus there's no delimiter.
So... suggestions on how to extract the date? I can do it on the VB side, but I'd like to remain in the XPath world for code clarity - unless of course the resulting XPath is unreadable.
If XPath 2.0 solution is acceptable, try
./Terms/Description/tokenize(substring-after(.,'Completion Date: '), '\n')[1]
If not and date format is always DD-mon-YYYY (e.g. 01-Dec-2018), try
./Terms/Description/substring(substring-after(.,'Completion Date: '), 1, 11)

Escape character in DataWeave string

We are getting a value from a DB that contains a backslash (\). After going through DataWeave, we get 2 backslashes. Here it is how it looks:
How can we have only one backslash in the end? Can we use the replace function somehow? I tried and could not make it work.
I believe the reason why you see two backslashes is because backslash is a reserved character (see JSON spec) therefore DataWeave is automatically escaping the backslash, which is necessary so not to have your DB value corrupted.
In my opinion the double backslash is not a problem. You should get the right content upon consuming the JSON object.
You could try to put in an escape character of your choice
Eg: %output application/csv escape = " "
This should ideally replace "/" with " ".
Hope this helps.