mule3 sftp:input-endpoint encoding - activemq

I'm wondering, is it possible to force mule to "convert" the encoding of the file it is reading? Lets say the file is put on some sftp area and is later read by an:
<sftp:inbound-endpoint connector-ref= .... encoding="Cp1252">
This does not really work...
If the file on the sftp area is "UTF8 without BOM" (ANSI as UTF-8), is it possible to convert this to plain old ANSI encoding reading the file?
thanks in advance.

Mule won't do this conversion for you.
The payload of a message received by a sftp:inbound-endpoint is a java.io.InputStream so you can create a custom transformer that reads this stream, performs the encoding conversion and outputs either another input stream or a byte array.

Related

Reading .Rda file in Pandas throws an LibrdataError error

I have a data stored in .Rda format which I want to upload on Python pandas to create a data frame.I am using pyreadr library to do it. However, It's throwing up an error -
LibrdataError: Unable to convert string to the requested encoding (invalid byte sequence)
Your data is in an encoding different from utf-8. pyreadr supports only utf-8 data. Unfortunately nothing can be done to fix it at the moment. The limitation is explained in the README and also tracked in this issue

Fix Unicode Decode Error Without Specifying Encoding='UTF-8'

I am getting the following error:
'ascii' codec can't decode byte 0xf4 in position 560: ordinal not in range(128)
I find this very weird given that my .csv file doesn't have special characters. Perhaps it has special characters that specify header rows and what not, idk.
But the main problem is that I don't actually have access to the source code that reads in the file, so I cannot simply add the keyword argument encoding='UTF-8'. I need to figure out which encoding is compatible with codecs.ascii_decode(...). I DO have access to the .csv file that I'm trying to read, and I can adjust the encoding to that, but not the source file that reads it.
I have already tried exporting my .csv file into Western (ASCII) and Unicode (UTF-8) formats, but neither of those worked.
Fixed. Had nothing to do with unicode shenanigans, my script was writing a parquet file when my Cloud Formation Template was expecting a csv file. Thanks for the help.

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xf1 in position 990: invalid continuation byte

I am comparing to csv files to each other to produce the final file with fathered differences information its giving me error message. I have resaved all files to csv decoded with utf-8 and tried running - it does not work. Can someone help me.
The problem is that your file is not in UTF-8 format. Many tools will refuse to handle data that is claimed to be UTF-8, but isn’t. I’d check first if that file is actually UTF-8 or is stored in some different encoding.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 3: invalid continuation byte

I'm trying to load a csv file using pd.read_csv but I get the following unicode error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 3: invalid continuation byte
Unfortunately, CSV files have no built-in method of signalling character encoding.
read_csv defaults to guessing that the bytes in the CSV file represent text encoded in the UTF-8 encoding. This results in UnicodeDecodeError if the file is using some other encoding that results in bytes that don't happen to be a valid UTF-8 sequence. (If they by luck did also happen to be valid UTF-8, you wouldn't get the error, but you'd still get wrong input for non-ASCII characters, which would be worse really.)
It's up to you to specify what encoding is in play, which requires some knowledge (or guessing) of where it came from. For example if it came from MS Excel on a western install of Windows, it would probably be Windows code page 1252 and you could read it with:
pd.read_csv('../filename.csv', encoding='cp1252')
I got the following error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position
51: invalid continuation byte
This was because I made changes to the file and its encoding. You could also try to change the encoding of file to utf-8 using some code or nqq editor in ubuntu as it provides directory option to change encoding. If problem remains then try to undo all the changes made to the file or change the directory.
Hope this helps
Copy the code, open a new .py file and enter code and save.
I had this same issue recently. This was what I did
import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

How can I add and remove bytes on from the start of a file?

I'm trying to open an existent file save a bytes in the start of it to later read them.
How can I do that? Because the "&" operand isn't working fo this type of data.
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
Help Please.
You cannot add to or remove from the beginning of a file. It just doesn’t work. Instead, you need to read the whole file, and then write a new file with the modified data. (You can, however, replace individual bytes or chunks of bytes in a file without needing to touch the whole file.)
Secondly,
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
You’re doing something wrong. Apparently you’ve read text data from the file and are now trying to convert it to bytes. This is the wrong way of doing it. Do not read text from the file, read the bytes directly (e.g. via My.Computer.FileSystem.ReadAllBytes). Raw byte data and text (i.e. String) are two fundamentally different concepts, do not confuse them. Do not convert needlessly to and fro.