Encoding type for polish characters - mule

I have json string which is having characters which exist in polish. example below
"Reno Truck Lachowski & Łuczak - NAPRAWY CHŁODNI,IZOTERM,ZABUDÓW POJAZDÓW CIĘŻAROWYCH"
or
"RENO TRUCK Lachowski & Łuczak s.c. SERWIS POJAZDÓW UZYTKOWYCH"
I need to update this value in database .
Can anyone let me know what is encoding type i need to set..
I tried with UTF-8 and ISO-8859-1, but both doesn't work .
Observed that when i set ISO-8859-1 the value seems to different as below
"RENO TRUCK Lachowski & ?uczak s.c. SERWIS POJAZDÓW UZYTKOWYCH"
The character Ł doesn't get updated.
Can anyone help please..

JSON values are expected to be encoded in UTF-8. The string you quoted seems to be encoded in something else. You are expected to know the encoding of the data. Note that it may not be a valid JSON if it is not UTF-8. Once you know it you could use DataWeave to convert the encoding to what your database is expect. Based on the JDBC URL it seems that the database connection is expecting ISO-8859-1.

Related

Fix corrupted characters (e.g. umlauts) in a string using ORACLE SQL and convert it to proper UTF-8

I currently have an ORACLE table which, in one column, contains obviously corrupted strings like the following: Pachtvertrag über eine Gaststätte.
At some point, there probably have been used a wrong encoding for the string. Is there a way of fixing the "wrong" encoding in a string like this even when the string is already corrupted like this?
I tried the following:
SELECT CONVERT('Pachtvertrag über eine Gaststätte', 'UTF8', 'US7ASCII') FROM DUAL;
But this leads to: Pachtvertrag ����ber eine Gastst����tte, while it should actually be Pachtvertrag über eine Gaststätte.
Another idea of mine was to somehow convert the string to bytes first (e.g. by using TO_SINGLE_BYTE) but this didn't lead to the desired result, either.
Character set US7ASCII does not support special characters and you must flip the character sets.
So, correction statement must be like
CONVERT('Pachtvertrag über eine Gaststätte', 'WE8ISO8859P1', 'AL32UTF8')
Just a note, ISO-8859-1, ISO-8859-15 and Windows-CP1252 (WE8MSWIN1252) are very similar. See ISO 8859-15 vs. -1 vs. Windows-1252 vs. Unicode and pick the correct encoding.

how to render a dicom file's header unreadable

Kind of a strange question, but I'm doing some testing to handle errors when a dicom file's tags can't be read.
Unfortunately I don't have a damaged dicom available.
Specifically, can anyone advise how to apply some sort of incorrectly encoded text tag or some invalid numeric data tag onto the file, such that it can't be read by python's pydicom package?
you could have a look at the dcmodify tool from the DCMTK. It can be used to insert, modify and delete attributes. I doubt that it is possible to specify invalid attribute values through the command line, but you could surely modify the source code to accomplish that (except you can definitely write attribute values that exceed the maximum length according to the Value Representation).
My approach would be to create a buffer of characters and write binary data to it. Then pass it to the method that writes the value to the attribute.
Examples:
write unicode (UTF-8) sequences which are not a valid unicode character
write ascii characters which are not covered by the characterset specified by (0008,0005) - not sure whether pydicom would run into problems but it would be wrong from the DICOM perspective
write non-numeric characters to attributes with Value Representation "Decimal String" or "Integer String".
formats other than YYYYMMDD for VR "Date"
formats other than HHMMSS.FFFFFF for VR "Time"
other characters than ['0'-'9'], '.' for VR "Unique Identifier"
[edit]: DCMTK, dcmodify: http://dicom.offis.de/dcmtk.php.en

Illegal xml parsing import to sql mac roman

I have a xml that says it's encoding is UTF-8. When I use openxml to import data into sql, I always get "XML parsing: line xxxxxx, character xx, illegal xml character.
Right now I can go to each line and replace it with the a legal character and it goes well. Sometimes there maybe be more than 5 mac roman characters and it becomes tedious to replace. I am currently using notepad ++ and there is probably a way for this.
Can anyone suggest if anything can be done in sql level or does it have to checked before ran in sql?
So far, most of the characters found are, x95, x92, x96, xbc, xbd, xbo.
Thanks.
In your question, you did not specify whether illegal characters you had to remove were Unicode or not. Or whether the file was really expected to contain UTF-8 characters. Unlike for the ASCII, for UTF-8 some byte combinations are illegal, so if you declare the text file to be encoded in UTF-8, you might not be able to read it successfully till end (such a thing could never happen with ASCII).
So it is possible that by removal of <?xml version="1.0" encoding="UTF-8"?> you just declared some non-unicode encoding of your file (instead of previously declared UTF-8), so reading the data passed. You did not have many foreign characters like ľťčý in the file, did you? Normally, it is a must that you check what happened to those after the import. It might happen that your import passes without error, but city name Čadca becomes äadca and somebody will thank your company for rendering his address unreadable.

NSJSONSerialization parsng special characters

I am parsing some data using NSJSONSerialization. After parsing, I get strings like &auml ; and %#339; which i think has something to do with encoding. But NSJSONSerialzation doesn't ask for what encoding it requires, it i guess detects it by itself. So my question is, how can I get proper strings instead of these weird &auml ; and %#339;.
NSJSONSerialization assumes the encoding is one of the Unicode encodings. Make sure the data you pass to it is in UTF-8 (or UTF-16). ä is C3 A4 in UTF-8 or E4 in UTF-16.
Note that the default encoding for HTTP if none is specified is ISO-8859-1, so it may be that you are passing ISO-8859-1 data instead of UTF-8.
In options try NSJSONReadingMutableLeaves, it must return NSMutableString.. For more take a look at the docs.

unicode escapes in objective-c

I have a string "Artîsté". I use json_encode from PHP on it and I get "Art\u00eest\u00e9".
How do I convert that to an NSString? I have tried many things and none of them work I always end up getting Artîsté
For Example:
NSString stringWithUTF8String:"Art\u00c3\u00aest\u00c3\u00a9"];//Artîsté
#"Art\u00c3\u00aest\u00c3\u00a9"; //Artîsté
You can use CFStringCreateFromExternalRepresentation with the kCFStringEncodingNonLossyASCII encoding to parse the \uXXXX escape sequences. Check out my answer here:
Converting escaped UTF8 characters back to their original form
The problem is your input string:
"Art\u00c3\u00aest\u00c3\u00a9"
does in fact literally mean "Artîsté". \u00c3 is 'Ã', \u00ae is '®', and \u00a9 is '©'.
Whatever is producing your input string is receiving UTF-8 input but expecting something else (e.g., cp1252, ISO-8859-1, or ISO-8859-15)