De serialize and serialize ,preserving java escaped emoticons - serialization

This is for handling facebook webhooks.
An event string arrives like this
{"object":"page","entry":[{"id":"222222222","time":1536713510549,"messaging":[{"sender":{"id":"1111111111"},"recipient":{"id":"355433484576638"},"timestamp":1536713509901,"message":{"mid":"VnOoUhb2FUTyfnkXtmKgqDCfJlgJPB_n1gj-8aC6ka4-Oo2GjMXS82vHH9ChydJrPX_5Zu3sJ6skCv8JToF1IA","seq":206765,"text":"Jeg m\u00e5 bare si at jeg elsker Obosbladet! St\u00e5 p\u00e5 videre! \ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04"}}]}]}
This is deserialized using
Dim TestObj As RealTimeEvent = JsonConvert.DeserializeObject(Of RealTimeEvent)(eventStr)
At this point , if I view the TestObj message in the debugger , I see
"Jeg mΓ₯ bare si at jeg elsker Obosbladet! StΓ₯ pΓ₯ videre! πŸ˜€πŸ˜¬πŸ˜πŸ˜‚πŸ˜ƒπŸ˜„"
Note the Swedish characters have been handled correctly, but the java escaped emoticon is not.(\ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04)
If I then try to deserialize the object
JsonConvert.SerializeObject(TestObj )
I get
{""RawEvent"":"""",""object"":""page"",""entry"":[{""id"":""355433484576638"",""time"":""1536713510549"",""changes"":null,""messaging"":[{""optin"":null,""read"":null,""postback"":null,""sender"":{""id"":""975511412531391""},""recipient"":{""id"":""355433484576638""},""timestamp"":""1536713509901"",""message"":{""mid"":""VnOoUhb2FUTyfnkXtmKgqDCfJlgJPB_n1gj-8aC6ka4-Oo2GjMXS82vHH9ChydJrPX_5Zu3sJ6skCv8JToF1IA"",""seq"":""206765"",""text"":""Jeg mΓ₯ bare si at jeg elsker Obosbladet! StΓ₯ pΓ₯ videre! πŸ˜€πŸ˜¬πŸ˜πŸ˜‚πŸ˜ƒπŸ˜„"",""attachments"":null}}]}]}
The Swedish characters are converted.. which is what I want, but I have no chance of handling the emoticon
Is there anyway I preserve everything that is not understood by the Newtonsoft De serializing process but keep the conversion of Swedish and other characters?
---Edit-- Adding explanation of what I am trying to achieve---
I need to be able to access the original definition of the emoticon.."\ud83d\ude00\ud83d\ude2c\ud83d\ude01\ud83d\ude02\ud83d\ude03\ud83d\ude04"
I am integrating to another system that can not handle emoticons at all. I have written a 'translator' which will parse the message text looking for the java escaped data. I take the whole emoticon definition (all pairs) and reduce until I find a matching definition.
Perhaps there is a way to tell the serializer to not convert any escaped values and keep the message text 'raw'? ( I have tried various JsonSerializerSettings but not found any)

Related

Sabre Web/ .NET - Special Characters In SabreCommandLLSRQ Response Not Handed Properly

I'm using VB.NET to consume Sabre Web Services, primarily using SabreCommandLLSRQ to send native Sabre commands. Sending special characters without any special encoding works fine, but when I try to manipulate any response that contain the Cross of Lorraine using the Response element of SabreCommandLLSRS all of the Cross of Lorraine chars are missing if I display my string in a MsgBox or try to manipulate it.
If I push that string into my clipboard and view it in Notepad++, the characters are there but they seem to be encoded improperly - they come through as something like "Â‑". I'm pretty new to unicode encoding so that's all a bit above my head.
I've tried using the Replace method of String Builder to change those characters to something visible no avail - anyone have a way around this issue?
Strangely, the other special characters (e.g. "Β€") seem to come through just fine.
This section in Dev Studio includes references to special character hex codes:
https://developer.sabre.com/docs/read/soap_apis/management/utility/Send_Sabre_Command
Does this help?
This is a pain in the behind due to the invisible characters.
String replace does work you just need to make sure you capture the invisible character after the Γ‚
Simply in the SabreCommandSend function before you send the string to Sabre put something like the below.
Hopefully this should copy and paste straight out including the invisible character.
if (tempCommand.Contains("‑"))
{
tempCommand = tempCommand.Replace("‑", "Γ‚");
}
I figured out how to get this to work, but its not pretty so if anyone has a better way to do it, I'm all ears.
I couldn't figure out what char to use to do the simple string Replace method, so instead I'm casting the string to a byte array, iterating through the array and replacing any strange characters I find, recasting the byte array into a raw string and doing the string replace on that:
Imports System.Text
Dim byteArray() As Byte = System.Text.Encoding.ASCII.GetBytes(sabreResponse)
For i = 0 To byteArray.Length - 1
If byteArray(i) = 63 Then 'this is a question mark char
byteArray(i) = 94 'caret that doesn't exist in native Sabre
End If
Next
MyClass.respString = System.Text.ASCIIEncoding.ASCII.GetString(byteArray)
MyClass.respString = MyClass.respString.Replace("^", "Β₯")
For whatever reason, the string replace method works after I swap out the offending byte with a dummy character but not before.

Jackson Json UTF16

I'm new to Java and in C# this stuff is pretty straightforward but I'm struggling with it in Java.
I'm entering some Chinese characters in a text box on the form but when Jackson Json serialises the object, it converts the Chinese chars into random bits of text. Does any one have any idea what I need to do with Jackson Json to preserve the characters so that I can pass them to the C# Web API service?
The code I'm using is below:
ObjectMapper mapper = new ObjectMapper();
String json = mapper.writeValueAsString(userAddress);
When the mapper de-serialises the userAddress object which contains the Chinese chars, it converts them to random chars within the json string before invoking the C# Web API. How do I preserve them or do I need to do encode them as bytes and then decode them in the C# Web API?
Thanks
It probably has to do more with Encoding than Jackson. One of the advantages of using UTF-8 is because it supports Chinese Characters. I tested exactly what you reported and Jackson converted the characters just fine. Now you should check what encoding is you JVM running on, if it's running on a encoding that doesn't support Chinese you might have that problem

Parse streaming JSON in Objective C

I am using JSON-RPC over TCP, the problem is that I could not find any JSON parse capable of parsing multiple JSON objects correctly, and it would be relatively hard to split it, since there is no delimiter used.
Anyone knows a way how I could handle i.e. this:
{"foo":false, "bar: true, "baz": "cool"}{"ba
Somehow I need to split it so I end up just with the first, complete JSON object. The remaining string needs to stay in buffer until I have enough data to parse it properly.
XBMC's JSON-RPC doc does give a hint:
As such, your client needs to be able to deal with this, eg. by counting and matching curly braces ({}).
Update: As Jody Hagins pointed out, beware of curly braces inside JSON strings when using this approach.
Another possible and probably much better solution would be using a streaming JSON parser like yajl (or its Objective-C wrapper yajl-objc). You can feed the parser with data until it says the current object is done and then restart parsing.
#ePirat, if someone just concatenates multiple JSON dictionaries without delimiters, they should be shot.
For parsing: JSONSerialization parses NSData which could come in any encoding. Fortunately, if you have multiple JSON dictionaries concatenated, they are quite easy to take apart. All you need is looking at the bytes and check for the characters \ " { and }.
If you find a { then increase the counter for "open brackets".
If you find a } then decrease the counter for "open brackets". If the counter is at zero, you've found the end of a dictionary.
If you find a ", then repeatedly look at the next character. If the next character is a " then skip it and go to the normal processing (you've found the end of a string). If the next character is a \ then skip that character and the following character. If the next character is anything else, skip it.
If you reach the end of the data, then your JSON data is incomplete. You could remember which state you were in (count of open brackets, whether you are parsing a string, and if parsing a string whether you just encountered a backlash character) and continue right where you left off.
No need to convert the NSData to a string until you've separated it into dictionaries. If you suspect that you might be given UTF-16 or UTF-32, check whether bytes 0, 1, 2 or 1, 2, 3 are zero (UTF-32), then check whether bytes 0 and 2 or 1 and 3 are zero (UTF-16). But in that case, if the server sends non-standard JSON in UTF-16 or UTF-32, change "the person responsible should be shot" to "the person responsible must be shot".

lucene highlighter: how can I get locations of fragments?

I know how to get relevant highlighted fragments together with some surrounding text using Lucene highlighter, namely, using
Highlighter highlighter = new Highlighter(scorer);
String[] fragments = highlighter.getBestFragments(stream, fieldContents, fragmentNumber);
But can I instead get pointers to these fragments in the original contents? In other words, I need to know where these fragments start and, if possible, end.
If you use the GetBestTextFragments method instead, you will get back an array of TextFragments. These have properties textStartPos and textEndPos.
(They are marked internal in Lucene.NET, which will require you to make some code changes to get access to them. I'm not sure about Java Lucene.)

Dealing with whitespace when parsing XML

I have problem with parsing XML.
I parsed data of cities, Amsterdam & Den Bosch.
Amsterdam works fine but Den Bosch does not.
No doubt it is due to space problem.
Den Bosch has a white space.
Should I trim the whitespace in my application or the web service?
Which would be the best to handle the space problem?
EDIT:
The OP and #PeterMurray-Rust seem to agree that the problem is that the third-party app returns URL-escaped strings of the form:
"Den%20Bosch"
%20 is not recognized by XML as anything special and that it will be necessary to replace occurrences by spaces. A typical scripting approach would be
s/%20/ /g
This is likely to be quite a common problem although I'm not clear why content should be URL-encoded.
[OP please comment if I have got this wrong]
From your update I assume that the data is something like:
<city>Den%20Bosch</city>
The string "%20" is three characters which XML does not regard as having any specific meaning. Depending on your language or whether you use XSLT you will need to replace them. In Java and the XOM library I might write
String value = cityNode.getValue().replaceAll("%20", " ");
I can't help with the specifics of Cocoa - I think you'll have to investigate the API to find how to get content values.
I Assume that you are parsing xml at application level and also by white space you mean the trailing white spaces and not the one in between the words "Den" and "Bosch". In anycase, I think you can trim the spaces at web service level, since you can be rest assured that any call coming from any other application using this web service need not have to trim the spaces since web service handles that internally. This would be a one-point change for you.
Don't know much about cocoa and your xml as well. City names, are these inner text of node or tag name. If it is in tag name or attributes without quotes, it will fail. If it is in inner text, it should work. However, there is CDATA fragment which tells the parser to ignore the contents
This is the code i implemented its working fine..........
if ([appDelegate.cityListArray count]>0) {
aDJInfo=[appDelegate.cityListArray objectAtIndex:indexPath.row];
//http://compliantbox.com/party_temperature/citysearch.php?city=Amsterdam&latitude=52.366125&longitude=4.899171
url=#"http://compliantbox.com/party_temperature/citysearch.php?city=";
NSString *string=[aDJInfo.city_Name stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
url=[url stringByAppendingString:string];
NSLog(#"abbbbbbbbbbb %#",string);
url=[url stringByAppendingString:#"&latitude=52.366125&longitude=4.899171"];
[self parseEventName:[[NSURL alloc]initWithString:url]];
}
}
#All thanks a lot..