DataContractSerializer encode \n - serialization

I am using the following code to serialize my object
DataContractSerializer ser = new DataContractSerializer(obj.GetType());
String text;
using (MemoryStream memoryStream = new MemoryStream())
{
ser.WriteObject(memoryStream, obj);
byte[] data = new byte[memoryStream.Length];
Array.Copy(memoryStream.GetBuffer(), data, data.Length);
text = Encoding.UTF8.GetString(data);
}
My object is serializing like this:
<Meta xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\"><Description>This is my new file
\n
\nMore Data</Description><Title>My Other Test Document</Title></Meta>
Notice that my \n was not escaped. Why is that? What is the best way to send \r\n through xml.
I searched and I dont see any articles about this. Am I missing some attribute in my serialize code?

Babel -- not at all, you're not missing any special attributes here. The \n is getting serialized. If it were being interpreted as a non-newline special character, you would see \n in the string, not \n itself. Are you not catching the newline on the client end, and have you verified it by spitting it out via an stdout call?

Related

Is it possible to parse only the second half of a file by starting at some rule other than the first rule?

I have generated a JSON parser with ANTLR. To parse a JSON file, I must call the first rule in the JSON grammar:
parser = new JSONParser(...);
parser.json();
If I only want to parse a JSON array, can I skip all the JSON tokens up until the first token that starts an array and then call:
parser.array();
My guess is it won't parse until the end of the file, but instead it will stop at the end of the array in the JSON file. That's ok. But I'm not sure if it is allowed to call array() without any context, such as the correct JSON lexer mode.
Is there an example that describes how to do this?
You could call seek() on your TokenStream before initiating the parse. (I haven't directly tested it, but that should advance you TokenStream to the token where you wish to begin parsing).
Your intuition regarding calling the parser.array() method, and that it will parse until it has completed parsing the array that starts at your position in the token stream, is correct. It will not parse to the end of the input.
If you don't know the index to seek to, perhaps you want to locate the first array, to start there, just call write a loop that calls LA() to get the next token (it's short for "Look ahead"). If it's not the '[' token, then consume() it. If it is, then you're at the right position and can call parser.array() and will will parse just that array and return.
Following up to Mike's answer, the lexer works independently of the parser, and you can't (easily) backup the lexer. But, you can work with CommonTokenStream to position where you want to parse:
var lexer = new JSONLexer(str);
var tokens = new CommonTokenStream(lexer);
for (int i = 0; ; ++i) {
var t = tokens.LA(i+1);
var token = tokens.LT(i+1);
if (token.Text == "[")
{
tokens.Seek(i);
break;
}
}
var parser = new JSONParser(tokens);
var tree = parser.arr();
This code does a string comparison, but if you define a lexer symbol for '[', then you can break on the Type instead.

WCF Change message encoding from Utf-16 to Utf-8

I have a WCF connected service in a .net core application. I'm using the code that is autogenerated taken the wsdl definition.
Currently at the top of the request xml is including this line:
<?xml version="1.0" encoding="utf-16"?>
I can't find a simple way to change this encoding to UTF-8 when sending the request.
Since I could find a configuration option a the request/client objects, I've tried to change the message with following code at IClientMessageInspector.BeforeSendRequest
public object BeforeSendRequest(ref Message request, IClientChannel channel)
{
// Load a new xml document from current request
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(request.ToString());
((XmlDeclaration)xmlDocument.FirstChild).Encoding = Encoding.UTF8.HeaderName;
// Create streams to copy
var memoryStream = new MemoryStream();
var xmlWriter = XmlWriter.Create(memoryStream);
xmlDocument.Save(xmlWriter);
xmlWriter.Flush();
xmlWriter.Close();
memoryStream.Position = 0;
var xmlReader = XmlReader.Create(memoryStream);
// Create a new message
var newMessage = Message.CreateMessage(request.Version, null, xmlReader);
newMessage.Headers.CopyHeadersFrom(request);
newMessage.Properties.CopyProperties(request.Properties);
return null;
}
But the newMessage object still writes the xml declaration using utf-16. I can see it while debugging at the watch window since.
Any idea on how to accomplish this (should be) simple change will be very apreciated.
Which binding do you use to create the communication channel? The textmessageencoding element which has been contained in the CustomBinding generally contains TextEncoding property.
https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/wcf/textmessageencoding
WriteEncoding property specifies the character set encoding to be used for emitting messages on the binding. Valid values are
UnicodeFffeTextEncoding: Unicode BigEndian encoding
Utf16TextEncoding: Unicode encoding
Utf8TextEncoding: 8-bit encoding
The default is Utf8TextEncoding. This attribute is of type Encoding.
As for the specific binding, it contains the textEncoding property too.
https://learn.microsoft.com/en-us/dotnet/api/system.servicemodel.basichttpbinding.textencoding?view=netframework-4.0
Feel free to let me know if there is anything I can help with.

newtonsoft SerializeXmlNode trailing nulls

I am creating an XmlDoc in C# and using Newtonsoft to serialize to JSON. It works, but I am getting a bunch of what appear to be "NUL"'s at the end of the JSON. No idea why. Anyone seen this before?
CODE:
XmlDocument xmlDoc = BuildTranslationXML(allTrans, applicationName, language);
// Convert the xml doc to json
// the conversion inserts \" instead of using a single quote, so we need to replace it
string charToReplace = "\"";
string jsonText = JsonConvert.SerializeXmlNode(xmlDoc);
// json to a stream
MemoryStream memoryStream = new MemoryStream();
TextWriter tw = new StreamWriter(memoryStream);
tw.Write(jsonText);
tw.Flush();
tw.Close();
// output the stream as a file
string fileName = string.Format("{0}_{1}.json", applicationName, language);
return File(memoryStream.GetBuffer(), "text/json", fileName);
The file is served up to the calling web page and the browser prompts the user to save the file. When opening the file, it displays the correct JSON but also has all the trailing nulls. See image below (hopefully the stackoverflow link works):
file screenshot
The GetBuffer() method returns the internal representation of the MemoryStream. Use ToArray() instead to get just the part of that internal array that has data Newtonsoft has put in there.

Uncompressing a Gzip format?

I am facing a problem with Gzip uncompressing.
The situation is like this. I have some text in UTF-8 format. Now this text is compressed using gzdeflate() function in PHP and then stored in a blob object in Mysql.
Now I tried to retrieve the blob object and then used Java's Gzip Stream to un compress it. But it throws an error saying that it is not in GZIP format.
I even used Inflater in Java to do the same but now I get "DataFormatException:incorrect header check". The code for the inflater is as below.
//rs is the resultset
//blobAsBytes is the byte array
while(rs.next()){
blob = rs.getBlob("old_text");
int blobLength = (int) blob.length();
blobAsBytes = blob.getBytes(1, blobLength);
}
Inflater decompresser = new Inflater();
decompresser.setInput(blobAsBytes);
byte[] result = new byte[100];
int resultLength = decompresser.inflate(result); // this is the line where the exception is occurring.
decompresser.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println(outputString);
I have to do this using Java and get all the text back that is stored in the database.
Can someone please help me with this.
Use gzencode(), not gzdeflate(). The latter does not produce the gzip format, it produces the deflate format. The former does produce the gzip format. The PHP functions are horribly and confusingly named.
Alternatively, use the java.util.zip.Inflater class with nowrap true in the Inflater constructor. That will decode raw deflate data on the Java end.

OutgoingWebResponseContext does not display non-english characters

We have implmented a REST-style get service Using WCF in .Net 3.5. This service retrieves research documents. The string 'synopsis' indicated in the code bolow contains non-english characteres which the browser deliveres as "????????".
private void ReturnSynopsisInfo(IApiWebOperationContext context, OutgoingWebResponseContext outgoingResp, string synopsis)
{
SetResponseHeaders(outgoingResp, HttpStatusCode.OK);
outgoingResp.ContentType = "text/html; charset=UTF-8";
context.Result = new MemoryStream(Encoding.ASCII.GetBytes(synopsis));
}
Any advise is much appreciated.
Thank You.
It seems you are declaring the encoding as utf-8 in the content-type header, but actually using ASCII encoding in stream. The ASCII encoder will silently change any non-ascii character into a question mark.
You probably want to use UTF8Encoding rater than ASCIIEncoding.