I am wondering about the type of encoded string like that:
a:2{s:3{"xyz"}, s:0{""}}
and is there any viewer for it like JSON viewer etc...
That looks extremely similar to PHP's native serialization format. The program:
print serialize([ "xyz" => "" ]);
will output:
a:1:{s:3:"xyz";s:0:"";}
Your sample has a couple of extra braces, and uses different punctuation in a few places. These don't make any sense; they would not be accepted by deserialize().
Related
I am trying create a grammar for a format that follows a type-length-value convention. Can ANTLR4 read in a length value and then parse that many characters?
NO ...
From your question (which is very short so I could miss something ...) I gather you are mixing grammar and encoding rules.
When you say type-length-value, it sounds like an encoding rule to me (how to serialize a data). In my experience, you write this code yourself.
A grammar is at a higher level: it's a piece of text that describes something. Antlr will help you breaking this text into tokens and then into a tree that you can navigate.
This step only handles text: if you were going that way to solve your problem, you would still have to handle type, length and value yourself.
EDIT:
with a bit of googling I found this https://github.com/NickstaDB/SerializationDumper
I'm working in IOS and trying to pass some content to a web server via an NSURLRequest. On the server I have a PHP script setup to accept the request string and convert it into an JSON object using the Zend_JSON framework. The issue I am having is whenever the character "ø" is in any part of the request parameters, then the request string is cut short by one character.
Request string before going to server.
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Østfold","udid":"bed164974ea0d436a43f3cdee0e005a1"}]
Request string on server before any parsing
[{"description":"Blah blah","type":"Russebuss","name":"Roscoe Simulator","appVersion":"1.0.20","osVersion":"IOS 5.1","phone":"5555555","country":"Nord-Trøndelag","udid":"bed164974ea0d436a43f3cdee0e005a1"}
Everything looks exactly the same except the final closing ] is missing. I'm thinking it's having an issue when converting the string to UTF-8, but not sure the correct way to fix this issue.
Does anyone have any ideas why this is happening?
first of all do not trust the xcode console in such cases. you never know which coding the console is actually using.
second, escape the invalid characters before you build you json string. easiest way would probably to make sure you are using the same unicode representation, like utf-8, all the time.
third, if there are still invalid characters use a json lib with a parser (does the encoding). validate the output by parsing back to e.g. NSString. or validate the output manually by using a web form like http://jsonformatter.curiousconcept.com/
the badest way is to replace the single characters in the string, build your json and convert back. one way to do this could be to replace e.g an german ä with its unicode representaion U+00E4 (http://www.utf8-chartable.de/).
Thats the way I do it. I am glad that I nerver needed to go further than step three and this is the step you should do anyway to keep your code simple.
Please try to use Zends internal json Encoding:
Zend_Json::$useBuiltinEncoderDecoder = true;
should fix your issue.
According to MSDN vb.net uses this extended character set. In my experience it actually uses this:
What am I missing? Why does it say it uses the one and uses the other?
Am I doing something wrong?
Is there some sort of conversion tool to the original character set?
This behaviour is defined in the documentation of the Chr command:
The returned value depends on the code page for the current thread, which is contained in the ANSICodePage property of the TextInfo class in the System.Globalization namespace. You can obtain ANSICodePage by specifying System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
So, the output of Chr for values greater than 127 is system-dependent. If you want reproducible results, create the desired instance of Encoding by calling Encoding.GetEncoding(String), then use Encoding.GetChars(Byte()) to convert your numeric values into characters.
If you go up one level in the chart linked in your question, you will see that they do not claim that this chart is always the output of the Chr command:
The characters that appear in Windows above 127 depend on the selected typeface.
The charts in this section show the default character set for a console application.
Your application is a WinForm application, not a console application. Even in the console, the character set used can be changed (for example, by using the chcp command), hence the word "default".
For detailed information about the encodings used in .net, I recommend the following MSDN article: Character Encoding in the .NET Framework.
The first character set is Code Page 437 (CP437), the second looks like Code Page 1252 (CP1252) also known as Windows Latin-1.
I'd guess VB.Net is simply picking up the default encoding for the PC.
How did you write all this? Because usually, when you use a output stream function, you can specify the encoding going with it.
Edit: I know this is not C#, but you can see the idea...
You'd have to set the encoding of your filestream, by doing something like this:
Setting the encoding when creating the filestream
I'm converting a WinForms app to Silverlight (VB.NET). What should I use instead of Char.ConvertFromUtf32 as it's not available to use in Silverlight?
UTF-32 is currently not part of Silverlight, so you have to find a way around the limitation. I think you should stop a moment and think exactly why you need to read UTF32-encoded text.
If you are reading such text from a database or a file on the server, I would perform the conversion server-side (if possible I would convert everything to UTF-8 and get rid of the UTF-32 data in one shot).
If you are parsing a user-provided file on the client side, I would detect the UTF-32 encoding and gently tell the user that the file encoding is not supported. UTF32 is pretty rare nowadays, so I guess it should not be a very common case (but I could be wrong not knowing your exact situation).
In order to detect the file encoding you have to look at the first few bytes (byte order mark) -more information here, if they are not present the task becomes much harder and involves some kind of heuristics based on character frequency.
From: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types
You can use a direct cast, like:
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Small warning, it will only work up to 0xffff. It will not work for high range Unicode from 0x10000 to 0x10ffff.
Also, if you need to parse \uXXXX, try this other question: How do I convert Unicode escape sequences to Unicode characters in a .NET string?
All of the examples which I see read in a file the lex & parse it.
I need to have a function which takes a string (char *, I'm generating C code) as parameter, and acts upon that.
How can I do that best? I thought of writing the string to a stream, then feeding that to the lexer, but it doesn't feel right. Is there any better way?
Thanks in advance
You would need to use the antlr3NewAsciiStringInPlaceStream method.
You didn't say what version of Antlr you were using so I'll assume Antlr v3.
The inputs to this method are the string to parse, it's length and then you can probably use NULL for the last input.
This produces an input stream similar to the antlr3AsciiFileStreamNew that you would use to parse a file.
I see that you mentioned writing the input to a stream. If you can use C++ then that's the best method you'll probably come by.
This is the barebones code I normally use:
std::istringstream issInput(std::cin); // make this an ifstream if you want to parse a file
Lexer oLexer(issInput);
Parser oParser(oLexer);
oFactory("CommonASTWithHiddenTokens",&antlr::CommonASTWithHiddenTokens::factory);
antlr::ASTFactory oFactory;
oParser.initializeASTFactory(oFactory);
oParser.setASTFactory(&oFactory);
oParser.main();
antlr::RefAST ast = oParser.getAST();
if (ast)
{
TreeWalker oTreeWalker;
oTreeWalker.main(ast, rPCode);
}
I think you should feed it to a stream. You could feed it to stdin if you'd like. That way, your code shouldn't differ too much from reading strings from a file.