Parse streaming JSON in Objective C - objective-c

I am using JSON-RPC over TCP, the problem is that I could not find any JSON parse capable of parsing multiple JSON objects correctly, and it would be relatively hard to split it, since there is no delimiter used.
Anyone knows a way how I could handle i.e. this:
{"foo":false, "bar: true, "baz": "cool"}{"ba
Somehow I need to split it so I end up just with the first, complete JSON object. The remaining string needs to stay in buffer until I have enough data to parse it properly.

XBMC's JSON-RPC doc does give a hint:
As such, your client needs to be able to deal with this, eg. by counting and matching curly braces ({}).
Update: As Jody Hagins pointed out, beware of curly braces inside JSON strings when using this approach.
Another possible and probably much better solution would be using a streaming JSON parser like yajl (or its Objective-C wrapper yajl-objc). You can feed the parser with data until it says the current object is done and then restart parsing.

#ePirat, if someone just concatenates multiple JSON dictionaries without delimiters, they should be shot.
For parsing: JSONSerialization parses NSData which could come in any encoding. Fortunately, if you have multiple JSON dictionaries concatenated, they are quite easy to take apart. All you need is looking at the bytes and check for the characters \ " { and }.
If you find a { then increase the counter for "open brackets".
If you find a } then decrease the counter for "open brackets". If the counter is at zero, you've found the end of a dictionary.
If you find a ", then repeatedly look at the next character. If the next character is a " then skip it and go to the normal processing (you've found the end of a string). If the next character is a \ then skip that character and the following character. If the next character is anything else, skip it.
If you reach the end of the data, then your JSON data is incomplete. You could remember which state you were in (count of open brackets, whether you are parsing a string, and if parsing a string whether you just encountered a backlash character) and continue right where you left off.
No need to convert the NSData to a string until you've separated it into dictionaries. If you suspect that you might be given UTF-16 or UTF-32, check whether bytes 0, 1, 2 or 1, 2, 3 are zero (UTF-32), then check whether bytes 0 and 2 or 1 and 3 are zero (UTF-16). But in that case, if the server sends non-standard JSON in UTF-16 or UTF-32, change "the person responsible should be shot" to "the person responsible must be shot".

Related

Why is my resource pack saying "Unable to parse pack manifest with stack: * Line 9, Column 5 Missing '}' or object member name" [duplicate]

When manually generating a JSON object or array, it's often easier to leave a trailing comma on the last item in the object or array. For example, code to output from an array of strings might look like (in a C++ like pseudocode):
s.append("[");
for (i = 0; i < 5; ++i) {
s.appendF("\"%d\",", i);
}
s.append("]");
giving you a string like
[0,1,2,3,4,5,]
Is this allowed?
Unfortunately the JSON specification does not allow a trailing comma. There are a few browsers that will allow it, but generally you need to worry about all browsers.
In general I try turn the problem around, and add the comma before the actual value, so you end up with code that looks like this:
s.append("[");
for (i = 0; i < 5; ++i) {
if (i) s.append(","); // add the comma only if this isn't the first entry
s.appendF("\"%d\"", i);
}
s.append("]");
That extra one line of code in your for loop is hardly expensive...
Another alternative I've used when output a structure to JSON from a dictionary of some form is to always append a comma after each entry (as you are doing above) and then add a dummy entry at the end that has not trailing comma (but that is just lazy ;->).
Doesn't work well with an array unfortunately.
No. The JSON spec, as maintained at http://json.org, does not allow trailing commas. From what I've seen, some parsers may silently allow them when reading a JSON string, while others will throw errors. For interoperability, you shouldn't include it.
The code above could be restructured, either to remove the trailing comma when adding the array terminator or to add the comma before items, skipping that for the first one.
Simple, cheap, easy to read, and always works regardless of the specs.
$delimiter = '';
for .... {
print $delimiter.$whatever
$delimiter = ',';
}
The redundant assignment to $delim is a very small price to pay.
Also works just as well if there is no explicit loop but separate code fragments.
Trailing commas are allowed in JavaScript, but don't work in IE. Douglas Crockford's versionless JSON spec didn't allow them, and because it was versionless this wasn't supposed to change. The ES5 JSON spec allowed them as an extension, but Crockford's RFC 4627 didn't, and ES5 reverted to disallowing them. Firefox followed suit. Internet Explorer is why we can't have nice things.
As it's been already said, JSON spec (based on ECMAScript 3) doesn't allow trailing comma. ES >= 5 allows it, so you can actually use that notation in pure JS. It's been argued about, and some parsers did support it (http://bolinfest.com/essays/json.html, http://whereswalden.com/2010/09/08/spidermonkey-json-change-trailing-commas-no-longer-accepted/), but it's the spec fact (as shown on http://json.org/) that it shouldn't work in JSON. That thing said...
... I'm wondering why no-one pointed out that you can actually split the loop at 0th iteration and use leading comma instead of trailing one to get rid of the comparison code smell and any actual performance overhead in the loop, resulting in a code that's actually shorter, simpler and faster (due to no branching/conditionals in the loop) than other solutions proposed.
E.g. (in a C-style pseudocode similar to OP's proposed code):
s.append("[");
// MAX == 5 here. if it's constant, you can inline it below and get rid of the comparison
if ( MAX > 0 ) {
s.appendF("\"%d\"", 0); // 0-th iteration
for( int i = 1; i < MAX; ++i ) {
s.appendF(",\"%d\"", i); // i-th iteration
}
}
s.append("]");
PHP coders may want to check out implode(). This takes an array joins it up using a string.
From the docs...
$array = array('lastname', 'email', 'phone');
echo implode(",", $array); // lastname,email,phone
Interestingly, both C & C++ (and I think C#, but I'm not sure) specifically allow the trailing comma -- for exactly the reason given: It make programmaticly generating lists much easier. Not sure why JavaScript didn't follow their lead.
Rather than engage in a debating club, I would adhere to the principle of Defensive Programming by combining both simple techniques in order to simplify interfacing with others:
As a developer of an app that receives json data, I'd be relaxed and allow the trailing comma.
When developing an app that writes json, I'd be strict and use one of the clever techniques of the other answers to only add commas between items and avoid the trailing comma.
There are bigger problems to be solved...
Use JSON5. Don't use JSON.
Objects and arrays can have trailing commas
Object keys can be unquoted if they're valid identifiers
Strings can be single-quoted
Strings can be split across multiple lines
Numbers can be hexadecimal (base 16)
Numbers can begin or end with a (leading or trailing) decimal point.
Numbers can include Infinity and -Infinity.
Numbers can begin with an explicit plus (+) sign.
Both inline (single-line) and block (multi-line) comments are allowed.
http://json5.org/
https://github.com/aseemk/json5
No. The "railroad diagrams" in https://json.org are an exact translation of the spec and make it clear a , always comes before a value, never directly before ]:
or }:
There is a possible way to avoid a if-branch in the loop.
s.append("[ "); // there is a space after the left bracket
for (i = 0; i < 5; ++i) {
s.appendF("\"%d\",", i); // always add comma
}
s.back() = ']'; // modify last comma (or the space) to right bracket
According to the Class JSONArray specification:
An extra , (comma) may appear just before the closing bracket.
The null value will be inserted when there is , (comma) elision.
So, as I understand it, it should be allowed to write:
[0,1,2,3,4,5,]
But it could happen that some parsers will return the 7 as item count (like IE8 as Daniel Earwicker pointed out) instead of the expected 6.
Edited:
I found this JSON Validator that validates a JSON string against RFC 4627 (The application/json media type for JavaScript Object Notation) and against the JavaScript language specification. Actually here an array with a trailing comma is considered valid just for JavaScript and not for the RFC 4627 specification.
However, in the RFC 4627 specification is stated that:
2.3. Arrays
An array structure is represented as square brackets surrounding zero
or more values (or elements). Elements are separated by commas.
array = begin-array [ value *( value-separator value ) ] end-array
To me this is again an interpretation problem. If you write that Elements are separated by commas (without stating something about special cases, like the last element), it could be understood in both ways.
P.S. RFC 4627 isn't a standard (as explicitly stated), and is already obsolited by RFC 7159 (which is a proposed standard) RFC 7159
It is not recommended, but you can still do something like this to parse it.
jsonStr = '[0,1,2,3,4,5,]';
let data;
eval('data = ' + jsonStr);
console.log(data)
With Relaxed JSON, you can have trailing commas, or just leave the commas out. They are optional.
There is no reason at all commas need to be present to parse a JSON-like document.
Take a look at the Relaxed JSON spec and you will see how 'noisy' the original JSON spec is. Way too many commas and quotes...
http://www.relaxedjson.org
You can also try out your example using this online RJSON parser and see it get parsed correctly.
http://www.relaxedjson.org/docs/converter.html?source=%5B0%2C1%2C2%2C3%2C4%2C5%2C%5D
As stated it is not allowed. But in JavaScript this is:
var a = Array()
for(let i=1; i<=5; i++) {
a.push(i)
}
var s = "[" + a.join(",") + "]"
(works fine in Firefox, Chrome, Edge, IE11, and without the let in IE9, 8, 7, 5)
From my past experience, I found that different browsers deal with trailing commas in JSON differently.
Both Firefox and Chrome handles it just fine. But IE (All versions) seems to break. I mean really break and stop reading the rest of the script.
Keeping that in mind, and also the fact that it's always nice to write compliant code, I suggest spending the extra effort of making sure that there's no trailing comma.
:)
I keep a current count and compare it to a total count. If the current count is less than the total count, I display the comma.
May not work if you don't have a total count prior to executing the JSON generation.
Then again, if your using PHP 5.2.0 or better, you can just format your response using the JSON API built in.
Since a for-loop is used to iterate over an array, or similar iterable data structure, we can use the length of the array as shown,
awk -v header="FirstName,LastName,DOB" '
BEGIN {
FS = ",";
print("[");
columns = split(header, column_names, ",");
}
{ print(" {");
for (i = 1; i < columns; i++) {
printf(" \"%s\":\"%s\",\n", column_names[i], $(i));
}
printf(" \"%s\":\"%s\"\n", column_names[i], $(i));
print(" }");
}
END { print("]"); } ' datafile.txt
With datafile.txt containing,
Angela,Baker,2010-05-23
Betty,Crockett,1990-12-07
David,Done,2003-10-31
String l = "[" + List<int>.generate(5, (i) => i + 1).join(",") + "]";
Using a trailing comma is not allowed for json. A solution I like, which you could do if you're not writing for an external recipient but for your own project, is to just strip (or replace by whitespace) the trailing comma on the receiving end before feeding it to the json parser. I do this for the trailing comma in the outermost json object. The convenient thing is then if you add an object at the end, you don't have to add a comma to the now second last object. This also makes for cleaner diffs if your config file is in a version control system, since it will only show the lines of the stuff you actually added.
char* str = readFile("myConfig.json");
char* chr = strrchr(str, '}') - 1;
int i = 0;
while( chr[i] == ' ' || chr[i] == '\n' ){
i--;
}
if( chr[i] == ',' ) chr[i] = ' ';
JsonParser parser;
parser.parse(str);
I usually loop over the array and attach a comma after every entry in the string. After the loop I delete the last comma again.
Maybe not the best way, but less expensive than checking every time if it's the last object in the loop I guess.

Possible to write an Express GET route that accepts an array of unknown length?

Like the title says, is it possible to write an Express GET route that accepts an array of unknown length?
I know I can use a POST request and just include an array in the body, but it isn't posting something so much as getting something!
I need to know how to encode the url. Most of what I am seeing is for arrays of particular length. This could be for 2 or 20, or more.
YES!
You can use the query parameter and use a delimiter, as search engines of old where your search string was actually in the url with spaces demarcated with +. This allows for an array of indeterminate length.
Did not get any feedback for HOW to encode an array to accept a url, so this is the approach I am going with. Even in my googling I couldn't find much on HOW to encode a URL to have an array in the params object, rather than the query object, possibly because I was searching for how to do it for an array of indeterminate length?

Objective C: Parsing JSON string

I have a string data which I need to parse into a dictionary object. Here is my code:
NSString *barcode = [NSString stringWithString:#"{\"OTP\": 24923313, \"Person\": 100000000000112, \"Coupons\": [ 54900012445, 499030000003, 00000005662 ] }"];
NSLog(#"%#",[barcode objectFromJSONString]);
In this log, I get NULL result. But if I pass only one value in Coupons, I get the results. How to get all three values ?
00000005662 might not be a proper integer number as it's prefixed by zeroes (which means it's octal, IIRC). Try removing them.
Cyrille is right, here is the autoritative answer:
The application/json Media Type for JavaScript Object Notation (JSON): 2.4 Numbers
The representation of numbers is similar to that used in most programming languages. A number contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part.
Octal and hex forms are not allowed. Leading zeros are not allowed.

Explain a piece of Smalltalk code?

I cannot understand this piece of Smalltalk code:
[(line := self upTo: Character cr) size = 0] whileTrue.
Can anybody help explain it?
One easy thing to do, if you have the image where the code came from, is run a debugger on it and step through.
If you came across the code out of context, like a mailing list post, then you could browse implementers of one of the messages and see what it does. For example, #size and #whileTrue are pretty standard, so we'll skip those for now, but #upTo: sounds interesting. It reminds me of the stream methods, and bringing up implementors on it confirms that (in Pharo 1.1.1), ReadStream defines it. There is no method comment, but OmniBrowser shows a little arrow next to the method name indicating that it is defined in a superclass. If we check the immediate superclass, PositionableStream, there is a good method comment explaining what the method does, which is draw from the stream until reaching the object specified by the argument.
Now, if we parse the code logically, it seems that it:
reads a line from the stream (i.e. up to a cr)
if it is empty (size = 0), the loop continues
if it is not, it is returned
So, the code skips all empty lines and returns the first non-empty one. To confirm, we could pass it a stream on a multi-line string and run it like so:
line := nil.
paragraph := '
this is a line of text.
this is another line
line number three' readStream.
[(line := paragraph upTo: Character cr) size = 0] whileTrue.
line. "Returns 'this is a line of text.'"
Is this more readable:
while(!strlen(line=gets(self)))
Above expression has a flaw if feof or any other error, line==NULL
So has the Smalltalk expression, if end of stream is encountered, upTo: will answer an empty collection, and you'll have an infinite loop, unless you have a special stream that raises an Error on end of stream... Try
String new readStream upTo: Character cr
The precedence rules of Smalltalk are
first: unary messages
second: binary messages
third: keyword messages
last: left to right
This order of left to right, can be changed by using parenthesis i.e. ( ) brackets. The expression within the pair of brackets is evaluated first.
Where brackets are nested, the inner-most bracket is is evaluated first, then work outwards in towards the outer bracket, and finally the remains of the expression outside the brackets.
Because of the strong left-to-right tendency, I often find it useful to read the expression from right to left.
So for [(line := self upTo: Character cr) size = 0] whileTrue.
Approaching it from the end back to beginning gives us the following interpretation.
. End the expression. Equivalent to ; in C or Java
whileTrue What's immediately to the left of it? ] the closure of a block object.
So whileTrue is a unary message being sent to the block [ ... ]
i.e. keep doing this block, while the block evaluates to true
A block returns the result of the last expression evaluated in the block.
The last expression in the block is size = 0 a comparison. And a binary message.
size is generally a unary message sent to a receiver. So we're checking the size of something, to see if it is 0. If the something has a size of 0, keep going.
What is it we are checking the size of? The expression immediately to the left of the message name. To the left of size is
(line := self upTo: Character cr)
That's what we want to know the size of.
So, time to put this expression under the knife.
(line := self upTo: Character cr) is an assignment. line is going have the result of
self upTo: Character cr assigned to it.
What's at the right-hand end of that expression? cr It's a unary message, so has highest precedence. What does it get sent to. i.e. what is the receiver for the cr message?
Immediately to its left is Character. So send the Character class the message cr This evaluates to an instance of class Character with the value 13 - i.e. a carriage return character.
So now we're down to self upTo: aCarriageReturn
If self - the object receiving the self upTo: aCarriageReturn message - does not understand the message name sizeUpto: it will raise an exception.
So if this is code from a working system, we can infer that self has to be an object that understands sizeUpTo: At this point, I am often tempted to search for the massage name to see which Classes have the message named sizeUpto: in their list of message names they know and understand (i.e. their message protocol ).
(In this case, it did me no good - it's not a method in any of the classes in my Smalltalk system).
But self appears to be being asked to deal with a character string that contains (potentially) many many carriage returns.
So, return the first part of aCharacterString, as far as the first carriage-return.
If the length of aCharacterString from the start to the first carriage return is zero, keep going and do it all again.
So it seems to be we're dealing with a concatenation of multiple cr-terminated strings, and processing each one in turn until we find one that's not empty (apart from its carriage- return), and assigning it to line
One thing about Smalltalk that I'm personally not a huge fan of is that, while message passing is used consistently to do nearly everything, it can sometimes be difficult to determine what message is being sent to what receiver. This is because Smalltalk doesn't have any delimiters around message sends (such as Objective-C for example) and instead allows you to chain message sends while following a set of precedence rules which go something like "message sends are interpreted from left to right, and unless delimited by parentheses, messages with many keywords are evaluated first, then binary keyword messages, then unary, and then no keyword ones." Of course using temporary variables or even just parentheses to make the order of the messages explicit can reduce the number of situations where you have to think about this order of operations. Here is an example of the above code, split up into multiple lines, using temp variables and parenthesis for explicit message ordering for readability. I think this is a bit clearer about the intent of the code:
line = (self upTo: (Character cr)).
([((line size) = 0)] whileTrue).
So basically, line is the string created when you concatenate the characters in string self up until the carriage return character (Character cr).
Then, we check line's size in characters, and check if that's equal to 0, and because we put this in a block (brackets), we can send it a whileTrue, which re-evaluates the condition in the block until it returns true. So, yeah whileTrue really would be clearer if it was called doWhileTrue or something like that.
Hope that helps.

Is it safe to convert a mysqlpp::sql_blob to a std::string?

I'm grabbing some binary data out of my MySQL database. It comes out as a mysqlpp::sql_blob type.
It just so happens that this BLOB is a serialized Google Protobuf. I need to de-serialize it so that I can access it normally.
This gives a compile error, since ParseFromString() is not intended for mysqlpp:sql_blob types:
protobuf.ParseFromString( record.data );
However, if I force the cast, it compiles OK:
protobuf.ParseFromString( (std::string) record.data );
Is this safe? I'm particularly worried because of this snippet from the mysqlpp documentation:
"Because C++ strings handle binary data just fine, you might think you can use std::string instead of sql_blob, but the current design of String converts to std::string via a C string. As a result, the BLOB data is truncated at the first embedded null character during population of the SSQLS. There’s no way to fix that without completely redesigning either String or the SSQLS mechanism."
Thanks for your assistance!
It doesn't look like it would be a problem judging by that quote (it's basically saying if a null character is found in the blob it will stop the string there, however ASCII strings won't have random nulls in the middle of them). However, this might present a problem for internalization (multibyte charsets may have nulls in the middle).