Unescaped control characters in NSJSONSerialization - objective-c

I have this JSON http://www.progmic.com/ielts/retrive.php that I need to parse. When I do it with NSJSONSerialization, I get "Unescaped control character around character 1981" error.
I need to know:
What the heck are the unescaped control characters? Is there a list or something?
How do I get rid of this error? The easiest way?
Thanks in advance.

I added this method to remove the unescaped characters from retrieved string:
- (NSString *)stringByRemovingControlCharacters: (NSString *)inputString
{
NSCharacterSet *controlChars = [NSCharacterSet controlCharacterSet];
NSRange range = [inputString rangeOfCharacterFromSet:controlChars];
if (range.location != NSNotFound) {
NSMutableString *mutable = [NSMutableString stringWithString:inputString];
while (range.location != NSNotFound) {
[mutable deleteCharactersInRange:range];
range = [mutable rangeOfCharacterFromSet:controlChars];
}
return mutable;
}
return inputString;
}
After recieving the NSData, I convert it to NSString, call the above method to get a new string with removed control characters and then convert the new NSString to NSData again for further processing.

What are you doing before using NSJSONSerialization? Check the encodings, maybe the issue lies there.
I quickly tried and it worked. The source JSON is valid. This is what I got when I serialize the array back to JSON with the pretty print thingy:
[
{
"url2": "http://ielts.progmic.com/app/i_COFANewSouthWhales.html",
"title": "COFA New South Whales ",
"img": "http://ielts.progmic.com/images/uni/1340407904.jpg",
"url": "http://ielts.progmic.com/app/COFANewSouthWhales.html",
"desc": "The College offers nine undergraduate degrees including some double degrees associated with other UNSW faculties.",
"img2": "http://ielts.progmic.com/images/uni/thumb/1340407904.jpg"
},
{
"url2": "http://ielts.progmic.com/app/i_RoyalCollegeOfArts.html",
"title": "Royal College Of Arts ",
"img": "http://ielts.progmic.com/images/uni/1340542224.jpg",
"url": "http://ielts.progmic.com/app/RoyalCollegeOfArts.html",
"desc": "The Royal College of Art (informally the RCA) is a public research university specialised in art and design located in London, United Kingdom. It is the world's only wholly postgraduate university of art and design, offering the degrees of Master of Arts (M.A.), Master of Philosophy (M.Phil.) and Doctor of Philosophy (Ph.D.). It was founded in 1837",
"img2": "http://ielts.progmic.com/images/uni/thumb/1340542224.jpg"
},
{
"url2": "http://ielts.progmic.com/app/i_MIDDLESEXUNIVERSITY.html",
"title": "MIDDLESEX UNIVERSITY ",
"img": "http://ielts.progmic.com/images/uni/1340410005.jpg",
"url": "http://ielts.progmic.com/app/MIDDLESEXUNIVERSITY.html",
"desc": "We have a reputation for the highest quality teaching, research that makes a real difference to people’s lives and a practical, innovative approach to working with businesses to develop staff potential and provide solutions to business issues. Our expertise is wide ranging, from engineering, information, health and social sciences, to business, arts and education - and we’re national leaders in work based learning solutions.",
"img2": "http://ielts.progmic.com/images/uni/thumb/1340410005.jpg"
},
{
"url2": "http://ielts.progmic.com/app/i_UNIVERSITYOFSCOTLAND.html",
"title": "UNIVERSITY OF SCOTLAND ",
"img": "http://ielts.progmic.com/images/uni/1340410189.jpg",
"url": "http://ielts.progmic.com/app/UNIVERSITYOFSCOTLAND.html",
"desc": " Founded in 1451, Glasgow is the fourth-oldest university in the English-speaking world. Over the last five centuries and more, we’ve constantly worked to push the boundaries of what’s possible. We’ve fostered the talents of seven Nobel laureates, one Prime Minister and Scotland’s inaugural First Minister. We’ve welcomed Albert Einstein to give a lecture on the origins of the general theory of relativity. Scotland’s first female graduates completed their degrees here in 1894 and the world’s first ultrasound images of a foetus were published by Glasgow Professor Ian Donald in 1958. In 1840 we became the first university in the UK to appoint a Professor of Engineering, and in 1957, the first in Scotland to have an electronic computer. All of this means that if you choose to work or study here, you’ll be walking in the footsteps of some of the world’s most renowned innovators, from scientist Lord Kelvin and economist Adam Smith, to the pioneer of television John Logie Baird.",
"img2": "http://ielts.progmic.com/images/uni/thumb/1340410189.jpg"
}
]
You can copy and paste the JSON in a file, save it in various encodings and see what's going on with your code. UTF-8 should be the way to go.

Related

Natural Text Generation based on key-value pairs

I have 40 entities forming a table. These entities consist of numerical or alphabetical key:value pairs in the context of real estate.
A table could look like this:
- Address 10066 Cielo Dr BEVERLY HILLS, CA 90210 UNITED STATES
- Price: $69,995,000
- Interior: 21,000 Sq Ft.
- Property Type: Single Family Home
- Year Home Built: 1996
Based on this table, which is generated synthetically, I would like to develop a grammatically correct natural text that considers the entire set of key:value pairs.
In our example, something like this:
"The property is located at 10066 Cielo Dr BEVERLY HILLS, CA 90210 UNITED STATES and costs
$69,995,000. The beautiful interior of the Single Family home sums up to 21,000 Sq Ft. The home was built in 1996."
I learned that GAN BERT might be the approach for this problem. Would it be possible to make that work or do you know a better approach?
Thanks for any sort of help, I really appreciate it!
NLPChoppa

Google Reverse Geocoding - how to choose between a street_address or route?

I have been using Google's reverse geocoding APIs in a vehicle tracking application to convert lat/lon information into an "address" for at least 5 years. Recently, this conversion has started yielding some surprising results.
For example, the lat/lon pair, 36.7653111,-121.74852, when plugged into Google Maps, yields "CA-156, Castroville, CA 95012" as the address. This is the desirable answer.
The tracking application yields "11298 Haight St, Castroville, Monterey County, CA, 95012, US" The problem is that the JSON result contains two "street_address" and one "route" type. The dumb algorithm of choosing the first street_address or route occurring in the result no longer works. The question now is how to decide which of the possibilities is a better match to the given lan/lon? The lat/lon is clearly on route CA-156. Haight St. does not cross CA-156 at all.
What is special about this case is that the vehicle is not travelling on either of the streets in the two "street_address" types but is on the street in the route. In this case, the route should have been given priority over the two street_address types.
I have now examined the results of hundreds of reverse geocodings. There does not appear to be any simple algorithmic way of choosing the best result. For example, reverse geocoding 37.31674,-122.0472125 returns only two results:
Type: premise
Address: Child Development Center, Cupertino, Santa Clara County, CA 95014, US
location_type: ROOFTOP
37.316425,-122.0460558 Distance: 354.7286202778164 Feet
Type: route
Address: CA-85, Cupertino, Santa Clara County, CA 95014, US
location_type: GEOMETRIC_CENTER
37.3145586,-122.0461306 Distance: 855.4738140974437 Feet
The vehicle is travelling on CA-85. Choosing the first result (premise) or the result with least distance, does not yield the best result.
The fundamental problem here is the for "route" types, the distance to the GEOMETRIC_CENTER does not tell you if you are "on the route" (0 distance) or if you are "off the route", how far off.
I have filed a case with Google. If I get a useful response, I will post it here.
If you are reverse geocoding lat/lon information coming from in-vehicle devices here are two approached that significantly improve the results. The discussion assumes you have limited the results to types: "premise", "street_address" or "route", If you are interested in other types, you may have to experiment a bit.
First, if the in-vehicle device returns the speed along with lat/lon, then choose the "route" result, if one is present, when the speed is above a certain threshold. Otherwise, choose the "street_address" or "premise" with the least distance to the lat/lon. You may have to experiment a bit with the speed threshold to find a reasonable value. For me, 25 MPH seemed to do a decent job.
Second, if you don't have speed or another indication that the vehicle is stopped or moving, then try the following "hack".
Scan the results up to the first occurrence of a "route" and determine amongst "premise" or "street_address" types the one with the least distance to the lat/lon. Remember the "route", if one is found.
Then
1. If no "route" result exists, return the "premise" or "street_address" with the least distance to the lat/lon.
2. Else
a. If the "route" has a "route" name that matches the regex "[A-Z]+-[0-9]+", return the route as the best result.
b. Else if a least distance "premise" or "street_address" exists, return that as the best result.
c. Otherwise, return the "route" as a best result.
This is far from perfect, but seems to work well enough for the US which is all I care about right now. As route names differ significantly from country to country some enhancement will likely be necessary.

Objective-C algorithm to find largest common subsets of arrays?

I'm currently in need of an efficient solution to finding the largest common subsets of multiple arrays.
For example:
Let's say a user, Chris, wants to find other users with common interests (from most common to least common); we'd have to compare his array of interests with other users' arrays and find the largest common subset to the smallest common subset.
Chris {bowling, gaming, skating, running}
And other users in database.
Brad {bowling, jumping, walking, sitting}
John {bowling, gaming, skating, eating}
Sarah {bowling, gaming, drawing, coding}
So Chris has the most common interests, respectively, with John, then Sarah, then Brad.
How would I, in Objective-C, be able to do this? Any pointers would be great.
You are looking for an algorithm to find the cardinality of a set intersection.
Depending on your set representation, you could choose different ways of doing it. The most performant representation for this would be using bits in an integer, but if the number of possible interests exceeds 64 this may not be easy to implement.
A straightforward way of implementing it would be with NSMutableSet, like this:
// Prepare the individual lists
NSArray *chris = #[#"bowling", #"gaming", #"skating", #"running"];
NSArray *brad = #[#"bowling", #"jumping", #"walking", #"sitting"];
// Obtain the intersection
NSMutableSet *common = [NSMutableSet setWitArray:chris];
[common intersectSet:[NSSet setWithArray:brad]];
NSLog(#"Common interest count: %i", common.count);

In ElasticSearch, removed stop words continue to have a small effect on scoring

Base Match Query: Billy Sue
Test Match Query #1: Billy Sue and
Test Match Query #2: Billy and Sue
We end up with identical scores between Base and #1, but Base and #2 have similar yet different scores.
Using the analyze API, the stop word and is removed on both test queries, but the start_offset and end_offset token properties differ for Sue between the Base query and Test Query #2.
Essentially, the pre-stop-word-removal distance between the remaining tokens is recorded and has a small yet finite impact on scoring.
The Question
Is there a way to delay the calculation of the start_offset and end_offset properties of tokens until after stop-words are removed, or otherwise prevent removed stop-words from influencing scoring in any fashion?
Perhaps disable position increments on the stop word filterand see if that helps? Especially if your mapping has some kind of filter after the stop word filter, you'll get strange artifacts from the position increments
E.g. something like this:
"analyzer": {
"analyzer_example":{
"tokenizer":"standard",
"filter":["standard", "lowercase", "filter_stop"]
}
},
"filter": {
"filter_stop":{
"type":"stop",
"enable_position_increments":"false"
}
}

Calculating group row at run time of a NSTableView

I have a NSTableView, filled with song titles.
The songs are ordered by artists, so I can display the artist name as a row view.
Code
Until now, I had to iterate through the table data and manually add the artists to the array.
NSMutableArray *songsAndArtists = [NSMutableArray array];
Artist *artist;
for (Song *song in self.songs) {
if (artist != song.artist) {
artist = song.artist;
[songsAndArtists addObject:artist];
}
[songsAndArtists addObject:song];
}
This can be really slow if you have 1'000 - 5'000 songs.
I haven't really figured out how I could speed up this process.
Does anybody have an idea how to calculate this at runtime?
EDIT
I'll try to clarify what I meant:
NSTableView supports group rows. My table view displays all the songs in CoreData.
I'd like to use the group rows to display the artist of the songs, like in the print screen I added above.
To do this I have to provide an array with an artist, followed by it's songs, followed by the next artist, and so on.
The code above shows how to insert the artists, but it's a pretty time-consuming process.
So my question, how can I speed this up?
HINT
Like nielsbot and Feloneous Cat suggested, iterating through all the artists won't work for me.
The user also has the option to search though the library.
Therefore, not all the songs should actually appear in the list.
Solution
I just want you to let you know what the problem was:
The problem was, that I actually compared via NSString in the previous version.
Pretty stupid fault...
It takes about 0.1 seconds or less which is great:
tableData = [self addGroupRowsToArray:[self allSongs] withKeyPath:#"artist"];
- (NSArray *)addGroupRowsToArray:(NSArray *)array withKeyPath:(NSString *)keyPath {
NSMutableArray *mixedArray = [NSMutableArray array];
id groupRowItem;
for (id arrayItem in array) {
if (groupRowItem != [arrayItem valueForKeyPath:keyPath]) {
groupRowItem = [arrayItem valueForKeyPath:keyPath];
[mixedArray addObject:groupRowItem];
}
[mixedArray addObject:arrayItem];
}
return mixedArray;
}
Well, i've tried to reproduce your problem. I don't know where do you get data from so i've faked them just for performance test. I also tested the code on iPod 4gen, and it works without any lag. (tried either 50000 rows and it only hang upon start, then worked perfectly)
In general my approach differs from yours with data structures. So you use arrays, and i use dictionaries. Still i think if i used arrays there should be no issues anyway. Maybe i am getting wrong what you've asked?
Still, my approach seems to be better suitable for search to work(sry i've been too bothered to try to implement it). Indeed, now the search to take place, whole sections may be excluded without checking songs, unless you need to search songs of course, anyway this gives more flexibility.
okay here is a link: https://github.com/igorpakushin/BigList
will be glad to here some feedback,
thank you
regards,
Igor
Is this faster?
-(NSArray*)songsAndArtists:(NSArray*)allArtists
{
NSMutableArray * result = [ NSMutableArray array ] ;
for( Artist * artist in allArtists )
{
[ result addObject:artist ] ;
[ result addObjectsFromArray:artist.songs ] ;
}
return result ;
}
If you are getting "all artists" from Core Data, you can tell it to prefetch the objects in the songs relationship, which will speed things up further.
-(NSArray*)allArtists
{
NSFetchRequest * request = [ NSFetchRequest fetchRequestWithEntityName:#"Artist" ] ;
[ request setRelationshipKeyPathsForPrefetching:#[ #"songs" ] ] ;
...
return results ;
}
So, let's back up and look at what you REALLY have. You already have all the information that you need residing in songs. Why are you in essence duplicating it merely to break out the artist?
Think of it this way, if you have the following songs (song/artist)
(0) "Death Eater", "Raging Machine Code"
(1) "Interrupt", "Raging Machine Code"
(2) "Panic", "Times Square Revolution"
(3) "New Years", "Times Square Revolution"
(4) "Toast", "Ed & Billy's Time Machine"
(5) "Surge", "Quiet Cat"
(6) "Surveil", "Quiet Cat"
What is wrong with this picture? We have the same information duplicated. Ideally, we want to have something like this:
"Raging Machine Code" -> has an array that contains
"Death Eater"
"Interrupt"
"Times Square Revolution" -> has an array that contains
"Panic"
"New Years"
"Ed & Billy's Time Machine" -> array that contains
"Toast"
"Quiet Cat" -> array that contains
"Surge"
"Surveil"
For UITableView this makes things trivial - we tell it how many sections (four) and for each section how many songs. To generate the cell, we get an NSIndexPath that tells us the section and the row (section would be the artist and the row would be the song for that artist).
NSTableView doesn't do that. It gives us a row. However, if we do some work on the front end (i.e. the song list) then we can assure that life will be beautiful and fast (or at least faster). The key is to pre-calculate the number of songs per artist and store it.
So assume we are asked to display row 5. "Raging Machine Code" is 0-2 (artist, song, song). "Times Square Revolution" is 3-5. Ah! 5 is the last song, so we display "New Years"!
Try another one, assume we are to display row 6. "Raging..." was 0-2, "Times..." was 3-5, "Ed & Billy's" is 6-7. Bingo, we need to display the artist, "Ed & Billy's Time Machine"!
The idea is that we do much of this prework AHEAD of actually displaying the data. Looping through artists will be far faster than looping through ALL the songs. Plus now you are only doing simple math - no need to move things.
Sometimes how you store your data means the difference between success and failure. Whenever you see yourself having to "redefine a data structure" that usually means your data structure is faulty - it may be simple, but it causes more effort.
Hopefully this was helpful and didn't end up being "TLTR" (too long to read).