Wiki API - Parsing sentences from JSON extracts in JavaScript? - api

Is there a way to have wiki display extracts in an array of sentences?
Or does anyone have any ideas other than using string.split(".") to parse? There are cases where the sentence may include a . and I don't want to split if it occurs mid-sentence.
For example, "The Eagles were No. 1 in the U.S. in 1970" would be split into 4 sentences using str.split(), and that's not what I want.
Wiki must have some sort of determination of what defines a sentence as it works when you limit the number of existence in a call (they don't break a sentence on an in-line period). Is there a way to get them individually?
Looking for a solution in JavaScript to parse a JSON excerpt string.

I ended up figuring out a work-around. Using exsentences, I made 10 calls, each with one more sentence than the previous call. I stored the results of each call in an array. So when the 10 calls were complete, I had 10 strings, ranging from one sentence in position 0, up to 10 sentences in the 9th position. Then I just iterated through the array, from 0 to length - 2, subtracting the string in the current position from the string at position [i + 1] (with string[i + 1].slice(string[i].length)), to get the nth string.

Related

Freemarker: Removing Items from One Sequence from Another Sequence

This may be something really simple, but I couldn't figure it out and been trying to find an example online to no avail. I'm basically trying to remove items found in one sequence from another sequence.
Example #1
Items added to the cart is in one sequence; items removed from cart is in another sequence:
<#assign Added_Items_to_Cart = "AAAA,BBBB,CCCC,DDDD,EEEE,FFFF">
<#assign Deleted_Items_from_Cart = "BBBB,DDDD">
The result I'm looking for is: AAAA,CCCC,EEEE,FFFF
Example #2
What if the all items added to and deleted from cart are in the same sequence?
<#assign Cart_Activity = "AAAA,BBBB,BBBB,CCCC,DDDD,EEEE,DDDD,FFFF,Add,Add,Delete,Add,Add,Add,Delete,Add">
The result I'm looking for is the same: AAAA,CCCC,EEEE,FFFF
First things first: You ask about sequence but the data you are dealing with are strings.
I know you are using the string to work as a sequence (and it works), but sequences are sequences and strings are strings, and they have diferente ways of dealing with. I just felt this was important to clarify if someone who is starting to learn how to program get to this answer.
Some assumptions since you're providing strings with data separated by comma:
You want a string with data separated by comma as a result.
You know how to properly create strings with data separated by comma.
You dont have commas in your items names.
Observations:
I'll give you the logic but not the code donne, as this can be a great chance for you to learn/practice freemarker (stackoverflow spirit, you know...)
You question is not about something specific of freemaker (it just happens to be the language you want to work with). Think about adding the logic tag to you question. :-)
Now to the answer on how to do what you want on a "string that is working as a sequence":
Example #1
Change your string to a real sequence :-)
1 - Use a built-in to split your string on commas. Do it for both Added_Items_to_Cart and Deleted_Items_from_Cart. Now you have two real sequences to work with.
2 - Create a new string tha twill be your result .
3 - Iterate over the sequence of added itens.
4 - For each item of the added list, you will check if the deleted list also contains this item.
4.1 - If the deleted list contains the item you do nothing.
4.2 - If the deleted list do not contains the item, you add that item to your string result
At the end of this nested iteration (thats another hint) you should get the result you're looking for.
Example #2
There are many ways of doing it and i'll just share the one that pops out of my mind right now.
I think it's noteworthy that in this approach you will always have an even sized list, as you always insert 2 infos each time: item and action.
So always the first half will be the 'item list' and the second half will be the 'action list'.
1 - Change that string to a sequence (yes, like on the other example).
2 - Get half of its size (in your example size = 16 so half of it is 8)
3 - Iterate over a range from 0 to half-1 (in your example 0 to 7)
4 - At each iteration you'll have a number. Lets call it num (yes I'm very creative):
4.1 - If at the position num + half you have the word "Add" you add the item of position num in your result string
4.2 - If at the position num + half you have the word "Delete" you remove the item of position num from your result string
And for the grand finale, some really usefull links that will help you in your freemarker life forever!!!
All built-ins from freemarker:
https://freemarker.apache.org/docs/ref_builtins.html
All directives from freemarker:
https://freemarker.apache.org/docs/ref_directive_alphaidx.html
Freemarekr cheatsheet :
https://freemarker.apache.org/docs/dgui_template_exp.html#exp_cheatsheet

How to treat numbers inside text strings when vectorizing words?

If I have a text string to be vectorized, how should I handle numbers inside it? Or if I feed a Neural Network with numbers and words, how can I keep the numbers as numbers?
I am planning on making a dictionary of all my words (as suggested here). In this case all strings will become arrays of numbers. How should I handle characters that are numbers? how to output a vector that does not mix the word index with the number character?
Does converting numbers to strings weakens the information i feed the network?
Expanding your discussion with #user1735003 - Lets consider both ways of representing numbers:
Treating it as string and considering it as another word and assign an ID to it when forming a dictionary. Or
Converting the numbers to actual words : '1' becomes 'one', '2' as 'two' and so on.
Does the second one change the context in anyway?. To verify it we can find similarity of two representations using word2vec. The scores will be high if they have similar context.
For example,
1 and one have a similarity score of 0.17, 2 and two have a similarity score of 0.23. They seem to suggest that the context of how they are used is totally different.
By treating the numbers as another word, you are not changing the
context but by doing any other transformation on those numbers, you
can't guarantee its for better. So, its better to leave it untouched and treat it as another word.
Note: Both word-2-vec and glove were trained by treating the numbers as strings (case 1).
The link you provide suggests that everything resulting from a .split(' ') is indexed -- words, but also numbers, possibly smileys, aso. (I would still take care of punctuation marks). Unless you have more prior knowledge about your data or your problem you could start with that.
EDIT
Example literally using your string and their code:
corpus = {'my car number 3'}
dictionary = {}
i = 1
for tweet in corpus:
for word in tweet.split(" "):
if word not in dictionary: dictionary[word] = i
i += 1
print(dictionary)
# {'my': 1, '3': 4, 'car': 2, 'number': 3}
The following paper can be helpful: http://people.csail.mit.edu/mcollins/6864/slides/bikel.pdf
Specifically, page 7.
Before they use an <unknown> tag they try to replace alphanumeric symbol combination with common pattern names tags, such as:
FourDigits (good for years)
I've tried to implement it and it gave great results.

Parse streaming JSON in Objective C

I am using JSON-RPC over TCP, the problem is that I could not find any JSON parse capable of parsing multiple JSON objects correctly, and it would be relatively hard to split it, since there is no delimiter used.
Anyone knows a way how I could handle i.e. this:
{"foo":false, "bar: true, "baz": "cool"}{"ba
Somehow I need to split it so I end up just with the first, complete JSON object. The remaining string needs to stay in buffer until I have enough data to parse it properly.
XBMC's JSON-RPC doc does give a hint:
As such, your client needs to be able to deal with this, eg. by counting and matching curly braces ({}).
Update: As Jody Hagins pointed out, beware of curly braces inside JSON strings when using this approach.
Another possible and probably much better solution would be using a streaming JSON parser like yajl (or its Objective-C wrapper yajl-objc). You can feed the parser with data until it says the current object is done and then restart parsing.
#ePirat, if someone just concatenates multiple JSON dictionaries without delimiters, they should be shot.
For parsing: JSONSerialization parses NSData which could come in any encoding. Fortunately, if you have multiple JSON dictionaries concatenated, they are quite easy to take apart. All you need is looking at the bytes and check for the characters \ " { and }.
If you find a { then increase the counter for "open brackets".
If you find a } then decrease the counter for "open brackets". If the counter is at zero, you've found the end of a dictionary.
If you find a ", then repeatedly look at the next character. If the next character is a " then skip it and go to the normal processing (you've found the end of a string). If the next character is a \ then skip that character and the following character. If the next character is anything else, skip it.
If you reach the end of the data, then your JSON data is incomplete. You could remember which state you were in (count of open brackets, whether you are parsing a string, and if parsing a string whether you just encountered a backlash character) and continue right where you left off.
No need to convert the NSData to a string until you've separated it into dictionaries. If you suspect that you might be given UTF-16 or UTF-32, check whether bytes 0, 1, 2 or 1, 2, 3 are zero (UTF-32), then check whether bytes 0 and 2 or 1 and 3 are zero (UTF-16). But in that case, if the server sends non-standard JSON in UTF-16 or UTF-32, change "the person responsible should be shot" to "the person responsible must be shot".

Why does the rebol interpreter return different results?

Consider:
>> print max 5 6 7 8
6
== 8
The documentation states that max only takes two arguments, so I understand the first line. But from the second line it looks like the interpreter is still able to find the max of an arbitrary number of args.
What's going on here? What is the difference between the two results returned? Is there a way to capture the second one?
I don't really know Rebol but what I do notice is that you're using print inside of th REPL. The first output is from print, which is outputting the result of max 5 6. The second output is from the REPL, which is outputting the value of your whole expression — which is maybe just the last item in the list? If you changed the order of your inputs, I bet you would see a different result.
max is an abbreviation for maximum. As #hobbs correctly guessed, it takes two arguments, and what you're seeing is just the evaluator's logic of turning the crank...and becoming equal to the last value in the expression. In this case you're not using that result, so the interpreter shows it to you with "==". But you could have assigned that whole expression to a variable (for instance).
What you were intending is something that gets the maximum value out of a series. In the DO dialect all functions have fixed arity, and the right way to design such a beast would be to make it take one argument...the series.
Such a thing does exist, though there isn't an abbreviation...
>> print maximum-of [5 6 7 8]
8

Strip first 2 characters and replace with another

This seems like quite a silly basic question but it seems something I have completely passed over in my knowledge.
Basically I have a string representing a phone number (0 being the area code): 0828889988
And would like to replace the first zero with a 27 (South African dialing code) as I am pretty sure it will always be so, my SMS api requires it in international format but I want the user to enter it in local format, so should be: 27828889988
Is there a line or two of code I can call to replace that first character with the two others?
As is - I can think around a workaround solution but as I am not sure of the direct syntax will be quite a few lines long.
Dim number as String = "0828889988"
number = "27" + number.SubString(1)
return number //returns "27828889988"