Google Custom Search automatic spell checking - spell-checking

We're having a problem with the automatic spell checking on queries in the XML results of the Google Custom Search.
Queries which are spelled incorrectly return results with the correct spelling e.g. socer becomes soccer and returns the correct results. On Google.com there is the option to then search for results on the original query using nfpr=1 in the query string. However this doesn't work in the Google Custom search, and I've been unable to find any other way to search for the incorrect spelling.

For a standard google search this behavior can be avoided by adding the argument &nfpr=1 to the query url.
For a custom search based on the AJAX API, this unfortunately isn't possible. The only way I've found is to use javascript to parse the user's query, then use a regular expression to put quotes around each single word that is not yet quoted. So for example, if the keywords received are
"bmw z4" manual
you would change that to
"bmw z4" "manual"
which has the same effect, except that it disables the auto-correction. Unfortunately if you want to deal with all the special cases of advanced logical syntax (AND, OR, |, -, etc.), your regexp gets a bit complex.
Myself, I just parse the response from Google to see if this is happening, and if so notify the user how to prevent it (by putting quotes around the offending word(s)).

Related

Bing Spell Checker is not working as expected

https://api.bing.microsoft.com/v7.0/spellcheck?text=mus&mkt=en-GB&mode=spell
Using the above URL and given my supplied API key within the request headers - I am getting no results back from the Bing Spell Checker API using PostMan or from Refit in C#(Xamarin Forms).
For 'mus' I'm expecting 'must', for 'tst' I'm expecting 'test' but, neither seem to work.
I've set the mode to 'spell' which should be the case however, even in 'proof' mode it doesn't return any results.
Please can somebody let me know why?
Thanks
The documentation explains:
Bing Spell Check API lets you perform contextual grammar and spell checking on a text string. While most spell-checkers rely on dictionary-based rule sets, the Bing spell-checker leverages machine learning and statistical machine translation to provide accurate and contextual corrections.
I assume that passing single words in most cases won’t provide enough information for this approach.
Try passing phrases that put your words in some reasonable context like mst have or quality tst.
For single word suggestions you can try a dictionary based service or software package.

regular expression does nothing in import.io

I'm trying to figure out how to use regular expressions on import.io. I have an HTML column that successfully pulls data from a link on the web page. I want to extract just part of the querystring on the link, so I go to the regexp field and enter a regular expression that tests successfully on regex101.com. The problem is, the extracted data does not change at all. In fact, I can type complete gibberish in the regexp field and it has absolutely no effect on the extracted data. I'm a bit mystified.
If my regular expression is wrong, shouldn't the extracted data change to nothing? Is there some trick to using the regexp field? Do I have to enter something in the xpath field? I clicked on View JSON button and copied the xpath for this column there and pasted that into the manual xpath box, but that didn't change anything either.
Is there a tutorial somewhere for how to use the regexp field? And I'm not asking about how to use regular expressions, just the interface for it on import.io.
Grant,
You are correct. At the moment it is not possible to apply regexp to HTML columns. There is a post in the idea forum capturing this as a feature request, you may want to upvote it, this way you'd also be notified if the idea gets built:
http://support.import.io/forums/199278-ideas-forum/suggestions/6328279-apply-regular-expressions-to-html

how to correct spelling mistakes in Google custom API

I am using Google's custom search API, I make an HTTP request to a URL that looks like this:
https://www.googleapis.com/customsearch/v1?key=<my-key>&cref=&num=10&q=how+can+i+do+htis
if you search for "how can i do htis" on Google you are told "Showing results for how can i do this", and give you some results (call them result set A)
but if you use the API to search for the misspelled string, you get different results than those of A... Searching with a correctly spelled string gives you result A, which matches the ordinary search service on Google
Is there a way to search directly using the suggested string? I want to use the API I can't afford implementing a spell checker myself that can also correct people names and everything
I think what you want to do is possible using the spelling suggestions of Google. This is part of the xml-results returned by your query.
See API here.

"Anti-XSS protection" by adding )]}' before ajax response

Google plus returns ajax requests with )]}' on first line. I heard it is protection against XSS. Are there any examples what and how could anyone do with this without that protection ?
Here's my best guess as to what's happening here.
First off, there are other aspects of the google json format that aren't quite valid json. So, in addition to any protection purposes, they may be using this specific string to signal that the rest of the file is in google-json format and needs to be interpreted accordingly.
Using this convention also means that the data feed wont execute from a call from a script tag, nor by interpreting the javascript directly from an eval(). This ensures front end developers are passing the content through a parser, which will keep any implanted code from executing.
So to answer your question, there are two plausible attacks that this prevents, one cross-site through a script tag, but the more interesting on is within-site. Both attacks assume that:
a bug exists in how user data is escaped and
it is exploited in a way that allows an attacker to inject code into one of the data feeds.
As a simple example, lets say a user figured out how to take a string like example
["example"]
and changed it to "];alert('example');
[""];alert('example');"]
Now if when that data shows up in another user's feed, the attacker can execute arbitrary code in the user's browser. Since it's within site, cookies are being sent to the server and the attacker could automate things like sharing posts or messaging people from the user's account.
In the Google scenario, these attacks won't work for a number of reasons. The first 5 characters will cause a javascript error before the attack code is run. Plus, since developers are forced to parse the code instead of accidentally running it through an eval, this practice will prevent code from being executed anyway.
As others said, it's a protection against Cross Site Script Inclusion (XSSI)
We explained this on Gruyere as:
Third, you should make sure that the script is not executable. The
standard way of doing this is to append some non-executable prefix to
it, like ])}while(1);. A script running in the same domain can
read the contents of the response and strip out the prefix, but
scripts running in other domains can't.

Programmatic access to On-Line Encyclopedia of Integer Sequences

Is there a way to search and retrieve the results from On-Line Encyclopedia of Integer Sequences (http://oeis.org) programmatically?
I have searched their site and the results are always returned in html. They do not seem to provide an API but in the policy statement they say its acceptable to access the database programmatically. But how to do it without screen scraping?
Thanks a lot for your help.
The OEIS now provides several points of access, not just ones using their internal format. These seem largely undocumented, so here are all of the endpoints that I have found:
https://oeis.org/search?fmt=json&q=<sequenceTerm>&start=<itemToStartAt>
Returns a JSON formatted response of the results found from the sequenceTerm given. If too many results were returned, count will be > 0 whilst results will be null. If no results were returned, count will be 0. itemToStartAt is used for pagination of results, as only a maximum of 10 are ever returned. This starts at 0. If you wanted to return a second page of results, this would equal 10. Information about what each of the entries means can be found here.
https://oeis.org/search?fmt=text&q=<sequenceTerm>&start=<itemToStartAt>
Exactly the same arguments as before, however this returns it in the OEIS internal format. Which is largely written about here. Unless your project requires it, I'd highly recommend using the JSON format over this.
https://oeis.org/search?fmt=<json|text>&q=id:A<sequenceNumber>
Will return a single result if the sequenceNumber is found. This is the suggested method for obtaining single sequences, as it appears to be far more optimised than some of the alternative methods that can be used as queries. Requests often take under a second. Alternative search query methods can be found on this page.
https://oeis.org/A<sequenceNumber>/graph?png=1
This endpoint can be used to grab the images used to graph the data points. Alternatively, setting png to equal to zero returns the HTML page containing a graph of it.
https://oeis.org/recent.txt
This returns a list of recently updated entries in the OEIS internal format. There are no parameters available, or JSON format, as this seems like a static text file that is simply being served to the client. Due to the length of replies from the OEIS database (for some sequences replies can take above five seconds), I'd highly recommend heavily caching requests and using the above endpoint to update them when they change.
A URL of the form http://oeis.org/search?fmt=text&q=2,5,14,50,233 gives a nicely formatted text output.
But it seems there is no way to get a single sequence in text form.
If you happen to use Mathematica, it sounds like the following notebook might help. It allows you to specify a sequence and automatically import a detailed list of matching entries from the OEIS:
http://www.brotherstechnology.com/math/oeis_mathematica.html
It looks like direct use of their CGI program is the only API they provide.
URL for Searching the Database
https://oeis.org/search?q=id:A000032&fmt=text
gives the plain text form of an entry in their internal format
https://oeis.org/eishelp1.html