Search for ==, ++, etc - operators

Google did not give me the results desired when querying for
python == and in or in
This is seemingly because google omits common programming language operators like == from search terms. Is there a way to get around this limitation?
Previous answers recommended google code search, yet that is shut down. Symbolhound.com was recommended, yet returned an empty results page.
This is not about searching for code, but instead searching for programming-related documentation (most often found on SO ;-), so are there any ways to search the web (or, for that matter, SO) for programming syntax?

There's an answer on the WebApps Stack I'll paraphrase here for posterity.
Write out the name of each symbol followed by "operator". Include the programming language if you want to be more specific.
Example Recommended Search
---------- -----------
+= (addition assignment) "plus equals operator"
% (modulus) "percent operator"
^ (C++ bitwise XOR) "caret operator C++"

Related

Ultimate complete list of native-internal VBA commands

After discovering (see here and here) that:
"VBA.Len" is not equivalent to "Len"
"VBA.LenB" is not equivalent to "LenB"
"VBA.Mid" is not equivalent to "Mid"
"VBA.Left$" is equivalent to "Left$"
and other confusing things like that:
"Left" and "InStr" are in this official list of keywords while it's not in this other official list of keywords
"InStrRev" and "LenB" don't appear in any official keywords lists while "InStr" and "Len" do appear in one or both lists
I'm left very confused about when to use "VBA." and when not.
Is there a way to obtain a real complete list of the native-internal VBA commands, more reliable than the official documentation?
I mean something like an "object browser" (where I can see the real complete list of the commands in the "VBA." library) for native-internal VBA commands?
The best resource is probably VBE's help and similar on-line, though it's not complete, and doesn't go into the details your raise. Also it doesn't document 'hidden' methods which were once intended to be deprecated but never were, some of which are very useful! Some of your questions were well answered in those other threads, but more generally about your points -
Qualifying with VBA is more about 'where' the method is sourced, and then whether or not there is any difference in the actual method, which in most cases there isn't apart from sourcing. Most of those strings functions are both in the Strings module and in the _HiddenInterface Interface (which is faster).
Left$ though is only in the Strings module, as are similar $ functions, and why you don't notice any difference in performance. Str is both in the _HiddenInterface and Conversion module.
If you're sure the unqualified versions do what you want you might think best to not to qualify, and generally that's fine. But if you ever end up with a project with a MISSING reference, unqualified Strings and DateTime functions will blow before your code has a chance to be alerted. Depending on how deployed or if not sure I tend to fully qualify, eg VBA.Strings.Left$, VBA.DateTime.Date. FWIW as a small bonus you'll get the intellisense.

Download OEIS sequences with known algorithm to produce them

I was reading some interesting questions about the topic "Can we make a program that, given a particular sequence, produces the next terms", like this one, and I really like the detailed answer of this one. I understand that the answer is "That's impossible without more restrictions", and that given some restrictions (polynomials, rational function or boolean map) we know some good algorithms, as the second answer I linked explains.
Now, a natural question is how much can we solve, trying our best even if we can't always solve it, to answer the original, general question. What I usually do when facing a hard sequence is trying to see if it's in OEIS, and if it seems to be there, seeing if there is any formula or algorithm to produce it in there. You can download a small version of OEIS with the first terms of each sequence, and you can make queries to find formulas or maple algorithms for a particular sequence. My question is, do you think it's feasible to download a small version of OEIS that includes, with the first terms, a little algorithm to produce it?
The natural problem here is that I haven't seen any link to download the entire database of OEIS with all the details, which maybe deserves its own question. Even if we had this, you need to read the formulas/algorithms (that can be written in different languages, from what I've seen) and interpret them correctly. But I thought maybe someone here knows how to solve this, in any case thanks in advance.
You could, as you note, download the sequences and their A-numbers from the link mentioned here: https://oeis.org/wiki/Welcome#Compressed_Versions
After searching that and finding one sequence (or a small number of sequences) of interest, you could scrape the respective page(s) for formulas. There are specific fields for Maple and Mathematica, which may be helpful, and otherwise, an entry in the PROGRAM field should include identifying information when it is not one of the standard languages with its own field in the database. See: http://oeis.org/wiki/Style_Sheet
Unofficially, but with the interests of the OEIS in mind, I would not recommend trying to download or scrape the OEIS in its entirety. Whether it's one person, or a whole host of people, we would certainly recommend using the compressed version of the database to identify sequences of interest by A-number first, then pulling their entire entry by scraping the site or querying the OEIS using methods that you have already mentioned: Programmatic access to On-Line Encyclopedia of Integer Sequences
If this sounds laborious, perhaps an alternative is the Wolfram Cloud, which actives this through other means. For example, you can navigate to the cloud (you may have to register just to get access) at: https://www.wolframcloud.com/
Typing in something like FindSequenceFunction[{1, 2, 3, 5, 17, 305, 34865}] will give you a formula, if Wolfram/Mathematica can find one. The documentation for FindSequenceFunction can be found here: https://reference.wolfram.com/language/ref/FindSequenceFunction.html
Wolfram/Mathematica can also invoke the OEIS using packages like the one described here: https://mathematica.stackexchange.com/questions/40/is-it-possible-to-invoke-the-oeis-from-mathematica

lucene / solr remove common phrases (stop phrases)

I would like to eliminate from the search query the words/phrases that bring no meaning to the query (we could call them stop phrases). Example:
"How to .."
"Where can I find .."
"What is the meaning of .."
etc.
Where to find / how to compute a list of 'common phrases' for English and for French?
How to implement it in Solr (Is there anything more advanced than the stopwords feature?)
I think that you shouldn't try to completely get rid of these phrases, because they reveal intent of the searcher. You can try to leverage the existence of them by using a natural language question answering system like Ephyra. There is even a project aimed at integration of it with Lucene. I haven't used it myself, but maybe at least evaluating it is
worth a try.
If you are determined to remove them, then I think that you need to write custom QueryParser that will filter the query, delegating the further processing to a parser of your choice.

Code related web searches

Is there a way to search the web which does NOT remove punctuation? For example, I want to search for window.window->window (Yes, I actually do, this is a structure in mozilla plugins). I figure that this HAS to be a fairly rare string.
Unfortunately, Google, Bing, AltaVista, Yahoo, and Excite all strip the punctuation and just show anything with the word "window" in it. And according to Google, on their site, at least, there is NO WAY AROUND IT.
In general, searching for chunks of code must be hard for this reason... anyone have any hints?
google codesearch ("window.window->window" but it doesn't seem to get any relevant result out of this request)
There is similar tools all over the internet like codase or koders but I'm not sure they let you search exactly this string. Anyway they might be useful to you so I think they're worth mentioning.
edit: It is very unlikely you'll find a general purpose search engine which will allow you to search for something like "window.window->window" because most search engines will do some processing on the document before storing it. For instance they might represent it internally as vectors of words (a vector space model) and use that to do the search, not the actual original string. And creating such a vector involves first cutting the document according to punctuation and other critters. This is a very complex and interesting subject which I can't tell you much more about. My bad memory did a pretty good job since I studied it at school!
BTW they might do the same kind of processing on your query too. You might want to read about tf-idf which is probably light years from what google and his friends are doing but can give you a hint about what happens to your query.
There is no way to do that, by itself in the main Google engine, as you discovered -- however, if you are looking for information about Mozilla then the best bet would be to structure your query something more like this:
"window.window->window" +Mozilla
OR +XUL
+ Another search string related to what you are
trying to do.
SymbolHound is a web search that does not remove punctuation from the queries. There is an option to search source code repositories (like the now-discontinued Google Code Search), but it also has the option to search the Internet for special characters. (primarily programming-related sites such as StackOverflow).
try it here: http://www.symbolhound.com
-Tom (co-founder)

What are the things should we consider while writing a Spell Checker?

I want to write a very simple Spell Checker. The spell checker will try to match the input word with equivalent words form the dictionary.
What can be done to find those 'equivalent words'? What analysis can be preformed on two words to mark them equivalent?
Before investing too much trying to unravel that i'd first look to already existing implementations like Aspell or netspell for two main reasons
Not much point in re-inventing the wheel. Spell checking is much trickier than it first appears and it makes sense to build on work that has already been done
If your interest is finding out how to do it, the source code and community will be a great benefit should you decide to implement your own anyway
Much depends on your use case. For example:
Is your dictionary very small (about twenty words)? In this case it probably is better to precompute all possible nearby mistaken words and use a table/hash lookup.
What is your error model? Aspell has at least two (one for spelling errors caused by nearby letters on the keyboard, and the other for spelling errors caused by the way a word sounds).
How dynamic is your dictionary? Can you afford to do a massive preparation in order to get an efficient retrieval?
You may need a "word equivalence" measure like Double Metaphone, in addition to edit distance.
You can get some feel by reading Peter Norvig's great description of spelling correction.
And, of course, whenever possible, steal code. Do not reinvent the wheel without a reason - a reason could be a very special domain, a special way your users make spelling mistakes, or just to learn how it's done.
Edit Distance is the theory you need to write a spell checker. You also need a dictionary. Most UNIX systems come with a dictionary already installed for your locale.
I just finished implementing a spell checker and used a combination of the following in getting a list of "suggested" words
Phonetic hashing of the "misspelled" word to lookup a hash of identical dictionary hashed real words (for java check out Apache Commons Codec for a suitable library). The phonetic hash of your dictionary file can be precomputed.
Edit distance between the input and the potentials (this is reasonably expensive so you need to reduce the list first with something like a phonetic hash, assuming a higher volume load - in my case, a server based spell check)
A known list of common misspellings, e.g. recieve vs. receive.
An ordered list of the most common words in the english language
Essentially I weighted each potential word primarily based on edit-distance and commonality. e.g. if word probability is a percentage, then
weight = edit-distance * 100 / probability
(lower weights are better)
But then I also also override any result with the known common misspellings (i.e. these always float to the top suggested result).
There may be better ways, but this worked pretty well.
You may also wish to ignore ALL CAPS words, initials etc, so choosing what to ignore is also something to think about.
Under linux/unix you have ispell. Why reinventing the whell.