Why does this search for [help/dont-ask] return irrelevant results in DSE? - sql

Why does this ridiculously simple query on data.stackexchange.com return results that don't have [help/dont-ask] in the comment text? I feel like I'm missing something mind-numbingly obvious here.
select top 10 Id, PostId, Text
from comments
where text like '%[help/dont-ask]%'
Results I currently get:
Id PostId Text
-- ------- -----------------------------------
1 35314 not sure why this is getting downvoted -- it is correct! Double check it in your compiler if you don't believe him!
2 35314 Yeah, I didn't believe it until I created a console app - but good lord! Why would they give you the rope to hang yourself! I hated that about VB.NET - the OrElse and AndAlso keywords!
4 35195 I don't see an accepted answer now, I wonder how that got unaccepted. Incidentally, I would have marked an accepted answer based on the answers available at the time. Also, accepted doesn't mean Best :)
9 47239 Jonathan: Wow! Thank you for all of that, you did an amazing amount of work!
10 45651 It will help if you give some details of which database you are using as techniques vary.
12 47428 One of the things that make a url user-friendly is 'discover-ability', meaning you can take a guess at url's simply from the address bar. http://i.love.pets.com/search/cats+dogs could easily lead to http://i.love.pets.com/search/pug+puppies etc
14 47481 I agree, both CodeRush and RefactorPro are visually impressive (most of which can be turned off BTW), but for navigating and refactoring Resharper is much better in my opinion of using both products.
15 47373 Just wanted to mention that this is an excellent solution if you consider the problem to be linear (i.e. treating `A1B2` as a single number). I still think the problem is multi-dimensional, but I guess we'll just have to wait for the author to clarify :)
16 47497 Indeed, the only way to do this is get the server to generate your CSS file which can be done in many ways depending on which language you are using. HttpHandlers are common in C#. You could use jQuery or the likes to add styling to every element with the class 'ourColur' and parametrise your JS
18 47513 This advice goes against the spirit of CSS, which is separation of content and presentation. You way requires changing HTML for presentation sake, and stating in content which elements have same color.
...none of which contains the magic link (or even the text dont-ask).

Because [] delimits a set of characters to find.
You need to escape them.
Or just use CHARINDEX as the search is unsargable anyway.
WHERE CHARINDEX('[help/dont-ask]', text) > 0

Related

Article buddy text encryption or something else?

I tried out "article buddy" (http://article-buddy.com/sales-page/)
They claim to change any content into original content for the search engines.
Everytime I tested the output of the tool it passed copyscrape and such.
After a while I tried finding out how the tool was fooling the tests because the text did not seem to change in any way. After I pasted the so called original content into "textpad" (text editor). I saw a lot of weird characters especially the question mark that replaced certain characters in the text.
Example (original text):
WestBow Press titles are regularly reviewed by Thomas Nelson & Zondervan for new, talented authors. While there is no guarantee of the number of titles to be signed each year, this is an opportunity to get your foot in the door. - See more at:
Example (output text of article buddy):
W??tB?w Pr??? t?tl?? are regularly r?v??w?d b? Th?m?? N?l??n & Z?nd?rv?n for n?w, t?l?nt?d authors. Wh?l? th?r? ?? n? gu?r?nt?? ?f th? numb?r ?f t?tl?? t? b? ??gn?d each ???r, th?? ?? an opportunity t? g?t ??ur f??t ?n th? door. - S?? m?r? ?t:
Does anyone know what is going on here?
I have seen multiple tools like this one in the past. Typically, the way it works is that it spins the content and replace words or whole sentences.
This used to work back in the days. My suggestion is to not waist your time and money on those. The only acceptable thing these days is original content.

TSearch2 - dots explosion

Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).

Add spaces between words in spaceless string

I'm on OS X, and in objective-c I'm trying to convert
for example,
"Bobateagreenapple"
into
"Bob ate a green apple"
Is there any way to do this efficiently? Would something involving a spell checker work?
EDIT: Just some extra information:
I'm attempting to build something that takes some misformatted text (for example, text copy pasted from old pdfs that end up without spaces, especially from internet archives like JSTOR). Since the misformatted text is probably going to be long... well, I'm just trying to figure out whether this is feasibly possible before I actually attempt to actually write system only to find out it takes 2 hours to fix a paragraph of text.
One possibility, which I will describe this in a non-OS specific manner, is to perform a search through all the possible words that make up the collection of letters.
Basically you chop off the first letter of your letter collection and add it to the current word you are forming. If it makes a word (eg dictionary lookup) then add it to the current sentence. If you manage to use up all the letters in your collection and form words out of all of them, then you have a full sentence. But, you don't have to stop here. Instead, you keep running, and eventually you will produce all possible sentences.
Pseudo-code would look something like this:
FindWords(vector<Sentence> sentences, Sentence s, Word w, Letters l)
{
if (l.empty() and w.empty())
add s to sentences;
return;
if (l.empty())
return;
add first letter from l to w;
if w in dictionary
{
add w to s;
FindWords(sentences, s, empty word, l)
remove w from s
}
FindWords(sentences, s, w, l)
put last letter from w back onto l
}
There are, of course, a number of optimizations you could perform to make it go fast. For instance checking if the word is the stem of any word in the dictionary. But, this is the basic approach that will give you all possible sentences.
Solving this problem is much harder than anything you'll find in a framework. Notice that even in your example, there are other "solutions": "Bob a tea green apple," for one.
A very naive (and not very functional) approach might be to use a spell-checker to try to isolate one "real word" at a time in the string; of course, in this example, that would only work because "Bob" happens to be an English word.
This is not to say that there is no way to accomplish what you want, but the way you phrase this question indicates to me that it might be a lot more complicated than what you're expecting. Maybe someone can give you an acceptable solution, but I bet they'll need to know a lot more about what exactly you're trying to do.
Edit: in response to your edit, it would probably take less effort to run some kind of OCR tool on a PDF and correct its output than it would just to correct what this system might give you, let alone program it
I implemented a solution, the code is avaible on code project:
http://www.codeproject.com/Tips/704003/How-to-add-spaces-between-spaceless-strings
My idea was to prioritize results that use up most of the characters (preferable all of them) then favor the ones with the longest words, because 2,3 or 4 character long words can often come up by chance from leftout characters. Most of the times this provides the correct solution.
To find all possible permutations I used recursion. The code is quite fast even with big dictionaries (tested with 50 000 words).

What's bad about the VB With/End With keyword?

In this question, a user commented to never use the With block in VB. Why?
"Never" is a strong word.
I think it fine as long as you don't abuse it (like nesting)
IMHO - this is better:
With MyCommand.Parameters
.Count = 1
.Item(0).ParameterName = "#baz"
.Item(0).Value = fuz
End With
Than:
MyCommand.Parameters.Count = 1
MyCommand.Parameters.Item(0).ParameterName = "#baz"
MyCommand.Parameters.Item(0).Value = fuz
There is nothing wrong about the With keyword. It's true that it may reduce readibility when nested but the solution is simply don't use nested With.
There may be namespace problems in Delphi, which doesn't enforce a leading dot but that issue simply doesn't exist in VB.NET so the people that are posting rants about Delphi are losing their time in this question.
I think the real reason many people don't like the With keyword is that is not included in C* languages and many programmers automatically think that every feature not included in his/her favourite language is bad.
It's just not helpful compared to other options.
If you really miss it you can create a one or two character alias for your object instead. The alias only takes one line to setup, rather than two for the With block (With + End With lines).
The alias also gives you a quick mouse-over reference for the type of the variable. It provides a hook for the IDE to help you jump back to the top of the block if you want (though if the block is that large you have other problems). It can be passed as an argument to functions. And you can use it to reference an index property.
So we have an alternative that gives more function with less code.
Also see this question:
Why is the with() construct not included in C#, when it is really cool in VB.NET?
The with keyword is only sideswiped in a passing reference here in an hilarious article by the wonderful Verity Stob, but it's worth it for the vitriol: See the paragraph that starts
While we are on identifier confusion. The with keyword...
Worth reading the entire article!
The With keyword also provides another benefit - the object(s) in the With statement only need to be "qualified" once, which can improve performance. Check out the information on MSDN here:
http://msdn.microsoft.com/en-us/library/wc500chb(VS.80).aspx
So by all means, use it.

Proportional font IDE

I would really like to see a proportional font IDE, even if I have to build it myself (perhaps as an extension to Visual Studio). What I basically mean is MS Word style editing of code that sort of looks like the typographical style in The C++ Programming Language book.
I want to set tab stops for my indents and lining up function signatures and rows of assignment statements, which could be specified in points instead of fixed character positions. I would also like bold and italics. Various font sizes and even style sheets would be cool.
Has anyone seen anything like this out there or know the best way to start building one?
I'd still like to see a popular editor or IDE implement elastic tabstops.
Thinking with Style suggests to use your favorite text-manipulation software like Word or Writer. Create your programme code in rich XML and extract the compiler-relevant sections with XSLT. The "Office" software will provide all advanced text-manipulation and formatting features.
i expected you'll get down-modded and picked on for that suggestion, but there's some real sense to the idea.
The main advantage of the traditional 'non-proportional' font requirement in code editors is to ease the burden of performing code formatting.
But with all of the interactive automatic formatting that occurs in modern IDE's, it's really possible that a proportional font could improve the readability of the code (rather than hampering it, as i'm sure many purists would expect).
A character called Roedy Green (famous for his 'how to write unmaintainable code' articles) wrote about a theoretical editor/language, based on Java and called Bali. It didn't include non-proportional fonts exactly, but it did include the idea of having non-uniform font-sizes.
Also, this short Joel Spolsky post posts to a solution, elastic tab stops (as mentioned by another commentor) that would help with the support of non-proportional (and variable sized) fonts.
#Thomas Owens
I don't find code formatted like that easier to read.
That's fine, it is just a personal preference and we can disagree. Format it the way you think is best and I'll respect it. I frequently ask myself 'how should I format this or that thing?' My answer is always to format it to improve readability, which I admit can be subjective.
Regarding your sample, I just like having that nicely aligned column on the right hand side, its sort of a quick "index" into the code on the left. Having said that, I would probably avoid commenting every line like that anyway because the code itself shouldn't need that much explanation. And if it does I tend to write a paragraph above the code.
But consider this example from the original poster. Its easier to spot the comments in the second one in my opinion.
for (size-type i = 0; i<v.size(); i++) { // rehash:
size-type ii = has(v[i].key)%b.size9); // hash
v[i].next = b[ii]; // link
b[ii] = &v[i];
}
for (size-type i = 0; i<v.size(); i++) { // rehash:
size-type ii = has(v[i].key)%b.size9); // hash
v[i].next = b[ii]; // link
b[ii] = &v[i];
}
#Thomas Owens
But do people really line comments up
like that? ... I never try to
line up declarations or comments or
anything, and the only place I've ever
seen that is in textbooks.
Yes people do line up comments and declarations and all sorts of things. Consistently well formatted code is easier to read and code that is easier to read is easier to maintain.
I wonder why nobody actually answers your question, and why the accepted answer doesn't really have anything to do with your question. But anyway...
a proportional font IDE
In Eclipse you can cchoose any font on your system.
set tab stops for my indents
In Eclipse you can configure the automatic indentation, including setting it to "tabs only".
lining up function signatures and rows of assignment statements
In Eclipse, automatic indentation does that.
which could be specified in points instead of fixed character positions.
Sorry, I don't think Eclipse can help you there. But it is open source. ;-)
bold and italics
Eclipse has that.
Various font sizes and even style sheets would be cool
I think Eclipse only uses one font and font-size for each file type (for example Java source file), but you can have different "style sheets" for different file types.
When I last looked at Eclipse (some time ago now!) it allowed you to choose any installed font to work in. Not so sure whether it supported the notion of indenting using tab stops.
It looked cool, but the code was definitely harder to read...
Soeren: That's kind of neat, IMO. But do people really line comments up like that? For my end of line comments, I always use a single space then // or /* or equivalent, depending on language I'm using. I never try to line up declarations or comments or anything, and the only place I've ever seen that is in textbooks.
#Brian Ensink: I don't find code formatted like that easier to read.
int var1 = 1 //Comment
int longerVar = 2 //Comment
int anotherVar = 4 //Command
versus
int var2 = 1 //Comment
int longerVar = 2 //Comment
int anotherVar = 4 //Comment
I find the first lines easier to read than the second lines, personally.
The indentation part of your question is being done today in a real product, though possibly to even a greater level of automation than you imagined, the product I mention is an XSLT IDE, but the same formatting principles would work with most (but not all) conventional code syntaxes.
This really has to be seen in video to get the sense of it all (sorry about the music back-track). There's also a light XML editor spin-off product, XMLQuire, that serves as a technology demonstrator.
The screenshot below shows XML formatted with quite complex formatting rules in this XSLT IDE, where all indentation is performed word-processor style, using the left margin - not space or tab characters.
To emphasise this formatting concept, all characters have been highlighted to show where the left-margin extends to keep indentation. I use the term Virtual Formatting to describe this - it's not like elastic tab stops, because there simply are no tabs, just margin information which is part of the 'paragraph' formatting (RTF codes are used here). The parser reformats continuously, in the same pass as syntax coloring.
A proportional font hasn't been used here, but it could have been quite easily - because the indentation is set in TWIPS. The editing experience is quite compelling because, as you refactor the code (XML in this case), perhaps through drag and drop, or by extending the length of an attribute value, the indentation just re-flows itself to fit - there's no tab-key or 'reformat' button to press.
So, the indentation is there, but the font work is a more complex problem. I've experimented with this, but found that if fonts are re-selected as you type, the horizontal shifting of the code is too distracting - there would need to be a user-initiated 'format fonts' command probably. The product also has Ink/Handwriting technology built-in for annotating code, but I've yet to exploit this in the live release.
Folks are all complaining about comments not lining up.
Seems to me that there's a very simple solution: Define the unit space as the widest character in the font. Now, proportionally space all characters except the space. the space takes up as much room so as to line up the next character where it would be if all preceeding characters on the line were the widest in the font.
ie:
iiii_space_Foo
xxxx_space_Foo
would line up the "Foo", with the space after the "i" being much wider than after the "x".
So call it elastic spaces. rather than tab-stops.
If you're a smart editor, treat comments specially, but that's just gravy
Let me recall arguments about using the 'var' keyword in C#. People hated it, and thought it would make code less clear. For example, you couldn't know the type in something like:
var x = GetResults("Main");
foreach(var y in x)
{
WriteResult(x);
}
Their argument was, that you couln't see if x was an array, an List or any other IEnumerable. Or what the type of y was. In my opinion the unclearity did not arise from using var, but from picking unclear variable names. Why not just type:
var electionResults = GetRegionalElactionResults("Main");
foreach(var result in electionResults)
{
Write(result); // you can see what you're writing!!
}
"But you still cannot see the type of electionResults!" - does it really matter? If you want to change the return type of GetRegionalElectionResults, you can do so. Any IEnumerable will do.
Fast forward to now. People want to align comments en similar code:
int var2 = 1; //The number of days since startup, including the first
int longerVar = 2; //The number of free days per week
int anotherVar = 38; //The number of working hours per week
So without the comment everything is unclear. And if you don't align the values, you cannot seperate them from the variales. But do you? What about this (ignore the bullets please)
int daysSinceStartup = 1; // including first
int freeDaysPerWeek = 2;
int workingHoursPerWeek = 38;
If you need a comment on EVERY LINE, you're doing something wrong. "But you still need to align the VALUES" - do you? what does 38 have to do with 2?
In C# Most code blocks can easily be aligned using only tabs (or acually, multiples of four spaces):
var regionsWithIncrease =
from result in GetRegionalElectionResults()
where result.TotalCount > result > PreviousTotalCount &&
result.PreviousTotalCount > 0 // just new regions
select result.Region;
foreach (var region in regionsWithIncrease)
{
Write(region);
}
You should never use line-to-line comments and you should rarely need to vertically align things. Rarely, not never. So I understand if some of you guys prefer a monospaced font. I prefer the readibility of font Noto Sans or Source Sans Pro. These fonts are available freely from Google, and resemble Calibri, but are designed for programming and thus have all the neccesary characteristics:
Big : ; . , so you can clearly see the difference
Clearly distinct 0Oo and distinct Il|
The major problem with proportional fonts is they destroy the vertical alignment of the code and this is a fairly major loss when it comes to writing code.
The vertical alignment makes it possible to manipulate rectangular blocks of code that span multiple lines by allowing block operations like cut, copy, paste, delete and indent, unindent etc to be easily performed.
As an example consider this snippet of code:
a1 = a111;
B2 = aaaa;
c3 = AAAA;
w4 = wwWW;
W4 = WWWW;
In a mono-spaced font the = and the ; all line up.
Now if this text is loded into Word and display using a proportional font the text effectively turns into this:
NOTE: Extra white space added to show how the = and ; no longer line up:
a1 = a1 1 1;
B2 = aaaa;
c3 = A A A A;
w4 = w w W W;
W4 = W W W W;
With the vertical alignment gone those nice blocks of code effectively disappear.
Also because the cursor is no longer guaranteed to move vertically (i.e. the column number is not always constant from one line to the next) it makes it more difficult to write throw away macro scripts designed to manipulated similar looking lines.