Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).
Related
When trying to filter by tag, there is a small popup:
I have been looking for logfmt around, but all I can find is key=value format.
My questions are:
Is there a way for something more sophisticated? (starts_with, not equal, contains, etc)
I am trying to filter by url using http.url="http://example.com?bla=bla&foo=bar". I am pretty sure the value exists because I am copy/pasting from my trace. I am getting no results. Do I need to escape characters or do something else for this to work?
I did some research around logfmt as well. Based on the documentation of the original implementation and in the Python implementation of the parser (and respective tests), I would say that it doesn't support anything more sophisticated (like starts_with, not equal, contains). And this is because the output of the parser is a simple dictionary (with no regex involved in the values).
As for the second question, using the same mentioned Python parser, I was able to double-check that your filter looks fine:
from logfmt import parse_line
parse_line('http.url="http://example.com?bla=bla&foo=bar"')
Output:
{'http.url': 'http://example.com?bla=bla&foo=bar'}
This makes me suspect of an issue on the Jaeger side, but this is as far as I could go.
I've implemented auto completion to a combobox like this article shows. Is it possible to make it search for substrings instead of just the beginning of the words?
http://www.codeproject.com/Articles/2371/IAutoComplete-and-custom-IEnumString-implementatio
I haven't found any way to customize how IEnumString/IAutoComplete compares the strings. Is it possible?
The built in search options help a bit but it is complete chaos. To find instring matches you need to set flag AcoWordFilter. But this will prevent from numbers being matched!! However, there is a trick to get the numbers to match: preced with a double-quote as in "3 to find a string containing or starting with "3". Some more chaos? In the AcoWordFilter you also need to prefix other characters not considered part of a "word", eg. you need to prefix parentheses with a " but then you will not find parentheses at the first position!
So the solution is either to create your own implementation of IAutoComplete or offer the user to switch between the modes (a bit awkward).
I dont think that the MS engineers are especially proud of such chaos. How about one more option: AcoSearchAnwhere?
After retrieving the Edit control's IAutoComplete interface, query it for an IAutoComplete2 interface. Calling its SetOptions member you can disable prefix filtering by specifying the ACO_NOPREFIXFILTERING AUTOCOMPLETEOPTIONS.
This is available on Windows Vista and later. If you need a solution that works with pre-Vista versions, you'll have to write your own.
This probably sounds silly, after all I could generate the file listing via PHP, right?
But I am becoming more and more fascinated with what all can be accomplished with just Apache and JQuery alone. I've been reading documentation and it seems like things are SO VERY close, but I am obviously missing a few things.
First, can I set a directory listing to a "path" or file name,
overwriting the default, "index.html"? In particular, I am trying to
configure any request ending in "ndx.mnu" to return the directory
listing:
"DirectoryIndex ndx.mnu"
...does not accomplish that. Anyideas?
Second, does anyone know of a way to impose a numerical sort similar
to the way in which VersionSort works for files? Right now:
"foo-1, foo-2"
sorts correctly but what if I want to force:
"foo-1, bar-2"
to be order returned?
Trying to make something with as few moving parts as possible. Any pointers to read up would be appreciated.
Well for the second part, you want to sort by the number rather than the letters correct? You should be able to read the string backwards and sort from end to beginning. Using strrev() to reverse it, you can write a sorting algorithm to do that.
Or if all the file use the '-#' notation then $num = explode('-', $string); and sort by $num[1] (which should be the number on the end) though if some file names contain multiple '-' you could use regular expressions.
I have an email template where the user can enter text like this:
Hello {first_name}, how are you?
and when the email actually gets sent it replaces the placeholder text {first_name} with the actual value.
There will be several of these placeholders, though, and I wasn't sure that gsub is meant to be used like this.
body = #email.body.gsub("{first_name}", #person.first_name)gsub("{last_name}", #person.last_name).gsub("",...).gsub("",...).gsub("",...).etc...
Is there a cleaner solution to achieving this functionality? Also, if anyone's done something similar to this, did they find that they eventually hit a point where using multiple gsubs on a few paragraphs for hundreds of emails was just too slow?
EDIT
I ran some tests comparing multiple gsubs vs using regex and it came out that the regex was usually 3x FASTER than using multiple gsubs. However, I think the regex code is a littler harder to read as-is, so I'm going to have to clean it up a bit but it does indeed seem that using regex is significantly faster than multiple gsubs. Since my use case will involve multiple substitutions for a large number of documents, the faster solution is better for me, even though I'll have to add a little more documentation.
You have to put in regular expressions all strings you want to catch and in the hash you put the replacement of all catches:
"123456789".gsub /(123|456)/, "123" => "ABC",
"456" => "DEF"
This code only works for ruby 1.9.
If you can use a template library like erb or haml, they are the proper tool for this kind of task.
I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.