SQL 2008 full text word break pointers - sql

If I do a full text search on SQL 2008, can I get a pointer ( File , or Database) so that I don't have to load the 100MB memo field to by Business Object and do search again ?

It does not appear that SQL Server 2008 supports the retrieval of offset pointers to the found keywords within the memo field.
Full text search does not search the memo fields, but searches an index that specifies which keywords are in which documents. The information about where these words appear within each document does not seem to be available in the full text search index.
Microsoft offers a type of query called sys.dm_fts_index_keywords_by_document. With it, you can enable the following use cases:
“I want to know how many keywords the full-text index contains”
“I want to know if a keyword is part of a given doc/row”
“I want to know how many times a keyword appears in the whole full-text index” (sum(occurrence_Count) where keyword=……)
“I want to know how many times a keyword appears in a given doc/row”
“I want to know how many keywords a given doc/row contains”
“I want to retrieve all the keywords belonging to a given doc/row”
However, scenarios not covered in this release:
“I want to know the offset (word or byte) of a given keyword in a given doc/row”
“I want to know the distance (in words) between two keywords per a given doc/row”
Sources:
http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506233
http://msdn.microsoft.com/en-us/library/cc280607.aspx

Related

Sql Server search entire Json document for value

I have a few thousand rows in my table (SQL Server 2016).
One of the columns stores JSON documents (NVARCHAR(max)).
The JSON documents are quite complex in therms of nesting etc.. also they can be very different one to another.
My goal is to search each document for a certain match. Say: "MagicNo":"999000".
So if the document has a property "MagicNo" and if the value is 999000 then it's a match.
I know you can navigate through the document using the
JSON_VALUE $.
followed by the path, but since those docs can be very different the "MagicNo" property may appear pretty much everywhere in the document (a lot nesting). So xpathing is out of question here.
Is there some kind of wild card I could use with JSON_VALUE to say search the entire doc and return it if the match is found?
The simple
like '%999000%'
and
CONTAINS
searches on the VARCHAR column are out of question here due to the poor performance.
Any thoughts?
Thanks.

SQL Server - Creating a "Search library" of terms to use in a query

Firstly I apologise in advance if this question is a bit bare bones or has misleading/confusing terminology but I'm not sure how else to phrase it.
I have a few tables which capture the language of interactions based on a few different factors. What I would like to do is set up a sort of temporary library of language based terms that I can reference in a query so that I can search the various tables and find matches against the terms stored in the library.
I'll try and give an example:
The library might consist of the following terms:
English, German, French, Italian, Spanish
I then want to search these tables:
teacherSpokenLanguages, courseLanguages, studentLanguages
And find all the rows that contain the search terms in any particular field (and specify which field that term is being found).
I hope there's enough information to piece together my request. Is this even remotely possible? Could I create a temporary table to contain these values perhaps? I can't do anything permanent on the database, it all has to be housed within this one query and has to be non-destructive.

How do i include other fields in a lucene search?

Lets use emails for an example as a document. You have your subject, body, the person who its from and lets say we can also tag them (as gmail does)
From my understanding of QueryParser i give it ONE field and the parser type. If a user enter text the user only searches whatever i set. I notice it will look in the subject or body field if i wrote fieldName: text to search however how do i make a regular query such as "funny SO question unicorn" find result(s) with some of those strings in the subject, the others in the body? ATM because i knew it would be easy i made a field called ALL and combined all the other fields into that but i would like to know how i can do it in a proper way. Especially since my next app is text search dependent
Use MultiFieldQueryParser. You can specify list of fields to be searched using following constructor.
MultiFieldQueryParser(Version matchVersion, String[] fields, Analyzer analyzer)
This will generate a query as if you have created multiple queries on different fields. This partially addresses your problem. This, still, will not match one term matching in field1 and another matching in field2. For this, as you have rightly pointed out, you will need to combine all the fields in one single field and search in that field. Nevertheless, you will find MultiFieldQueryParser useful when query terms do not cross the field boundaries.

How to handle a "keyword search" via Stored Procedure?

I'm creating a self-help FAQ type application and one of the requirements is that the end user has to be able to search for FAQ topics. I have three models of note, listed below with their relevant (i.e. searchable) columns:
Topic: Name, Description
Question: Name, Answer
Problem: Name, Solution
All three tables are linked to Topic via a TopicID column. The idea is to provide a single textbox where the user can enter a search query, something either as a sentence (e.g. "How do I perform X") or a phrase (e.g. "Performing X" or "Perform X"), and provide all Topics/Questions/Problems that have any of the words they entered in either the name or description/answer/solution fields; the model will only ever have those columns searchable and I don't have to worry about filtering out the common words like "How" and such (It would be nice but isn't a requirement as it's not an exact match but a fuzzy match).
For reasons outside of my control, I have to use a Stored Procedure. My question is what would be the most appropriate way to handle a search like this; I've seen similar questions regarding multiple columns but really there is not a variable number of columns, there are always two columns per table that are actually searchable. The issue is that the search query could, in theory, be nearly anything - a sentence, a phrase, a comma-separated list of terms (e.g. "x,y,z"), so I would have to split the search term into components (e.g. split on whitespace) and then search each pair of columns for every term? Is that reasonably easy to do in SQL Server? The alternative, a little messier, is to just pull all the data back and then split the query and filter the results in the server-side code as there shouldn't ever be that many items entered, but I would feel a little dirty doing something like that ;-)
Suggest creating a new Full Text Catalog, and assign the table and columns to that catalog. Ensure your catalog is being updated at the right frequency for your needs.
You can then query this catalog using the FREETEXT predicate. It sounds like you need to match on those suffixes like 'ing', so suggest FREETEXT over CONTAINS in this case.
You can use a variable in this search, so it'll be easy to fit into a stored proc.
declare #token varchar(256);
select #token = 'perform';
select * from Problem
where freetext(Name, #token)
or freetext(Solution, #token);
--this will match 'perform' and 'performing'

How would you reproduce a tagging system like the one StackOverflow uses?

I am trying to produce a tagging system for a recruitment agency model and love the way SO separates tags and searches for the remaining phrases.
How would you compare the tags in a table to the search query etc...
I have come up with the following but it has some hickups...
User enters search query
Full text SQL contains() search on tbl_tags
Returns 5 results
Check if each "exact tag phrase" exists in original query string.
If it does exist then add tagID to array.
Remove tag names from original search string...
Search in tbl_people for people with linked TagIDs and search text fields with remaining text.
Example search : French Project Managers with Oracle experience
Tags : [French] [Project Manager]s with [Oracle] experience
Remaining text : s with experience
Now the problem comes when I search for Project Managers it leaves me with a surplus "s"... and there are probably other bugs with this logic too that I cannot account for...
Any ideas on how to make the logic perfect?
Thanks in advance, I understand this might be a bit of an abstract question...
You're missing a key ingredient of how StackOverflow does its search. SO requires that the user delineate the tags in the search string by explicitly putting brackets around the tags. The (probably simplified) logic would then be.
Extract marked tags using regex to detect contents inside brackets
Using list of most common tags, scan string for unmarked tags and extract them.
Remove tag meta characters
Perform full-text search, filtered by tags