SQL Azure - Substring Searches?

SQL Azure - Substring Searches? - sql

SQL Azure does not support SQL Server's Full Text Search feature.
Does this mean a text field cannot be indexed to handle substring searches?
For example, if I have a table Emails, with a Message column
And I want to find all messages with both the words 'hello' and 'thanks' in them, will the standard index on the message collumn allow me to do this?
CREATE TABLE Emails (
[Id] bigint NOT NULL,
[Message] nvarchar({some number}) NOT NULL
);
GO
CREATE NONCLUSTERED INDEX Messages_Emails ON Emails
my query (using entity) would look like
var niceMessageQuery = Context.Emails.Where(e => e.Message.Contains("hello") && e.Message.Contains("thanks"));
Is there a better way to setup this query?

As of April 2015 Full-Text Search is available in Azure Sql Database server version V12: http://azure.microsoft.com/blog/2015/04/30/full-text-search-is-now-available-for-preview-in-azure-sql-database/.

Not familiar at all with azure, but is it possible to use a subquery? The inner (sub) query find all records that contain "hello" and the outer query uses that inner query as it's dataset to search for "thanks"?

Related

Fulltext search numbers does not work in SQL Server 2012

Fulltext search numbers does not work in SQL Server 2012.
I tried to create an empty_stoplist and repopulate the index. Can anyone tell me what I am doing wrong here?
CREATE FULLTEXT CATALOG Orders_FTS
WITH ACCENT_SENSITIVITY = OFF;
GO
CREATE FULLTEXT INDEX ON dbo.Orders
(
a Language 1031,
b Language 1031,
c Language 1031,
d Language 1031
)
KEY INDEX [PK_Orders]
ON Orders_FTS;
GO
CREATE FULLTEXT STOPLIST EMPTY_STOPLIST;
ALTER FULLTEXT STOPLIST empty_stoplist DROP ALL;
ALTER FULLTEXT INDEX ON Orders SET STOPLIST EMPTY_STOPLIST;
ALTER FULLTEXT INDEX ON Orders SET STOPLIST = OFF;
ALTER FULLTEXT INDEX ON Orders START UPDATE POPULATION;
The SQL query:
SELECT
T.*, R.RANK
FROM
Orders As T
INNER JOIN
CONTAINSTABLE(Orders, *, '"*007440147*"') AS R On T.ID = R.[KEY]
ORDER BY
RANK DESC, ID DESC

The problem is that leading wildcards (ex: *bcde) are not supported by SQL Server. (More here.) The query will execute without any errors but will always return 0 results. You can only use wildcards in the middle of a string (ex: ab*de) or the end of a string (ex: abcd*).
Usually this can be worked around by creating columns that contain the reverse string and searching on those columns (ex: Column1 = abcde, Column1Reverse = edcba, query has CONTAINS(Column1Reverse, '"edcb*"')).
However in your case you want to use a wildcard at the beginning and end of the string. I think your options are limited to:
If you don't need a leading wildcard, then don't use it. For example, if the text you are trying to match is 007440147xxx then using 007440147* in your query will work fine.
Use LIKE instead of CONTAINSTABLE, for example: SELECT * FROM Orders WHERE Column1 LIKE '%007440147%'. The downside to this approach is that you won't get a rank value and queries may take a long time to execute. (Then again, even if you could use a leading wildcard in full text searches, they would be slow.)
Redesign how the data is stored and queried. I can't offer any suggestions without understanding what these numbers mean and how they need to be queried.
Consider using another search product. I believe Lucene can perform leading wildcard searches but such searches tend to be slow.

Postgres full text search: how to search multiple words in multiple fields?

i'm using for the first time Postgresql and i'm trying to create a search engine in my website. i have this table:
CREATE TABLE shop (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
address TEXT NOT NULL,
city TEXT NOT NULL
);
Then i created an index for every field of the table (is this the right way? Or maybe i can create one index for all fields?):
CREATE INDEX shop_name_fts ON shop USING gin(to_tsvector('italian', name));
CREATE INDEX shop_desc_fts ON shop USING gin(to_tsvector('italian', description));
CREATE INDEX shop_addr_fts ON shop USING gin(to_tsvector('italian', address));
CREATE INDEX shop_city_fts ON shop USING gin(to_tsvector('italian', city));
Now, what is the SQL query if i want to search one word in every index?
I tried this and it works:
SELECT id FROM shop WHERE to_tsvector(name) ## to_tsquery('$word') OR
to_tsvector(description) ## to_tsquery('$word') OR
to_tsvector(address) ## to_tsquery('$word') OR
to_tsvector(city) ## to_tsquery('$word')
Does exist a better way to do the same?
Can i search to_tsquery into multiple to_tsvector?
A friend of mine suggests a solution, but it is for MySQL database:
SELECT * FROM shop WHERE MATCH(name, description, address, city) AGAINST('$word')
What is the solution for Postgresql?
In addition, can i search multiple to_tsquery into multiple to_tsvector? what is the SQL query if i want to search two words or more than one word? Can i just pass "two words" to $word from PHP? If i can, how does it work? Does it search for first word AND second one or first word OR second one?

It looks like what you want is, in fact to search the concatenation of all those fields.
You could build a query doing exactly this
... where to_tsvector('italian', name||' '||coalesce(decription,'')...) ## to_tsquery('$word')
and build an index on the exact same computation:
create index your_index on shop
using GIN(to_tsvector('italian',name||' '||coalesce(decription,'')...))
Don't forget to use coalesce on columns accepting NULL values.

Look for "a,b,c" in column with data "a,z,b,c,x" with SQL query

I've been refactoring and upgrading an existing news site's data layer which I didn't developed from the start. The application is taking quite a lot of visits and after a bit of research I decided to ditch EF and go with Ado.Net / Dapper since the sql commands will never be exposed to any kind of UI layer or string manipulation.
One problem I've come up with, tough, is news tags are not normalized in the database and stored as a comma seperated string in News table and there is a front-end functionality which requires "related news" to be shown to the user.
So I need to search any occurences of a comma delimited string values in a table column that also contains comma delimited string values.
I've come up with the following query in sql management studio but it (obviously) takes a good time to return the results. Is there a way to do this operation better? I don't have expert knowledge in SQL so with my knowledge this is the query working at the moment:
-- I'm declaring this variable only for testing. In reality, #Tags should also be a query
-- which returns the set of tags of the target news...
DECLARE #Tags nvarchar(MAX)
Select #Tags = Tags FROM News WHERE Id = 7978 -- No idea where / how to include this query
-- in the actual search query :/
-- dbo.Split is a table valued function that takes a comma delimited nvarchar as parameter
-- and returns table(Id int, Data nvarchar, Order int) with the seperated values of the CSV
SELECT DISTINCT TOP 10 N.Id, N.Title, N.CreatedAt From News N
CROSS APPLY dbo.Split(N.Tags) B
WHERE B.Data IN
(
SELECT C.Data FROM dbo.Split(#Tags) C
)
ORDER BY N.CreatedAt DESC, N.Id DESC
I have full text index enabled and set for "Tags" column in the News table, but couldn't think of a proper query to use benefits of it.
SQL Server version: 2008 R2
This query supposed to supply an IEnumerable<NewsDto> GetRelatedNews(int targetNewsId) api method.

Will you try following query:
SELECT DISTINCT TOP 10 n.Id, n.Title, n.CreatedAt
FROM dbo.Split(#Tags) c
CROSS APPLY
(
SELECT id, Title, CreatedAt
FROM News
WHERE CONTAINS(Tags, c.Data) //THIS SHOULD MAKE USE OF FT
) n
But one drawback is that it may get all top 10 news from the first tag.

Further research didn't produce any alternatives to what I gave as an example in my original post. So I decided to go with that query and turn it into a stored procedure.
It takes 3 seconds to return all results and in my Web project I'm calling this method via ajax and caching the results to prevent running the same SP for every request.
Overall it doesn't impact my WebUI performance since it loads related news asynchronously and uses cached result if any exists.

Using sys.dm_fts_index_keywords_by_document in SQL Server 2012

I'm trying to determine how many times a word occurs within a table for a uni assignment. I've been using sys.dm_fts_index_keywords_by_document in SQL Server 2012 as I've used it previously in 2008. It worked fine before, but in this context SQL Server doesn't seem to like it very much.
SELECT display_term, SUM(occurrence_count) AS APP
FROM sys.dm_fts_index_keywords_by_document
(
DB_ID('Assign2A_341'), OBJECT_ID('Post')
)
GROUP BY display_term
ORDER BY APP DESC
I keep running into this error: Msg 30004, Level 16, State 1, Line 1
A fulltext system view or stvf cannot open user table object id 599673184.
This is the format of the table being used:
CREATE TABLE Post(
Post_ID FLOAT NOT NULL,
Post_Txt NVARCHAR(MAX) NOT NULL,
Post_Date NVARCHAR(255) NOT NULL,
Post_Author VARCHAR(50) NOT NULL,
PRIMARY KEY(Post_ID));
I can't see any reason why this shouldn't work, the context in which I previously used it was very similar to how I'm using it now, the only difference being the version of SQL Server I'm using and the content of the table.
Any help would be very appreciated!

Did you create the full-text index on Table Post after the create Table statement?
Also, as BoL states, you need sufficient permissions to read from this index, are those present?
Requires CREATE FULLTEXT CATALOG permissions and SELECT permission on
the columns covered by the full-text index.

Firebird: Using SQL to search all fields?

I have a SQL query that is going to be used to search fields that are most likely going to change a lot. I.e More fields will be added. How can I write a sql query using a simple like that will search across al fields without explicitly specifying the fields?
Something like:
select * from MYTABLE where CONCAT(*) like '%mySearchTerm%';
Is there an easy way to do this?

I think the only way in Firebird is to use calculated field which concats the fields you want to search:
create table T (
foo varchar(24) not null,
bar varchar(24) not null,
...
SearchField varchar(1024) generated always as (foo || '##' || bar)
)
select * from T where SearchField like '%mySearchTerm%';
So each time youre altering the table (adding or dropping an field) you also have to change the calculated field, but the query would remain the same.
But this has some impact on perfomance as you're doing concat you really don't need...

If you have much records and fields in the table and need high performance, I suggest you to search about Full Text Index ou Full Text Search for Firebird.
It is not a native feature. But there are some third party solutions.
See this link: http://www.firebirdfaq.org/faq328/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Azure - Substring Searches? - sql

As of April 2015 Full-Text Search is available in Azure Sql Database server version V12: http://azure.microsoft.com/blog/2015/04/30/full-text-search-is-now-available-for-preview-in-azure-sql-database/.

Not familiar at all with azure, but is it possible to use a subquery? The inner (sub) query find all records that contain "hello" and the outer query uses that inner query as it's dataset to search for "thanks"?

Related

Fulltext search numbers does not work in SQL Server 2012

Postgres full text search: how to search multiple words in multiple fields?

Look for "a,b,c" in column with data "a,z,b,c,x" with SQL query

Using sys.dm_fts_index_keywords_by_document in SQL Server 2012

Firebird: Using SQL to search all fields?

Categories

Resources