FullText Contains "AND DO" - sql

I found a weird issue with a query that uses FullText index.
The following query
#1 SELECT * FROM tbparticipant where contains([FullTextQuery],'ALINE AND NASCIMENTO')
returns
ALINE DO NASCIMENTO
ALINE QUEIROZ DO NASCIMENTO
ALINE NASCIMENTO DE SOUZA
ALINE CORREIA DO NASCIMENTO
But this query
#2 SELECT * FROM tbparticipant where contains([FullTextQuery],'ALINE AND DO')
returns nothing.
I thought it would be a problem with "DO" being too short, but this query
#3 SELECT * FROM tbparticipant where contains([FullTextQuery],'ALINE AND DE')
returns
ALINE NASCIMENTO DE SOUZA
So, what's wrong with the query #2?

"Do" is on stopword list - stopwords are words considered to common or to short to have any significant meaning for full text queries. You can list your stopwords for english language like this:
select * from sys.fulltext_system_stopwords where language_id = 1033
Reference:
http://msdn.microsoft.com/en-us/library/ms142551.aspx

Related

How to I check a text consist of ascii code in PostgreSQL?

I want to select only the text which consist of ascii code values.
e.g
"Grey's Anatomy : Station 19"
"Trésors sous les mers"
"Les Légendes des Studios Marvel"
"The Great North"
"Solar Opposites"
I want to select it from above titles.
"Grey's Anatomy : Station 19"
"The Great North"
"Solar Opposites"
How to I filter out by postgreSQL?
You could use regex matching.
select * from titles where title ~ '^[[:ascii:]]+$';
Example http://sqlfiddle.com/#!17/2402a1/8
https://www.postgresql.org/docs/current/functions-matching.html

sql substr variable url extraction process

Context:
till now i uses to use regexp in sql to extract variable urls. I find it very slow and want to optimize it using substr and instr commands. That's important for me cause as i'm new in sql it serves me to be more familiar with such commands.
database:
my db is made by posts extracted from social platforms. text are called "titre". It contains variables url in different formats: www, http, https. I want to create a table or table view (i m not fixed) containing those url and the related id_post.
My work:
I have noticed that url always ends with a blank space, sthg like: "toto want to share with you this www.example.com in his post"
here stands what i ve done so far:
---longueur de la chaîne de caractère depuis https
select LENGTH(substr(titre, INSTR(titre,'https:'))) from post_categorised_pages where id_post = '280853248721200_697941320345722';
---longueur de la chaîne de caractère depuis le blanc
select LENGTH(substr(titre, INSTR(titre,' ', 171))) from post_categorised_pages where id_post = '280853248721200_697941320345722';
--- différence pour obtenir la longueur de chaîne de caractères de l'url
select LENGTH(substr(titre, INSTR(titre,'https:'))) - LENGTH(substr(titre, INSTR(titre,' ', 171))) as longueur_url from post_categorised_pages where id_post = '280853248721200_697941320345722';
---url
select substr(titre, 171, 54)from post_categorised_pages where id_post = '280853248721200_697941320345722';
Question:
How can i automotasize that over the whole table "post_categorised_page"?
Can i introduce case when statements to take into account https or http of www. and how can i do that?
Thanks a lot!!!!
Maybe, instead of the "HTTP", HTTPS" or "WWW" string you would need to have the name of a column.
In this case, probably, it would be helpful to have a definition table where to define all possible sources. This tabel to have 2 columns (ID and source_name).
Then, in your post_categorised_pages table, to insert also the source of the message (the ID value).
Then, into the query, to join with this definition table by ID and, instead of
select substr(titre, INSTR(titre,'https:'), (LENGTH(substr(titre, INSTR(titre,'https:'))) - LENGTH(substr(titre, INSTR(titre,' ', (INSTR(titre,'https:')))))))from post_categorised_pages where id_post = '280853248721200_697941320345722';
to have
select substr(titre, INSTR(titre,"definition table".source_name), (LENGTH(substr(titre, INSTR(titre,"definition table".source_name))) - LENGTH(substr(titre, INSTR(titre,' ', (INSTR(titre,"definition table".source_name)))))))from post_categorised_pages where id_post = '280853248721200_697941320345722';
Ok guys, here is the solution i have found (there is stil one mistake, see at end of post).
I use two views to finally extract my strings.
First view is create by a connect by request:
--- create intermediate table view with targeted pattern position
create or replace view Start_Position_Index as
with "post" as
(select id, text from "your_table" where id= 'xyz')
select id, instr(text,'#', 1, level) as position, text
from post
connect by level <= regexp_count(titre, '#');
then
--- create working table view with full references and blank position for each pattern match and string_lenght for each one
create or replace view _#_index as
select id, position as hashtag_pos, INSTR(text,' ', position) as blank_position, INSTR(text,' ', position) - position as string_length, text
from Start_Position_Index;
At the end you will be able to retrieve the hashtags (in that case) you were looking for in your string.
Ok so the mistakes:
- if the pattern you are looking for is at the end of your string it will retrieve a null value cause there will be no blank space (as it is at end of the string).
- it is not well optimized cause here i am working with views and not tables. I think using tables will be faster.
But i m pretty sure there is lots of things to do in order to optimize this code... any idea? The challenge were how to extract specific pattern recursively among strings whithout using costy regex and without using pl/sql stuff. What do you think of that?
How about using Oracle Full Text search?
This will index all the words from the column and will provide the hashtags or web addresses, as both are written in one word, without space in between.

Regular expression Oracle

I am learning to use regular expressions and I'm using them to limit the results of a search query by using the REGEXP_LIKE in Oracle 11. Placing an example of the data available, I have the following:
Plan navegación 200 MB
Plan navegación 1 GB
Plan navegación 1 GB
Plan de navegacion 3G
Plan de navegacion 4G
Plan de navegacion 3G Empresarial
Plan de navegacion 4G Empresarial
Plan de servicios 3G
Plan de servicios 4G
Plan navegación Datos
I want this result is limited to the following (Only 3G, 4G):
Plan de navegacion 3G
Plan de navegacion 4G
Plan de navegacion 3G Empresarial
Plan de navegacion 4G Empresarial
I am using the following search pattern but I did not properly filtered results:
Upper(PLAN_GSM),'(NAVEGA){1}|(3G|4G|5G)'
Upper(PLAN_GSM),'((NAVEGA)+)(3G|4G)+'
I have done several tests and do not find the solution. Someone could give me hints?
You could simply use LIKE, as below:
select *
from mytable
where PLAN_GSM LIKE 'Plan de navegacion _G%';
or use REGEXP_LIKE, as below:
select *
from mytable
where REGEXP_LIKE(PLAN_GSM, '^Plan de navegacion (3|4|5)G(*)');
SQL Fiddle demo
Reference:
Oracle/PLSQL: REGEXP_LIKE Condition on Tech on the Net
You can use this:
SELECT * FROM mytable
WHERE REGEXP_LIKE(mycolumn, '\APlan de navegacion \dG.*\z', 'c');
\d represents a digit
\A is the beginning of the string
.* greedily matches any characters
\z is the end of the string
select *
from all_tab_columns
where COLUMN_NAME like '%MAIN%ACCOUNT%LINK%CODE%N%' and TABLE_NAME like 'CB%' and not regexp_like(table_name,'[0-9]')
The above query will fetch the only object without number of content.
select *
from all_tab_columns
where COLUMN_NAME like '%MAIN%ACCOUNT%LINK%CODE%N%' and TABLE_NAME like 'CB%' and not regexp_like(table_name,'[0-9]')
The above query will fetch only object with numbers content.

Search "Our" Use Full Text Search with Contains in SQL Server

We Use Full text search and Contains to search between records in SQL Server 2008 R2, here are the samples:
NEWS(Title): "We", "New", "Our", "Long-Term", "Seem", "Non.Active"
So as you see in the News table the title field have the values.
We can search all of the values except "Long-Term" And "Non.Active", actually we can not search the words includes dash("-") or dot("."). We also check these tips:
SELECT * FROM NEWS WHERE Contains(Title, 'Non.Active');
SELECT * FROM NEWS WHERE Contains(Title, 'Non Active');
SELECT * FROM NEWS WHERE Contains(Title, 'NonActive');
SELECT * FROM NEWS WHERE Contains(Title, 'Non*');
SELECT * FROM NEWS WHERE Contains(Title, 'Active');
SELECT * FROM NEWS WHERE Contains(Title, 'Non');
SELECT * FROM NEWS WHERE Contains(Title, '*Active');
SELECT * FROM NEWS WHERE Contains(Title, ' "Non.Active" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "Non Active" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "NonActive" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "Non*" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "*Active" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "Active" ');
SELECT * FROM NEWS WHERE Contains(Title, ' "Non" ');
But none of them return any result.
Also we rebuild Full Text Index and yet we did not get any result.
So the question is: Is there any way to search the words include "." or "-" with full text Contains predicate? any suggestion
UPDATE
I'm really sorry the main problem is another?
you all right about two words of "non" and "Action". but the main case I test it is "We.Our" and steel not return any result? That's so wired, I test "Non.Action" with above search and worked but "We.Our" don't. So I try another record, I inserted the "our" and the search result is yet null. The problem is about "Our" word? what is the problem with "our" I also check it in SQL Server 2012, and not worked also. is there any one have any idea about this?
You can check how needed data is stored in Fulltext catalog:
SELECT *
FROM sys.dm_fts_index_keywords_by_document(db_id('DBname'), object_id('TableName'))
WHERE document_id = <Unique Id>
What about the "our" word, it seems to be a stop-word.
The point is the Block words and Noisy words should be checked, in my case the our was a noisy word, I don't know why. but the problem solved
In My cases, it simply works as
SELECT * FROM NEWS WHERE Contains(Title,'"Non.*"');
But i hear there is little bug in 2008 version.May the problem will solve after updating sql server instances.

SQL Server 2008 full-text search doesn't find word in words?

In the database I have a field with a .mht file. I want to use FTS to search in this document. I got this working, but I'm not satisfied with the result. For example (sorry it's in dutch, but I think you get my point) I will use 2 words: zieken and ziekenhuis. As you can see, the phrase 'zieken' is in the word 'ziekenhuis'.
When I search on 'ziekenhuis' I get about 20 results. When I search on 'zieken' I get 7 results. How is this possible? I mean, why doesn't the FTS resturn the minimal results which I get from 'ziekenhuis'?
Here's the query I use:
SELECT DISTINCT
d.DocID 'Id',
d.Titel,
(SELECT afbeeldinglokatie FROM tbl_Afbeelding WHERE soort = 'beleid') as Pic,
'belDoc' as DocType
FROM docs d
JOIN kpl_Document_Lokatie dl ON d.DocID = dl.DocID
JOIN HandboekLokaties hb ON dl.LokatieID = hb.LokatieID
WHERE hb.InstellingID = #instellingId
AND (
FREETEXT(d.Doel, #searchstring)
OR FREETEXT(d.Toepassingsgebied, #searchstring)
OR FREETEXT(d.HtmlDocument, #searchstring)
OR FREETEXT (d.extraTabblad, #searchstring)
)
AND d.StatusID NOT IN( 1, 5)
I would suggest that you look at using the CONTAINS predicate, as opposed to FREETEXT
Usage scenarios, including what you wish to achieve, can be found in the examples section of the documentation.
From your description, I believe that you are attempting to perform a "Prefix" search. For example:
USE AdventureWorks;
GO
SELECT Name
FROM Production.Product
WHERE CONTAINS(Name, ' "SearchTerm*" ');
GO
This will provide you a result set containing all words that "contain" the prefix search term.