SQL CONTAINSTABLE - Unexpected results - sql
I have a table programs with some records and have a different results if using LIKE or CONTAINSTABLE.
CREATE TABLE Programs (
ID varchar(255) NOT NULL PRIMARY KEY,
Title varchar(255) NOT NULL
);
Insert INTO Programs VALUES
('1', '5 Horas em Islamabad'),
('2','Gus Melhoras" Melhora'),
('3', '13 Horas - Os Soldados Secretos de Benghazi'),
('4','72 Horas de Medo'),
('5','As Primeiras 48 Horas')
SELECT distinct Title FROM Programs WHERE Title LIKE '%Horas%'
SELECT ID, Title, KEY_TBL.RANK
FROM Programs AS DocTable
INNER JOIN CONTAINSTABLE(Programs, Title, 'Horas') AS KEY_TBL
ON DocTable.ID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 0
ORDER BY KEY_TBL.RANK DESC;
With LIKE i have 5 results
ID Title
1 5 Horas em Islamabad
2 Gus Melhoras" Melhora
3 13 Horas - Os Soldados Secretos de Benghazi
4 72 Horas de Medo
5 As Primeiras 48 Horas
With ContainsTable i have 2 results
ID Title RANK
4 72 Horas de Medo 32
5 As Primeiras 48 Horas 32
I understand why the record with title "Gus Melhoras" Melhora" is not returned, because does not have the word "Horas".
But the records "5 Horas em Islamabad" and "13 Horas - Os Soldados Secretos de Benghazi" contain the word "Horas" and do not return.
Can anybody why this happened and can help me?
My dbms are Microsoft SQL Server.
Columns used in Full text index
EDIT:
In my case i defined the "Language for Word Breaker" in "Brazilian". If i changed to "English" returns correctly 4 items.
The word i search "Horas" is "Hours" in English. But if i add the new record, with title "13 hours in Islamabad" and search by word "Hours" the record is returned.
Anyone know why this particular behavior in Brazilian or Portuguese Language?
More, in Spanish "Horas" is the same word "Horas" and if i change my "Language for Word Breaker" to Spanish the 4 items are returned.
EDIT2:
Used the queries send by #Randy in Marin and i did the test used the Portuguese language.
SELECT s.stopword, l.name
FROM sys.fulltext_system_stopwords s
JOIN sys.fulltext_languages l ON l.lcid = s.language_id
WHERE l.lcid = 2070 -- portuguese
stopword name
0 Portuguese
1 Portuguese
2 Portuguese
3 Portuguese
4 Portuguese
5 Portuguese
6 Portuguese
7 Portuguese
8 Portuguese
9 Portuguese
a Portuguese
agora Portuguese
...
When execute the query to find the exact matches
SELECT occurrence, special_term, left(display_term, 20) as [display_term]
FROM sys.dm_fts_parser ('"5 Horas em Islamabad"', 2070, 0, 0); -- portuguese
occurrence special_term display_term
1 Exact Match tt24050000
1 Exact Match 5 horas
1 Exact Match tt24170000
2 Noise Word em
3 Exact Match islamabad
It's the equal result to the Brazilian language, although there are digits stopwords
The dmv sys.dm_fts_parser shows how a phrase is parsed for different languages with or without a stoplist or accent.
SET NOCOUNT ON
--select * from sys.syslanguages
SELECT occurrence, special_term, left(display_term, 20) as [display_term]
FROM sys.dm_fts_parser ('"5 Horas em Islamabad"', 1033, 0, 0); -- english
SELECT occurrence, special_term, left(display_term, 20) as [display_term]
FROM sys.dm_fts_parser ('"5 Horas em Islamabad"', 1046, 0, 0); -- brazilian
SELECT occurrence, special_term, left(display_term, 20) as [display_term]
FROM sys.dm_fts_parser ('"5 Horas em Islamabad"', 3082, 0, 0); -- spanish
occurrence special_term display_term
----------- ---------------- --------------------
1 Noise Word 5
1 Noise Word nn5
2 Exact Match horas
3 Exact Match em
4 Exact Match islamabad
occurrence special_term display_term
----------- ---------------- --------------------
1 Exact Match tt24050000
1 Exact Match 5 horas
1 Exact Match tt24170000
2 Noise Word em
3 Exact Match islamabad
occurrence special_term display_term
----------- ---------------- --------------------
1 Noise Word 5
1 Noise Word nn5
2 Exact Match horas
3 Exact Match em
4 Exact Match islamabad
The "5" is not a noise word in Brazilian. I tried null for a stoplist and both 0 and 1 for the accent and it did not help.
If you run the following two queries, it's clear that the Brazialian stoplist is very different. It does not have digits. Perhaps it should. Maybe a support call is required.
SELECT s.stopword, l.name
FROM sys.fulltext_system_stopwords s
JOIN sys.fulltext_languages l
ON l.lcid = s.language_id
WHERE l.lcid = 1033
stopword
----------------------------------------------------------------
$
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
...
SELECT s.stopword, l.name
FROM sys.fulltext_system_stopwords s
JOIN sys.fulltext_languages l
ON l.lcid = s.language_id
WHERE l.lcid = 1046
stopword
----------------------------------------------------------------
a
abaixo
acaso
aceleradamente
acerca
acima
acolá
ademais
adentro
adiantado
adiante
adrede
afora
agora
agorinha
ainda
alerta
algo
algum
alguma
algumas
...
LIKE and CONTAINSTABLE can be expected to have different results. LIKE uses simple and deterministic pattern matching rules and all characters are significant. CONTAINSTABLE uses a complex system that attempts to apply language specific algorithms to do fuzzy matches.
If storing documents that can be of different languages, specifying the language in CONTAINSTABLE can yield better results. The LCID might be stored in the record of the document and passed to CONTAINSTABLE in the join. If not specified, the language of the full text is used and might not be a good match.
SELECT ID, Title, KEY_TBL.RANK
FROM Programs AS DocTable
INNER JOIN CONTAINSTABLE(Programs, Title, 'Horas', 1046) AS KEY_TBL
ON DocTable.ID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 0
ORDER BY KEY_TBL.RANK DESC;
UPDATE:
Here is a means to check in which languages a value is a stopword.
select * from sys.fulltext_system_stopwords
WHERE stopword IN ('5', 'em')
stopword language_id
---------------------------------------------------------------- -----------
5 0
5 1028
5 1030
5 1031
5 1033
5 1036
5 1040
5 1041
5 1043
5 1045
5 1049
5 1053
5 1054
5 1055
5 2052
5 2057
5 2070
5 3082
em 1046
em 2070
Related
How do I correctly map letters in the database?
I have two tables. One table with the letters of different countries and a second table with a mapping of these letters to each other. I need to make a query to get the mapped letters of the two languages. Can you tell me how this can be done optimally? The letter table id letter language 1 A en 2 Ä de 3 A de 4 O en 5 O de 6 Ö de The letter mapping table id letter1(letterTable.id) letter2(letterTable.id) 1 1 2 2 1 3 3 4 5 4 4 6 Would it be better to create a separate table for each alphabet? Maybe there is some other architectural approach for this kind of letter matching? I would really appreciate it!
This can be achieved with a join that is restricted to the two languages you want to check: select en.id as id_en, en.letter as letter_en, de.id as id_de, de.letter as letter_de from letter en join letter_mapping lm on lm.letter1 = en.id join letter de on de.id = lm.letter2 and de.language = 'de' where en.language = 'en'; Online example
How to assign data without repetition in SQL
I need to create automatic weekly assignments of items to sites for my employees. The items table items_bank looks like that(of course there will be a lot of items with few more languages) : **item_id** **item_name** **language** 1 Jorge Garcia English 2 Chrissy Metz English 3 Nina Hagen German 4 Harald Glööckle German 5 Melissa Anderson French 6 Pauley Perrette French My second table is the sites table : **site_id** **site_name** 1 DR 2 LI 3 IG I need to assign every week items to the sites with the following constraints : For each site assign at least X items of English, Y items of German, and so on... we want to create diversity - so we would like to avoid repeating the assignments of the 2 weeks before I think we need to create another table in which we can save there the history of the last 2 weeks' assignments. right now I managed to create an SQL query that assigns items but I don't know how to take the constraints under consideration this is what I create so far : WITH numbered_tasks AS ( SELECT t.*, row_number() OVER (ORDER BY rand()) item_number, count(*) OVER () total_items FROM item_bank t ), numbered_employees AS ( SELECT e.*,row_number() OVER (ORDER BY rand()) site_number, count(*) OVER () total_sites FROM sites_bank e ) SELECT nt.item_name, ne.acronym FROM numbered_tasks nt INNER JOIN numbered_employees ne ON ne.site_number-1 = mod(nt.item_number-1, ne.total_sites) Expected results are for the example which says : site_id=1 have to get 1 item with the English language site_id=2 have to get 1 item with the German language site_id=1 have to get 1 item with the French language **item_id** **language** **Week_number** **site** 1 English 1 1 4 German 1 2 5 French 1 3 Any help will be appreciated!
union result of multiple dynamic queries sql
I have two tables tblJobs and tblJobseeker. tblJobs JobId skill location 5 .net, php Mexico 8 java Boston 9 sql, c++ London, Mexico tblJobseeker ID skill location 3 .net Mexico 7 sql Boston 10 java Boston 12 php Mexico 13 c++ London, Boston Now I want to loop through first table tblJobs and find matching Jobseeker based on skill and location. For each record in tblJobs a result will come which I need to union with other records result set. I was trying to use cursor and dynamic query but how I can set condition of column skill and location in dynamic query. Also the records in both tables may vary In above case, result should be ID skill location 3 .net Mexico 12 php Mexico 10 java Boston 13 c++ London, Boston I have edited the question. Here I am using charindex to match the result and inner join is not possible.There can be n number of locations or skills so different columns are not possible.
Simple inner join would do: select s.* from tblJobseeker s inner join tblJobs j on s.skill = j.skill and s.location = j.location;
SQL trouble with average function
I am very new to SQL and am having a lot of trouble wrapping my head around certain things. The following code SELECT Treated.ProgScore, Patient.Ethnicity FROM Patient JOIN City ON Patient.ZIP = City.ZIP JOIN Treated ON Patient.SSN = Treated.PSSN WHERE City.Cname != 'Dallas'; produces the following data. I would like to extract the average ProgScore of each ethnicity and list it from lowest to highest average ProgScore PROGSCORE ETHNICITY 2 Japanese 5 Caucasian 9 Caucasian 2 African 3 Japanese 7 Caucasian 10 Japanese 8 Caucasian 1 African 4 Japanese 7 Caucasian Is there a way to do this? I would like my output to look like this. Average ETHNICITY 4.75 Japanese 7.2 Caucasian Thanks for any help
It sounds like you want SELECT avg(Treated.ProgScore), Patient.Ethnicity FROM Patient JOIN City ON Patient.ZIP = City.ZIP JOIN Treated ON Patient.SSN = Treated.PSSN WHERE City.Cname != 'Dallas' GROUP BY Patient.Ethnicity ORDER BY avg(Treated.ProgScore);
Multiple JOIN (SQL)
My problem is Play! Framework / JPA specific. But I think it's applicable to general SQL syntax. Here is a sample query with a simple JOIN: return Post.find( "select distinct p from Post p join p.tags as t where t.name = ?", tag ).fetch(); It's simple and works well. My question is: What if I want to JOIN on more values in the same table? Example (Doesn't work. It's a pseudo-syntax I created): return Post.find( "select distinct p from Post p join p.tags1 as t, p.tags2 as u, p.tags3 as v where t.name = ?, u.name = ?, v.name = ?", tag1, tag2, tag3, ).fetch();
Your programming logic seems okay, but the SQL statement needs some work. Seems you're new to SQL, and as you pointed out, you don't seem to understand what a JOIN is. You're trying to select data from 4 tables named POST, TAG1, TAG2, and TAG3. I don't know what's in these tables, and it's hard to give sample SQL statements without that information. So, I'm going to make something up, just for the purposes of discussion. Let's say that table POST has 6 columns, and there's 8 rows of data in it. P Fname Lname Country Color Headgear - ----- ----- ------- ----- -------- 1 Alex Andrews 1 1 0 2 Bob Barker 2 3 0 3 Chuck Conners 1 5 0 4 Don Duck 3 6 1 5 Ed Edwards 2 4 2 6 Frank Farkle 4 2 1 7 Geoff Good 1 1 0 8 Hank Howard 1 3 0 We'll say that TAG1, TAG2, and TAG3 are lookup tables, with only 2 columns each. Table TAG1 has 4 country codes: C Name - ------- 1 USA 2 France 3 Germany 4 Spain Table TAG2 has 6 Color codes: C Name - ------ 1 Red 2 Orange 3 Yellow 4 Green 5 Blue 6 Violet Table TAG3 has 4 Headgear codes: C Name - ------- 0 None 1 Glasses 2 Hat 3 Monacle Now, when you select data from these 4 tables, for P=6, you're trying to get something like this: Fname Lname Country Color Headgear ----- ------ ------- ------ ------- Frank Farkle Spain Orange None First thing, let's look at your WHERE clause: where t.name = ?, u.name = ?, v.name = ? Sorry, but using commas like this is a syntax error. Normally you only want to find data where all 3 conditions are true; you do this by using AND: where t.name=? AND u.name=? AND v.name=? Second, why are you joining tables together? Because you need more information. Table POST says that Frank's COUNTRY value is 4; table TAG1 says that 4 means Spain. So we need to "join" these tables together. The ancient (before 1980, I think) way to join tables is to list more than one table name in the FROM clause, separated by commas. This gives us: SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear FROM POST P, TAG1 T, TAG2 U, TAG3 V The trouble with this query is that you're not telling it WHICH rows you want, or how they relate to each other. So the database generates something called a "Cartesian Product". It's extremely rare that you want a Cartesian Product - normally this is a HUGE MISTAKE. Even though your database only has 22 rows in it, this SELECT statement is going to return 768 rows of data: Alex Andrews USA Red None Alex Andrews USA Red Glasses Alex Andrews USA Red Hat Alex Andrews USA Red Monacle Alex Andrews USA Orange None Alex Andrews USA Orange Glasses ... Hank Howard Spain Violet Monacle That's right, it returns every possible combination of data from the 4 tables. Imagine for a second that the POST table eventually grows to 20000 rows, and the three TAG tables have 100 rows each. The whole database would be less than a megabyte, but the Cartesian Product would have 20,000,000,000 rows of data -- probably about 120 GB of data. Any database engine would choke on that. So if you want to use the Ancient way of specifying tables, it is VERY IMPORTANT to make sure that your WHERE clause shows the relationship between every table you're querying. This makes a lot more sense: SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear FROM POST P, TAG1 T, TAG2 U, TAG3 V WHERE P.Country=T.C AND P.Color=U.C AND P.Headgear=V.C This only returns 8 rows of data. Using the Ancient way, it's easy to accidentally create Cartesian Products, which are almost always bad. So they revised SQL to make it harder to do. That's the JOIN keyword. Now, when you specify additional tables you can specify how they relate at the same time. The New Way is: SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear FROM POST P INNER JOIN TAG1 T ON P.Country=T.C INNER JOIN TAG2 U ON P.Color=U.C INNER JOIN TAG3 V ON P.Headgear=V.C You can still use a WHERE clause, too. SELECT P.FNAME, P.LNAME, T.NAME As Country, U.NAME As Color, V.NAME As Headgear FROM POST P INNER JOIN TAG1 T ON P.Country=T.C INNER JOIN TAG2 U ON P.Color=U.C INNER JOIN TAG3 V ON P.Headgear=V.C WHERE P.P=? If you call this and pass in the value 6, you get only one row back: Fname Lname Country Color Headgear ----- ------ ------- ------ -------- Frank Farkle Spain Orange None
As was mentioned in the comments, you are looking for an ON clause. SELECT * FROM TEST1 INNER JOIN TEST2 ON TEST1.A = TEST2.A AND TEST1.B = TEST2.B ...
See example usage of join here: http://en.wikibooks.org/wiki/Java_Persistence/Relationships#Join_Fetching