sqlite3 with FTS4 table: Query returns wrong row - sql

I have a weird issue with my FTS4 index in SQLite3, namely that a MATCH query for one term returns not the exact match but another, similar one, and vice versa.
Here is an example to illustrate it better:
SELECT name FROM test_idx WHERE name MATCH 'lehmbruck-museum';
-- "Lehmbruck-Archiv"
SELECT name FROM test_idx WHERE name MATCH 'lehmbruck-archiv';
-- "Lehmbruck-Museum"
It seems to have something to do with the dash, here is a similar case that exhibits the same behavior:
SELECT name FROM test_idx WHERE name MATCH 'some-thing';
-- "some-thang"
SELECT name FROM test_idx WHERE name MATCH 'some-thang';
-- "some-thing"
Here is how this test database is built, in case somebody wants to have a go at reproducing it:
CREATE VIRTUAL TABLE test_idx USING fts4(name);
INSERT INTO test_idx (name) VALUES
('some-thing'), ('some-thang'),
('Lehmbruck-Museum'), ('Lehmbruck-Archiv');

SELECT name FROM test_idx WHERE name MATCH 'lehmbruck-museum';
What you pass to MATCH here is a full text query expression. The - character is a unary operator in that expression language that is a stand in for the NOT set operation, and is certainly giving you your unexpected results. Notably - the exact opposite of what you expect! Of course, it is finding exactly what the query is instructed to find - the string lehmbruck and NOT museum at the end!
You'll need to escape it to get the results you want - or perhaps employ the LIKE operator if you are looking at a single column in a table.
Some more information on this expression language can be found in section 3 of the FTS3 and FTS4 documentation on the SQLite doc site here.

Related

How can you filter Snowflake EXPLAIN AS TABULAR syntax when its embedded in the TABLE function? Can you filter it with anything?

I have a table named Posts I would like to count and profile in Snowflake using the current Snowsight UI.
When I return the results via EXPLAIN using TABLULAR I am able to return the set with the combination of TABLE, RESULT_SCAN, and LAST_QUERY_ID functions, but any predicate or filter or column reference seems to fail.
Is there a valid way to do this in Snowflake with the TABLE function or is there another way to query the output of the EXPLAIN using TABLULAR?
-- Works
EXPLAIN using TABULAR SELECT COUNT(*) from Posts;
-- Works
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t;
-- Does not work
SELECT t.* FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) as t where operation = 'GlobalStats';
-- invalid identifier 'OPERATION', the column does not seem recognized.
Tried the third example and expected the predicate to apply to the function output. I don't understand why the filter works on some TABLE() results and not others.
You need to double quote the column name
where "operation"=
From the Documentation
Note that because the output column names from the DESC USER command
were generated in lowercase, the commands use delimited identifier
notation (double quotes) around the column names in the query to
ensure that the column names in the query match the column names in
the output that was scanned

How to match a regular expression from a SQL table to a text

I have used before LIKE command to match patterns to a specific SQL table column. For example need all the rows which all have name started with "A". But this case I am trying to solve things reversed , have a column "Description" where all the regular expressions row wise. And need to return specific rows which all match with a input text.
Table A
Description
============
\b[0-9A-Z ]*WT[A-Z0-9 ]*BALL\b
\b[0-9A-Z ]*WG[A-Z0-9 ]*BAT\b
\b[0-9A-Z ]*AX[A-Z0-9 ]*BAR\b
So Description column has these regular expressions and the input text "BKP 200 WT STAR BALL" So need to return the first row after select query since that the only match. If any one can help with the select query or idea, would be very helpful. If more details required please mention also.
Cross join you regex table to one that you searching within. Then just match two columns against each other.
Here's the example how you can match any of your expressions.
Here how you can match all of them

Is there a way to express AND in SIMILAR TO ignoring order of matches?

I have a Redshift table column that contains 1 to many hashtags (e.g. #a, #b, etc.). I want to write a query that finds rows where all tags from a given set exist (e.g. #a and #b) while not picking up other rows that have some but not all of the tags (e.g. only #a or only #b).
I can see how to do this with multiple LIKE statements (e.g. LIKE '%#a %' AND LIKE '%#b%') but I would really like to do it with a single statement. I can see how to do this with SIMILAR TO but not in a way that ignores ordering. The following would work but only if I include all possible combinations of ordering.
SELECT * FROM table WHERE field SIMILAR TO '(%#a%)(%#b%)|(%#b%)(%#a%)'
This works but having to list all combinations of the tags I'm looking for would be a royal pain and prone to error. Is there a way to express 'AND' in SIMLAR TO (or another function) in Redshift that ignores order?
Make sure to capture the whole tag in any position and not match on incomplete tags:
SELECT *
FROM table
WHERE (field LIKE '#a#%' OR field LIKE '%#a') AND
(field LIKE '#b#%' OR field LIKE '%#b')
This avoids matching data such as #ac#b
Use AND and LIKE:
SELECT t.*
FROM table t
WHERE field LIKE '%#a%' AND
field LIKE '%#b%';

MariaDB MATCH AGAINST with single quote search term?

I'm trying to find a working query for using MATCH AGAINST while having a search term containing a single quote.
Example data in the database table:
I'm a freak
Example search term:
I'm
Search queries I tried:
SELECT * FROM table WHERE MATCH (name) AGAINST ('"I\'m"' IN NATURAL LANGUAGE MODE);
SELECT * FROM table WHERE MATCH (name) AGAINST ('"I\'m"' IN BOOLEAN MODE);
SELECT * FROM table WHERE MATCH (name) AGAINST ('I\'m*' IN BOOLEAN MODE);
SELECT * FROM table WHERE MATCH (name) AGAINST ('(I\'m)*' IN BOOLEAN MODE);
...and many more. Nothing is working.
I'm using MariaDB 10.1.33.
Any ideas?
I'm pretty sure contractions are not treated as words.
Instead the apostrophe is treated as a word separator giving you "I" and "m".
But you probably don't have innodb_ft_min_token_size=1, so those two "words" are ignored.
There are limitations of FT; you have encountered one of them.

Oracle Text Context-Index returns all rows for contains() query with wildcard

We use Oracle Text in a Oracle Database (11.2.0.4.0) to perform full-text search over stored documents, as well as over multiple columns in our database.
For these multi-column indexes we noticed that some double-sided wildcard queries return the wrong number of results: The whole table!
Our application translates the query of a user into a double-sided wildcard query (e.g. "york" -> "%york%") and passes them to the contains operator.
We re-ran this on the database and could reproduce it.
Consider, for example, a table containing cities where the full-text index spans all columns: Zip-Code, Cityname, State and Country:
select * from city where contains(cityname, '%york%')>0
The following query arguments seem to return a wrong number of results (all rows):
%s%
%i%
%d%
%c%
What I checked already:
Interestingly, the non-working queries are all format-arguments in C. But I have not been able to find these as keywords or special operators in the Oracle Text documentation.
I checked that the stop word list does not contain these queries.
I set a custom lexer and turned on the "mixed case" option for it, which seems to fix the issue for lowercase queries, but the issue persists for upper case queries (%S%).
The score operator returns a value of 6 for the rows that should not match:
select cityname, state, zip, score(1) from city where contains(cityname, '%s%', 1)>0
---------------------------------
|Cityname |State|Zip | Score(1)|
|-------------------------------|
|La Cibourg|NE |2332| 6 | - WRONG
|Morlon |FR |1638| 6 | - WRONG
|Leuk Stadt|VS |3953| 12 | - Correct row
---------------------------------
Do you know any (mis-)configuration that can cause this?
Update
The exact version is 11.2.0.4.0, with Patch 18842982 applied.
The script to create the table and index is below:
drop table city_copy;
create table city_copy (
city_nr number not null,
zip_code varchar2(60),
city_name varchar2(60),
state varchar2(60)
);
insert into city_copy
select 1, 2332, 'La Ciboug', 'NE' from dual
union all
select 2, 1638, 'Morlon', 'FR' from dual
union all
select 3, 3953, 'Leuk Stadt', 'VS' from dual;
commit;
exec ctxsys.ctx_ddl.drop_preference('CITY_MULTI');
exec ctxsys.ctx_ddl.create_preference('CITY_MULTI', 'MULTI_COLUMN_DATASTORE');
exec ctxsys.ctx_ddl.set_attribute('CITY_MULTI', 'COLUMNS', 'ZIP_CODE, CITY_NAME, STATE');
create index city_idx_ft on city_copy(zip_code)
indextype is ctxsys.context parameters ('datastore CITY_MULTI sync (on commit)');
The current settings for the default lexer are:
DEFAULT_LEXER COMPOSITE GERMAN
DEFAULT_LEXER MIXED_CASE YES
DEFAULT_LEXER ALTERNATE_SPELLING GERMAN
Our stoplist is unchanged from the default stoplist for German
SO, after quite a bit research...
I am still not sure if it is a bug, but although my intuition said it is the lexer that causes this behavior - it is not.
Please add to the preference an attribute named DELIMITER with a value of NEWLINE
exec ctx_ddl.set_attribute('CITY_MULTI', 'DELIMITER', 'NEWLINE');
That would solve your issue.
The default delimiter is COLUMN_NAME_TAG which probably conflicts with too short parameters (it is supposed to treat your data as if it is an XML, and probably somewhere in how Oracle concatenate the text there are the single characters you were looking for).
It looks to me like for the multi column data store Oracle Text constructs for each row you have an XML that contains the name of the column in it, something like:
<XML>
<zip_code>2332</zip_code>
<city_name>La Ciboug</city_name>
<state>NE</state>
</XML>
and that XML is being indexed (or a structure that is similar to it).
and when looking for just S the s from the word "state" returns in every row.
The new line changes the way the text is being built to
2332
La Ciboug
NE
which is better in your case and the way you search.
more info about it you can find here:
http://docs.oracle.com/cd/B19306_01/text.102/b14218/cdatadic.htm#i1006391
Good luck!