SQL FTS and comparison statements - sql

Short story. I am working on a project where I need to communicate with SQLite database. And there I have several problems:
There is one FTS table with nodeId and nodeName columns. I need to select all nodeIds for which nodeNames contains some text pattern. For instance all node names with "Donald" inside. Something similar was discussed in this thread. The point is that I can't use CONTAINS keyword. Instead I use MATCH. And here is the question itself: how should this "Donald" string be "framed"? With '*' or with '%' character? Here is my query:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH "Donald"
Is it OK to write multiple comparison in SELECT statement? I mean something like this:
SELECT * FROM distanceTable WHERE pointId = 1 OR pointId = 4 OR pointId = 203 AND distance<200
I hope that it does not sound very confusing. Thank you in advance!

Edit: Sorry, I missed the fact that you are using FTS4. It looks like you can just do this:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald'
Here is relevant documentation.
No wildcard characters are needed in order to match all entries in which Donald is a discrete word (e.g. the above will match Donald Duck). If you want to match Donald as part of a word (e.g. Donalds) then you need to use * in the appropriate place:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald*'
If your query wasn't working, it was probably because you used double quotes.
From the SQLite documentation:
The MATCH operator is a special syntax for the match()
application-defined function. The default match() function
implementation raises an exception and is not really useful for
anything. But extensions can override the match() function with more
helpful logic.
FTS4 is an extension that provides a match() function.
Yes, it is ok to use multiple conditions as in your second query. When you have a complex set of conditions, it is important to understand the order in which the conditions will be evaluated. AND is always evaluated before OR (they are analagous to mathematical multiplication and addition, respectively). In practice, I think it is always best to use parentheses for clarity when using a combination of AND and OR:
--This is the same as with no parentheses, but is clearer:
SELECT * FROM distanceTable WHERE
pointId = 1 OR
pointId = 4 OR
(pointId = 203 AND distance<200)
--This is something completely different:
SELECT * FROM distanceTable WHERE
(pointId = 1 OR pointId = 4 OR pointId = 203) AND
distance<200

Related

WHERE clause returning no results

I have the following query:
SELECT *
FROM public."Matches"
WHERE 'Matches.Id' = '24e81894-2f1e-4654-bf50-b75e584ed3eb'
I'm certain there is an existing match with this Id (tried it on other Ids as well), but it returns 0 rows. I'm new to querying with PgAdmin so it's probably just a simple error, but I've read the docs up and down and can't seem to find why this is returning nothing.
Single quotes are only used for strings in SQL. So 'Matches.Id' is a string constant and obviously not the same as '24e81894-2f1e-4654-bf50-b75e584ed3eb' thus the WHERE condition is always false (it's like writing where 1 = 0)
You need to use double quotes for identifiers, the same way you did in the FROM clause.
WHERE "Matches"."Id" = ...
In general the use of quoted identifiers is strongly discouraged.

Find duplicates in case-sensitive query in MS Access

I have a table containing Japanese text, in which I believe that there are some duplicate rows. I want to write a SELECT query that returns all duplicate rows. So I tried running the following query based on an answer from this site (I wasn't able to relocate the source):
SELECT [KeywordID], [Keyword]
FROM Keyword
WHERE [Keyword] IN (SELECT [Keyword]
FROM [Keyword] GROUP BY [Keyword] HAVING COUNT(*) > 1);
The problem is that Access' equality operator treats the two Japanese writing systems - hiragana and katakana - as the same thing, where they should be treated as distinct. Both writing systems have the same phonetic value, although the written characters used to represent the sound are different - e.g. あ (hiragana) and ア (katakana) both represent the sound 'a'.
When I run the above query, however, both of these characters will appear, as according to Access, they're the same character and therefore a duplicate. Essentially it's a case-insensitive search where I need a case-sensitive one.
I got around this issue when doing a simple SELECT to find a Keyword using StrComp to perform a binary comparison, because this method correctly treats hiragana and katakana as distinct. I don't know how I can adapt the query above to use StrComp, though, because it's not directly evaluating one string against another as in the linked question.
Basically what I'm asking is: how can I do a query that will return all duplicates in a table, case-sensitive?
You can use exists instead:
SELECT [KeywordID], [Keyword]
FROM Keyword as k
WHERE EXISTS (SELECT 1
FROM Keyword as k2
WHERE STRCOMP(k2.Keyword, k.KeyWord, 0) = 0 AND
k.KeywordID <> k2.KeywordID
);
Try with a self join:
SELECT k1.[KeywordID], k1.[Keyword], k2.[KeywordID], k2.[Keyword]
FROM Keyword AS k1 INNER JOIN Keyword AS k2
ON k1.[KeywordID] < k2.[KeywordID] AND STRCOMP(k1.[Keyword], k2.[Keyword], 0) = 0

Why am I getting results that don't match the first WHERE clause?

WHERE [SOTR_CUST_CODE] = 'O004'
AND [SOTD_STRC_CODE] LIKE 'PC%'
OR [SOTD_STRC_CODE] LIKE 'PD%'
This returns records of customers that are not 'O004', and I'm not sure why. Is there also a better way to search for a string that could start with 2 different sets of characters without using the LIKE function twice?
Using SQL Server 2012
You need to use parentheses in your clause. Without parentheses, it means:
(A and B) OR C
Therefore you will get all records matching condition C regardless of condition A or B.
I know it already has very good answer (thanks to Laposhasu Acsa), but wanted to clarify for future readers:
Below code (without parentheses) is the same as (A and B) or C
WHERE [SOTR_CUST_CODE] = 'O004'
AND [SOTD_STRC_CODE] LIKE 'PC%'
OR [SOTD_STRC_CODE] LIKE 'PD%'
First solution is to use parentheses:
WHERE [SOTR_CUST_CODE] = 'O004'
AND ([SOTD_STRC_CODE] LIKE 'PC%'
OR [SOTD_STRC_CODE] LIKE 'PD%')
Second solution, which suits this particular case is:
WHERE [SOTR_CUST_CODE] = 'O004'
AND [SOTD_STRC_CODE] LIKE 'P[CD]%'

An esoteric pondering regarding the lack of compatibility between % and = and <>

I am new to the world of programming but please humor me nonetheless.
I know that % works with LIKE and NOT LIKE. For example the following two queries work:
--QUERY 1
SELECT *
FROM TrumpFeccandid_Pacs
WHERE PACID NOT LIKE 'C%'
--QUERY 2
SELECT *
FROM TrumpFeccandid_Pacs
WHERE PACID LIKE 'C%'
However % does not work with = or <>. For example, the following two queries do not work:
--QUERY A
SELECT *
FROM TrumpFeccandid_Pacs
WHERE PACID <> 'C%'
--QUERY B
SELECT *
FROM TrumpFeccandid_Pacs
WHERE PACD = 'C%'
Why is this the case? Intuitively speaking I feel like not only should queries A and B work but Query A should be equivalent to Query 1 and Query B should be equivalent to Query 2.
These examples were using T-SQL from Sql Server 2016.
Image a relatively simple query like this one:
SELECT *
FROM A
JOIN B ON A.Name = B.Name
If = worked like LIKE, god help you if Name contains a percent or underscore!
Intuitively speaking I feel like
That is where you go awry!
LIKE is defined a certain way, as are = and <>. The people who designed the language presumably tried to make it accessible, to make it easy to understand and remember and use. What they did not do, because they could not do, is define it such that it meets everyone's expectations and hunches.
Why is LIKE different from =?
a like 'C%' is true if a starts with 'C'
a = 'C%' is true if a is exactly the 2 letter string 'C%'
But the real moral to this story IMO is that if you want to know how the language works, the best advice is RTFM. Especially when it doesn't work as expected.
SQL provides standard pattern matching like those used in Unix, grep, sed. These patters can be used only with operators "LIKE" and "NOT LIKE".....
LIKE/NOT LIKE are Boolean types i.e they returns TRUE/FALSE if the match_expression matches the specified pattern.
Following are various wild card used to match the patterns:
% = Any number of characters
_ = Any Single character
[] = Any single character within the specified range
[^] = Any single character not within the specified range
Documentation on patterns and like operators:
SQL server LIKE operator

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'