T-SQL CONTAINS with numbers and dots (.)

T-SQL CONTAINS with numbers and dots (.) - sql

Let's consider User.Note = 'Version:3.7.21.1'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7.2*"')
=> returns something
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
If User.Note = 'Version:3.7.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns something
If User.Note = 'Version:3.72.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
I can't figure out how it works. It should always returns something when I search for "3.7*".
Do you know what's the logic behind this ?
PS: if I replace the numbers by letters, there's no problem.

I think your problem is being caused by the unpredictability of the word breaker interacting with the punctuation marks within the data. Full text search is based on the concept of strings of characters, not including spaces and punctuation. When the engine is building the index it sees the periods and breaks the word in weird ways.
As an example, I made a small table with the three values you provided...
VALUES (1,'3.7.21.1'),(2,'3.7.21'),(3,'3.72.21')
Now when I do your selects, I get results on all four... not the results I expect, though.
For me, this returns all three values
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7.2*"')
and this returns only 3.7.21
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7*"')
So let's run this and take a look at the contents of the full text index
SELECT * FROM sys.dm_fts_index_keywords(db_id('{databasename}'), object_id('{tablename}'))
For my results (yours are quite probably different) I've got the following display_term values
display_term document_count
21 3
3 3
3.7.21 1
7 2
72 1
So let's look at the first search criterion '"3.7.2*"'
If I shove that into sys.dm_fts_parser...
select * from sys.dm_fts_parser('"3.7.2*"', 1033, NULL, 0)
...it's showing me that it's breaking with matches on
3
7
2
But if I do...
select * from sys.dm_fts_parser('"3.7*"', 1033, NULL, 0)
I'm getting a single exact match on the term 3.7 and sys.dm_fts_index_keywords told me earlier that I only have one document/row that contains 3.7
You might also experience additional weirdness because numbers 0-9 are usually in the system stopwords and can be left out of an index because they're considered to be useless. This might be why it works when you change to letters.
Also, I know you've decided to replace LIKE, but Microsoft has suggested that you only use alphanumeric characters in your full text indexes and, if you need to use non-alphanumeric characters in search criteria, you should use LIKE. Perhaps changing the periods to some alphanumeric replacement that won't be used in normal values?

Contains will only work if the column is in a full text index. If it it is not indexed you will need to use like:
SELECT * FROM [USER] WHERE NOTE like '3.7%' --or '%3.7%
Are you wanting to use CONTAINS because you think it will be faster?(It generally is)
The Microsoft document lists all the ways you can format and use CONTAINS(11 examples)
Here is the Microsoft doc on CONTAINS

Related

Remove unnecessary Characters by using SQL query

Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.

Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.

This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1

Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.

While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.

Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR（REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle： http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

Matching exactly 2 characters in string - SQL

How can i query a column with Names of people to get only the names those contain exactly 2 “a” ?
I am familiar with % symbol that's used with LIKE but that finds all names even with 1 a , when i write %a , but i need to find only those have exactly 2 characters.
Please explain - Thanks in advance
Table Name: "People"
Column Names: "Names, Age, Gender"

Assuming you're asking for two a characters search for a string with two a's but not with three.
select *
from people
where names like '%a%a%'
and name not like '%a%a%a%'

Use '_a'. '_' is a single character wildcard where '%' matches 0 or more characters.
If you need more advanced matches, use regular expressions, using REGEXP_LIKE. See Using Regular Expressions With Oracle Database.
And of course you can use other tricks as well. For instance, you can compare the length of the string with the length of the same string but with 'a's removed from it. If the difference is 2 then the string contained two 'a's. But as you can see things get ugly real soon, since length returns 'null' when a string is empty, so you have to make an exception for that, if you want to check for names that are exactly 'aa'.
select * from People
where
length(Names) - 2 = nvl(length(replace(Names, 'a', '')), 0)

Another solution is to replace everything that is not an a with nothing and check if the resulting String is exactly two characters long:
select names
from people
where length(regexp_replace(names, '[^a]', '')) = 2;
This can also be extended to deal with uppercase As:
select names
from people
where length(regexp_replace(names, '[^aA]', '')) = 2;
SQLFiddle example: http://sqlfiddle.com/#!4/09bc6

select * from People where names like '__'; also ll work

Count with muliple where conditions in ms access

I have the query below;
Select count(*) as poor
from records where deviceId='00019' and type='Poor' and timestamp between #14-Sep-2012 01:01:01# and #24-Sep-2012 01:01:01#
table is like;
id. deviceId, type, timestamp
data is like;
data is like;
1, '00019', 'Poor', '19-Sep-2012 01:01:01'
2, '00019', 'Poor', '19-Sep-2012 01:01:01'
3, '00019', 'Poor', '19-Sep-2012 01:01:01'
4, '00019', 'Poor', '19-Sep-2012 01:01:01'
i am trying to count the devices with a specific specific type.
Please help.. access always returns wrong data. it is returning 1 while 00019 has 4 entries for poor

Type and timestamp are both reserved words, so enclose them in square brackets in your query like this: [type] and [timestamp]. I doubt those reserved words are the cause of your problem, but it's hard to predict exactly when reserved words will cause query problems, so just rule out this possibility by using the square brackets.
Beyond that, stored text values sometimes contained extra non-visible characters. Check the lengths of the stored text values to see whether any are longer than expected.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len([type]) AS LenOfType,
Len([timestamp]) AS LenOfTimestamp
FROM records;
In comments you mentioned spaces (ASCII value 32) in your stored values. I had been thinking we were dealing with other non-printable/invisible characters. If you have one or more actual space characters at the beginning and/or end of a stored deviceId value, the Trim() function will discard them. So this query will give you different length numbers in the two columns:
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Trim(deviceId)) AS LenOfDeviceId_NoSpaces
FROM records;
If the stored values can also include spaces within the string (not just at the beginning and/or end), Trim() will not remove those. In that case, you could use the Replace() function to discard all the spaces. Note however a query which uses Replace() must be run from inside an Access application session --- you can't use it from Java code.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Replace(deviceId, ' ', '')) AS LenOfDeviceId_NoSpaces
FROM records;
If that query returns the same length numbers in both columns, then we are not dealing with actual space characters (ASCII value 32) ... but some other type of character(s) which look "space-like".

If you want to count devices with specific type irrespective of deviceids then use this:
Select count(*) as excellent
from records where type='Poor'
If you want to count devices with specific deviceid irrespective of types then use this:
Select count(*) as excellent
from records where deviceId='00019'

SQL: How to query items from one table that are not in another when the syntax is not the same?

I have a question because I'm really bad at SQL. I understand basic functions but when
it gets a bit more complex, I'm completly lost.
here is what I have:
tables: tA, tB
columns: tA: refA tB: refB
basically refA and refB represent the same thing (some id of a form like xxx-xxx-xxx), but
refB can have information appended (like xxx-xxx-xxx_Zxxx or xxx-xxx-xxx Zxxx)
here is what I know how to do:
querying items that are in a table but not in another (when they are exactly the same)
select refA
from tA
where not exists (select *
from tB
where tB.refB = tA.refA
)
What i want to do:
I want a query that will list items from refA that are not in refB.
BUT, Problem is if I run a "simple" query with a NOT EXISTS like I just showed, it will return everything,
because of the appends. so I thought about using some syntax like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE tB.refB LIKE CONCAT(tA.refA,'%'))
but... of course, it doesn't work.
Could someone show me how it should be done, and also explain how it works, so I can learn ?
Thanks in advance !
edit: additional info
I can't use a left() or something alike, because the ref format is similar but not always the same (varies in number of characters).
The only way to detect the end of the id before the append, is that there is either a blank space or an underscore.
edit 2: data sample causing problems (MON, Jan. 10th)
here is some actual data from the tables, which makes most answers people have given here
miss some results :/
in tA:
B20-60-04-6A-1
B20-60-04-6A-11
B20-60-04-6A-12
B20-60-04-6A-13
in tB:
B20-60-04-6A-11_XX
B20-60-04-6A-12_XX
B20-60-04-6A-13_XX
problem with mid(), left(), etc. is that if we check "B20-60-04-6A-1" (14 chars)
against the 14 first chars, it will return 3 positives, while in fact it is not in tB...
so, how can we proceed ?
Examples of data patterns in tA are like this:
(X, XYZ: charaters. x: alphanumerical)
Xxx-xx-xx-x
Xxx-xx-xx-xx
Xxx-xx-xx-xx-xx
Xxx-xx-xx-xx-xx-x
etc
examples of data patterns in tB:
Xxx-xx-xx-xx-xx-XYZ-xx Z xxx_XX
Xxx-xx-xx-xx-xx-XYZZxxx_XX
Xxx-xx-xx-xx-xx Z xxx_XX
XYZ are always the same 3 characters. When we do not have XYZ, there is always a blank space or an underscore.
so the string of data we compare should be trimmed according to this:
- from start to -XYZ string
- or, if no -XYZ in the string, from start to the first " " or "_"
I'd write that lightning fast in VBA, but in SQL... well, I'll give it a shot, but I'm really bad at it :D

So, first off, you need a function that will change refB to not have the appended information, so it can be compared properly with refA. There will be several approaches, but something like this should work:
Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1)
That will convert any refB like "123-456 123 EXTRA STUFF" or "123-456_123_EXTRA_STUFF" into "123-456". That result should then be okay to compare directly with a refA.
EDIT: A short explanation of the expression above. What I'm doing is:
Adding an underscore to the end of refB, so that there's always at least one underscore (this copes for the case where refB is the same as refA, e.g. "123" becomes "123_")
Replacing all spaces in refB with underscores (the Replace function). Now we know that the separator is always an underscore, and we also know from step 1 that there will be at least one underscore.
Finding the location of the first underscore (the InStr function). This is the position where refB is split between refA and the additional stuff.
Grabbing all the characters between the start of the string and this first underscore, i.e. the part before the separator.
So, that gives you something like this:
select refA
from tA
where not exists (select *
from tB
where Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1) = tA.refA
)
I would use this approach rather than comparing with wildcards, or trimming refB to match the length of refA, because of this scenario:
refA
====
123
123-456
123-456-789
refB
====
123-456-789_This_is_a_test
In this case, trimming or wildcard matching refA with refB will result in success for all refAs, because "123*", "123-456*" and "123-456-789*" all match "123-456-789_This_is_a_test".

So you want everything from A where not in B, but where only the start of B's id matches?
select refA
from tA
left outer join tB
on tA.refA = left( tB.refB, len(tA.refA)) --trim B's id to the length of A's
where tB.refB is null

Maybe use a left() function, if one exists in access? Like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Len(tA.refA)) = tA.refA)
If, as you said, you have to look for a space or underscore in the refA, you can use this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Max(Instr(tA.refA, ' '), Instr(tA.refA, '_'))) = tA.refA)

I'd change the schema. Your second table should have two columns, one containing the first part of the identifier, the other containing the second; if the column was the primary key first, just create a unique multi-column index and disallow NULL values.
You can also add a foreign key constraint this way, and/or optimize the comparisons by introducing a surrogate key in the first table and referencing that from the second.
If you do not have an index on the substring you are trying to match, you will end up with a full scan for each value you are looking for, this is hideously expensive.

I think your suggestion will work in a slightly different format, generally the wild card in Access is *, unless you have set ANSI 92 mode, however you can use ALIKE with % in 'ordinary' mode.
EDIT : DIFFERENT IDEA
SELECT tA.refA
FROM tA
WHERE (((tA.refA)
Not In (SELECT Mid(tb.RefB,1,Len(ta.RefA)) FROM tb)));

This is valid syntax and close to the syntax you say you want to write:
SELECT refA
FROM tA
WHERE NOT EXISTS (
SELECT *
FROM tB
WHERE tB.refB ALIKE tA.refA & '%'
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

T-SQL CONTAINS with numbers and dots (.) - sql

Related

Remove unnecessary Characters by using SQL query

Finding first and second word in a string in SQL Developer

Matching exactly 2 characters in string - SQL

Count with muliple where conditions in ms access

SQL: How to query items from one table that are not in another when the syntax is not the same?

Categories

Resources