Find exactly text - sql

Text column is NVARCHAR(MAX) type.
ID Text
001 have odds and modds
002 odds>=12
003 modds
004 odds < 1
How can I search in Text column contains odds and not contain modds
I try:
Select * from MyTable
Where text LIKE '%odds%' AND text NOT LIKE '%modds%'
But result not correct return all. I want return
ID Text
001 have odds and modds
002 odds>=12
004 odds < 1
Any ideas? Thanks!

WHERE (text LIKE '%odds%' AND text NOT LIKE '%modds%')
OR (text LIKE '%odds%odds%')
Some questions regarding how this works. First off, SQL works with "sets" of data so we need a selector (WHERE clause) to create our "set" (or it is the entire table "set" if none is included)
SO here we created two portions of the set.
First we select all the rows that include the value "odds" in them somewhere but do NOT include "modds" in them. This excludes rows that ONLY include "modds" in them.
Second, we include rows where they have BOTH/two values of "odds" in them - the "%" is a wildcard so to break it down starting at the beginning.
"'%" anything at the start
"'%odds" anything at the start followed by "odds"
"'%odds%" anything at the start with anything following that
"'%odds%odds" anything at the start with anything following that but has "odds" after that
"'%odds%odds%'" anything at the start % with "odds" with anything in between % with "odds" following that with anything at the end %
This works for THIS SPECIFIC case because both the words contain "odds" so the order is NOT specific here. IF we wanted to do that with different words for example "cats", "cats" and "dogs" but JUST "dogs: we would have:
WHERE (mycolumn LIKE '%cats%' AND mycolumn NOT LIKE '%dogs%')
OR ((mycolumn LIKE '%cats%dogs%') OR (mycolumn LIKE '%dogs%cats%'))
This could also be written like: (has BOTH with the AND)
WHERE (mycolumn LIKE '%cats%' AND mycolumn NOT LIKE '%dogs%')
OR (mycolumn LIKE '%cats%' AND mycolumn LIKE '%dogs%')
This would catch the values without regard to the order of the "cats" and "dogs" values in the column.
Note the groupings with the parenthesis is not optional for these last two solution examples.

Select * from MyTable
Where text LIKE 'odds%'

Select * from MyTable Where text LIKE '% odds%' or text LIKE 'odds%'

The most flexible and efficient way is to use full-text search. This would create an index for each word in the specified text columns.
This feature is included with (at least some versions of) Microsoft SQL Server.

Related

T-SQL CONTAINS with numbers and dots (.)

Let's consider User.Note = 'Version:3.7.21.1'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7.2*"')
=> returns something
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
If User.Note = 'Version:3.7.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns something
If User.Note = 'Version:3.72.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
I can't figure out how it works. It should always returns something when I search for "3.7*".
Do you know what's the logic behind this ?
PS: if I replace the numbers by letters, there's no problem.
I think your problem is being caused by the unpredictability of the word breaker interacting with the punctuation marks within the data. Full text search is based on the concept of strings of characters, not including spaces and punctuation. When the engine is building the index it sees the periods and breaks the word in weird ways.
As an example, I made a small table with the three values you provided...
VALUES (1,'3.7.21.1'),(2,'3.7.21'),(3,'3.72.21')
Now when I do your selects, I get results on all four... not the results I expect, though.
For me, this returns all three values
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7.2*"')
and this returns only 3.7.21
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7*"')
So let's run this and take a look at the contents of the full text index
SELECT * FROM sys.dm_fts_index_keywords(db_id('{databasename}'), object_id('{tablename}'))
For my results (yours are quite probably different) I've got the following display_term values
display_term document_count
21 3
3 3
3.7.21 1
7 2
72 1
So let's look at the first search criterion '"3.7.2*"'
If I shove that into sys.dm_fts_parser...
select * from sys.dm_fts_parser('"3.7.2*"', 1033, NULL, 0)
...it's showing me that it's breaking with matches on
3
7
2
But if I do...
select * from sys.dm_fts_parser('"3.7*"', 1033, NULL, 0)
I'm getting a single exact match on the term 3.7 and sys.dm_fts_index_keywords told me earlier that I only have one document/row that contains 3.7
You might also experience additional weirdness because numbers 0-9 are usually in the system stopwords and can be left out of an index because they're considered to be useless. This might be why it works when you change to letters.
Also, I know you've decided to replace LIKE, but Microsoft has suggested that you only use alphanumeric characters in your full text indexes and, if you need to use non-alphanumeric characters in search criteria, you should use LIKE. Perhaps changing the periods to some alphanumeric replacement that won't be used in normal values?
Contains will only work if the column is in a full text index. If it it is not indexed you will need to use like:
SELECT * FROM [USER] WHERE NOTE like '3.7%' --or '%3.7%
Are you wanting to use CONTAINS because you think it will be faster?(It generally is)
The Microsoft document lists all the ways you can format and use CONTAINS(11 examples)
Here is the Microsoft doc on CONTAINS

SELECT middle part of a String if it exists. Postgresql

i've got a problem with transferring "real-World" data into my schema.
It's actually a "project" for my Database course and they gave us ab table with dog race results. This Table has a column which contains the name of the Dog (which itself consists of the actuall name and the name of the breeder) and informations about the Birthcountry, actual living Country and the birth year.
Example filed are "Lillycette [AU 2012]" or "Black Bear Lee [AU/AU 2013]" or "Lemon Ralph [IE/UK 1998]".
I've managed it to get out the first word and save it in the right column with split_part like this:
INSERT INTO tblHund (rufname)
SELECT
split_part(name, ' ', 1) AS rufname,
FROM tblimport;
tblimport is a table where I dumped the data from the csv file.
That works just as it should.
Accessing the second part of the Name with this fails because sometimes there isn't a second part and sometimes times there second part consists of two words.
And this is the where iam stuck right now.
I tried it with substring and regular expressions:
INSERT INTO tblZwinger (Name)
SELECT
substring(vatertier from E'[^ ]*\\ ( +)$')AS Name
FROM tblimport
WHERE substring(vatertier from E'[^ ]*\\ ( +)$') != '';
The above code is executed without errors but actually does nothing because the SELECT statement just give empty strings back.
It took me more then 3h to understand a bit of this regular Expressions but I still feel pretty stupid when I look at them.
Is there any other way of doing this. If so just give me a hint.
If not what is wrong with my expression above?
Thanks for your help.
You need to use atom ., which matches any single character inside capturing group:
E'[^ ]*\\ (.+)$'
SELECT
tblimport.*,
ti.parts[1] as f1,
ti.parts[2] as f2, -- It should be the "middle part"
ti.parts[3] as f3
FROM
tblimport,
regexp_matches(tblimport.vatertier, '([^\s]+)\s*(.*)\s+\[(.*)\]') as ti(parts)
WHERE
nullif(ti.parts[2], '') is not null
Something like above.

Remove unnecessary Characters by using SQL query

Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.
Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.
This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.

SQL: How to query items from one table that are not in another when the syntax is not the same?

I have a question because I'm really bad at SQL. I understand basic functions but when
it gets a bit more complex, I'm completly lost.
here is what I have:
tables: tA, tB
columns: tA: refA tB: refB
basically refA and refB represent the same thing (some id of a form like xxx-xxx-xxx), but
refB can have information appended (like xxx-xxx-xxx_Zxxx or xxx-xxx-xxx Zxxx)
here is what I know how to do:
querying items that are in a table but not in another (when they are exactly the same)
select refA
from tA
where not exists (select *
from tB
where tB.refB = tA.refA
)
What i want to do:
I want a query that will list items from refA that are not in refB.
BUT, Problem is if I run a "simple" query with a NOT EXISTS like I just showed, it will return everything,
because of the appends. so I thought about using some syntax like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE tB.refB LIKE CONCAT(tA.refA,'%'))
but... of course, it doesn't work.
Could someone show me how it should be done, and also explain how it works, so I can learn ?
Thanks in advance !
edit: additional info
I can't use a left() or something alike, because the ref format is similar but not always the same (varies in number of characters).
The only way to detect the end of the id before the append, is that there is either a blank space or an underscore.
edit 2: data sample causing problems (MON, Jan. 10th)
here is some actual data from the tables, which makes most answers people have given here
miss some results :/
in tA:
B20-60-04-6A-1
B20-60-04-6A-11
B20-60-04-6A-12
B20-60-04-6A-13
in tB:
B20-60-04-6A-11_XX
B20-60-04-6A-12_XX
B20-60-04-6A-13_XX
problem with mid(), left(), etc. is that if we check "B20-60-04-6A-1" (14 chars)
against the 14 first chars, it will return 3 positives, while in fact it is not in tB...
so, how can we proceed ?
Examples of data patterns in tA are like this:
(X, XYZ: charaters. x: alphanumerical)
Xxx-xx-xx-x
Xxx-xx-xx-xx
Xxx-xx-xx-xx-xx
Xxx-xx-xx-xx-xx-x
etc
examples of data patterns in tB:
Xxx-xx-xx-xx-xx-XYZ-xx Z xxx_XX
Xxx-xx-xx-xx-xx-XYZZxxx_XX
Xxx-xx-xx-xx-xx Z xxx_XX
XYZ are always the same 3 characters. When we do not have XYZ, there is always a blank space or an underscore.
so the string of data we compare should be trimmed according to this:
- from start to -XYZ string
- or, if no -XYZ in the string, from start to the first " " or "_"
I'd write that lightning fast in VBA, but in SQL... well, I'll give it a shot, but I'm really bad at it :D
So, first off, you need a function that will change refB to not have the appended information, so it can be compared properly with refA. There will be several approaches, but something like this should work:
Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1)
That will convert any refB like "123-456 123 EXTRA STUFF" or "123-456_123_EXTRA_STUFF" into "123-456". That result should then be okay to compare directly with a refA.
EDIT: A short explanation of the expression above. What I'm doing is:
Adding an underscore to the end of refB, so that there's always at least one underscore (this copes for the case where refB is the same as refA, e.g. "123" becomes "123_")
Replacing all spaces in refB with underscores (the Replace function). Now we know that the separator is always an underscore, and we also know from step 1 that there will be at least one underscore.
Finding the location of the first underscore (the InStr function). This is the position where refB is split between refA and the additional stuff.
Grabbing all the characters between the start of the string and this first underscore, i.e. the part before the separator.
So, that gives you something like this:
select refA
from tA
where not exists (select *
from tB
where Left(tb.RefB, InStr(Replace(tb.RefB+"_", " ", "_"), "_") -1) = tA.refA
)
I would use this approach rather than comparing with wildcards, or trimming refB to match the length of refA, because of this scenario:
refA
====
123
123-456
123-456-789
refB
====
123-456-789_This_is_a_test
In this case, trimming or wildcard matching refA with refB will result in success for all refAs, because "123*", "123-456*" and "123-456-789*" all match "123-456-789_This_is_a_test".
So you want everything from A where not in B, but where only the start of B's id matches?
select refA
from tA
left outer join tB
on tA.refA = left( tB.refB, len(tA.refA)) --trim B's id to the length of A's
where tB.refB is null
Maybe use a left() function, if one exists in access? Like this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Len(tA.refA)) = tA.refA)
If, as you said, you have to look for a space or underscore in the refA, you can use this:
SELECT refA
FROM tA
WHERE NOT EXISTS (SELECT *
FROM tB
WHERE Left(tB.refB, Max(Instr(tA.refA, ' '), Instr(tA.refA, '_'))) = tA.refA)
I'd change the schema. Your second table should have two columns, one containing the first part of the identifier, the other containing the second; if the column was the primary key first, just create a unique multi-column index and disallow NULL values.
You can also add a foreign key constraint this way, and/or optimize the comparisons by introducing a surrogate key in the first table and referencing that from the second.
If you do not have an index on the substring you are trying to match, you will end up with a full scan for each value you are looking for, this is hideously expensive.
I think your suggestion will work in a slightly different format, generally the wild card in Access is *, unless you have set ANSI 92 mode, however you can use ALIKE with % in 'ordinary' mode.
EDIT : DIFFERENT IDEA
SELECT tA.refA
FROM tA
WHERE (((tA.refA)
Not In (SELECT Mid(tb.RefB,1,Len(ta.RefA)) FROM tb)));
This is valid syntax and close to the syntax you say you want to write:
SELECT refA
FROM tA
WHERE NOT EXISTS (
SELECT *
FROM tB
WHERE tB.refB ALIKE tA.refA & '%'
);

Is it possible to get the matching string from an SQL query?

If I have a query to return all matching entries in a DB that have "news" in the searchable column (i.e. SELECT * FROM table WHERE column LIKE %news%), and one particular row has an entry starting with "In recent World news, Somalia was invaded by ...", can I return a specific "chunk" of an SQL entry? Kind of like a teaser, if you will.
select substring(column,
CHARINDEX ('news',lower(column))-10,
20)
FROM table
WHERE column LIKE %news%
basically substring the column starting 10 characters before where the word 'news' is and continuing for 20.
Edit: You'll need to make sure that 'news' isn't in the first 10 characters and adjust the start position accordingly.
You can use substring function in a SELECT part. Something like:
SELECT SUBSTRING(column, 1,20) FROM table WHERE column LIKE %news%
This will return the first 20 characters from column column
I had the same problem, I ended up loading the whole field into C#, then re-searched the text for the search string, then selected x characters either side.
This will work fine for LIKE, but not full text queries which use FORMS OF INFLECTION because that may match "women" when you search for "woman".
If you are using MSSQL you can perform all kinds VB-like of substring functions as part of your query.