How to replace common words in sql column - sql

I have a table of common words that are used in sentences (i.e. A, the, and, where, etc...)
What I want to do is loop through all those words and strip them out of the descriptions that people have entered to attempt to generate common keywords or tags. But I can't use replace because replace will remove any instance of the common word regardless of whether it is only a couple of letters that make up a larger word. For instance:
I want to replace A in the description. Now obviously a lot of words contain the letter a. So all those A's will be stripped from the words. I don't want that. I only want it when A is used a a whole word. I can figure this out using regular expressions but was wondering if there was anyway to do this in SQL without having to resort to CLR proc.
Maybe I am missing something but I couldn't seem to find an easy way to do this without having to write some specific scenarios like: word plus space before, word plus space after, word plus period after, etc... I don't think that is the best way.

For quick and dirty, I used to slosh through the various SQL functions PATINDEX, LEFT, RIGHT and LIKE to do this sort of thing. For one-time data prep, I export to something like Excel and eyeball it.
A good approach also is to create a new StringSubstitutionTable with two columns SOURCESTRING and TARGETSTRING and run a replace function to replace the SOURCESTRING with the TARGETSTRING on the joined table. This is cool because you can just add substitution entries as needed.

You can try nesting the replaces for each word you would like to replace. For example:
UPDATE TableName
SET ColumnName = REPLACE(REPLACE(REPLACE(REPLACE(TableName.ColumnName,' a ',' '),' the ',' '),' and ',' '), ' ', ' ')
Let me know if this is what you were looking for.

Here is they way I did something similar to what you are trying to do.
During your replace action...
Append a space before and after the common word.
Append a space before and after the description.
Let's suppose you want to remove the CommonWord "A" from the Description.
Description: "A good phrase never starts with A or ends with A"
CommonWord: "A"
Update TableName
Set Description =
LTRIM(RTRIM(Replace(' ' + Description + ' ', ' ' + CommonWord + ' ', ' ')))
This lets you delete all whole words that equal 'A'. Because you are replacing ' A ' with a space you need to LTRIM RTRIM to remove any leading or trailing spaces.
You could also do this in two steps:
--
-- Step 1 Loop through all common words removing them
--
Update TableName
Set Description = Replace(' ' + Description + ' ', ' ' + CommonWord + ' ', ' ')
--
-- Step 2 Unconditionally Trim all Descriptions
--
Update TableName
Set Description = LTRIM(RTIM(Description))

Related

SQL Wildcards for space or nothing

I am using wildcards in a join statement to find when a specific phrase is contained in a text field. Originally, I tried something like:
[Text Field] LIKE '%[^0-9A-Z]' + [Phrase] + '[^0-9A-Z]%'
However, this only gives me results that contain the phrase inside of the text field surrounded by spaces or special characters. Instead I need it to be when the phrase is surrounded by a leading or trailing space, or if the phrase is at the beginning or end of text field. Does anyone have any idea on how to accomplish this? I have searched around to no avail.
Presumably, you are using SQL Server. Just add spaces to the beginning and end of the column:
' ' + [Text Field] + ' ' LIKE '%[^0-9A-Z]' + [Phrase] + '[^0-9A-Z]%'

Full Text Search Using Multiple Partial Words

I have a sql server database that has medical descriptions in it. I've created a full text index on it, but I'm still figuring out how this works.
The easiest example to give is if there is a description of Hypertensive heart disease
Now they would like to be able to type hyp hea as a search term and have it return that.
So from what I've read it seems like my query needs to be something like
DECLARE #Term VARCHAR(100)
SET #Term = 'NEAR(''Hyper*'',''hea*'')'
SELECT * FROM Icd10Codes WHERE CONTAINS(Description, #Term)
If I take the wild card out for Hypertensive and heart, and type out the full words it works, but adding the wild card in returns nothing.
If it makes any difference I'm using Sql Server 2017
So it was a weird syntax issue that didn't cause an error, but stopped the search from working.
I changed it to
SELECT * FROM Icd10Codes where CONTAINS(description, '"hyper*" NEAR "hea*"')
The key here being I needed double quotes " and not to single quotes. I assumed it was two single quotes, the first to escape the second, but it was actually double quotes. The above query returns the results exactly as expected.
this will work:
SELECT * FROM Icd10Codes where SOUNDEX(description)=soundex('Hyp');
SELECT * FROM Icd10Codes where DIFFERENCE(description,'hyp hea')>=2;
You could try a like statement. You can find a thorough explanation here.
Like so:
SELECT * FROM Icd10Codes WHERE Icd10Codes LIKE '%hyp hea%';
And then instead of putting the String in there just use a variable.
If you need to search for separated partial words, as in an array of search terms, it gets a bit tricky, since you need to dynamically build the SQL statement.
MSSQL provides a few features for full text search. You can find those here. One of them is the CONTAINS keyword:
SELECT column FROM table WHERE CONTAINS (column , 'string1 string2 string3');
For me - this had more mileage.
create a calculated row with fields as full text search.
fullname / company / lastname all searchable.
ALTER TABLE profiles ADD COLUMN fts tsvector generated always as (to_tsvector('english', coalesce(profiles.company, '') || ' ' || coalesce(profiles.broker, '') || ' ' || coalesce(profiles.firstname, '') || ' ' || coalesce(profiles.lastname, '') || ' ' )) stored;
let { data, error } = await supabase.from('profiles')
.select()
.textSearch('fts',str)

search for the presence of two strings in oracle database in a clob column?

I have a clob column with a long string.
like "great job done today. good "
I want to retrieve all the records which has both the words great and good .
both the position is not fixed.
It's not clear what "which has both the words great and good " actually means.
If you mean you want to check if the string has both patterns of words, then it is simple as
where lower(col) like '%good%' AND lower(col) like '%great%'
If you mean they should contain true words good and great, then for simpler cases something like should do
where ' ' || lower(col) || ' ' LIKE '% good %' AND ' ' || lower(col) || ' ' LIKE '% great %'
However, this will not match a sentence ending with the word and having a full stop. It gets lengthy if you want to simply use LIKE to handle such scenarios.
REGEXP functions give more flexibility, but could be less performant.
WHERE REGEXP_LIKE ( s, '(^|\W)good(\W|$|\.)') AND REGEXP_LIKE ( s, '(^|\W)great(\W|$|\.)')
This searches for word boundaries, i.e. the word surrounded by non-word characters and start and end including an ending dot. If you want to ignore case, add ,'i' as 3rd argument to it.
I think this is the basic logic you want:
where lower(col) like '%good%' and lower(col) like '%great%'
In clobs, you can't quite use like:
where dbms_lob.instr(col, 'good') and dbms_log.instr(col, 'great')
Try this and let me know if it works for you.
SELECT *
FROM clob_table
WHERE DBMS_LOB.INSTR(clob_text,'great')>0
AND DBMS_LOB.INSTR(clob_text,'good')>0;
Remember put the names of your table and columns.
Here my results,

White spaces in sql

that will sounds stupid, but I have a table with names, those names may finish with white space or may not. E.g. I have name ' dummy ', but even if in the query I write only ' dummy' it will find the record ' dummy '. Can I fix it somehow?
SELECT *
FROM MYTABLE where NAME=' dummy'
Thanks
This is how SQL works (except Oracle), when you compare two strings the shorter one will be padded with blanks to the length of th 2nd string.
If you really need to consider trailings blanks you can switch to LIKE which doesn't follow that rule:
SELECT *
FROM MYTABLE where NAME LIKE ' dummy'
Of course, you better clean your data during load.
There's only one thing which is worse than trailing spaces, leading spaces (oh, wait a minute, you got them, too).

SQL (MySQL): Match first letter of any word in a string?

(Note: This is for MySQL's SQL, not SQL Server.)
I have a database column with values like "abc def GHI JKL". I want to write a WHERE clause that includes a case-insensitive test for any word that begins with a specific letter. For example, that example would test true for the letters a,c,g,j because there's a 'word' beginning with each of those letters. The application is for a search that offers to find records that have only words beginning with the specified letter. Also note that there is not a fulltext index for this table.
You can use a LIKE operation. If your words are space-separated, append a space to the start of the string to give the first word a chance to match:
SELECT
StringCol
FROM
MyTable
WHERE
' ' + StringCol LIKE '% ' + MyLetterParam + '%'
Where MyLetterParam could be something like this:
'[acgj]'
To look for more than a space as a word separator, you can expand that technique. The following would treat TAB, CR, LF, space and NBSP as word separators.
WHERE
' ' + StringCol LIKE '%['+' '+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160)+'][acgj]%'
This approach has the nice touch of being standard SQL. It would work unchanged across the major SQL dialects.
Using REGEXP opearator:
SELECT * FROM `articles` WHERE `body` REGEXP '[[:<:]][acgj]'
It returns records where column body contains words starting with a,c,g or i (case insensitive)
Be aware though: this is not a very good idea if you expect any heavy load (not using index - it scans every row!)
Check the Pattern Matching and Regular Expressions sections of the MySQL Reference Manual.