Mixing Like and Not Like in SQL - sql

I am trying to search a free text column that contain crime reports. I want to identify shot from a gun, but not blood shot eyes. What I wish is to exclude the term “shot” if it is saying blood shot, but still selected the row if shot is used elsewhere in the report. I believe the code below will exclude the row if “blood shot” is located, even if “shot” is mentioned multiple times.
(Narrative LIKE '%[^a-z]Shot[^a-z]%' and Narrative Not Like '%[^a-z]Blood?Shot[^a-z]%')
Is there a way exclude from the search terms if the term “shot” is near the term “Blood”. But not exclude the cell if the term “shot” shows up in another place in the report within the cell?

This is really not something you should be doing in base SQL -- databases are not very good are such string manipulation. You probably want to look into the full text index capabilities on your database.
But I think the simplest method is:
where replace(lower(narrative), 'blood shot', '') like '%shot%'
That is, remove the "blood shot" from the string and then check.
You may still want to have delimiters around "shot". Perhaps:
where concat(' ', replace(lower(narrative), 'blood shot', ''), ' ') like '%[^a-z]shot[^a-z]%'

Related

How do I find the amount of time a certain word appears in a title in SQL?

You have a database with a lot of movies and their specific titles, the question is as follows.
How many movies are there that have the word ‘love’ anywhere in the title? (Hint: The L in the word love can be upper or lower case and can be included in words such as ‘lovers’.)
This is my code thus far but I am not sure how to include the search for 'L' and 'Lovers'.
SELECT title
FROM Movies
WHERE title LIKE '%love%'
AND title LIKE 'love%'
OR title LIKE '%love'
Can anyone assist?
Many databases support case-insensitive strings by default, so this would find all of them:
WHERE title LIKE '%love%'
Some don't. A convenient function is to put the title in lower case:
WHERE LOWER(title) LIKE '%love%'
%love% will also match foxglove, rollover, sloven, pullover etc. You should also review your AND/OR use in the WHERE clause to get the expected results. Having said that, your '%love%' AND 'love%' is the same as just %love% since % matches nothing as well as anything.
You may get better results matching '% love%' OR 'love%' which will give (titles where love% is not the first word) AND (titles where love% is the first word). Use LOWER or UPPER as suggested by Gordon to make the search case insensitive:
WHERE UPPER(title) LIKE '% LOVE%' OR UPPER(title) LIKE 'LOVE%'

Regex matching everything except specific words

I have looked through the other questions asked on excluding regex, but I was unable to find the answer to my question.
I have the SQL statement
select --(* vendor(microsoft), product(odbc) guid'12345678-1234-1234-1234-123456789012' *)-- from TAB
With regex, I want to find every single character in that string, except
--(* vendor(microsoft), product(odbc)
and
*)--
The vendor and product names (microsoft and odbc) could be anything as well, I still want to exclude it.
I don't care what kind of characters there are, or if the SQL statement is even syntactically correct. The string could look like this, and I still want to find everything, including whitespaces, excluding what I mentioned above:
{Jane Doe?= --(* vendor(micro1macro2?), product(cdb!o) 123$% --(**) *)-- = ?
So far, I have this expression:
(--\(\* vendor\(.*\), product\(.*?\))|(\*\)--)
Which seems to work in finding what I want to exclude https://regex101.com/r/rMbYHz/204. However, I'm unable to negate it.
Does replace() do what you want?
select replace(replace(t.col, '--(* vendor(microsoft), product(odbc)', ''
), '*)--', ''
)

SQL remove spaces between specific character in a string?

I want to update a database table field using another field in the same table. Currently I have this table called sources.
Name Code
In the name column I have values like this example :
' Deals On Wheels '
'Homesru - Abu Dhabi - Madinat Zayed Gold Centre'
And I am having this update statement :
UPDATE Sources
SET Code = REPLACE((LTRIM(RTRIM(Name))),' ','-')
the result is :
Deals-On-Wheels-Al-Aweer
which is fine.
but for second one I have this :
Homesru---Abu-Dhabi---Madinat-Zayed-Gold-Centre
I want it to be like this :
Homesru-Abu-Dhabi-Madinat-Zayed-Gold-Centre
How can I Achieve this ? Any Help is appreciated.
As suggested by #DanielE. my answer will point to a more global solution, in case you ever need to replace duplicated/triplicated/quadriplicated/... occurrences of a character on a string.
I'll not create a full solution for this issue, is a recurring question and there are really good solutions around already. Check these links:
SQL Server Central: remove spaces between specific character in a string?. This forum post will point to the next link I'm posting here. But is good to know what they are asking and answering.
Replace multiple spaces with new one but you can slightly modify it to replace any character you want.
You can also rely on this answer Find and remove repeated strings from Aaron Bertrand.
try
REPLACE((LTRIM(RTRIM(REPLACE((LTRIM(RTRIM(Name))),' - ','-')))),' ','-')
this will first replace ' - ' with just '-'
You might want to look into using a UDF to do a regular expression search and replace. See https://launchpad.net/mysql-udf-regexp

Search column in SQL database ignoring special characters

Does anybody know if it's possible to do a %LIKE% search against a column in a SQL Server database but get it to ignore any special characters in the column?
So, for example if I have a column called "songs" and they contain the following...
Black Or White
No Sleep 'till Brooklyn
The Ship Song
Papa Don't Preach
If the user searches for "no sleey till brooklyn" then I would like it to return a match even though they forgot to include the apostrophe. I would also like it to return the 4th row if they search for "SOUL". I'm sure you get the idea....
Any help would really be appreciated.
I would look into using a Full Text Index and then you can use the power of FREETEXT and CONTAINS to do your search.
EDIT: I would still look into refining the Full Text Index searching, however, to follow on from another answer, this is an option using REPLACE.
SELECT
Artist,
Title
FROM
Songs
WHERE
REPLACE(REPLACE(REPLACE(Artist, '#',''), '*', ''), '"', '') LIKE '%Keywords%'
You will have various characters to remove. Single quotes, double quotes, hyphens, dots, commas, etc.
You can use Regular expressions in your where clause and do a match on the clean value. Read more about regex within SQL here.
As for the art where you want to return the 4th row for SOUL.. you will need a a data structure to tag songs and you will have to search on the tags for the match. I'm afraid we will need more details on your data structure for that.
Use a combination of TRANSLATE, UPPER, and TRIM.
This is an old question but I just stumbled upon it and am also working with song titles and want to expand upon the accepted answer that uses REPLACE. You can create a list of the characters you want to ignore and create a simple function in any language to generate the quick'n'dirty never-ending REPLACE lines. For example, in Python:
def sanitize(db_field):
special_chars = ['•', '"', "\\'", '*', ',']
sanitized = "REPLACE({}, '{}', '')".format(db_field, special_chars.pop(0))
for s in special_chars:
sanitized = "REPLACE({}, '{}', '')".format(sanitized, s)
return sanitized
A call such as sanitize("name") will return
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(name, '•', ''), '"', ''), '\'', ''), '*', ''), ',', '')
which can be used in your query. Just wrote this so hope it helps someone.

Custom ORDER BY to ignore 'the'

I'm trying to sort a list of titles, but currently there's a giant block of titles which start with 'The '. I'd like the 'The ' to be ignored, and the sort to work off the second word. Is that possible in SQL, or do I have to do custom work on the front end?
For example, current sorting:
Airplane
Children of Men
Full Metal Jacket
Pulp Fiction
The Fountain
The Great Escape
The Queen
Zardoz
Would be better sorted:
Airplane
Children of Men
The Fountain
Full Metal Jacket
The Great Escape
Pulp Fiction
The Queen
Zardoz
Almost as if the records were stored as 'Fountain, The', and the like. But I don't want to store them that way if I can, which is of course the crux of the problem.
Best is to have a computed column to do this, so that you can index the computed column and order by that. Otherwise, the sort will be a lot of work.
So then you can have your computed column as:
CASE WHEN title LIKE 'The %' THEN stuff(title,1,4,'') + ', The' ELSE title END
Edit: If STUFF isn't available in MySQL, then use RIGHT or SUBSTRING to remove the leading 4 characters. But still try to use a computed column if possible, so that indexing can be better. The same logic should be applicable to rip out "A " and "An ".
Rob
Something like:
ORDER BY IF(LEFT(title,2) = "A ",
SUBSTRING(title FROM 3),
IF(LEFT(title,3) = "An ",
SUBSTRING(title FROM 4),
IF(LEFT(title,4) = "The ",
SUBSTRING(title FROM 5),
title)))
But given the overhead of doing this more than a few times, you're really better off storing the title sort value in another column...
I think you could do something like
ORDER BY REPLACE(TITLE, 'The ', '')
although this would replace any occurrence of 'The ' with '', not just the first 'The ', although I don't think this would affect very much.
The best way to handle this would be to have a column that contains the value you want to use specifically for ordering output. Then you'd just have to use:
SELECT t.title
FROM MOVIES t
ORDER BY t.order_title
There are going to be various rules about what should and should not be used to order titles.
Based on your example, an alternative would be to use something like:
SELECT t.title
FROM MOVIES t
ORDER BY SUBSTR(t.title, INSTR(t.title, 'The '))
You could use a CASE statement to contain the various rules.
You can certainly arrange dynamically strip off 'The', though you'll soon find that you have to deal with 'A' and 'An' (except for the special case of titles like "A is for Alibi"). When "foreign" films enter the mix, you'll need to cope with "El" and "La" (except for that pesky edge case, "LA Story"). Then mix in some German films, and you'll need to cope with 'Der' and 'Die' (except for that pesky set of 'Die Hard' edge cases). See the pattern? You're headed down a path that keeps getting longer and more pitted with special cases.
The way forward on this that avoids an ever-growing set of special cases is to store the title as you want it display and store the title as you want it sorted.
For SQLite
ORDER BY CASE WHEN LOWER(SUBSTR(title,1,4)) = 'the ' THEN SUBSTR(title,5) ELSE title END ASC
Ways that will only remove the first The:
=SUBSTITUTE(A1,"The ","",1) OR more reliably:
=IF(IF(LEFT(A1,4)="The ",TRUE)=TRUE,RIGHT(A1,(LEN(A1)-4)),A1)
Second one is basically saying if the first left digit equals The, then check how many digits are in the cell, and show only the the right hand digits excluding The.