How to quickly compare many strings?

How to quickly compare many strings? - sql

In SQL Server, I have a string column that contains numbers. Each entry I need is only one number so no parsing is needed. I need some way to find all rows that contain numbers from 400 to 450. Instead of doing:
...where my stringcolumn like '%400%' or stringcolumn like '%401%' or stringcolumn like '%402%' or ...
is there a better that can save on some typing?
There are also other values in these rows such as: '5335154', test4559#me.com', '555-555-5555'. Filtering those out will need to be taken into account.

...where stringcolumn like '4[0-4][0-9]' OR stringcolumn = '450'
You don't need the wildcard if you want to restrict to 3 digits.

Use regex to accomplish this.
...where stringcolumn like '4[0-4][0-9]' OR stringcolumn like '450'

one way
WHERE Column like '%4[0-4][09]%'
OR Column LIKE '%500%'
keep in mind that this will pick anything with the number in it, so 5000 will be returned as well

I would do the following:
select t.*
from (select t.*,
(case when charindex('4', col) > 0
then substrint(col, charindex('4', col), charindex('4', col) + 2)
end) as col4xx
from t
) t
where (case when isnumeric(col4xx) = 1
then (case when cast(col4xx as int) between 400 and 450 then 'true'
end)
end) = 'true'
I'm not a fan of having case statements in WHERE clauses. However, to ensure conversion to a number, this is needed (or the conversion could become a column in another subquery). Note that the following is not equivalent:
where col4xx between '400' and '450'
Since the string '44A' would match.

Related

TSQL - Remove Everything after the last period

I have a column string that may be one of the following.
10.0.2531.0
10.50.2500
10.0.2531.60
My requirement is, if there are 3 periods/decimal points, remove the last period/decimal and everything after that.
If I use the following, this will take care of the first row where there is only ".0", however, it does not work for the third row.
select
case
when right(column_1,2) = '.0' then left(column_1,len(column_1)-2)
else column_1 end,
FROM
table_1
I also tried the following but that didn't work.
select
case
when right(column_1,2) = '.' then left(column_1,len(column_1)-2)
when right(column_1,3) = '.' then left(column_1,len(column_1)-3)
else column_1 end,
FROM
table_1
The number after the third period/decimal may be a 0 or another number.

The following works, under the assumption that there are never five periods:
select (case when ip like '%.%.%.%'
then left(ip, len(ip) - charindex('.', reverse(ip))
else ip
end) as firstThree

Use a combination of charindex and substring.
SQL Fiddle
select reverse(substring(reverse(column_1), charindex('.',reverse(column_1))+1, len(column_1)))
from table_1
where len(column_1) - len(replace(column_1,'.','')) = 3

SQL query: convert

I'm trying to read a column from a database using a SQL query. The column consists of empty string or numbers as strings, such as
"7500" "4460" "" "2900" "2640" "1850" "" "2570" "9050" "8000" "9600"
I'm trying to find the right sql query to extract all the numbers (as integers) and removing the empty ones, but I'm stuck. So far I've got
SELECT *
FROM base
WHERE CONVERT(INT, code) IS NOT NULL
Done in program R (package sqldf)

If all columns are valid integers, you could use:
select * , cast(code as int) IntCode
from base
where code <> ''
To prevent cases when field code is not a valid number, use:
select *, cast(codeN as int) IntCode
from base
cross apply (select case when code <> '' and not code like '%[^0-9]%' then code else NULL end) N(codeN)
where codeN is not null
SQL Fiddle
UPDATE
To find rows where code is not a valid number, use
select * from base where code like '%[^0-9]%'

select *
from base
where col like '[1-9]%'
Example: http://sqlfiddle.com/#!6/f7626/2/0
If you don't need to test for the number being valid, ie. a string such as '909XY2' then this may run marginally faster, more or less depending on the size of the table

Is this what you want?
SELECT (case when code not like '%[^0-9]%' then cast(code as int) end)
FROM base
WHERE code <> '' and code not like '%[^0-9]%';
The conditions are repeated in the where and case on purpose. SQL Server does not guarantee that where filters are applied before logic in the select, so you can get an error with conversions. More recent versions of SQL Server have try_convert() to fix this problem.

Using sqldf with the default sqlite database and this test data:
DF <- data.frame(a = c("7500", "4460", "", "2900", "2640", "1850", "", "2570",
"9050", "8000", "9600"), stringsAsFactors = FALSE)
try this:
library(sqldf)
sqldf("select cast(a as aint) as aint from DF where length(a) > 0")
giving:
aint
1 7500
2 4460
3 2900
4 2640
5 1850
6 2570
7 9050
8 8000
9 9600
Note In plain R one could write:
transform(subset(DF, nchar(a) > 0), a = as.integer(a))

MS-SQL List of email addresses LIKE statement/regex

I have a column in my table called TO which is a comma separated list of email addresses. (1-n)
I am not concerned with a row if it ONLY contains addresses to Whatever#mycompany.com and want to flag that as 0. However, if a row contains a NON mycompany address (even if there are mycompany addresses present) I'd like to flag it as 1. Is this possible using one LIKE statement?
I've tried;
AND
[To] like '%#%[^m][^y][^c][^o][^m][^p][^a][^n][^y]%.%'
The ideal output will be:
alice#mycompany.com, bob#mycompany.com, malory#yourcompany.com 1
alice#mycompany.com, bob#mycompany.com 0
malory#yourcompany.com 1
Would it be better to write some kind of parsing function to split out addresses into a table if this isnt possible? I don't have an exhaustive list of other domains in the data.

It's ugly but it works. Case statement compares number of occurences of # symbol with number of occurences of #mycompany.com (XXX.. is just for keeping the length of the string):
select
*
, flag = case when len(field) - len(replace(replace(field,'#mycompany.com','XXXXXXXXXXXXXX'),'#','')) > 0 then 1 else 0 end
from (
select 'alice#mycompany.com, bob#mycompany.com, malory#yourcompany.com' as field union all
select 'alice#mycompany.com, bob#mycompany.com' union all
select 'malory#yourcompany.com'
) x

I would suggest a simple counting approach. Count the number of times that "#mycompany" appears and count the number of commas. If these differ, then you have an issue:
select emails,
(case when len(emails) - len(replace(emails, ',', '')) =
len(emails) - len(replace(emails, '#mycompany.com', 'mycompany.com'))
then 0
else 1
end) as HasNonCompanyEmail
from t
To simplify the arithmetic, I replace "#mycompany.com" with "mycompany.com". This removes exactly one character.

How can I ORDER anything that looks like a number, as a number in T-SQL?

I have a column named Code that is varchar(3).
It contains numbers and strings as well. For example: ' 1', '234', 'Xxx', '9 ','Aa ' etc.
Is there way -just like in MS EXCEL- ORDER anything that looks like a number, as a number?
So that output for the given example above will be:
1. 1
2. 234
3. 9
4. Aa
5. Xxx

ORDER BY CASE WHEN ISNUMERIC(YourField) = 1 THEN CONVERT(INT, YourField) - 500 ELSE ASCII(LOWER(YourField)) END
If the field can be converted to a number it is sorted by number otherwise it uses ASCII coding to sort. I have used "- 500" just so there is no cross over in the sort, and to ensure numbers are sorted ahead of text.
ADDENDUM:
Brian Arsuaga has posted a more robust solution to this which I actually prefer, but since this has already been marked as the answer I am adding his solution to this for the benefit of anyone reading this in the future.
ORDER BY
ISNUMERIC(YourField) DESC,
CASE WHEN ISNUMERIC(YourField) = 1 THEN CONVERT(INT, YourField) ELSE 0 END,
YourField

If you don't like using an arbitrary sentinel (500), which might cause sorting issues depending on the range of numbers you expect, you can use multiple expressions for the ordering.
-- put the numbers at the top
ORDER BY ISNUMERIC(YourField) DESC,
-- sort the numbers as numbers, sort the strings as nothing
CONVERT(INT, CASE WHEN ISNUMERIC(YourField) = 1 THEN YourField ELSE '0' END),
-- sort the strings
YourField
The last term is only a tiebreaker when either two terms are both numbers with the same value ('01', '1') or two terms are both non-numbers. For non-numbers, their first and second terms will always be 0.
More complicated, but maybe a little more safe.
Edited to add a nice comparison with the help of the guy below
create table #t
(
YourField varchar(4)
)
insert into #t(YourField) Values('1'), ('3'), ('234'), ('0'), ('00'),
('09'), ('9'), ('1a'), ('aaa'), ('aba'), ('-500')
Select YourField from #t
ORDER BY ISNUMERIC(YourField) DESC,
CONVERT(INT, CASE WHEN ISNUMERIC(YourField) = 1 THEN YourField ELSE '0' END),
YourField
drop table #t

SQL multiple words search, ordered by number of matches

I'm trying to compose an SQL SELECT query with multiple search words. But I want the result be ordered by number of words matches.
For example, let the search string is "red green blue". I want the results which contains all these three words on top, after that the results, which contains two of them, and at the end - only one word matches.
SELECT
*
FROM
table
WHERE
(col LIKE '%red%') OR
(col LIKE '%green%') OR
(col LIKE '%blue%')
ORDER BY
?????
Thanks in advance!

ORDER BY
(
CASE
WHEN col LIKE '%red%' THEN 1
ELSE 0
END CASE
+
CASE
WHEN col LIKE '%green%' THEN 1
ELSE 0
END CASE
+
CASE
WHEN col LIKE '%blue%' THEN 1
ELSE 0
END CASE
) DESC
If your DB vendor has IF, you can use it instead of CASE (e.g., for Mysql you can write
IF (col LIKE '%red% , 1,0) + IF(....'

What platform are you using? if SQL Server, then it sounds like a Full Text Search archtecture would be your best fit.
http://msdn.microsoft.com/en-us/library/ms142583.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to quickly compare many strings? - sql

...where stringcolumn like '4[0-4][0-9]' OR stringcolumn = '450' You don't need the wildcard if you want to restrict to 3 digits.

Use regex to accomplish this. ...where stringcolumn like '4[0-4][0-9]' OR stringcolumn like '450'

one way WHERE Column like '%4[0-4][09]%' OR Column LIKE '%500%' keep in mind that this will pick anything with the number in it, so 5000 will be returned as well

Related

TSQL - Remove Everything after the last period

SQL query: convert

MS-SQL List of email addresses LIKE statement/regex

How can I ORDER anything that looks like a number, as a number in T-SQL?

SQL multiple words search, ordered by number of matches

Categories

Resources