Search first character in a column PostgreSQL - sql

I want to search the first character from a column by charlist (bracket expression) but it brings all the column characters although there are customers their names starting with non-letters.
I use PostgreSQL.
SELECT name
FROM customs
WHERE name ~* '[a-z]'

https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP:
Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.
Some examples:
'abc' ~ 'abc' true
'abc' ~ '^a' true
'abc' ~ '(b|d)' true
'abc' ~ '^(b|c)' false
So your condition should be
WHERE name ~* '^[a-z]'
if you want to match only at the beginning of name.

You could also dispense with regular expression matching and simply do:
where name < 'a' or name >= '{'
{ is the mysterious character that follows z in the ASCII chart. Note: For this or any solution, you may need to check whether or not the collation is case-sensitive.

Related

How ''~'' and ''^'' actually works with practical examples in PostgreSQL?

I'm trying to solve a case that, a lot of users have used the syntax that contains the "~".
As below:
select
business_postal_code as zip,
count(distinct case when left(business_address,1) ~ '^[0-9]' then lower(split_part(business_address, ' ', 2))
else lower(split_part(business_address, ' ', 1)) end ) as n_street
from sf_restaurant_health_violations
where business_postal_code is not null
group by 1
order by 2 desc, 1 asc;
link to acess the case: https://platform.stratascratch.com/coding/10182-number-of-streets-per-zip-code?python=
But I couldn't undernstand how this part of the code actually works: ... ~ '^ ....
Let's simplify the query in your question to the component parts you're asking about. Once we see how they work individually, perhaps the whole query will make more sense.
To start, the ~ (tilde) is the POSIX, case-sensitive regular expression operator. The linked PostgreSQL documentation provides brief descriptions and usage examples of it and its sibling operators:
Operator
Description
Example
~
Matches regular expression, case sensitive
'thomas' ~ '.*thomas.*'
~*
Matches regular expression, case insensitive
'thomas' ~* '.*Thomas.*'
!~
Does not match regular expression, case sensitive
'thomas' !~ '.*Thomas.*'
!~*
Does not match regular expression, case insensitive
'thomas' !~* '.*vadim.*'
We can see that each operator has two operands: a constant string on the left, and a pattern on the right. If the string on the left is a match for the pattern on the right, the statement is true, otherwise it is false.
In the given example for the operator you're asking about, 'thomas' is a match for the pattern '.*thomas.*' by standard regular expression rules. The '.*' pre-and-postfixes mean "match any character (except newline) any number of times (zero or more)". The whole pattern then means, "match any character any number of times, then the literal string 'thomas', then any character any number of times". One such match would be 'john thomas jones' where 'john ' matches the first '.*' and ' jones' matches the second '.*'.
I don't think this is a great example because it is functionally equivalent to 'thomas' LIKE '%thomas%' which is likely to run faster, among other benefits like being a SQL-standard operator.
A better example is the query in your question where the pattern '^[0-9]' is used. Setting aside the ^ for now, this pattern means, "match any character in 0-9 (0, 1, 2, ..., 8, 9)", which would be much more verbose if you were to use the LIKE operator: field LIKE '^0' OR field LIKE '^1' OR field LIKE '^2' ....
The ^ operator is not PostgreSQL-specific. Rather it is a special character in regular expressions with one of two meanings (aside from its use as a literal character; more about that in this answer):
The match should begin at the start of the line/string.
For example, the string "Hello, World!" would contain a match for the pattern 'World' since the word "World" appears in it, but would not contain a match for the pattern '^World' since the word "World" is not at the start of the string.
The string "Hello, World!" would contain a match for both of the following patterns: 'Hello' and '^Hello' since the word "Hello" is at the start of the string.
The given character set should be negated when making a match.
For example, the pattern [^0-9] means, "match any character that is not in the range 0-9". So 'a' would match, '&' would match, and 'G' would match, but '7' would not match since it is in the character set that is being excluded.
The query in your question uses the first of the two meanings. The pattern '^[0-9]' means, "match any character in the range 0-9 starting at the beginning of the string". So '0123' would match since the string starts with "0", but 'a5' would not match since the string starts with "a" which is not the character set that is being matched.
Back to the query in your question, then. The relevant part reads:
1 count(distinct
2 case
3 when left(business_address, 1) ~ '^[0-9]'
4 then lower(split_part(business_address, ' ', 2))
5 else lower(split_part(business_address, ' ', 1))
6 end
7 ) as n_street
Line 3 contains a regular expression match that will determine if we should use this case in the overall CASE statement. If the string matches the pattern, the expression will be true and we will use this case. If the string does not match the pattern, the expression will be false and we will try the next case.
The string we are matching to the pattern is left(business_address, 1). The LEFT function takes the first n characters from the string. Since n is "1" here, this returns the first character of the field business_address.
The pattern we are trying to match this string to is '^[0-9]' which we have already said means, "match any character in the range 0-9 starting at the beginning of the string". Technically we don't need the ^ regex operator here since LEFT(..., 1) will return at most one character (which will always be the first character in the resulting string).
As an example, if business_address is "123 Jones Street, Anytown, USA", then LEFT(business_address, 1) will return "1" which will match the pattern (and therefore the expression will be true and we will use the first case).
If, instead, business_address were "Jones Plaza, Suite 123, Anytown, USA", then LEFT(business_address, 1) would return "J" which would not match the pattern (since the first character is "J" which is not in the range 0-9). Our expression would be false and we would continue to the next case.

Rails SQL: How to search on a column in a table ignoring all special characters

I am searching for matching values in MyTable.
name to search for -> This-is-a (test/with/time)
The name from DB table -> "this-is-a-test-with/time"
If I can get both the column value and the search value to match on something like this -> "thisisatestwithtime" which ignores all special characters and spaces.
value = This-is-a (test/with/time)
MyTable.where("upper(name) = upper(?)",value.to_s.scan(/[0-9a-z]/i).join("")).first
This converts the value to a form where all special characters are removed but how can I run the same on the value is in the table?
You can use a regular expression search.
select * from "table" where "name" ~ 'this' and name ~ 'is' and name ~ 'a'
and name ~ 'test' and name ~ 'with' and name ~ 'time';
If you want to search whole words only (for example find -a- instead of cat)
name ~ '\ma\M'
For case insensitive, use
name ~* 'a'
https://www.postgresql.org/docs/9.6/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
You can also use replace to match the whole values
select * from table where regex_replace(name, '\W', '') = :name
Table.where("regex_replace(name, '\W', '') = :name", name: 'thisisatestwithtime')

SQL : Confused with WildCard operators

what is difference between these two sql statements
1- select * from tblperson where name not like '[^AKG]%';
2- select * from tblperson where name like '[AKG]%';
showing same results: letter starting from a,k,g
like '[^AKG]% -- This gets you rows where the first character of name is not A,K or G. ^ matches any single character not in the specified set or a specified range of characters. There is one more negation not. So when you say name not like '[^AKG]%' you get rows where the first character of name is A,K or G.
name like '[AKG]% -- you get rows where the first character of name is A,K or G.
The wildcard character [] matches any character in a specified range or a set of characters. In your case it is a set of characters.
So both the conditions are equivalent.
You are using a double 'NOT'. The carrot '^' in your first character match is shorthand for 'not', so you are evaluating 'not like [not' AKG]% IE not like '[^AKG]%'.
1)In the first query you are using 'Not' and '^' basically it is Not twice so it cancels outs
therefore your query is 'Not Like [^AKG]' ==> 'Like [AKG]'
^ a.k.a caret or up arrow.
The purpose of this symbol is to provide a match for any characters not listed within the brackets [] , meaning that normally it wouldn't provide a result for anything that starts with AKG, but since you added the word NOT to the query , you are basically cancelling the operator, just as if you were doing in math :
(- 1) * (- 1)

Postgresql : Pattern matching of values starting with "IR"

If I have table contents that looks like this :
id | value
------------
1 |CT 6510
2 |IR 52
3 |IRAB
4 |IR AB
5 |IR52
I need to get only those rows with contents starting with "IR" and then a number, (the spaces ignored). It means I should get the values :
2 |IR 52
5 |IR52
because it starts with "IR" and the next non space character is an integer. unlike IRAB, that also starts with "IR" but "A" is the next character. I've only been able to query all starting with IR. But other IR's are also appearing.
select * from public.record where value ilike 'ir%'
How do I do this? Thanks.
You can use the operator ~, which performs a regular expression matching.
e.g:
SELECT * from public.record where value ~ '^IR ?\d';
Add a asterisk to perform a case insensitive matching.
SELECT * from public.record where value ~* '^ir ?\d';
The symbols mean:
^: begin of the string
?: the character before (here a white space) is optional
\d: all digits, equivalent to [0-9]
See for more info: Regular Expression Match Operators
See also this question, very informative: difference-between-like-and-in-postgres

Where filename like '%_123456_%'

I'm trying to query a table with a like statement. Is _ considered a wildcard symbol in postgres? Would
select * from table where field1 like '%_123_%'
return the same thing as
select * from table where field1 like '%123%'
Here is an example from the official documentation regarding wildcards in Postgres:
'abc' LIKE 'abc' true
'abc' LIKE 'a%' true
'abc' LIKE '_b_' true
'abc' LIKE 'c' false
_ is a wildcard for one character, while % is a wildcard for multiple characters.
Yes, _ is a wildcard symbol that matches one character. It can't be an empty match, so no, those statements are not the same. The first requires the string be at least 5 characters long while the second only requires 3 characters.
If you're familiar with regexes, %123% is equivalent to .*123.*, while %_123_% is equivalent to .+123.+.
From the PostgreSQL manual:
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
yes
_ matches one char while % matches lots of chars.