remove special characters without removing space in between words - ruby-on-rails-5

By using above logic I am trying to remove space from starting and ending of the sentence and removing special characters. It's removing space last and starting and removing special characters and also removing space in between word to word.But I don't want to remove space in between word to word.
Ex:1234whitegreen
Expected output: 1234 white green

"String".squeeze(' ').gsub(/, /, ',').gsub(/[^0-9A-Za-z,\r\n ]/i, '').strip
gsub!(/[^0-9A-Za-z ]/, '') will remove all special character without start end spaces.
strip will remove start and end space from string.

Related

Query to replace special characters in phone number field

Can anyone help with a query on how to replace special/non-numeric/hidden characters from a phone number column.
I've tried
LTRIM(RTRIM(REGEXP_REPLACE(
PHONE_NBR,
'[^[:digit:]][:cntrl:][:alpha:][:graph:][:blank:][:print:][:punct:][:space:]~',
'')))
but no luck, there are still a few records which contain non-numeric values.
Your regex is saying to ONLY replace a string consisting of: a non-numeric character followed by a control character, an alpha, a graph, a blank, a print, a punct, a space, and then a tilde.
You should be able to just use '[^[:digit:]]' as your regex, to remove all non-numeric characters.

How to prevent two succeeding spaces in an Antlr rule?

As a lexer rule I'd like to match a string according to these rules:
must not contain tabs (\t) or line breaks (\r, \n)
must not contain two succeeding spaces
can contain all other characters, including single spaces
I came up with:
STRING: ~[\t\r\n]*
But I don't know how to prevent succeeding spaces.
This will do it:
STRING:
(
~[\t\r\n ] // non-whitespace
| ' ' ~[\t\r\n ] // or single space followed by non-whitespace
)+
' '? // may optionally end in a space (if desired, remote the line otherwise)
;

Trim trailing spaces with PostgreSQL

I have a column eventDate which contains trailing spaces. I am trying to remove them with the PostgreSQL function TRIM(). More specifically, I am running:
SELECT TRIM(both ' ' from eventDate)
FROM EventDates;
However, the trailing spaces don't go away. Furthermore, when I try and trim another character from the date (such as a number), it doesn't trim either. If I'm reading the manual correctly this should work. Any thoughts?
There are many different invisible characters. Many of them have the property WSpace=Y ("whitespace") in Unicode. But some special characters are not considered "whitespace" and still have no visible representation. The excellent Wikipedia articles about space (punctuation) and whitespace characters should give you an idea.
<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>
The standard SQL trim() function by default only trims the basic Latin space character (Unicode: U+0020 / ASCII 32). Same with the rtrim() and ltrim() variants. Your call also only targets that particular character.
Use regular expressions with regexp_replace() instead.
Trailing
To remove all trailing white space (but not white space inside the string):
SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;
The regular expression explained:
\s ... regular expression class shorthand for [[:space:]]
    - which is the set of white-space characters - see limitations below
+ ... 1 or more consecutive matches
$ ... end of string
Demo:
SELECT regexp_replace('inner white ', '\s+$', '') || '|'
Returns:
inner white|
Yes, that's a single backslash (\). Details in this related answer:
SQL select where column begins with \
Leading
To remove all leading white space (but not white space inside the string):
regexp_replace(eventdate, '^\s+', '')
^ .. start of string
Both
To remove both, you can chain above function calls:
regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')
Or you can combine both in a single call with two branches.
Add 'g' as 4th parameter to replace all matches, not just the first:
regexp_replace(eventdate, '^\s+|\s+$', '', 'g')
But that should typically be faster with substring():
substring(eventdate, '\S(?:.*\S)*')
\S ... everything but white space
(?:re) ... non-capturing set of parentheses
.* ... any string of 0-n characters
Or one of these:
substring(eventdate, '^\s*(.*\S)')
substring(eventdate, '(\S.*\S)') -- only works for 2+ printing characters
(re) ... Capturing set of parentheses
Effectively takes the first non-whitespace character and everything up to the last non-whitespace character if available.
Whitespace?
There are a few more related characters which are not classified as "whitespace" in Unicode - so not contained in the character class [[:space:]].
These print as invisible glyphs in pgAdmin for me: "mongolian vowel", "zero width space", "zero width non-joiner", "zero width joiner":
SELECT E'\u180e', E'\u200B', E'\u200C', E'\u200D';
'᠎' | '​' | '‌' | '‍'
Two more, printing as visible glyphs in pgAdmin, but invisible in my browser: "word joiner", "zero width non-breaking space":
SELECT E'\u2060', E'\uFEFF';
'⁠' | ''
Ultimately, whether characters are rendered invisible or not also depends on the font used for display.
To remove all of these as well, replace '\s' with '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]' or '[\s᠎​‌‍⁠]' (note trailing invisible characters!).
Example, instead of:
regexp_replace(eventdate, '\s+$', '')
use:
regexp_replace(eventdate, '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]+$', '')
or:
regexp_replace(eventdate, '[\s᠎​‌‍⁠]+$', '') -- note invisible characters
Limitations
There is also the Posix character class [[:graph:]] supposed to represent "visible characters". Example:
substring(eventdate, '([[:graph:]].*[[:graph:]])')
It works reliably for ASCII characters in every setup (where it boils down to [\x21-\x7E]), but beyond that you currently (incl. pg 10) depend on information provided by the underlying OS (to define ctype) and possibly locale settings.
Strictly speaking, that's the case for every reference to a character class, but there seems to be more disagreement with the less commonly used ones like graph. But you may have to add more characters to the character class [[:space:]] (shorthand \s) to catch all whitespace characters. Like: \u2007, \u202f and \u00a0 seem to also be missing for #XiCoN JFS.
The manual:
Within a bracket expression, the name of a character class enclosed in
[: and :] stands for the list of all characters belonging to that
class. Standard character class names are: alnum, alpha, blank, cntrl,
digit, graph, lower, print, punct, space, upper, xdigit.
These stand for the character classes defined in ctype.
A locale can provide others.
Bold emphasis mine.
Also note this limitation that was fixed with Postgres 10:
Fix regular expressions' character class handling for large character
codes, particularly Unicode characters above U+7FF (Tom Lane)
Previously, such characters were never recognized as belonging to
locale-dependent character classes such as [[:alpha:]].
It should work the way you're handling it, but it's hard to say without knowing the specific string.
If you're only trimming leading spaces, you might want to use the more concise form:
SELECT RTRIM(eventDate)
FROM EventDates;
This is a little test to show you that it works.
Tell us if it works out!
If your whitespace is more than just the space meta value than you will need to use regexp_replace:
SELECT '(' || REGEXP_REPLACE(eventDate, E'[[:space:]]', '', 'g') || ')'
FROM EventDates;
In the above example I am bounding the return value in ( and ) just so you can easily see that the regex replace is working in a psql prompt. So you'll want to remove those in your code.
SELECT replace((' devo system ') ,' ','');
It gives: devosystem
A tested one that works like a charm:
UPDATE company SET name = TRIM (BOTH FROM name) where id > 0

Why does this space matter in this regular expression?

Why does the space make all the difference?
select * from beds where id~'.*Extra large.* (Red).*';
and
select * from beds where id~'.*Extra large.*(Red).*';
The first one returned nothing and the second acted as I wanted. An example of what I want matched is:
"Extra large" (Red) {2012 model}
I thought the first would work since there is a space after (Red)?
EDIT:Even if I escape the brackets with '\' I still can't have a space there.
The problem is that you have not escaped your brackets around "Red". Your regex should be:
'.*Extra large.* \(Red\).*'
This makes the brackets literal brackets, but without escaping them they create a regex group (and not characters to be matched).
Your first regex grouped the characters Red and required a space to precede that group Red, so it would match "... Red...", but there is a bracket in your input before Red, so it doesn't match.
Your second regex accepts any character(s) (via .*) before Red, so it matches.
This is because you're not escaping the ().
The brackets around "Red" create a group and are not included in the match. This
is the reason why the regexp without the whitespace works.
The .* in the regexp without the whitespace matches " (, then comes Red and after that ) {2012 model}. The brackets are matched by the .* operators.
The .* in the regexp with the whitespace matches " and the ( is not included in the pattern.
So the right pattern would be this:
.*Extra large.*\(Red\).*

How to exclude spaces when using ValidationExpression in vb.net

I want to exclude spaces when validating a textbox in vb.net.
Here is the current ValidationExpressopn value:
ValidationExpression="^([a-zA-Z0-9_-.\']+)#(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$" />
When user inputs space in textbox, I dont want that to render as error.
Example: I include spaces after "1#test.com "
This should not be treater as incorrect data in the textbox.
Any ideas?
If your spaces are leading or trailing you can do a Trim on the expressionToValidate before comparing to your regexp
Dim expressionWithoutTrailingAndLeadingWhiteSpaces As String = originalExpression.Trim()
If you want to modify the regExp to take into account the trailing spaces:
^[_a-z0-9-]+(.[a-z0-9-]+)#[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,4})( *)$
If you want to exclude also leading spaces add an extra ( *) at the beginning of the expression:
^( *)[_a-z0-9-]+(.[a-z0-9-]+)#[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,4})( *)$
Btw - the regExp you are providing is broken - I used the one found here (expression to validate email addresses)