How to query for special characters - sql

I have a large table filled with vendor information.
I need to split this list into two separate lists based on column VENDOR_NAME. One list where VENDOR_NAME is all regular characters and numbers, another list where VENDOR_NAME is special/foreign characters.
I am not sure what the SELECT statements would be to view this information off of the existing master table. Then I could just create two new tables.
VENDOR_NAME only numbers and regular characters
VENDOR_NAME only foreign characters
Example:
Regular: BLUE RIBBON TAG & LABEL CORP
Foreign: 俞章平
Regular: ULSTER-SOCIETY OF GASTROENTEROLOGY/1
Foreign: 马建忠

You could use the function ASCIISTR():
ASCIISTR takes as its argument a string, or an expression that
resolves to a string, in any character set and returns an ASCII
version of the string in the database character set. Non-ASCII
characters are converted to the form \xxxx, where xxxx represents a
UTF-16 code unit.
To get all strings without special characters:
SELECT * FROM table
WHERE INSTR(ASCIISTR(vendor_name),'\') = 0
You have to take care, of course, that strings with '\' would be filtered out by this as well, since the backslash is translated to '\005C' by ASCIISTR. Maybe like this:
WHERE INSTR(REPLACE(ASCIISTR(vendor_name),'\005C','_' ),'\') = 0

Related

Getting the Column containing the non-english language in ORACLE

I have above entries in my database, my requirement is to extract the fields containing the non-english language characters ( including if the data containing the combination of english and non-english characters like HotelName field for the ID 45).
I tried by regexp_like function by looking for the alphanumeric and non-alphanumeric, but i have some data with combination of both the condition fails there.
Thanks in Advance
Raghavan
Does this do what you want?
where regexp_like(hotelname, '[^a-zA-Z0-9 ]')
That is, where the hotel name contains any character that is not a "letter" or digit. You may need to take additional characters into account as well, such as commas, periods, and hyphens.

SQL Text Function for special characters

I have a field with text reviews in it and I want to spot where people have used special characters to get offensive words etc past the filters, so instead of typing badword they type b.a.d.w.o.r.d or b*a*d*w*o*r*d,
Is there a way to look for say 3 or more special characters in word in a text review, maybe some sort of count function for special characters?
If you have a table with a field containing words you dont want to allow you could add it in your WHERE clause like so using REGEX_REPLACE.
SELECT yourfield
FROM yourtable
WHERE REGEXP_REPLACE(yourfield,'[^a-zA-Z'']','') NOT IN (SELECT badwords
FROM badwordstable)

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.

MSAccess Query a string matching a pattern

I have a table with a string field containing location information. I want to be able to query this table and retrieve all of the tags matching the format xxxxxxAA where xxxxxx is a 6-digit number and AA is two alphabetic characters.
Is there a method of querying this using SQL or is this something that I need to do in VBA?
Sample data:
BGS5 PM RGP5
022051PM
022201PM
030539PM
WAS3N
179546MM
And I want to return the following without knowing the values:
022051PM
022201PM
030539PM
179546MM
thanks in advance
Jason
You can use a query with a Like comparison in the WHERE clause.
SELECT y.text_field
FROM YourTable AS y
WHERE y.text_field Like '######[A-Z][A-Z]'
The # matches a digit.
[A-Z] matches one character from a character class consisting of only letters. That character class is actually upper case letters. However, the comparison is case-insensitive, so will match lower case letters, too.

Postgresql query to update fields using a regular expression

I have the following data in my "Street_Address_1" column:
123 Main Street
Using Postgresql, how would I write a query to update the "Street_Name" column in my Address table? In other words, "Street_Name" is blank and I'd like to populate it with the street name value contained in the "Street_Address_1" column.
From what I can tell, I would want to use the "regexp_matches" string method. Unfortunately, I haven't had much luck.
NOTE: You can assume that all addresses are in a "StreetNumber StreetName StreetType" format.
If you just want to take Street_Address_1 and strip out any leading numbers, you can do this:
UPDATE table
SET street_name = regexp_replace(street_address_1, '^[0-9]* ','','');
This takes the value in street_address_1 and replaces any leading string of numbers (plus a single space) with an empty string (the fourth parameter is for optional regex flags like "g" (global) and "i" (case-insensitive)).
This version allows things like "1212 15th Street" to work properly.
Something like...:
UPDATE table
SET Street_Name = substring(Street_Address_1 FROM '^[0-9]+ ([a-zAZ]+) ')
See relevant section from PGSQL 8.3.7 docs, the substring form is detailed shortly after the start of the section.