How to find multiple spaces inside text for a varchar2 field in Oracle 11g? - sql

I don't recall ever seeing a field like this before, but it combines the city, state, and zipcode into a single string of varchar2. Fortunately, I believe most of the fields are in the same city space state, space zipcode format, but I started finding a few that deviated from that norm.
Right now I'm trying to identify all these distinct conditions
in the database with over 5 million rows and my queries aren't working for what I wanted.
I started with:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '%' || CHR(32) || '%';
Then tried:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %';
but they are both pulling based on leading and trailing spaces and I was really wanted to find spaces in the inside of the text. I don't want to remove them, just identify them with a query so I can parse them properly in my java code and then do an insert later to put them into city, state, and zipcode fields in another table.
While it doesn't show it here, I found this field in IA with no leading spaces, then one leading space and then two leading spaces. I fixed the leading spaces with trim.
WEST LIBERTY, IA 52776
This last one I wasn't expecting and I wanted to see if there are other conditions that might be unusual, but my query doesn't find them as the spaces are in the middle of the text:
TRUTH OR CONSEQUENCE, NM 87901
How would I go about a query to find these kinds of distinct records?

This query replaces each of the spaces with a dot (.) so you can see them
SELECT
REGEXP_REPLACE(PROJECT_CTY_ST_ZIP,
'([[:space:]])',
'.') spaces_or_now_dots
FROM PAYMENT
This query finds the ones that have one or more spaces.
SELECT PROJECT_CTY_ST_ZIP
FROM PAYMENT
where REGEXP_LIKE(PROJECT_CTY_ST_ZIP,
'[[:space:]]'
)
I have not considered the cases of spaces in the beginning and end, because you have already taken care of them.

Related

WHERE colname LIKE '% %' OR colname LIKE '% % %' is returning rows with four words

The following code is supposed to return rows where the City name is two or three words, however it also returns one with four words:
SELECT FirstName ||' '|| LastName AS 'Full Name', City, Country, Email, Phone
FROM customers
WHERE City LIKE '% %' OR City LIKE '% % %';
[Output from Query]
São José dos Campos appears if I use any of the three following codes:
LIKE '% %'
LIKE '% % %'
LIKE '% % % %'
Why is this happening?
Other answers have explained why you're getting the answer you are -- the % wildcard with LIKE matches space characters, too -- but nobody has explained how to get what you want.
This will do it:
WHERE LENGTH(City) - LENGTH(REPLACE(City, ' ', '')) IN (1,2)
Syntax will vary somewhat by RDBMS (e.g., SQL Server calls it the LEN() function).
What this does is take the City field and replace each space with an empty string, which effectively removes that character. Then it subtracts the length from the length of the City with the spaces in it. That tells you how many spaces are in the City field. Then you just look for City fields that have the right number of spaces. 1 space means two words, etc.
Beware of leading or trailing spaces in some RDBMSs, as well as consecutive spaces.
If you need to use the LIKE operator, you can try something like this:
WHERE City LIKE '% %'
AND City NOT LIKE '% % % %'
That will return where the City has at least one space, but less than three or more. Here, were using the fact that the 1 space pattern matches 2 spaces and the 3 space pattern matches 4 spaces to our advantage! You'll still want to beware of consecutive spaces and leading or trailing spaces. That may create unexpected results.
You'll have to test to see which option performs better in your system.
Some RDBMSs will give you expanded LIKE or full regex options, which can work even better than what's above.
The wildcard % matches anything including spaces. In particular a pattern like '% %' will match 'São José dos Campos' because:
The first % matches "São José dos" and the second % matches "Campos", or
The first % matches "São José" and the second % matches "dos Campos", or
The first % matches "São" and the second % matches "José dos Campos".
You don't mention which database you are using, but most database engines, now offer regular expression matching. With those you can precisely search for a very specific pattern.
Symbol % means anything, including any number of spaces.
It happens because "who can do the most, can do the least".
When using LIKE statement, % doesn't mean "a word" but "any pattern". Therefore, % % doesn't mean "two words separated by a space", but "any pattern, followed by a space, followed by any pattern". This actually makes you look for a string that contains at least one space.
And here's the cause: if a string that contains four spaces, it does contains two of them. So looking for this kind of string actually matches every string that contains two or more spaces, thus matching two, three, four or more words.
Perhaps you meant
SELECT FirstName ||' '|| LastName AS 'Full Name', City, Country, Email, Phone
FROM customers
WHERE City LIKE '% %'
and City not LIKE '% % % %'
This means there needs to be at least one space but not 3 or more spaces.
In SQL, % is zero, one, or many characters, including spaces. So Cities with one, two or more spaces between those cities will match in LIKE "% %".
You can try this if you want cities that have only one or two spaces within:
WHERE City LIKE '% %' AND City NOT LIKE '% % % %';
But if there are cities that have 4 space or 5 space, you need to add another AND ... NOT LIKE for that.

Find white-space at the front in SQL data

Thank you for your help.
How do I get data that contains a space at the front or the back?
For example: (see .png)
enter image description here
I tried:
select* from customer_table where customer like '% %'
but:
it gave me all of them because name contains a space in between.
I need to get the ones that have a white-space at the front.
thank you!
I think you want:
where customer like ' %' or customer like '% '
like patterns match the entire string. So the first gets spaces at the begging and the second spaces at the end. If you wanted both, you can do that in a single patter:
where customer like ' % '
Use TRIM function on the column to fit you select condition, for instance
WHERE LTRIM([CUSTOMER]) = 'Joe'
This example is for Microsoft SQL Server but I believe all DBMS have a similar function you can use trim as the keyword to search.

Search for multiple phone number formats in database

I got a database with a user table. This table contains a column phonenumber. The problem is that its fields use multiple number patterns.
The current patterns I found:
06403/975-0
+496403975 0
06403 975-0
06403 975 0
+49 6403 975 0
When searching for a user in the database, is there a way to search for all number patterns?
SELECT id FROM user WHERE phone = '0123456789'
I use Oracle and MS SQL
Assuming your question means this:
"Is it possible to remove all the non-digit characters from the stored phone number, before making the comparison in the WHERE clause?"
a possible solution looks like this:
...
where translate(phone, '0123456789' || phone, '0123456789') = <input value here>
TRANSLATE will translate every digit to itself, and all other characters in phone to nothing (they will simply be deleted from the string). This is exactly what you want.
If you find that the query is slow, you may want to create a (function-based) index on translate(phone, '0123456789' || phone, '0123456789').
EDIT: I missed the part where you said you are using both Oracle and SQL Server. I did a quick search and found that SQL Server does not have a function similar to Oracle's TRANSLATE. I will leave it to SQL Server experts to help you with that part; I don't know SQL Server.
In Oracle you could do it like this. Strip out the non-numeric characters with translate() to get the phone number. You need to handle the leading zero or international dialling code:
select username from your_table
where translate(phone, '1234567890+/ -', '1234567890') in ('064039750', '4964039750')
You may need to tweak this if you don't know what the international dialling code is.
Obviously the actual problem is one of data quality: the application should enforce a strict format on phone numbers. One bout of data cleansing on write saves a whole bunch of grief on read.
You have a database containing phone numbers. These are sometimes in international format, but often in some national format, probably German, where two leading zeros introduce a country code, while a single leading zero would introduce an area code instead (assuming the home country Germany then). Moreover, a phone number can contain symbols for readability, namely '-', '/', and ' '.
So
+49 12/3456-7 means +491234567 of course
00441234567 means +441234567
04412345 means +494412345
I suggest you convert all numbers into international format in these steps:
replace a leading + with a leading 00, thus making only digits important
remove every character that is not a digit
replace a leading 00 with a leading +
replace a leading 0 with a leading +49
Use Oracle's REGEXP_REPLACE for this:
select
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(trim(phone),
'^\+', '00'), -- leading '+' -> leading '00'
'[^[:digit:]]', ''), -- remove all non-digits
'^00' , '+'), -- leading '00' -> leading '+'
'^0', '+49') -- leading '0' -> leading '+49'
as international_phone
from mytable;
You can do this in the WHERE clause of course:
SELECT id FROM user WHERE regexp_replace(...) = '+49123456789'
or even
SELECT id FROM user WHERE regexp_replace(...phone...) = regexp_replace(...'0123456789'...)
And you may write a PL/SQL function for this for convenience and use it so:
SELECT id FROM user WHERE international_phone(phone) = international_phone('0123456789')
This is for Oracle. There may be something alike for SQL Server.

White spaces in sql

that will sounds stupid, but I have a table with names, those names may finish with white space or may not. E.g. I have name ' dummy ', but even if in the query I write only ' dummy' it will find the record ' dummy '. Can I fix it somehow?
SELECT *
FROM MYTABLE where NAME=' dummy'
Thanks
This is how SQL works (except Oracle), when you compare two strings the shorter one will be padded with blanks to the length of th 2nd string.
If you really need to consider trailings blanks you can switch to LIKE which doesn't follow that rule:
SELECT *
FROM MYTABLE where NAME LIKE ' dummy'
Of course, you better clean your data during load.
There's only one thing which is worse than trailing spaces, leading spaces (oh, wait a minute, you got them, too).

SQL Server CONCAT function

I trying to draw a statement like this
SELECT CONCAT(street_name, ' ', street_number) as 'street_detail'
FROM geo_map
WHERE CONCAT(street_name, ' ', street_number) LIKE '%'
My table is something like this
postal_code int
building_name nchar(200)
street_number nchar(60)
street_name nchar(120)
The result I get was just the street name, less the street number, although my street number have value, any idea what's went wrong in my concat.
I am using SQL Server
It is best to use NVARCHAR(...) instead of NCHAR(...) types for storing information like what you have. The reason is that for NCHAR(...) types, strings are padded with trailing spaces to fill the whole length of the field.
A string in an NCHAR(200) field is always 200 characters wide. The concatenation of street_name, a space and the street_number will be 261 characters wide. The building number will appear on the 202nd character in the concatenation.
Perhaps you are not seeing a street number in your concatenation because your display field (in your program, SSMS, webpage, ...) just isn't wide enough.
Now with storing your street name in an NVARCHAR(200) and pretty much all other related information in NVARCHAR(...) fields, you would not have that problem. Strings stored in those fields are not padded with trailing spaces, and you would see your street number at the place you expected in your concatenation.