SQL Server CONCAT function - sql

I trying to draw a statement like this
SELECT CONCAT(street_name, ' ', street_number) as 'street_detail'
FROM geo_map
WHERE CONCAT(street_name, ' ', street_number) LIKE '%'
My table is something like this
postal_code int
building_name nchar(200)
street_number nchar(60)
street_name nchar(120)
The result I get was just the street name, less the street number, although my street number have value, any idea what's went wrong in my concat.
I am using SQL Server

It is best to use NVARCHAR(...) instead of NCHAR(...) types for storing information like what you have. The reason is that for NCHAR(...) types, strings are padded with trailing spaces to fill the whole length of the field.
A string in an NCHAR(200) field is always 200 characters wide. The concatenation of street_name, a space and the street_number will be 261 characters wide. The building number will appear on the 202nd character in the concatenation.
Perhaps you are not seeing a street number in your concatenation because your display field (in your program, SSMS, webpage, ...) just isn't wide enough.
Now with storing your street name in an NVARCHAR(200) and pretty much all other related information in NVARCHAR(...) fields, you would not have that problem. Strings stored in those fields are not padded with trailing spaces, and you would see your street number at the place you expected in your concatenation.

Related

Query to retrieve only columns which have last name starting with 'K'

]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'

Search for multiple phone number formats in database

I got a database with a user table. This table contains a column phonenumber. The problem is that its fields use multiple number patterns.
The current patterns I found:
06403/975-0
+496403975 0
06403 975-0
06403 975 0
+49 6403 975 0
When searching for a user in the database, is there a way to search for all number patterns?
SELECT id FROM user WHERE phone = '0123456789'
I use Oracle and MS SQL
Assuming your question means this:
"Is it possible to remove all the non-digit characters from the stored phone number, before making the comparison in the WHERE clause?"
a possible solution looks like this:
...
where translate(phone, '0123456789' || phone, '0123456789') = <input value here>
TRANSLATE will translate every digit to itself, and all other characters in phone to nothing (they will simply be deleted from the string). This is exactly what you want.
If you find that the query is slow, you may want to create a (function-based) index on translate(phone, '0123456789' || phone, '0123456789').
EDIT: I missed the part where you said you are using both Oracle and SQL Server. I did a quick search and found that SQL Server does not have a function similar to Oracle's TRANSLATE. I will leave it to SQL Server experts to help you with that part; I don't know SQL Server.
In Oracle you could do it like this. Strip out the non-numeric characters with translate() to get the phone number. You need to handle the leading zero or international dialling code:
select username from your_table
where translate(phone, '1234567890+/ -', '1234567890') in ('064039750', '4964039750')
You may need to tweak this if you don't know what the international dialling code is.
Obviously the actual problem is one of data quality: the application should enforce a strict format on phone numbers. One bout of data cleansing on write saves a whole bunch of grief on read.
You have a database containing phone numbers. These are sometimes in international format, but often in some national format, probably German, where two leading zeros introduce a country code, while a single leading zero would introduce an area code instead (assuming the home country Germany then). Moreover, a phone number can contain symbols for readability, namely '-', '/', and ' '.
So
+49 12/3456-7 means +491234567 of course
00441234567 means +441234567
04412345 means +494412345
I suggest you convert all numbers into international format in these steps:
replace a leading + with a leading 00, thus making only digits important
remove every character that is not a digit
replace a leading 00 with a leading +
replace a leading 0 with a leading +49
Use Oracle's REGEXP_REPLACE for this:
select
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(trim(phone),
'^\+', '00'), -- leading '+' -> leading '00'
'[^[:digit:]]', ''), -- remove all non-digits
'^00' , '+'), -- leading '00' -> leading '+'
'^0', '+49') -- leading '0' -> leading '+49'
as international_phone
from mytable;
You can do this in the WHERE clause of course:
SELECT id FROM user WHERE regexp_replace(...) = '+49123456789'
or even
SELECT id FROM user WHERE regexp_replace(...phone...) = regexp_replace(...'0123456789'...)
And you may write a PL/SQL function for this for convenience and use it so:
SELECT id FROM user WHERE international_phone(phone) = international_phone('0123456789')
This is for Oracle. There may be something alike for SQL Server.

LEN(firstName) + LEN(lastName) does not equal LEN(firstName + lastName)

I'm trying to search a table based on the concatentation of the firstName and lastName columns. Both of these are defined as NVARCHAR(50) NOT NULL
The query sometimes fails to find a match because the concatenated column is padded with extra spaces. Here's the query:
SELECT firstName + lastName AS fullName, LEN(firstName) + LEN(lastName) AS realLength, LEN(firstName + lastName) AS concatLength FROM UsersTable
And here's an image with the results:
What is the deal with this? How can I avoid the extra spaces? If I do SELECT RTRIM(firstName) + RTRIM(lastName) ... I get the correct full name with no extra spaces, but using RTRIM is too expensive because my data set is very big. This would lead me to think that the issue is the data itself, except that LEN(firstName) is the same as LEN(RTRIM(firstName))
You have spaces at the end of your FirstName. It is easy enough to check that the following returns 4:
select len(N'abcd ')
This is a property of the varchar() data types and len(). Of course, when you concatenate them, then SQL Server decides to recognize the spaces at the end.
This behavior is documented in the "Remarks" section of the documentation:
Remarks
LEN excludes trailing blanks. If that is a problem, consider using the
DATALENGTH (Transact-SQL) function which does not trim the string. If
processing a unicode string, DATALENGTH will return twice the number
of characters.
As the comments suggest, you can ltrim()/rtrim() the values before concatenating them. Or, use like.

How to find multiple spaces inside text for a varchar2 field in Oracle 11g?

I don't recall ever seeing a field like this before, but it combines the city, state, and zipcode into a single string of varchar2. Fortunately, I believe most of the fields are in the same city space state, space zipcode format, but I started finding a few that deviated from that norm.
Right now I'm trying to identify all these distinct conditions
in the database with over 5 million rows and my queries aren't working for what I wanted.
I started with:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '%' || CHR(32) || '%';
Then tried:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %';
but they are both pulling based on leading and trailing spaces and I was really wanted to find spaces in the inside of the text. I don't want to remove them, just identify them with a query so I can parse them properly in my java code and then do an insert later to put them into city, state, and zipcode fields in another table.
While it doesn't show it here, I found this field in IA with no leading spaces, then one leading space and then two leading spaces. I fixed the leading spaces with trim.
WEST LIBERTY, IA 52776
This last one I wasn't expecting and I wanted to see if there are other conditions that might be unusual, but my query doesn't find them as the spaces are in the middle of the text:
TRUTH OR CONSEQUENCE, NM 87901
How would I go about a query to find these kinds of distinct records?
This query replaces each of the spaces with a dot (.) so you can see them
SELECT
REGEXP_REPLACE(PROJECT_CTY_ST_ZIP,
'([[:space:]])',
'.') spaces_or_now_dots
FROM PAYMENT
This query finds the ones that have one or more spaces.
SELECT PROJECT_CTY_ST_ZIP
FROM PAYMENT
where REGEXP_LIKE(PROJECT_CTY_ST_ZIP,
'[[:space:]]'
)
I have not considered the cases of spaces in the beginning and end, because you have already taken care of them.

Postgresql query to update fields using a regular expression

I have the following data in my "Street_Address_1" column:
123 Main Street
Using Postgresql, how would I write a query to update the "Street_Name" column in my Address table? In other words, "Street_Name" is blank and I'd like to populate it with the street name value contained in the "Street_Address_1" column.
From what I can tell, I would want to use the "regexp_matches" string method. Unfortunately, I haven't had much luck.
NOTE: You can assume that all addresses are in a "StreetNumber StreetName StreetType" format.
If you just want to take Street_Address_1 and strip out any leading numbers, you can do this:
UPDATE table
SET street_name = regexp_replace(street_address_1, '^[0-9]* ','','');
This takes the value in street_address_1 and replaces any leading string of numbers (plus a single space) with an empty string (the fourth parameter is for optional regex flags like "g" (global) and "i" (case-insensitive)).
This version allows things like "1212 15th Street" to work properly.
Something like...:
UPDATE table
SET Street_Name = substring(Street_Address_1 FROM '^[0-9]+ ([a-zAZ]+) ')
See relevant section from PGSQL 8.3.7 docs, the substring form is detailed shortly after the start of the section.