I want to split the column containing the name of the address and the house number, for example, "New Street 55A", to "New Address" and "55A" (two new columns if possible, "Address Name" and "House number").
I cannot use functions such as Left( ) or Right( ), as there are many addresses and all have a different number of characters (eg. London Street 555A/B, Left 42 CC, Tenere 1, Tesla Str. 522D). Basically, it would need to split after the numeric value and it would need to include letters after the number, /, or anything else.
Hope I was clear and you can help me. I must say also I am quite a beginner so a detailed answer or a full code would mean a lot.
Related
]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'
I'm an absolute beginner at SQL. I do know how to use SQL commands that belong to Data Manipulation Language and Data Definition Language.
The name of the table is ways_tags
id,key,value,type
164931009,street,6th Main Road Ram Nagar (N) Extn,addr
The header represents the column names, I've printed just one row.
Here's what I want to accomplish:
Select observations that have type as addr and key as street.
In such observations, scan the characters of value column and check if the last 4 characters match Extn and replace it with Extension
Also, is there a way I can reduce size of the code, where it checks for TWO patterns instead of ONE?
In other words
You can replace what I'd like to accomplish on 2. with something like
In such observations, scan the characters of value column:
i. If last 4 characters in the value column is Extn replace with Extension.
ii.If last 2 characters in the value column is St. replace with Street.
I'm really sorry, but I just know the pseudocode and don't know how to incorporate regular expressions with normal SQL, and so I can't post anything to show that I've tried.
So eventually after performing the step,
164931009,street,6th Main Road Ram Nagar (N) Extension,addr
is the updated observation instead of
164931009,street,6th Main Road Ram Nagar (N) Extn,addr
Select observations that have type as addr and key as street.
… scan the characters of value column and check if the last 4 characters match Extn
WHERE type = 'addr'
AND key = 'street'
AND value LIKE '%Extn'
… and replace it with Extension
If "Extn" does not occur anywhere else in the value, this could be done with replace(), but in the general case, you have to extract all but the four last characters, and append the new value:
substr(value, 1, length(value) - 4) || 'Extension'
Then plug this into an UPDATE statement:
UPDATE ways_tags
SET value = substr(value, 1, length(value) - 4) || 'Extension'
WHERE type = 'addr'
AND key = 'street'
AND value LIKE '%Extn';
Doing two replacements in a single statement would be possible with a CASE expression, but would not reduce code size.
I have a query (sql) to pull out a street name from a string. It's looking for the last occurrence of a digit, and then pulling the proceeding text as the street name. I keep getting the oracle
"argument '0' is out of range"
error but I'm struggling to figure out how to fix it.
the part of the query in question is
substr(address,regexp_instr(address,'[[:digit:]]',1,regexp_count(address,'[[:digit:]]'))+2)
any help would be amazing. (using sql developer)
The fourth parameter of regexp_instr is the occurrence:
occurrence is a positive integer indicating which occurrence of
pattern in source_string Oracle should search for. The default is 1,
meaning that Oracle searches for the first occurrence of pattern.
In this case, if an address has no digits within, the regexp_count will return 0, that's not a valid occurrence.
A simpler solution, which does not require separate treatment for addresses without a house number, is this:
with t (address) as (
select '422 Hickory Str.' from dual union all
select 'One US Bank Plaza' from dual
)
select regexp_substr(address, '\s*([^0-9]*)$', 1, 1, null, 1) as street from t;
The output looks like this:
STREET
-------------------------
Hickory Str.
One US Bank Plaza
The third argument to regexp_substr is the first of the three 1's. It means start the search at the first character of address. The second 1 means find the first occurrence of the search pattern. The null means no special match modifiers (such as case insensitive - nothing like that needed here). The last 1 means "return the first SUBEXPRESSION from the match pattern". Subexpressions are parts of the match expression enclosed in parentheses.
The match pattern has a $ at the end - meaning "anchor at the end of the input string" ($ means the end of the string). Then [...] means match any of the characters in square brackets, but the ^ in [^...] changes it to match any character OTHER THAN what is in the square brackets. 0-9 means all characters between 0 and 9; so [^0-9] means match any character(s) OTHER THAN digits, and the * after that means "any number of such characters" (between 0 and everything in the input string). \s is "blank space" - if there are any blank spaces following a possible number in the address, you don't want them included right at the beginning of the street name. The subexpression is just [^0-9]* meaning the non-digits, not including any spaces before them (because the \s* is outside the left parenthesis).
My example illustrates a potential problem though - sometimes an address does, in fact, have a "number" in it, but spelled out as a word instead of using digits. What I show is in fact a real-life address in my town.
Good luck!
looking for the last occurrence of a digit, and then pulling the proceeding text as the street name
You could simply do:
SELECT REGEXP_REPLACE( address, '^(.*)\d+\D*$', '\1' )
AS street_name
FROM address_table;
I trying to draw a statement like this
SELECT CONCAT(street_name, ' ', street_number) as 'street_detail'
FROM geo_map
WHERE CONCAT(street_name, ' ', street_number) LIKE '%'
My table is something like this
postal_code int
building_name nchar(200)
street_number nchar(60)
street_name nchar(120)
The result I get was just the street name, less the street number, although my street number have value, any idea what's went wrong in my concat.
I am using SQL Server
It is best to use NVARCHAR(...) instead of NCHAR(...) types for storing information like what you have. The reason is that for NCHAR(...) types, strings are padded with trailing spaces to fill the whole length of the field.
A string in an NCHAR(200) field is always 200 characters wide. The concatenation of street_name, a space and the street_number will be 261 characters wide. The building number will appear on the 202nd character in the concatenation.
Perhaps you are not seeing a street number in your concatenation because your display field (in your program, SSMS, webpage, ...) just isn't wide enough.
Now with storing your street name in an NVARCHAR(200) and pretty much all other related information in NVARCHAR(...) fields, you would not have that problem. Strings stored in those fields are not padded with trailing spaces, and you would see your street number at the place you expected in your concatenation.
I want to identify the number of words occurring after a comma, in a full name field in Oracle database table.
The name field contains format of "LAST, FIRST MIDDLE"
Some names may have up to 4 to 5 names, such as "DOE, JOHN A B"
For example, if the Name field = 'WILLIAMS JR, HANK' it would output 1 (for 1 word occurring after the comma.
If the Name field = 'DOE, JOHN A B' i want it to output 3.
I would like to use a regexp_count function to determine this count.
I am using the following code to identify how many words exist in the field and would like to modify it to include this functionality:
REGEXP_COUNT(REPLACE(fieldname, ',',', '), '[^]+')
It would likely have to remove the replace function in order to find the comma, but this was the best I could do so far.
Help is much appreciated!
How about the following:
REGEXP_COUNT( fieldname, "\\w", INSTR(fieldname, ",")+1)
I have updated the code as follows, which appears to be working as desired:
REGEXP_COUNT(fieldname, '[^ ]+', (INSTR(fieldname, ',')+1))