This is a bit above my level. But am trying to learn. I don't want to seem like I'm just trying to get my homework done but would appreciate any help pointers.
I am trying to find a substring (postcode) in an address column and once found, copy to the post code column
I have the following sql which finds columns that match a postcode pattern.
SELECT Address
FROM tb_member
WHERE (Address LIKE '%[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%')
Next I presume I need to find the substring index...
This is where I start to get a little flummoxed - Am I heading in the right direction?
So you know you want to SUBSTRING a value - look at what the function requires to make it work:
The string value
The starting point of the substring you want to capture
The length of the substring you want
In SQL Server/TSQL, PATINDEX will be better for this situation than CHARINDEX to get that starting point of the substring.
I gather you know how long the substring will always be?
PATINDEX will return the substring index for you.
Related
I am fairly new to regex expressions and always had a trouble to follow. It would be really helpful if I can get answer to the following problem.
I have a column with strings in redshift table and want to extract a certain part of the string(The string that is after the last '/'). For example, I have https://hello.com/my_first_website in my redshift table with the column name as customer_site, from this I want to extract my_first_website as output. Can someone tell me a regex expression that can help me to extract this.
You can use regexp_substr function such as
SELECT regexp_substr('https://hello.com/my_first_website','[^/]*$')
I'm not sure if it is possible to do what I'm trying to do, but I thought i would give it a shot anyway. Also, I am fairly new to the SQL Server world and this is my fist post, so I apologize if my wording is poor or if I leave information out. Also, I am working with SQL Server 2005.
Let's say I have a table named "table" and a column named "column" The contents of column is a jumbled mess of characters (ntext data type). These characters were all drawn in from multiple entry fields in a front end application. Now one of those entry fields was for sensitive information that we no longer need and would like to get rid of but I can't just get rid of the whole column because it also contains other valuable information. Most of the solutions I have found so far only deal with columns that have short entries so they are just able to update the whole string, but for mine I think I need to identify the the beginning and the end of the specific substring that I need and replace it or delete it somehow. This is the closest I have gotten to at least selecting the data that I need... AAA and /AAA mark the beginning and the end of the substring that I need.
select
substring (column, charindex ('AAA', column), charindex ('/AAA',column))
from table
where column like '%/AAA%'
The problems I am having with this one are that the substring doesn't stop at /AAA, it just keeps going, and some of the results are just blank so it looks something like:
AAA 12345 /AAA abcdefghijklmnop
AAA 12346 /AAA qrstuvwxyzabcdef
AAA 12347 /AAA abcdefghijklmnop
With the characters in bold being the information I need to get rid of. Also even though row 3 is blank, it still does contain the info that I need but I'm guessing that it isn't returning it because it has a different amount of characters before it (for example, rows 1, 2, and 4 might have 50 characters before them but row 3 might have 100 characters before it), at least that's the only reason that I could think of.
So I suppose the first step would probably be to actually select the right substring, then to either delete it or replace it with a different, meaningless substring like "111111" or something.
If there is more information that you need to be provided with or if I was unclear about anything please let me know and thank you for taking the time to read through (and hopefully answer) my question!
Edit: Another one that gets close to the right results goes something like this
select substring(column,charindex('AAA',column),20) from table
where column like '%/AAA%'
I'm not sure if this approach would work better since the substring i'm looking for is always going to have the same amount of characters. The problem with this one though, is that instead of having blank rows, they are replaced with irrelevant substrings from that column, but all of the other rows do return exactly what I want.
First of all, check your usage of SUBSTRING(). The third argument is for length, not end character, so you would need to alter your query to something like:
select substring (column, charindex ('AAA',column)
, charindex ('/AAA',column)-charindex ('AAA',column))
from table where column like '%/AAA%'
Yes your approach of finding it and then either deleting or replacing it is sound.
If some of the results are blank, it's possible that you are finding and replacing the entire string. If it had not found the correct regular expression in there, you would have not returned the row at all, which is different from returning a black value in that column.
we ran into an issue where we need to test two varchar numeric strings. So if we had one string like '123456' and '123465'. The character could be swapped at any place in the string. I have no clue what to even Google for help with this, but my hope would be to assign a match ranking percentage. Is that even feasible? Any direction would be extremely appreciated.
You might google "Levenshtein distance". Here's a potentially relevant answer:
Levenshtein distance in T-SQL
I want to select out just the email address using SubString.
Here is my column data:
[{"IsPrimary":false,"Address":"test#gmail.com","Type":"Other"}]
Here is my Query:
SELECT SUBSTRING(EmailJson, CHARINDEX('ess":"', EmailJson)+6, CHARINDEX('","Type', EmailJson)) From Respondents
Problem is that it isn't working the way I thought substring would work. I expected it to give me a range of characters. For example I want substring to return a range of characters like 5-10. The way this substring works is that I establish the start and then how long I want it to be from the start position.
How can I alter my query to just return them email only from the column.
I agree with the above comments that this is not an elegant way of doing it but if you really need to use substring then have a look at the below.
I have changed this to work with oracle because that is what I have available and I am unsure what you are using but you should be able to get the idea from it.
SELECT substr(EmailJson, (instr(EmailJson,"Type":"Other"', 'ess":"')+6), (instr(EmailJson,"Type":"Other"', '","Type') - (instr(EmailJson,'ess":"')+6))) From Respondents;
I am trying to use sql pattern matching to check if a string value is in the correct format.
The string code should have the correct format of:
alphanumericvalue.alphanumericvalue
Therefore, the following are valid codes:
D0030.2190
C0052.1925
A0025.2013
And the following are invalid codes:
D0030
.2190
C0052.
A0025.2013.
A0025.2013.2013
So far I have the following SQL IF clause to check that the string is correct:
IF #vchAccountNumber LIKE '_%._%[^.]'
I believe that the "_%" part checks for 1 or more characters. Therefore, this statement checks for one or more characters, followed by a "." character, followed by one or more characters and checking that the final character is not a ".".
It seems that this would work for all combinations except for the following format which the IF clause allows as a valid code:
A0025.2013.2013
I'm having trouble correcting this IF clause to allow it to treat this format as incorrect. Can anybody help me to correct this?
Thank you.
This stackoverflow question mentions using word-boundaries: [[:<:]] and [[:>:]] for whole word matches. You might be able to use this since you don't have spaces in your code.
This is ANSI SQL solution
This LIKE expression will find any pattern not alphanumeric.alphanumeric. So NOT LIKE find only this that match as you wish:
IF #vchAccountNumber NOT LIKE '%[^A-Z0-9].[^A-Z0-9]%'
However, based on your examples, you can use this...
LIKE '[A-Z][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9]'
...or one like this if you 5 alphas, dot, 4 alphas
LIKE '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9].[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
The 2nd one is slightly more obvious for fixed length values. The 1st one is slighty less intuitive but works with variable length code either side of the dot.
Other SO questions Creating a Function in SQL Server with a Phone Number as a parameter and returns a Random Number and Best equivalent for IsInteger in SQL Server