Oracle SQL REGEXP_SUBSTR() and SUBSTR() issue - sql

I am trying to modify a query that I have created to bring back only records that DO NOT start with the string 'PO:'
The records I want are all different in the way they start but none of the ones I want start with 'PO:'. Some may start with numbers or some may start with other words.
I know by using REGEXP_SUBSTR() or just SUBSTR() I can pull back data based solely on numbers or letters but how do I Not include certain words/strings.
Any help is appreciated!

you can use NOT
ex : yourcolumn not like 'PO%'
It accepts all values wich does not begin by 'PO'

Related

Hive Regular expression - only portion of string needed

Hi i was trying to extract portion of data from one column in my hive table but the position of character is not in one place
select value4,regexp_extract(value4,'*****',0) from hive_table;
column value is shown below
grade:data:home made;Cat;dinnerbox_grade_Enroll
list:date:may;animal;dinnerbox_list_value
cgrade:made_data;dinnerbox_cgrade_notEnroll
I want data from dinnerbox to till end.
Can any one help on this?
It is a pretty simple regular expression
.*dinnerbox(.*?)$
Using a non-greedy wildcard, but forcing it to the end of the line makes sure that you always get the dinnerbox at the end.
You want capture group 1
To get rid of the _ you can use
.*dinnerbox_(.*?)$

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

SQL Server 2005 Update/Delete Substring of a Lengthy Column

I'm not sure if it is possible to do what I'm trying to do, but I thought i would give it a shot anyway. Also, I am fairly new to the SQL Server world and this is my fist post, so I apologize if my wording is poor or if I leave information out. Also, I am working with SQL Server 2005.
Let's say I have a table named "table" and a column named "column" The contents of column is a jumbled mess of characters (ntext data type). These characters were all drawn in from multiple entry fields in a front end application. Now one of those entry fields was for sensitive information that we no longer need and would like to get rid of but I can't just get rid of the whole column because it also contains other valuable information. Most of the solutions I have found so far only deal with columns that have short entries so they are just able to update the whole string, but for mine I think I need to identify the the beginning and the end of the specific substring that I need and replace it or delete it somehow. This is the closest I have gotten to at least selecting the data that I need... AAA and /AAA mark the beginning and the end of the substring that I need.
select
substring (column, charindex ('AAA', column), charindex ('/AAA',column))
from table
where column like '%/AAA%'
The problems I am having with this one are that the substring doesn't stop at /AAA, it just keeps going, and some of the results are just blank so it looks something like:
AAA 12345 /AAA abcdefghijklmnop
AAA 12346 /AAA qrstuvwxyzabcdef
AAA 12347 /AAA abcdefghijklmnop
With the characters in bold being the information I need to get rid of. Also even though row 3 is blank, it still does contain the info that I need but I'm guessing that it isn't returning it because it has a different amount of characters before it (for example, rows 1, 2, and 4 might have 50 characters before them but row 3 might have 100 characters before it), at least that's the only reason that I could think of.
So I suppose the first step would probably be to actually select the right substring, then to either delete it or replace it with a different, meaningless substring like "111111" or something.
If there is more information that you need to be provided with or if I was unclear about anything please let me know and thank you for taking the time to read through (and hopefully answer) my question!
Edit: Another one that gets close to the right results goes something like this
select substring(column,charindex('AAA',column),20) from table
where column like '%/AAA%'
I'm not sure if this approach would work better since the substring i'm looking for is always going to have the same amount of characters. The problem with this one though, is that instead of having blank rows, they are replaced with irrelevant substrings from that column, but all of the other rows do return exactly what I want.
First of all, check your usage of SUBSTRING(). The third argument is for length, not end character, so you would need to alter your query to something like:
select substring (column, charindex ('AAA',column)
, charindex ('/AAA',column)-charindex ('AAA',column))
from table where column like '%/AAA%'
Yes your approach of finding it and then either deleting or replacing it is sound.
If some of the results are blank, it's possible that you are finding and replacing the entire string. If it had not found the correct regular expression in there, you would have not returned the row at all, which is different from returning a black value in that column.

SQL Substring a range of data

I want to select out just the email address using SubString.
Here is my column data:
[{"IsPrimary":false,"Address":"test#gmail.com","Type":"Other"}]
Here is my Query:
SELECT SUBSTRING(EmailJson, CHARINDEX('ess":"', EmailJson)+6, CHARINDEX('","Type', EmailJson)) From Respondents
Problem is that it isn't working the way I thought substring would work. I expected it to give me a range of characters. For example I want substring to return a range of characters like 5-10. The way this substring works is that I establish the start and then how long I want it to be from the start position.
How can I alter my query to just return them email only from the column.
I agree with the above comments that this is not an elegant way of doing it but if you really need to use substring then have a look at the below.
I have changed this to work with oracle because that is what I have available and I am unsure what you are using but you should be able to get the idea from it.
SELECT substr(EmailJson, (instr(EmailJson,"Type":"Other"', 'ess":"')+6), (instr(EmailJson,"Type":"Other"', '","Type') - (instr(EmailJson,'ess":"')+6))) From Respondents;

Removing extraneous characters in column using T-SQL

I am attempting to remove extraneous characters from data in a primary key column..the data in this column serves as a control number, and the extra characters are preventing a Web application from effectively interacting with the data.
As an example, one row may look like this:
ocm03204415 820302
I want to remove everything after the space...so the characters '820302'. I could manually do it, but, there are around 2,000 records that have these extra values in the column. It would be great if I could remove them programmatically. I can't do a simple Replace because the characters have no pattern...I couldn't define a rule to discover them...the only thing uniform is the space...although, now that I look at the data set, they do all start with 8.
Is there a way I could remove these characters programmatically? I am familiar with PL/SQL in the Oracle environment, and was wondering if Transactional SQL would offer some possibilities in the MS-SQL environment?
Thanks so much.
You may want to look into the CHARINDEX function to find the space. Then you can use SUBSTRING to grab everything up to the space in a single UPDATE statement.
Try this:
UPDATE YourTable
SET YourColumn = LEFT(YourColumn,CHARINDEX(' ',YourColumn)-1)
WHERE CHARINDEX(' ',YourColumn) > 1