Delete specific pattern between commas in text file - sql

I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance

You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.

Related

SQL script to update all column values starting with number and - with blank in Postgresql

I need to update a varchar column's values
The values start with a number followed by - and then some letters
For Ex: 27-Check This
I need to update this value ie, I need to remove the starting number and the -
Expected output Example: Check This
NB: only the starting number and - should be removed all the values after the first alphabet should not be changed. Ie, if some number or - is present after the first alphabet then that should not be removed.
For ex: 27-Check 23-C This
Expected output: Check 23-C This
NB: I am new to sql so please help even if this looks simple to you
you can use regexp_replace to remove the leading digits:
update the_table
set the_column = regexp_replace(the_column, '^[0-9]{1,}\s*-\s*', '')
where the_column ~ '^[0-9]{1,}'
^[0-9]{1,}- in details:
^ match at the start of the string
[0-9]{1,} at least one number
\s* followed by zero or more (white) space
- followed by a dash
\s* followed by zero or more (white) space
The where clause ensure that only those columns are changed that need to be changed (e.g. values not starting with a number won't be touched at all).
If you just want everything after the first hyphen when the pattern starts with a number, you can use:
update t
set col = substring(col from '-(.*)')
where col ~ '^[0-9]+-';
substring() with a pattern is a nice implementation of what would be called regexp_substr() in other databases. It simply returns the first time the pattern is in the string. The full pattern is matched, but if there are parentheses, then only that portion is returned.

Regular expression - capture number between underscores within a sequence between commas

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten
We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

Regexp for removing all spaces and digits

I am trying to use regular expression in a sql statement to remove all spaces and digits from a string 'text' which may look like
1. 000000123456 (No space)
2. 00000 123456 (spaces in the beginning and end of string)
3. 90330000 45 (2 spaces at the end)
I have been able to come up with the solutions below so far:
select regexp_replace('text','\\s(^[0-9]*)\\s','\\1')
select regexp_replace('text','[[:blank:]]+[^[0-9]*][[:blank:]]+','\\1')
The results I get are:
1. 000000123456
2. 00000 123456
3. 90330000 45
I get the text as is. If I try to just remove the digits using
regexp_replace('text','^[0-9]*','\\1'),
it works fine- all digits get removed and the value results in ''(null). But the text with spaces does not remove the digits nor the space.
What am I doing wrong here?
Try using a character class which contains both digits and whitespace:
regexp_replace('text', '[0-9\\s]+', '')
This should remove all numbers and whitespace. But, it isn't completely clear what you are trying to do, because you did not show us the numbers in context.

Notepad++ rows to columns, in groups

I have found a ton of ways to transpose columns to text in Notepad++ and vice versa. However, where I'm struggling is that I have one column with several rows. I can't 'just' transpose these as the data winds up being in the wrong order.
Example:
RANK
COMPANY
GROWTH
REVENUE
INDUSTRY
1
Skillz
50,058.92%
$54.2m
Software
2
EnviroSolar Power
36,065.06%
$37.4m
Energy
When I transpose this, I wind up with:
RANKCOMPANYGROWTHREVENUEINDUSTRY 1Skillz50,058.92%$54.2mSoftware2EnviroSolar Power36,065.06%$37.4mEnergy
I need everything to remain in groups so I wind up with the following, noting that I also need a delimiter added:
RANK|COMPANY|GROWTH|REVENUE|INDUSTRY
1|Skillz|50,058.92%|$54.2m|Software
2|EnviroSolar Power|36,065.06|$37.4m|Energy
As you can see with the company EnviroSolar Power, there is a space between "EnviroSolar" and "Power" and anything I've tried winds up removing the spaces that should remain in tact when transposing.
I appreciate ANY help you can offer! Thank you in advance!
Assuming that your rows always start with integers (except for the header row of course) and furthermore, that only the first column contains integers you could do do that with two search replace (Ctrl+H).
Be sure to opt for 'Regular expression' search mode.
First replace all newlines with pipes. This will put everything on one line for now.
Find what: \n
Replace with: |
Next find all pure numeric fields and make them start of a line to reach the desired result.
Find what: \|([0-9]+)\|
Replace with: \n$1|
If you know the number of columns, in fact here it is 5, you could do in two steps:
First:
Ctrl+H
Find what: (?:[^\r\n]+\R){5}
Replace with: $0\n
Replace all
Explanation:
(?: : start non capture group
[^\r\n]+ : 1 or more any character but line break
\R : any kind of line break
){5} : group must occurs 5 times,
here you can give the columns number of your choice
This will add a linebreak after 5 columns.
Check regular expression
Second:
Ctrl+H
Find what: (\R)(?!\R)|(\R\R)
Replace with: (?1|:\n)
Replace all
Explanation:
(\R) : any kind of line break, in group 1
(?!\R) : negative lookahead, make sure we have not another linebreak after
| : OR
(\R\R) : 2 line break, in group 2
Replacement:
(?1 : conditional replacement, is group 1 existing
| : yes ==> a pipe
:\n : no ==> linebreak
) : end condition
This will replace a single linebreak by a pipe and 2 consecutive linebreaks by a single one
Result for given example:
RANK|COMPANY|GROWTH|REVENUE|INDUSTRY
1|Skillz|50,058.92%|$54.2m|Software
2|EnviroSolar Power|36,065.06%|$37.4m|Energy

match line that doesnt contain certain words

I have the following string:
ignoreword1,word1, ignoreword2
i would like to match any word that is not ignoreword1 or ignoreword2
this is what i have so far
(?s)^((?!ignoreword1).)*$
the main goal is to use the regex as part of postgresql database to select rows where the column match a substring after removing "ignoreword1", "ignoreword2" and the comma ","
To match any word that is not ignoreword1 or Ignoreword2 use 
\b(?!(?:ignoreword1|ignoreword2)\b)\w+
In PostgreSQL, word boundaries are [[:<:]] and [[:>:]], so use something like:
[[:<:]](?!(?:ignoreword1|ignoreword2)[[:>:]])[a-zA-Z]+
Pattern details:
[[:<:]] - leading word boundary
(?!(?:ignoreword1|ignoreword2)[[:>:]]) - fail the match if the whole string is either ignoreword1 or ignoreword2
[a-zA-Z]+ - one or more any ASCII letters.