Regex to split values in PostgreSQL

Regex to split values in PostgreSQL - sql

I have a list of values coming from a PGSQL database that looks something like this:
198
199
1S
2
20
997
998
999
C1
C10
A
I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.
The above list would then be split into this layout as the result from PG:
I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.
All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.

For extracting the characters before the digits:
regexp_replace(fieldname, '\d.*$', '')
For extracting the characters after the digits:
regexp_replace(fieldname, '^([^\d]*\d*)', '')
Note that:
if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
This also works for any non-alphanumeric characters like #, [, ! ...etc.
Final SQL
select
fieldname as original,
regexp_replace(fieldname, '\d.*$', '') as before_s,
regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;
See fiddle.

This answer relies on information you delivered, which is
Essentially, any non-numeric character that appears before numeric
ones needs to be returned for one column, and the other column would
show all non-numeric characters appearing AFTER numeric ones.
Everything non-numeric before a numeric value into 1 column
Everything non-numeric after a numeric value into 2 column
So there's assumption that you have a value that has a numeric value in it.
select
val,
regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from
val;
Attached SQLFiddle for a preview.

Related

Trim Leading Zeroes Only If Numeric

I have a column containing a combination of numeric and alphanumeric values. When the value is strictly numeric, the database stores it with leading zeroes (but not always), but not if not.
Here's some sample data:
I need to use these values as part of a string that I will use to join to another table. Unfortunately, the portion of the string that corresponds to this field in the other table snips off the leading zeroes of any of the numeric-only values. I'm stumped finding a method of snipping the leading zeroes ONLY in this case.
I found this solution, but it's not for SQL Server (2012). Trim leading zeroes if it is numeric and not trim zeroes if it is alphanumeric
I also saw this, but it also removes the leading zeroes from the hyphenated values shown in the example, which doesn't work. Better techniques for trimming leading zeros in SQL Server?
Help! Thanks!

You could use:
select (case when col not like '%[^0-9]%'
then convert(varchar(255), try_convert(numeric(38), col))
else col
end)
This works for up to 38 digits after the leading zeros

The database does not store anything in varchar (text) fields except what you give it. If you give it leading zeroes, it will save them, it has no reason not to as it's just a piece of text.
For your problem, you can do this:
ISNULL(CAST(TRY_CAST(field AS numeric(38)) AS varchar(insert_field_length))), field)

Regular expression - capture number between underscores within a sequence between commas

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten

We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

Parsing a string and comparing values to existing column

I have the below table with the string marked "Remark" that needs to be parsed. The highlighted fares need to be compared from the columns TotalBookedFare and Remark. The only issue is that the value I need to compare under the Remark column is in the middle of a string. I've tried to parse the string but I cannot figure it out. I am using SQL Server 2008. As you can see the first row is not a match while the other three are matching.
Ideally I would like to convert the one string "Remark" to the 5 columns listed below so I can compare the TotalBookedFare to the "New" column.dionbennett

I think this should work
select substring(
remark, --string base
charindex ('/', 'xyz/57.77usd/zyx') + 1,
--starting position is location one to the right of first instance of / character (5)
charindex ('u', 'xyz/57.77usd/zyx', charindex ('/', 'xyz/57.77usd/zyx')) - charindex ('/', 'xyz/57.77usd/zyx') - 1
--length is the location of the first instance of the u character
--starting from the location of first instance of the / character (10)
--then subtracted by the location of the first instance of the / character (4)
--and then an additional 1 resulting in the length of the string to be extracted (5)
)
The string I put in there is just a more concrete example, if you replace it with Remark, it should extract the substring for each row. You could even modify it with some copy/pasting to get each of those columns you were looking for.

DB2 TRIM 000000 to 0

I have looked at:
DB2 SQL Query Trim inside trim
Trimming Blank Spaces in Char Column in DB2
SQL Trim after final semi colon
The IBM infocenter.
I have a column that is six long and a character column. A typical value would be AA01AA. I need to substring the middle 2 characters out of the value and convert to a number.
I am doing this with the following code: TRIM(L '0' FROM(SUBSTRING(Myfield, 3, 2))). In the example value above that gives me 1. The problem comes in when the value is 000000. The trim returns ''. I need it to return 0.
I have tried REPLACE(TRIM(L '0' FROM(SUBSTRING(Myfield, 3, 2))),'' ,'0') but that simply gives me a blank string back. I have also tried TRANSLATE(TRIM(L '0' FROM(SUBSTRING(Myfield, 3, 2))), '0', '') but that gives an error about parameter 03 being an invalid data type, length etc.
I would appreciate any help.

Something like this should do the trick:
integer(substr(myField,3,2)))
The SUBSTR extracts the two characters, the INTEGER takes it as input and converts it to a number. TRIM is not necessary at all.
values(integer(substr('000000',3,2)))
1
-----------
0
1 record(s) selected.

Split a field and add these fields to another table

I have a table which has a field that allows up to 120 chars. I want to split the field into three fields. If the field contains more than 40 chars and less than 80 then split the field into two. The split point should be the first space char, before the 40th character and add the two new fields to another table. and if the field is 120 char then split them in three.
Will appreciate the help!

I guess you could do something along the lines of:
SELECT
SUBSTRING(MyCol,1,40),
NULLIF(SUBSTRING(MyCol,41,40), ''),
NULLIF(SUBSTRING(MyCol,81,40), ''),
To have your 1 column broken down correctly for your INSERT statement.
The NullIf function will set whatever column needs to be NULL correctly if the SubString() function returns an empty string for that value.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Regex to split values in PostgreSQL - sql

Related

Trim Leading Zeroes Only If Numeric

Regular expression - capture number between underscores within a sequence between commas

Parsing a string and comparing values to existing column

DB2 TRIM 000000 to 0

Split a field and add these fields to another table

Categories

Resources