getting the string path piece by piece with regex (SQL -Athena) - sql

i want to convert a string into rows in SQL in Amazon Athena
Since Athena not support certain functions im forced doing many regex functions
a input (who can also have different lengths ) can look like this:
v1 facility username utm_parameter
and i want to turn this into a table who will look like this
1st | 2nd | 3rd | 4th
------ | ------ | ----- | -----
v1 | facility |username | utm_parameter
i allready filter out the first piece of text out of the string with this code:
SELECT REGEXP_EXTRACT( REGEXP_replace( REGEXP_REPLACE( REGEXP_EXTRACT( REGEXP_EXTRACT(message,'path=\S+'),'"(.*?)"'),'/', ' '),'"',''),'\S+') AS '1st' from data
but i dont know how to get the text part after the next blank spaces with the regex
does anyone know how i write the next regex function?

Try this:
-- input, don't use in real query
WITH
input(message) AS (
SELECT 'v1 facility username utm_parameter'
)
-- input end, start real query here
SELECT
SPLIT_PART(message,' ',1) AS "1st"
, SPLIT_PART(message,' ',2) AS "2nd"
, SPLIT_PART(message,' ',3) AS "3rd"
, SPLIT_PART(message,' ',4) AS "4th"
FROM input;
1st|2nd |3rd |4th
v1 |facility|username|utm_parameter
And, for the rest, it's like spelling the word Mississippi: you need to know when to stop.....

Related

How to use the SQL REPLACE Function, so that it will replace some text between a certain range, rather than one specific value

I have a table called Product and I am trying to replace some of the values in the Product ID column pictured below:
ProductID
PIDLL0000074853
PIDLL000086752
PIDLL00000084276
I am familiar with the REPLACE function and have used this like so:
SELECT REPLACE(ProductID, 'LL00000', '/') AS 'Product Code'
FROM Product
Which returns:
Product Code
PID/74853
PIDLL000086752
PID/084276
There will always be there letter L in the ProductID twice LL. However, the zeros range between 4-6. The L and 0 should be replaced with a /.
If anyone could suggest the best way to achieve this, it would be greatly appreciate. I'm using Microsoft SQL Server, so standard SQL syntax would be ideal.
Please try the following solution.
All credit goes to #JeroenMostert
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ProductID VARCHAR(50));
INSERT INTO #tbl (ProductID) VALUES
('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PITLL0000084770');
-- DDL and sample data population, end
SELECT *
, CONCAT(LEFT(ProductID,3),'/', CONVERT(DECIMAL(38, 0), STUFF(ProductID, 1, 5, ''))) AS [After]
FROM #tbl;
Output
+----+------------------+-----------+
| ID | ProductID | After |
+----+------------------+-----------+
| 1 | PIDLL0000074853 | PID/74853 |
| 2 | PIDLL000086752 | PID/86752 |
| 3 | PIDLL00000084276 | PID/84276 |
| 4 | PITLL0000084770 | PIT/84770 |
+----+------------------+-----------+
This isn't particularly pretty in T-SQL, as it doesn't support regex or even pattern replacement. Therefore you method is to use things like CHARINDEX and PATINDEX to find the start and end positions and then replace (don't read REPLACE) that part of the text.
This uses CHARINDEX to find the 'LL', and then PATINDEX to find the first non '0' character after that position. As PATINDEX doesn't support a start position I have to use STUFF to remove the first characters.
Then, finally, we can use STUFF (again) to replace the length of characters with a single '/':
SELECT STUFF(V.ProductID,CI.I+2,ISNULL(PI.I,0),'/')
FROM (VALUES('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PIDLL3246954384276'))V(ProductID)
CROSS APPLY(VALUES(NULLIF(CHARINDEX('LL',V.ProductID),0)))CI(I)
CROSS APPLY(VALUES(NULLIF(PATINDEX('%[^0]%',STUFF(V.ProductID,1,CI.I+2,'')),1)))PI(I);
If you are always starting with "PIDLL", you can just remove the "PIDLL", cast the rest as an INT to lose the leading 0's, then append the front of the string with "PID/". One line of code.
-- Sample Data
DECLARE #t TABLE (ProductID VARCHAR(40));
INSERT #t VALUES('PIDLL0000074853'),('PIDLL000086752'),('PIDLL00000084276');
-- Solution
SELECT t.ProductID, NewProdID = 'PID/'+LEFT(CAST(REPLACE(t.ProductID,'PIDLL','') AS INT),20)
FROM #t AS t;
Returns:
ProductID NewProdID
------------------ ----------------
PIDLL0000074853 PID/74853
PIDLL000086752 PID/86752
PIDLL00000084276 PID/84276

Split a column in two based based on variable lenght field

Hi: I have a table made with rows like this:
ID_CATEGORIA CATEGORIA_DRG
------------ ---------------------------------------------------------------
1 001-002-003-543 Craniotomia
2 004-531-532 Interventi midollo spinale
3 005-533-534 Interventi vasi extracranici
4 006 Decompressione tunnel carpale
I'd like to get something like this:
ID CATEGORIA DESCRIZIONE
------------ ------------------ --------------------------------------
1 001-002-003-543 Craniotomia
2 004-531-532 Interventi midollo spinale
3 005-533-534 Interventi vasi extracranici
4 006 Decompressione tunnel carpale
I don't need to alter the table, a 'formatted' query can be enough.
I Think SUBSTRING() is the right function for me, but I don't know how to mesaure the lenght of the first (numbers, dash-separated) field.
In Python I'll find that size with len("005-533-534 Interventi vasi extracranici".split(' ')[0])', but I don't have idea about how to write it in SQL
Something like this should do -
SELECT ID_CATEGORIA AS ID ,SUBSTRING(CATEGORIA_DRG,1,CHARINDEX(' ',CATEGORIA_DRG)) as CATEGORIA,SUBSTRING(CATEGORIA_DRG,CHARINDEX(' ',CATEGORIA_DRG),LEN(CATEGORIA_DRG)) AS DESCRIZIONE
FROM TABLENAME
Try this:
select id_categoria ID,
substring(categoria_drg, 1, idx) CATEGORIA,
substring(categoria_drg, idx + 1, 1000) DESCRIZIONE
from (
select id_categoria, categoria_drg, charindex(' ', categoria_drg) idx from my_table
) a
It uses charindex to detect when the code is finished, because it is followed by first space in the string, which the function finds :)

Extract first word from a varchar column and reverse it

I have following data in my table
id nml
-- -----------------
1 Temora sepanil
2 Human Mixtard
3 stlliot vergratob
I need to get the result by extracting first word in column nml and get its last 3 characters with reverse order
That means output should be like
nml reverse
----------------- -------
Temora sepanil aro
Human Mixtard nam
stlliot vergratob toi
You use PostgreSQL's string functions to achieve desired output
in this case am using split_part,right,reverse function
select reverse(right(split_part('Temora sepanil',' ',1),3))
output:
aro
so you can write your query in following format
select nml
,reverse(right(split_part(nml,' ',1),3)) "Reverse"
from tbl
Split nml using regexp_split_to_array(string text, pattern text [, flags text ]) refer Postgres Doc for more info.
Use reverse(str) (refer Postgres Doc) to reverse the first word form previous split.
Use substr(string, from [, count]) (refer Postgres Doc) to select first three letters of the reversed test
Query
SELECT
nml,
substr(reverse(regexp_split_to_array(nml, E'\\s+')[0]),3) as reverse
FROM
MyTable
You can use the SUBSTRING, CHARINDEX, RIGHT and REVERSE function
here's the syntax
REVERSE(RIGHT(SUBSTRING(nml , 1, CHARINDEX(' ', nml) - 1),3))
sample:
SELECT REVERSE(RIGHT(SUBSTRING(nml , 1, CHARINDEX(' ', nml) - 1),3)) AS 'Reverse'
FROM TableNameHere

Regular Expression - Retrieve specific asterisk separated value string

I need to retrieve a specific part of a string which has values separated by asterisk's
In the example below I need to retrieve the string Client Contact Centre Seniors2 which sits between the 6 and 7 asterisk.
I am fairly new to regular expressions and have only managed to find select a value between 2 asterisks using *[\w]+*
Is there a way to specify which number of asterisk to look at using regular expression, or is there a better way for me to retrieve the string I am after?
String:
2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*
Note: I will be using this regular expression in Oracle SQL using REGEXP_LIKE(string, regex).
* is a regex operator and needs to be escaped, unless used inside brackets that holds character list. You can use this simplified pattern to extract the
seventh word.
regexp_substr(Audits.audit_log,'[^*]+',1,7)
SQL Fiddle
Query 1:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'([^*]+)\*',1,7,null,1)
from x
Results:
| REGEXP_SUBSTR(Y,'([^*]+)\*',1,7,NULL,1) |
|-----------------------------------------|
| Client Contact Centre Seniors2 |
Query 2:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'[^*]+',1,7)
from x
Results:
| REGEXP_SUBSTR(Y,'[^*]+',1,7) |
|--------------------------------|
| Client Contact Centre Seniors2 |
You could also use INSTR and SUBSTR for that. Simple and fast, but not as concise as the REGEXP_SUBSTR.
with t as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*' testvalue
from dual
)
select substr(testvalue, instr(testvalue, '*', 1, 6)+1, instr(testvalue, '*', 1, 7) - instr(testvalue, '*', 1, 6) - 1)
from t;

Remove sub string from a column's text

I've the following two columns in Postgres table
name | last_name
----------------
AA | AA aa
BBB | BBB bbbb
.... | .....
.... | .....
How can I update the last_name by removing name text from it?
final out put should be like
name | last_name
----------------
AA | aa
BBB | bbbb
.... | .....
.... | .....
UPDATE table SET last_name = regexp_replace(last_name, '^' || name || ' ', '');
This only removes one copy from the beginning of the column and correctly removes the trailing space.
Edit
I'm using a regular expression here. '^' || name || ' ' builds the regular expression, so with the 'Davis McDavis' example, it builds the regular expression '^Davis '. The ^ causes the regular expression to be anchored to the beginning of the string, so it's going to match the word 'Davis' followed by a space only at the beginning of the string it is replacing in, which is the last_name column.
You could achieve the same effect without regular expressions like this:
UPDATE table SET last_name = substr(last_name, length(name) + 2);
You need to add two to the length to create the offset because substr is one-based (+1) and you want to include the space (+1). However, I prefer the regular expression solution even though it probably performs worse because I find it somewhat more self-documenting. It has the additional advantage that it is idempotent: if you run it again on the database it won't have any effect. The substr/offset method is not idempotent; if you run it again, it will eat more characters off your last name.
Not sure about syntax, but try this:
UPDATE table
SET last_name = TRIM(REPLACE(last_name,name,''))
I suggest first to check it by selecting :
SELECT REPLACE(last_name,name,'') FROM table
you need the replace function see http://www.postgresql.org/docs/8.1/static/functions-string.html
UPDATE table SET last_name = REPLACE(last_name,name,'')