How do I remove the | special character in a CHAR dataset using proc sql?

How do I remove the | special character in a CHAR dataset using proc sql? - sql

I am trying to remove the charater | using proc sql. The position of | is not fixed and varies in the data, hence I do not want to use the substr function
Example 1- 1234|5678|9|101
Example 2 - 12345|6789|1|011

You can use TRANSLATE() function
UPDATE tab
SET TRANSLATE(Col, '', '|')

In oracle you could use REPLACE, something like
SELECT REPLACE('1234|5678|9|101 Example 2 - 12345|6789|1|011','|','') Changed
FROM DUAL;

Related

How to add delimiter to String after every n character using hive functions?

I have the hive table column value as below.
"112312452343"
I want to add a delimiter such as ":" (i.e., a colon) after every 2 characters.
I would like the output to be:
11:23:12:45:23:43
Is there any hive string manipulation function support available to achieve the above output?

For fixed length this will work fine:
select regexp_replace(str, "(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})","$1:$2:$3:$4:$5:$6")
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
Another solution which will work for dynamic length string. Split string by the empty string that has the last match (\\G) followed by two digits (\\d{2}) before it ((?<= )), concatenate array and remove delimiter at the end (:$):
select regexp_replace(concat_ws(':',split(str,'(?<=\\G\\d{2})')),':$','')
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
If it can contain not only digits, use dot (.) instead of \\d:
regexp_replace(concat_ws(':',split(str,'(?<=\\G..)')),':$','')

This is actually quite simple if you're familiar with regex & lookahead.
Replace every 2 characters that are followed by another character, with themselves + ':'
select regexp_replace('112312452343','..(?=.)','$0:')
+-------------------+
| _c0 |
+-------------------+
| 11:23:12:45:23:43 |
+-------------------+

Single hive query to remove certain text in data

I have a column data like this in 2 formats
1)"/abc/testapp/v1?FirstName=username&Lastname=test123"
2)"/abc/testapp/v1?FirstName=username"
I want to retrieve the output as "/abc/testapp/v1?FirstName=username" and strip out the data starting with "&Lastname" and ending with "".The idea is to remove the Lastname with its value.
But if the data doesn't contain "&Lastname" then it should also work fine as per the second scenario
The value for Lastname shown in the example is "test123" but in general this will be dynamic
I have started with regexp_replace but i am able to replace "&Lastname" but not its value.
select regexp_replace("/abc/testapp/v1?FirstName=username&Lastname=test123&type=en_US","&Lastname","");
Can someone please help here how i can achieve both these with a single hive query?

Use split function:
with your_data as (--Use your table instead of this example
select stack (2,
"/abc/testapp/v1?FirstName=username&Lastname=test123",
"/abc/testapp/v1?FirstName=username"
) as str
)
select split(str,'&')[0] from your_data;
Result:
_c0
/abc/testapp/v1?FirstName=username
/abc/testapp/v1?FirstName=username
Or use '&Lastname' pattern for split:
select split(str,'&Lastname')[0] from your_data;
It will allow something else with & except starting with &Lastname

for both queries with or without last name its working in this way using split for hive no need for any table to select you can directly execute the function like select functionname
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0]
select
split("/abc/testapp/v1FirstName=username",'&')[0]
Result :
_c0
/abc/testapp/v1FirstName=username
you can make a single query :
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0],
split("/abc/testapp/v1FirstName=username",'&')[0]
_c0 _c1
/abc/testapp/v1FirstName=username /abc/testapp/v1FirstName=username

Get the last part of the value returned by split_part() function

I have a file_path string separated by forward slashes. I want to split them based on the forward slashes and return the file name.
INPUT
//a/b/c/xyz.png
OUTPUT
xyz.png
CURRENT SOLUTION
SELECT REVERSE(SPLIT_PART(REVERSE('//a/b/c/xyz.py'), '/', 1)) as "file_name";
Is there a more efficient way of doing this?

regexp_match() is more concise:
select (regexp_match('//a/b/c/xyz.py', '[^/]+$'))[1]

I would just use regexp_replace() to remove everything before the last slash (included):
select regexp_replace('//a/b/c/xyz.png', '.*/', '')
Demo on DB Fiddle:
| regexp_replace |
| :------------- |
| xyz.png |
You can also use substring(), which may or may not be more efficient:
substring('//a/b/c/xyz.png' from '[^/]*$')

PostgreSQL 14 will support negative index so it will be straightforward operation.
split_part
Splits string at occurrences of delimiter and returns the n'th field (counting from one), or when n is negative, returns the |n|'th-from-last field.
split_part('abc,def,ghi,jkl', ',', -2) → ghi
In this particular scenario:
SELECT SPLIT_PART('//a/b/c/xyz.py', '/', -1) as "file_name";

using length function in REGEXP_REPLACE() in Postgres

I am removing that last 3 characters from the string "ABC123" using regexp_replace function in Oracle using the below statement
select REGEXP_REPLACE('ABC123','123','', LENGTH('ABC123') - 3) from dual;
The same result can be achieved in Postgres with the below statements,
select regexp_replace('ABC123','[123]', '','g')
select translate('ABC123','123', '');
Is there any way I can use the length function for replace as I have used in Oracle?

Why not simply use left()?
select left('ABC123', length('ABC123') - 3)
The same idea can be used in Oracle as well, but you need to use the substr() function. This should be more efficient in both databases.

You could also look into the trim functionality.
http://www.postgresqltutorial.com/postgresql-trim-function/
"select REGEXP_REPLACE('ABC123','123','', LENGTH('ABC123') - 3) from dual;"
would become
select ltrim('ABC123','ABC') from dual;
resulting in 123

Regular expression for gettin data after - in sql

I have a column with assignment numbers like - 11827,27266,91717,09818-2,726252-3,8716151-0,827272,18181
Now i am selecting the records like
select assignment_number from table;
But now i want that the column detail is retreived in such a way that numbers are only retrieved without -2 -3 etc like
726252-3---> 726252 8716151-0-->8716151
I know i can use regex for this but i do not know how to use it

This will select everthing before the character -:
^([^-]+)
From 726252-3 will match 726252

You would use regexp() substr:
select regexp_substr(assignmentnumber, '[0-9]+')
This will return the first string of numbers encountered in the string.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I remove the | special character in a CHAR dataset using proc sql? - sql

I am trying to remove the charater | using proc sql. The position of | is not fixed and varies in the data, hence I do not want to use the substr function Example 1- 1234|5678|9|101 Example 2 - 12345|6789|1|011

You can use TRANSLATE() function UPDATE tab SET TRANSLATE(Col, '', '|')

In oracle you could use REPLACE, something like SELECT REPLACE('1234|5678|9|101 Example 2 - 12345|6789|1|011','|','') Changed FROM DUAL;

Related

How to add delimiter to String after every n character using hive functions?

Single hive query to remove certain text in data

Get the last part of the value returned by split_part() function

using length function in REGEXP_REPLACE() in Postgres

Regular expression for gettin data after - in sql

Categories

Resources