Replacing first occurence of character in a string using HiveQL - hive

I am trying to replace the first occurrence of '-' in a string in Hive table. I am using HiveQL. I searched this topic here and other websites, but could not find clear explanation how to use metacharacters with regexp_replace() to do that.
This is a string from which I need to replace first '-' with empty space: 16-001-02707
The result should be like this: 16001-02707
This is the method I used:
select regexp_replace ('16-001-02707','[^[:digit:]]', '');
However, this doesn't do anything.

select regexp_replace ('16-001-02707','^(.*?)-', '$1');
16001-02707
Following the OP question in the comments
with t as (select '111-22-333333-4-555-6-7-8888-999999' as col)
select regexp_replace (col,'^(.*?)-','$1')
,regexp_replace (col,'^(.*?-.*?)-','$1')
,regexp_replace (col,'^((.*?-){2}.*?)-','$1')
,regexp_replace (col,'^((.*?-){3}.*?)-','$1')
,regexp_replace (col,'^((.*?-){4}.*?)-','$1')
,regexp_replace (col,'^((.*?-){5}.*?)-','$1')
from t
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| _c0 | _c1 | _c2 | _c3 | _c4 | _c5 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| 11122-333333-4-555-6-7-8888-999999 | 111-22333333-4-555-6-7-8888-999999 | 111-22-3333334-555-6-7-8888-999999 | 111-22-333333-4555-6-7-8888-999999 | 111-22-333333-4-5556-7-8888-999999 | 111-22-333333-4-555-67-8888-999999 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+

Related

How to explode substrings inside a string in a column in SQL

Let's say I have a table like the one below
| Header 1 | Header 2 | Header 3
--------------------------------------------------------------------------------------
| id1 | detail1 | <a#test.com> , <b#test.com> , <c#test.com> , <d#test.com>
How do i explode it on SQL based on the substring emails inside the angle brackets such that it looks like the one below.
| Header 1 | Header 2 | Header 3. |
-------------------------------------------
| id1 | detail1 | a#test.com |
| id1 | detail1 | b#test.com |
| id1 | detail1 | c#test.com |
| id1 | detail1 | d#test.com |
Using regexp_extract_all and explode should do.
select `Header 1`, `Header 2`, explode(regexp_extract_all(`Header 3`, '<(.+?)>')) as `Header 3` from table
this should get you
+--------+--------+----------+
|Header 1|Header 2|Header 3 |
+--------+--------+----------+
|id1 |detail1 |a#test.com|
|id1 |detail1 |b#test.com|
|id1 |detail1 |c#test.com|
|id1 |detail1 |d#test.com|
+--------+--------+----------+
Be aware that regexp_extract_all was added to spark since version 3.1.0.
For spark blow 3.1.0
This can be done with split, somewhat a dirty hack. But the strategy and the results are the same.
select `Header 1`, `Header 2`, explode(array_remove(split(`Header 3`, '[<>,\\s]+'), '')) as `Header 3` from table
What this do is to regex match the delimiters and split the string into array. It also needs an array_remove function call to remove unneeded empty string.
Explanation
With regexp_extract_all, we use the pattern <(.+?)> to extract all strings within angle brackets, into an array like this
['a#test.com', 'b#test.com', 'c#test.com']
For the pattern (.+?)here
. matches 1 character;
+ is a quantifier of ., looking for 1 or unlimited matches;
? is a non greedy modifier, makes the match stop as soon as possible;
brackets makes the pattern with in angle brackets as a matching group, so we can extract from groups later;
Now with explode, we can separate elements of the array into multiple rows, hence the result above.

sql-remove dashes from string column

in stored procedure, i have this field
LTRIM(ISNULL(O.Column1, ''))
If there is a dash(-) symbol at end of the value, want to remove it. only in conditions if a dash symbol exist at start/end.
Any suggestions
EDIT:
Microsoft SQL Server 2014 12.0.5546.0
Expected output:
1)input: "abc-abc" //output: "abc-abc"
2)input: "abc-" //output: "abc"
3)input: "abc" //ouput: "abc"
I think you might be stuck with string manipulation here.
The CASE expression here takes the LTRIM/RTRIM result from your column and checks both ends for a dash, and then each end for a dash. If dashes exist, it strips them out. It's not pretty, and won't perform well on a mountain of data, but will do what you need.
Data setup:
create table trim (col1 varchar(10));
insert trim (col1)
values
('abc'),
(' abc-'),
('abc- '),
('abc-abc '),
(' -abc'),
('-abc '),
(NULL),
(''),
(' -abc- ');
The query:
select
case
when right(ltrim(rtrim(isnull(col1,''))),1) = '-'
and left(ltrim(rtrim(isnull(col1,''))),1) = '-'
then substring(ltrim(rtrim(isnull(col1,''))),2,len(ltrim(rtrim(isnull(col1,''))))-2)
when right(ltrim(rtrim(isnull(col1,''))),1) = '-'
then left(ltrim(rtrim(isnull(col1,''))), len(ltrim(rtrim(isnull(col1,''))))-1)
when left(ltrim(rtrim(isnull(col1,''))),1) = '-'
then right(ltrim(rtrim(isnull(col1,''))), len(ltrim(rtrim(isnull(col1,''))))-1)
else ltrim(rtrim(isnull(col1,'')))
end as trimmed
from trim;
Results:
+---------+
| trimmed |
+---------+
| abc |
| abc |
| abc |
| abc-abc |
| abc |
| abc |
| |
| |
| abc |
+---------+
SQL Fiddle Demo
Since the Database is not mentioned, here is how you do it (rather find it)
SQL Server
Remove the last character in a string in T-SQL?
Oracle
Remove last character from string in sql plus
Postgresql
Postgresql: Remove last char in text-field if the column ends with minus sign
MySQL
Strip last two characters of a column in MySQL
You can use LEFT function, along with SUBSTRING to achieve the result.
SELECT CASE WHEN RIGHT(stringVal,1)= '-' THEN SUBSTRING(stringVal,1,LEN(stringVal)-1)
ELSE stringVal END AS ModifiedString
from
( VALUES ('abc-abc'), ('abc-'),('abc')) as t(stringVal)
+----------------+
| ModifiedString |
+----------------+
| abc-abc |
| abc |
| abc |
+----------------+

How remove symbols from the sentence in Oracle?

In Oracle database I have such table.
| TREE | ORG_NAME |
|---------------------------------|----------|
| \Google earth\Nest global\ATAP | ATAP |
| \Google earth\Nest\Beemoney\ | Beemoney |
| \Google\\\BeeKey\ | |
| | York |
I am trying to make sql query which would return such result.
| ORGANIZATION |
|-----------------------------------|
| Google earth > Nest global > ATAP |
| Google earth > Nest Beemoney |
| Google > BeeKey |
| York |
As you can see I want:
1) Replace \ symbol at the beginning and end of the sentence.
2) Replace \ symbol which is inside sentence to > symbol.
3) Replace \\\ symbol which is inside sentence to > symbol.
4) If TREE colomn is empty take record from ORG_NAME colomn.
Here is how I started. This SQL query solve 2, 3 and 4 part. How to solve problem with 1 part. I think I need to use REGEXP_REPLACE, right? How to make it correctly? Is there any other more elegant way to redisign sql query? As you can see I walk on the same table a few times.
SELECT
COALESCE (TREE, ORG_NAME) as ORGANIZATION
FROM (
SELECT
REPLACE(TREE, '\', '>') AS TREE,
ORG_NAME
FROM (
SELECT
REPLACE(TREE, '\\\', '>') AS TREE,
ORG_NAME
FROM
ORG
)
)
This could be a way with a regexp_replace and a trim to remove the characters from the beginning and the end of the string:
select nvl(regexp_replace( trim('\' from tree), '\\+', ' > '), org_name)
from yourTable
Here is a working solution which uses two calls to regexp_replace:
select
regexp_replace(
regexp_replace('\Google\\\BeeKey\', '^\\?(.*?)\\?$', '\1'), '\\+', ' > ')
from dual;
Google > BeeKey
Demo
The inner call to regexp_replace strips off any possible leading or trailing path separators. The outer call converts any number of internal path separators / to > separators as a replacement.

Odd bug in SQL TRIM() function

I have the following table:
select * from top3art;
path | count
-----------------------------+--------
/article/candidate-is-jerk | 338647
/article/bears-love-berries | 253801
/article/bad-things-gone | 170098
I want to trim off '/article/' in path values, so I do this:
select *, trim(leading '/article/' from path) from top3art;
path | count | ltrim
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Rows 2 and 3 work just fine. But what happened to the 1st row??
It trimmed '/article/ca'. Why did it take 2 more characters?
Now watch what happens when I just trim '/articl':
select *, trim(leading '/articl' from path) as test from top3art;
path | count | test
-----------------------------+--------+----------------------
/article/candidate-is-jerk | 338647 | e/candidate-is-jerk
/article/bears-love-berries | 253801 | e/bears-love-berries
/article/bad-things-gone | 170098 | e/bad-things-gone
That works as expected... Now watch what happens when I add one more char in my trim clause, '/article':
select *, trim(leading '/article' from path) as test from top3art;
path | count | test
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Same as the first result!
I can't make sense of this.
Why is this happening?
How do I fix it?
trim removes any character in the first argument from the second argument, so it also removes the c and the a of "candidate". Instead of trim, you could use a split_part call:
select *, split_part(path, '/article/', 2) as test from top3art;
Trim removes all signs you mentioned not words/phrases.
Instead of trim use replace()
select *, replace(path, '/article/','') from top3art;

Oracle SQL - Substring issue

I have an field pattern and value in that field is INDI/17-18/6767/KER/787 .I want to get 6767 from this string
I used the query
select substr(pattern,12,15) from pattern_table
But the output I got is 6767/KER/787 instead of 6767.
Try this:
You have to give the length as the 3rd value, not the position.
SELECT SUBSTR(pattern,12,4) FROM pattern_table
For a generic result to get the 3rd value separated by a delimiter, you may use REGEXP_SUBSTR.
SQL Fiddle
Query 1:
SELECT pattern,REGEXP_SUBSTR(pattern, '[^/]+', 1, 3) id
FROM pattern_table
Results:
| PATTERN | ID |
|--------------------------|-------|
| INDI/17-18/6767/KER/787 | 6767 |
| INDI/17-18-19/67/KER/787 | 67 |
| INDI/16-18/67890/KAR/986 | 67890 |
even this will also work:
SELECT substr('INDI/17-18/6767/KER/787',instr('INDI/17-
18/6767/KER/787','/',1,2)+1,4) FROM dual;