Odd bug in SQL TRIM() function - sql

I have the following table:
select * from top3art;
path | count
-----------------------------+--------
/article/candidate-is-jerk | 338647
/article/bears-love-berries | 253801
/article/bad-things-gone | 170098
I want to trim off '/article/' in path values, so I do this:
select *, trim(leading '/article/' from path) from top3art;
path | count | ltrim
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Rows 2 and 3 work just fine. But what happened to the 1st row??
It trimmed '/article/ca'. Why did it take 2 more characters?
Now watch what happens when I just trim '/articl':
select *, trim(leading '/articl' from path) as test from top3art;
path | count | test
-----------------------------+--------+----------------------
/article/candidate-is-jerk | 338647 | e/candidate-is-jerk
/article/bears-love-berries | 253801 | e/bears-love-berries
/article/bad-things-gone | 170098 | e/bad-things-gone
That works as expected... Now watch what happens when I add one more char in my trim clause, '/article':
select *, trim(leading '/article' from path) as test from top3art;
path | count | test
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Same as the first result!
I can't make sense of this.
Why is this happening?
How do I fix it?

trim removes any character in the first argument from the second argument, so it also removes the c and the a of "candidate". Instead of trim, you could use a split_part call:
select *, split_part(path, '/article/', 2) as test from top3art;

Trim removes all signs you mentioned not words/phrases.
Instead of trim use replace()
select *, replace(path, '/article/','') from top3art;

Related

How to explode substrings inside a string in a column in SQL

Let's say I have a table like the one below
| Header 1 | Header 2 | Header 3
--------------------------------------------------------------------------------------
| id1 | detail1 | <a#test.com> , <b#test.com> , <c#test.com> , <d#test.com>
How do i explode it on SQL based on the substring emails inside the angle brackets such that it looks like the one below.
| Header 1 | Header 2 | Header 3. |
-------------------------------------------
| id1 | detail1 | a#test.com |
| id1 | detail1 | b#test.com |
| id1 | detail1 | c#test.com |
| id1 | detail1 | d#test.com |
Using regexp_extract_all and explode should do.
select `Header 1`, `Header 2`, explode(regexp_extract_all(`Header 3`, '<(.+?)>')) as `Header 3` from table
this should get you
+--------+--------+----------+
|Header 1|Header 2|Header 3 |
+--------+--------+----------+
|id1 |detail1 |a#test.com|
|id1 |detail1 |b#test.com|
|id1 |detail1 |c#test.com|
|id1 |detail1 |d#test.com|
+--------+--------+----------+
Be aware that regexp_extract_all was added to spark since version 3.1.0.
For spark blow 3.1.0
This can be done with split, somewhat a dirty hack. But the strategy and the results are the same.
select `Header 1`, `Header 2`, explode(array_remove(split(`Header 3`, '[<>,\\s]+'), '')) as `Header 3` from table
What this do is to regex match the delimiters and split the string into array. It also needs an array_remove function call to remove unneeded empty string.
Explanation
With regexp_extract_all, we use the pattern <(.+?)> to extract all strings within angle brackets, into an array like this
['a#test.com', 'b#test.com', 'c#test.com']
For the pattern (.+?)here
. matches 1 character;
+ is a quantifier of ., looking for 1 or unlimited matches;
? is a non greedy modifier, makes the match stop as soon as possible;
brackets makes the pattern with in angle brackets as a matching group, so we can extract from groups later;
Now with explode, we can separate elements of the array into multiple rows, hence the result above.

How remove symbols from the sentence in Oracle?

In Oracle database I have such table.
| TREE | ORG_NAME |
|---------------------------------|----------|
| \Google earth\Nest global\ATAP | ATAP |
| \Google earth\Nest\Beemoney\ | Beemoney |
| \Google\\\BeeKey\ | |
| | York |
I am trying to make sql query which would return such result.
| ORGANIZATION |
|-----------------------------------|
| Google earth > Nest global > ATAP |
| Google earth > Nest Beemoney |
| Google > BeeKey |
| York |
As you can see I want:
1) Replace \ symbol at the beginning and end of the sentence.
2) Replace \ symbol which is inside sentence to > symbol.
3) Replace \\\ symbol which is inside sentence to > symbol.
4) If TREE colomn is empty take record from ORG_NAME colomn.
Here is how I started. This SQL query solve 2, 3 and 4 part. How to solve problem with 1 part. I think I need to use REGEXP_REPLACE, right? How to make it correctly? Is there any other more elegant way to redisign sql query? As you can see I walk on the same table a few times.
SELECT
COALESCE (TREE, ORG_NAME) as ORGANIZATION
FROM (
SELECT
REPLACE(TREE, '\', '>') AS TREE,
ORG_NAME
FROM (
SELECT
REPLACE(TREE, '\\\', '>') AS TREE,
ORG_NAME
FROM
ORG
)
)
This could be a way with a regexp_replace and a trim to remove the characters from the beginning and the end of the string:
select nvl(regexp_replace( trim('\' from tree), '\\+', ' > '), org_name)
from yourTable
Here is a working solution which uses two calls to regexp_replace:
select
regexp_replace(
regexp_replace('\Google\\\BeeKey\', '^\\?(.*?)\\?$', '\1'), '\\+', ' > ')
from dual;
Google > BeeKey
Demo
The inner call to regexp_replace strips off any possible leading or trailing path separators. The outer call converts any number of internal path separators / to > separators as a replacement.

Oracle SQL - Substring issue

I have an field pattern and value in that field is INDI/17-18/6767/KER/787 .I want to get 6767 from this string
I used the query
select substr(pattern,12,15) from pattern_table
But the output I got is 6767/KER/787 instead of 6767.
Try this:
You have to give the length as the 3rd value, not the position.
SELECT SUBSTR(pattern,12,4) FROM pattern_table
For a generic result to get the 3rd value separated by a delimiter, you may use REGEXP_SUBSTR.
SQL Fiddle
Query 1:
SELECT pattern,REGEXP_SUBSTR(pattern, '[^/]+', 1, 3) id
FROM pattern_table
Results:
| PATTERN | ID |
|--------------------------|-------|
| INDI/17-18/6767/KER/787 | 6767 |
| INDI/17-18-19/67/KER/787 | 67 |
| INDI/16-18/67890/KAR/986 | 67890 |
even this will also work:
SELECT substr('INDI/17-18/6767/KER/787',instr('INDI/17-
18/6767/KER/787','/',1,2)+1,4) FROM dual;

Replacing first occurence of character in a string using HiveQL

I am trying to replace the first occurrence of '-' in a string in Hive table. I am using HiveQL. I searched this topic here and other websites, but could not find clear explanation how to use metacharacters with regexp_replace() to do that.
This is a string from which I need to replace first '-' with empty space: 16-001-02707
The result should be like this: 16001-02707
This is the method I used:
select regexp_replace ('16-001-02707','[^[:digit:]]', '');
However, this doesn't do anything.
select regexp_replace ('16-001-02707','^(.*?)-', '$1');
16001-02707
Following the OP question in the comments
with t as (select '111-22-333333-4-555-6-7-8888-999999' as col)
select regexp_replace (col,'^(.*?)-','$1')
,regexp_replace (col,'^(.*?-.*?)-','$1')
,regexp_replace (col,'^((.*?-){2}.*?)-','$1')
,regexp_replace (col,'^((.*?-){3}.*?)-','$1')
,regexp_replace (col,'^((.*?-){4}.*?)-','$1')
,regexp_replace (col,'^((.*?-){5}.*?)-','$1')
from t
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| _c0 | _c1 | _c2 | _c3 | _c4 | _c5 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| 11122-333333-4-555-6-7-8888-999999 | 111-22333333-4-555-6-7-8888-999999 | 111-22-3333334-555-6-7-8888-999999 | 111-22-333333-4555-6-7-8888-999999 | 111-22-333333-4-5556-7-8888-999999 | 111-22-333333-4-555-67-8888-999999 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+

How to get numbers arranged right to left in sql server SELECT statements

When performing SELECT statements including number columns (prices, for example), the result always is left to right ordered, which reduces the readability. Therefore I'm searching a method to format the output of number columns right to left.
I already tried to use something like
SELECT ... SPACE(15-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
which gives close results, but depending on font not really. An alternative would be to replace 'SPACE()' with 'REPLICATE('_',...)', but I don't really like the underscores in output.
Beside that this formula will crash on numbers with more digits than 15, therefore I searched for a way finding the maximum length of entries to make it more save like
SELECT ... SPACE(MAX(A.Nummer)-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
but this does not work due to the aggregate character of the MAX-function.
So, what's the best way to achieve the right-justified order for the number-columns?
Thanks,
Rainer
To get you problem with the list box solved have a look at this link: http://www.lebans.com/List_Combo.htm
I strongly believe that this type of adjustment should be made in the UI layer and not mixed in with data retrieval.
But to answer your original question i have created a SQL Fiddle:
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.some_numbers(n INT);
Create some example data:
INSERT INTO dbo.some_numbers
SELECT CHECKSUM(NEWID())
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))X(x);
The following query is using the OVER() clause to specify that the MAX() is to be applied over all rows. The > and < that the result is wrapped in is just for illustration purposes and not required for the solution.
Query 1:
SELECT '>'+
SPACE(MAX(LEN(CAST(n AS VARCHAR(MAX))))OVER()-LEN(CAST(n AS VARCHAR(MAX))))+
CAST(n AS VARCHAR(MAX))+
'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| > 1620287540< |
| >-1451542215< |
| >-1257364471< |
| > -819471559< |
| >-1364318127< |
| >-1190313739< |
| > 1682890896< |
| >-1050938840< |
| > 484064148< |
This query does a straight case to show the difference:
Query 2:
SELECT '>'+CAST(n AS VARCHAR(MAX))+'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| >1620287540< |
| >-1451542215< |
| >-1257364471< |
| >-819471559< |
| >-1364318127< |
| >-1190313739< |
| >1682890896< |
| >-1050938840< |
| >484064148< |
With this query you still need to change the display font to a monospaced font like COURIER NEW. Otherwise, as you have noticed, the result is still misaligned.