I have the following table:
select * from top3art;
path | count
-----------------------------+--------
/article/candidate-is-jerk | 338647
/article/bears-love-berries | 253801
/article/bad-things-gone | 170098
I want to trim off '/article/' in path values, so I do this:
select *, trim(leading '/article/' from path) from top3art;
path | count | ltrim
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Rows 2 and 3 work just fine. But what happened to the 1st row??
It trimmed '/article/ca'. Why did it take 2 more characters?
Now watch what happens when I just trim '/articl':
select *, trim(leading '/articl' from path) as test from top3art;
path | count | test
-----------------------------+--------+----------------------
/article/candidate-is-jerk | 338647 | e/candidate-is-jerk
/article/bears-love-berries | 253801 | e/bears-love-berries
/article/bad-things-gone | 170098 | e/bad-things-gone
That works as expected... Now watch what happens when I add one more char in my trim clause, '/article':
select *, trim(leading '/article' from path) as test from top3art;
path | count | test
-----------------------------+--------+--------------------
/article/candidate-is-jerk | 338647 | ndidate-is-jerk
/article/bears-love-berries | 253801 | bears-love-berries
/article/bad-things-gone | 170098 | bad-things-gone
Same as the first result!
I can't make sense of this.
Why is this happening?
How do I fix it?
trim removes any character in the first argument from the second argument, so it also removes the c and the a of "candidate". Instead of trim, you could use a split_part call:
select *, split_part(path, '/article/', 2) as test from top3art;
Trim removes all signs you mentioned not words/phrases.
Instead of trim use replace()
select *, replace(path, '/article/','') from top3art;
Related
Let's say I have a table like the one below
| Header 1 | Header 2 | Header 3
--------------------------------------------------------------------------------------
| id1 | detail1 | <a#test.com> , <b#test.com> , <c#test.com> , <d#test.com>
How do i explode it on SQL based on the substring emails inside the angle brackets such that it looks like the one below.
| Header 1 | Header 2 | Header 3. |
-------------------------------------------
| id1 | detail1 | a#test.com |
| id1 | detail1 | b#test.com |
| id1 | detail1 | c#test.com |
| id1 | detail1 | d#test.com |
Using regexp_extract_all and explode should do.
select `Header 1`, `Header 2`, explode(regexp_extract_all(`Header 3`, '<(.+?)>')) as `Header 3` from table
this should get you
+--------+--------+----------+
|Header 1|Header 2|Header 3 |
+--------+--------+----------+
|id1 |detail1 |a#test.com|
|id1 |detail1 |b#test.com|
|id1 |detail1 |c#test.com|
|id1 |detail1 |d#test.com|
+--------+--------+----------+
Be aware that regexp_extract_all was added to spark since version 3.1.0.
For spark blow 3.1.0
This can be done with split, somewhat a dirty hack. But the strategy and the results are the same.
select `Header 1`, `Header 2`, explode(array_remove(split(`Header 3`, '[<>,\\s]+'), '')) as `Header 3` from table
What this do is to regex match the delimiters and split the string into array. It also needs an array_remove function call to remove unneeded empty string.
Explanation
With regexp_extract_all, we use the pattern <(.+?)> to extract all strings within angle brackets, into an array like this
['a#test.com', 'b#test.com', 'c#test.com']
For the pattern (.+?)here
. matches 1 character;
+ is a quantifier of ., looking for 1 or unlimited matches;
? is a non greedy modifier, makes the match stop as soon as possible;
brackets makes the pattern with in angle brackets as a matching group, so we can extract from groups later;
Now with explode, we can separate elements of the array into multiple rows, hence the result above.
In Oracle database I have such table.
| TREE | ORG_NAME |
|---------------------------------|----------|
| \Google earth\Nest global\ATAP | ATAP |
| \Google earth\Nest\Beemoney\ | Beemoney |
| \Google\\\BeeKey\ | |
| | York |
I am trying to make sql query which would return such result.
| ORGANIZATION |
|-----------------------------------|
| Google earth > Nest global > ATAP |
| Google earth > Nest Beemoney |
| Google > BeeKey |
| York |
As you can see I want:
1) Replace \ symbol at the beginning and end of the sentence.
2) Replace \ symbol which is inside sentence to > symbol.
3) Replace \\\ symbol which is inside sentence to > symbol.
4) If TREE colomn is empty take record from ORG_NAME colomn.
Here is how I started. This SQL query solve 2, 3 and 4 part. How to solve problem with 1 part. I think I need to use REGEXP_REPLACE, right? How to make it correctly? Is there any other more elegant way to redisign sql query? As you can see I walk on the same table a few times.
SELECT
COALESCE (TREE, ORG_NAME) as ORGANIZATION
FROM (
SELECT
REPLACE(TREE, '\', '>') AS TREE,
ORG_NAME
FROM (
SELECT
REPLACE(TREE, '\\\', '>') AS TREE,
ORG_NAME
FROM
ORG
)
)
This could be a way with a regexp_replace and a trim to remove the characters from the beginning and the end of the string:
select nvl(regexp_replace( trim('\' from tree), '\\+', ' > '), org_name)
from yourTable
Here is a working solution which uses two calls to regexp_replace:
select
regexp_replace(
regexp_replace('\Google\\\BeeKey\', '^\\?(.*?)\\?$', '\1'), '\\+', ' > ')
from dual;
Google > BeeKey
Demo
The inner call to regexp_replace strips off any possible leading or trailing path separators. The outer call converts any number of internal path separators / to > separators as a replacement.
I have an field pattern and value in that field is INDI/17-18/6767/KER/787 .I want to get 6767 from this string
I used the query
select substr(pattern,12,15) from pattern_table
But the output I got is 6767/KER/787 instead of 6767.
Try this:
You have to give the length as the 3rd value, not the position.
SELECT SUBSTR(pattern,12,4) FROM pattern_table
For a generic result to get the 3rd value separated by a delimiter, you may use REGEXP_SUBSTR.
SQL Fiddle
Query 1:
SELECT pattern,REGEXP_SUBSTR(pattern, '[^/]+', 1, 3) id
FROM pattern_table
Results:
| PATTERN | ID |
|--------------------------|-------|
| INDI/17-18/6767/KER/787 | 6767 |
| INDI/17-18-19/67/KER/787 | 67 |
| INDI/16-18/67890/KAR/986 | 67890 |
even this will also work:
SELECT substr('INDI/17-18/6767/KER/787',instr('INDI/17-
18/6767/KER/787','/',1,2)+1,4) FROM dual;
I am trying to replace the first occurrence of '-' in a string in Hive table. I am using HiveQL. I searched this topic here and other websites, but could not find clear explanation how to use metacharacters with regexp_replace() to do that.
This is a string from which I need to replace first '-' with empty space: 16-001-02707
The result should be like this: 16001-02707
This is the method I used:
select regexp_replace ('16-001-02707','[^[:digit:]]', '');
However, this doesn't do anything.
select regexp_replace ('16-001-02707','^(.*?)-', '$1');
16001-02707
Following the OP question in the comments
with t as (select '111-22-333333-4-555-6-7-8888-999999' as col)
select regexp_replace (col,'^(.*?)-','$1')
,regexp_replace (col,'^(.*?-.*?)-','$1')
,regexp_replace (col,'^((.*?-){2}.*?)-','$1')
,regexp_replace (col,'^((.*?-){3}.*?)-','$1')
,regexp_replace (col,'^((.*?-){4}.*?)-','$1')
,regexp_replace (col,'^((.*?-){5}.*?)-','$1')
from t
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| _c0 | _c1 | _c2 | _c3 | _c4 | _c5 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
| 11122-333333-4-555-6-7-8888-999999 | 111-22333333-4-555-6-7-8888-999999 | 111-22-3333334-555-6-7-8888-999999 | 111-22-333333-4555-6-7-8888-999999 | 111-22-333333-4-5556-7-8888-999999 | 111-22-333333-4-555-67-8888-999999 |
+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+------------------------------------+
When performing SELECT statements including number columns (prices, for example), the result always is left to right ordered, which reduces the readability. Therefore I'm searching a method to format the output of number columns right to left.
I already tried to use something like
SELECT ... SPACE(15-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
which gives close results, but depending on font not really. An alternative would be to replace 'SPACE()' with 'REPLICATE('_',...)', but I don't really like the underscores in output.
Beside that this formula will crash on numbers with more digits than 15, therefore I searched for a way finding the maximum length of entries to make it more save like
SELECT ... SPACE(MAX(A.Nummer)-LEN(A.Nummer))+A.Nummer ...
FROM Artikel AS A ...
but this does not work due to the aggregate character of the MAX-function.
So, what's the best way to achieve the right-justified order for the number-columns?
Thanks,
Rainer
To get you problem with the list box solved have a look at this link: http://www.lebans.com/List_Combo.htm
I strongly believe that this type of adjustment should be made in the UI layer and not mixed in with data retrieval.
But to answer your original question i have created a SQL Fiddle:
MS SQL Server 2008 Schema Setup:
CREATE TABLE dbo.some_numbers(n INT);
Create some example data:
INSERT INTO dbo.some_numbers
SELECT CHECKSUM(NEWID())
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))X(x);
The following query is using the OVER() clause to specify that the MAX() is to be applied over all rows. The > and < that the result is wrapped in is just for illustration purposes and not required for the solution.
Query 1:
SELECT '>'+
SPACE(MAX(LEN(CAST(n AS VARCHAR(MAX))))OVER()-LEN(CAST(n AS VARCHAR(MAX))))+
CAST(n AS VARCHAR(MAX))+
'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| > 1620287540< |
| >-1451542215< |
| >-1257364471< |
| > -819471559< |
| >-1364318127< |
| >-1190313739< |
| > 1682890896< |
| >-1050938840< |
| > 484064148< |
This query does a straight case to show the difference:
Query 2:
SELECT '>'+CAST(n AS VARCHAR(MAX))+'<'
FROM dbo.some_numbers SN;
Results:
| COLUMN_0 |
|---------------|
| >-1486993739< |
| >1620287540< |
| >-1451542215< |
| >-1257364471< |
| >-819471559< |
| >-1364318127< |
| >-1190313739< |
| >1682890896< |
| >-1050938840< |
| >484064148< |
With this query you still need to change the display font to a monospaced font like COURIER NEW. Otherwise, as you have noticed, the result is still misaligned.