How to select 1st half part of pipe separated data - sql

Data in each record of a column named REQUEST_IP_ADDR is as below '10.247.32.44 | 10.247.32.44'. How do i select only 1st part that is 10.247.32.44 ?
--Below is the select query I am trying to run
SELECT DISTINCT MSG_TYPE_CD, SRC, SRC_IP from MESSAGE_LOG order by MSG_TYPE_CD;
--My table looks as below
MSG_TYPE_CD SRC SRC_IP
KB0192 ZOHO 10.247.32.44 | 10.247.32.44
KB0192 ZOHO 10.247.32.45 | 10.247.32.45
KB0192 ZOHO 127.0.0.1 | 10.240.20.137
KB0192 ZOHO 127.0.0.1 | 10.240.20.138
KB0196 GUPSHUP 10.240.20.59 | 10.10.1.19
I want select only 1st part of data which is before the pipe

Using the base string functions we can try:
SELECT
SRC_IP,
SUBSTR(SRC_IP, 1, INSTR(SRC_IP, '|') - 2) AS first_ip
FROM MESSAGE_LOG
ORDER BY
MSG_TYPE_CD;
Demo
The logic behind the first query is that we find the position of the pipe | using INSTR. Then, we take the substring from the first character until two characters before the pipe (to leave out both the pipe and the space that precedes it).
A very slick answer using REGEXP_SUBSTR:
SELECT
SRC_IP,
REGEXP_SUBSTR(SRC_IP, '^[^ |]+') AS first_ip
FROM MESSAGE_LOG
ORDER BY
MSG_TYPE_CD;
Demo
The regex pattern used here is:
^[^ |]+
This says to take any character from the start of the SRC_IP column which is not space or pipe |. This means take the first IP address.

Related

Separate with different characters sql

So I have a column which contains multiple different strings. If the string contains a _ it has to be split on that character. For the others I use would use a separate rule like: If it starts with 4FH, GWO, CTW and doesn't have an _ then it has to split after 3 characters. If it starts with 4 and doesn't have an _.. etc..
Example
|Source |
|EC_HKT |
|4FHHTK |
|ABC_GJE |
|4SHARED |
|ETK_ETK-40|
etc..
What i want as a result is
|Source|Instance|
|EC |HKT |
|4FH |HTK |
|ABC |GJE |
|4 |SHARED |
|ETK |40 |
As a start I first tried
SELECT
LEFT(lr.Source, CHARINDEX('_',lr.Source)) AS Source,
RIGHT(lr.Source, LEN(lr.Source) - CHARINDEX('_', lr.Source)) AS Interface,
But this would only work if all the results had a _ . Any tips or ideas? Would a CASE WHEN THEN work?
This requires a little creativity and no doubt more work than what I've done here, however this gives you at least one pattern to work with and enhance as required.
The following uses a simple function to apply basic rules from your sample data to derive the point to split your string, plus some additional removal of characters and removal of the source part if it also exists in the instance part.
If more than one "rule" matches, it uses the one with higher number of matching characters.
Function:
create or alter function splitpos(#string varchar(50))
returns table as
return
(
with map as (
select * from (values ('4FH'),('GWO'),('CTW'),('4'))m(v)
)
select IsNull(NullIf(CharIndex('_',#string),0)-1,Max(Len(m.v))) pos
from map m
where #string like m.v + '%'
)
Query:
select l.v source, Replace(Replace(Replace(Stuff(source,1,pos,''),'_',''),'-',''),l.v,'') instance
from t
cross apply dbo.splitpos(t.source)
cross apply (values(Left(source,pos)))l(v)
Demo DB<>Fiddle
To split with different rules, use a CASE expression. (W3Schools)
SELECT CASE
WHEN lr.Source LIKE '4FH%' AND CHARINDEX('_', lr.Source) = 0
THEN LEFT(lr.Source, 3)
...
END as Source
If theses are separate columns then you would need a case statement for each column.

Single hive query to remove certain text in data

I have a column data like this in 2 formats
1)"/abc/testapp/v1?FirstName=username&Lastname=test123"
2)"/abc/testapp/v1?FirstName=username"
I want to retrieve the output as "/abc/testapp/v1?FirstName=username" and strip out the data starting with "&Lastname" and ending with "".The idea is to remove the Lastname with its value.
But if the data doesn't contain "&Lastname" then it should also work fine as per the second scenario
The value for Lastname shown in the example is "test123" but in general this will be dynamic
I have started with regexp_replace but i am able to replace "&Lastname" but not its value.
select regexp_replace("/abc/testapp/v1?FirstName=username&Lastname=test123&type=en_US","&Lastname","");
Can someone please help here how i can achieve both these with a single hive query?
Use split function:
with your_data as (--Use your table instead of this example
select stack (2,
"/abc/testapp/v1?FirstName=username&Lastname=test123",
"/abc/testapp/v1?FirstName=username"
) as str
)
select split(str,'&')[0] from your_data;
Result:
_c0
/abc/testapp/v1?FirstName=username
/abc/testapp/v1?FirstName=username
Or use '&Lastname' pattern for split:
select split(str,'&Lastname')[0] from your_data;
It will allow something else with & except starting with &Lastname
for both queries with or without last name its working in this way using split for hive no need for any table to select you can directly execute the function like select functionname
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0]
select
split("/abc/testapp/v1FirstName=username",'&')[0]
Result :
_c0
/abc/testapp/v1FirstName=username
you can make a single query :
select
split("/abc/testapp/v1FirstName=username&Lastname=test123",'&')[0],
split("/abc/testapp/v1FirstName=username",'&')[0]
_c0 _c1
/abc/testapp/v1FirstName=username /abc/testapp/v1FirstName=username

Get the substring from a string in Apache drill using the position of a charecter

Please help me with a solution.
From the string '/This/is/apache/drill/queries' I want a sql query that runs on Apache Drill to fetch sub-string 'drill' that comes after the 4th occurrence of '/'
NOTE: the length of the string varies, hence position of the '/' also varies.
And the string starts with '/'
In drill the instr(string,'/',1,4) will not work. Hence I am not able to get the string that appears after the 4th occurrence of '/'.
Drill has split UDF which does the same thing as String.split(), so this query:
SELECT split(a, '/')[4] FROM (VALUES('/This/is/apache/drill/queries')) t(a);
Will return the desired result:
+---------+
| EXPR$0 |
+---------+
| drill |
+---------+
You can use REGEXP_REPLACE for that purpose:
SELECT REGEXP_REPLACE('/This/is/apache/drill/queries', '^\/.*?\/.*?\/.*?\/(.*?)\/.*$.','\1');
The above regular expression looks for fourth '/' and takes the content from there till fifth '/'.

How to use regex OR operation in impala regex_extract method and get different capture group

I have the following table1 with attribute co:
|-----------------------------------------
| co
|-----------------------------------------
| fsdsdf "This one" fdsfsd ghjhgj "sfdsf"
| Just This
|-----------------------------------------
In case there are quotation mark - I would like to get the first occurrence content. If there is no quotation mark I would like to return the content as is.
For the above example:
For the first line - This one
For the second line - Just This
I have SQL code in Impala that solves the first case:
select regexp_extract (co, '"([^"]*")',1) from table1
How can I generalize it to detect and return the required results for the next case?
You can not generalize it in impala. As far as the problem you are having it requires OR | implementation in your regex. With regex_extract you need to put capture group no. in the end . e.g.
select regexp_extract (co, '"([^"]*")',1) from table1
But with | operand in a regex, capture group will have to be different for both case. Which you can not define in your regex_extract method.
Say if (A)|(B) is your regex then for your first case capture group will be 1 and for your second case capture group will be 2 . But you can not put both 1 and 2 in your regex_extract syntax to date.
The Generic regex syntax would be (which i guess won't work in impala grouping):
^(?!.*")(.*)$|^[^"]*"(.*?)".*$
Watch out the capture groupings
In the link , you will see "This One" is captured as group 2
Where as Just this is captured as group 1
Check This using union.
select regexp_extract (co, '"([^"]*")',1) from table1
union
select co from table1 where co like '"%"'
You can use an if function and put RegEx functions inside for the arguments. So,
if(regexp_like(co,'"'),
regexp_extract(co,'"([^"]*)',1), co)

How to extract multiple dates from varchar2(4000) multiline string using sql?

I have two columns ID (NUMBER), DESCRIPTION (VARCHAR2(4000)) in original table
DESCRIPTION column has multi line strings.
I need to extract dates from each line of the string and also need to find earliest date. so the result would look like in expected result table.
Origional result:
Expected Table:
Using this query:
to_date((regexp_substr(A.Description , '\d{1,2}/\d{1,2}/\d{4}')), 'MM-DD-YYYY')
I was able to extract date from the first line
Discontinued:09/10/2015:Rappaport Family Institute for Research:;
only, but not from the other two.
OK, I think I found a solution similar to the other post, but simpler. FYI. regexp_substr() function only returns one match. Here is an example with a string with embedded line feeds (really does not matter, but added to show it will work in this case):
WITH A AS
(SELECT 'this is a test:12/01/2015 01/05/2018'
|| chr(13)
||chr(10)
|| ' this is the 2nd line: 07/07/2017' Description
FROM dual
)
SELECT to_date(regexp_substr(A.Description , '\d{1,2}/\d{1,2}/\d{4}',1,level),'MM/DD/YYYY')
FROM A
CONNECT BY level <= regexp_count(a.description, '\d{1,2}/\d{1,2}/\d{4}')
Output:
12/01/2015
01/05/2018
07/07/2017
If you are not familiar with hierarchical queries in oracle, "level" is a pseudo-column. By using that as the 3rd parameter (occurrence) in the regexp_substr function, each "level" will start the pattern match after the prior found substring. regexp_count will count the #times the pattern is matched, so we keep parsing the sting, moving over one occurrence until the max #of matches is reached.