Separate with different characters sql - sql

So I have a column which contains multiple different strings. If the string contains a _ it has to be split on that character. For the others I use would use a separate rule like: If it starts with 4FH, GWO, CTW and doesn't have an _ then it has to split after 3 characters. If it starts with 4 and doesn't have an _.. etc..
Example
|Source |
|EC_HKT |
|4FHHTK |
|ABC_GJE |
|4SHARED |
|ETK_ETK-40|
etc..
What i want as a result is
|Source|Instance|
|EC |HKT |
|4FH |HTK |
|ABC |GJE |
|4 |SHARED |
|ETK |40 |
As a start I first tried
SELECT
LEFT(lr.Source, CHARINDEX('_',lr.Source)) AS Source,
RIGHT(lr.Source, LEN(lr.Source) - CHARINDEX('_', lr.Source)) AS Interface,
But this would only work if all the results had a _ . Any tips or ideas? Would a CASE WHEN THEN work?

This requires a little creativity and no doubt more work than what I've done here, however this gives you at least one pattern to work with and enhance as required.
The following uses a simple function to apply basic rules from your sample data to derive the point to split your string, plus some additional removal of characters and removal of the source part if it also exists in the instance part.
If more than one "rule" matches, it uses the one with higher number of matching characters.
Function:
create or alter function splitpos(#string varchar(50))
returns table as
return
(
with map as (
select * from (values ('4FH'),('GWO'),('CTW'),('4'))m(v)
)
select IsNull(NullIf(CharIndex('_',#string),0)-1,Max(Len(m.v))) pos
from map m
where #string like m.v + '%'
)
Query:
select l.v source, Replace(Replace(Replace(Stuff(source,1,pos,''),'_',''),'-',''),l.v,'') instance
from t
cross apply dbo.splitpos(t.source)
cross apply (values(Left(source,pos)))l(v)
Demo DB<>Fiddle

To split with different rules, use a CASE expression. (W3Schools)
SELECT CASE
WHEN lr.Source LIKE '4FH%' AND CHARINDEX('_', lr.Source) = 0
THEN LEFT(lr.Source, 3)
...
END as Source
If theses are separate columns then you would need a case statement for each column.

Related

How to add delimiter to String after every n character using hive functions?

I have the hive table column value as below.
"112312452343"
I want to add a delimiter such as ":" (i.e., a colon) after every 2 characters.
I would like the output to be:
11:23:12:45:23:43
Is there any hive string manipulation function support available to achieve the above output?
For fixed length this will work fine:
select regexp_replace(str, "(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})","$1:$2:$3:$4:$5:$6")
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
Another solution which will work for dynamic length string. Split string by the empty string that has the last match (\\G) followed by two digits (\\d{2}) before it ((?<= )), concatenate array and remove delimiter at the end (:$):
select regexp_replace(concat_ws(':',split(str,'(?<=\\G\\d{2})')),':$','')
from
(select "112312452343" as str)s
Result:
11:23:12:45:23:43
If it can contain not only digits, use dot (.) instead of \\d:
regexp_replace(concat_ws(':',split(str,'(?<=\\G..)')),':$','')
This is actually quite simple if you're familiar with regex & lookahead.
Replace every 2 characters that are followed by another character, with themselves + ':'
select regexp_replace('112312452343','..(?=.)','$0:')
+-------------------+
| _c0 |
+-------------------+
| 11:23:12:45:23:43 |
+-------------------+

Get the last part of the value returned by split_part() function

I have a file_path string separated by forward slashes. I want to split them based on the forward slashes and return the file name.
INPUT
//a/b/c/xyz.png
OUTPUT
xyz.png
CURRENT SOLUTION
SELECT REVERSE(SPLIT_PART(REVERSE('//a/b/c/xyz.py'), '/', 1)) as "file_name";
Is there a more efficient way of doing this?
regexp_match() is more concise:
select (regexp_match('//a/b/c/xyz.py', '[^/]+$'))[1]
I would just use regexp_replace() to remove everything before the last slash (included):
select regexp_replace('//a/b/c/xyz.png', '.*/', '')
Demo on DB Fiddle:
| regexp_replace |
| :------------- |
| xyz.png |
You can also use substring(), which may or may not be more efficient:
substring('//a/b/c/xyz.png' from '[^/]*$')
PostgreSQL 14 will support negative index so it will be straightforward operation.
split_part
Splits string at occurrences of delimiter and returns the n'th field (counting from one), or when n is negative, returns the |n|'th-from-last field.
split_part('abc,def,ghi,jkl', ',', -2) → ghi
In this particular scenario:
SELECT SPLIT_PART('//a/b/c/xyz.py', '/', -1) as "file_name";

Format a number to NOT have commas (1,000,000 -> 1000000) in Google BigQuery

In Bigquery: How do we format a number that will be part of the result set that should be not having commas: like 1,000,000 to 1000000 ?
I am assuming that your data type is string here.
You can use the REGEXP_REPLACE function to remove certain symbols from strings.
SELECT REGEXP_REPLACE("1,000,000", r',', '') AS Output
Returns:
+-----+---------+
| Row | Output |
+-----+---------+
| 1 | 1000000 |
+-----+---------+
If your data contains strings with and without commas, this function will return the ones without as they are so you don't need to worry about filtering the input.
Documentation for this function can be found here.

How to use regex OR operation in impala regex_extract method and get different capture group

I have the following table1 with attribute co:
|-----------------------------------------
| co
|-----------------------------------------
| fsdsdf "This one" fdsfsd ghjhgj "sfdsf"
| Just This
|-----------------------------------------
In case there are quotation mark - I would like to get the first occurrence content. If there is no quotation mark I would like to return the content as is.
For the above example:
For the first line - This one
For the second line - Just This
I have SQL code in Impala that solves the first case:
select regexp_extract (co, '"([^"]*")',1) from table1
How can I generalize it to detect and return the required results for the next case?
You can not generalize it in impala. As far as the problem you are having it requires OR | implementation in your regex. With regex_extract you need to put capture group no. in the end . e.g.
select regexp_extract (co, '"([^"]*")',1) from table1
But with | operand in a regex, capture group will have to be different for both case. Which you can not define in your regex_extract method.
Say if (A)|(B) is your regex then for your first case capture group will be 1 and for your second case capture group will be 2 . But you can not put both 1 and 2 in your regex_extract syntax to date.
The Generic regex syntax would be (which i guess won't work in impala grouping):
^(?!.*")(.*)$|^[^"]*"(.*?)".*$
Watch out the capture groupings
In the link , you will see "This One" is captured as group 2
Where as Just this is captured as group 1
Check This using union.
select regexp_extract (co, '"([^"]*")',1) from table1
union
select co from table1 where co like '"%"'
You can use an if function and put RegEx functions inside for the arguments. So,
if(regexp_like(co,'"'),
regexp_extract(co,'"([^"]*)',1), co)

Query to search substring in column

I have a table that has a substring value in the column and I want to write a query that checks if input string has the substring.
My table looks like:
| company | host |
| ------- | ---------- |
| ebay | ebay.com |
| google | google.com |
| yahoo | yahoo.com |
My input will be like www.ebay.com or https://www.ebay.com or www.qa.ebay.com or www.dev.ebay.com..
If I get any of the inputs I want to return the first record.
I tried looking at the CHARINDEX, INSTR but they are work in reverse. My scenario is I have substring to be searched in table and the actual string as input.
Any help is appreciated.
You can use like for this, but you also need string concatenation. In ANSI standard SQL, this looks like:
select t.*
from t
where #inputstring like concat('%.', t.host)
where #inputstring is the string you are inputting.
Note: You can also use the concatenation infix operation, which is typically || (standard) or +.
You can use the SQL wildcard like so:
SELECT * FROM table WHERE host LIKE '%ebay.com';
Go for this:
SELECT * FROM table WHERE host LIKE '%SearchString%'
It will pull all rows containing the SearchString.
You can achieve this using like operator.
Select * from yourtable
where ? like concat('%', company, '%');
parameter ? with your input.