How to split a string having values "." seperated in SQL - sql

For Eg; abc.def.efg , separate into independent strings as abc def efg
Head
abc.def.efg
to
left
center
right
abc
def
efg

On SQL Server with a 3-part delimited string you can use parsename
with t as (
select 'left.centre.right' Head
)
select ParseName(Head,3) L, ParseName(Head,2) C, ParseName(Head,1) R
from t;

on MySQL, you can do:
with t as (
select 'left.centre.right' Head
)
select
substring_index(Head,'.',1) as L,
substring_index(substring_index(Head,'.',2),'.',-1) as M,
substring_index(Head,'.',-1) as R
from t;
results:
L
M
R
left
centre
right
see: DBFIDDLE, and DOCS

Look into the split_part() equivalent for the RDBMS you're using.
E.g.
SELECT
split_part(Head, '.', 1) AS "left",
split_part(Head, '.', 2) AS center,
split_part(Head, '.', 3) AS "right"
FROM your_table
EDIT: corrected the indexes, see: https://www.postgresqltutorial.com/postgresql-split_part/

Related

SQL: Divide long text in multiple rows

I would like to divide a long text in multiple rows; there are other questions similar to this one but none of them worked for me.
What I have
ID | Message
----------------------------------
1 | Very looooooooooooooooong text
2 | Short text
What I would like to do is divide that string every n characters
Result if n = 15:
Id | Message
------------------------------------------
1 | Very looooooooo
1 | oooooooong text
2 | Short text
Even better if the split is done at the first space after n character.
I tried with string_split and substring but I cannot find anything that works.
I thought to use something similar to this:
SELECT index, element FROM table, CAST(message AS SUPER) AS element AT index;
But it doesn't take into account the length and I don't like casting a varchar variable into a super.
You can use generate_series() to accomplish this:
select m.*, gs.posn, substring(m.message, gs.posn, 15) as split_message
from messages m
cross join lateral generate_series(1, length(message), 15) gs(posn);
Splitting on spaces after the length is a little trickier. We would have to split the message into words and then figure out how to break them into groups and then reaggregate.
I could not figure out how to split on spaces without recursion. I hope you don't mind that it treats all whitespace as word boundaries:
with recursive by_words as (
select m.*, s.n, s.word, length(s.word) as word_len,
max(s.n) over (partition by m.id) as num_words
from messages m
cross join lateral regexp_split_to_table(m.message, '\s+')
with ordinality as s(word, n)
), rejoin as (
select id, n, array[word] as words, word_len as cum_word_len,
word_len >= 15 as keep
from by_words
where n = 1
union all
select p.id, c.n,
case
when p.cum_word_len >= 15 then array[c.word]
else p.words||c.word
end as words,
case
when p.cum_word_len >= 15 then c.word_len
else p.cum_word_len + c.word_len + 1
end as cum_word_len,
(p.cum_word_len + c.word_len + 1 >= 15)
or (c.n = c.num_words) as keep
from rejoin p
join by_words c on (c.id, c.n) = (p.id, p.n + 1)
)
select id,
row_number() over (partition by id
order by n) as segnum,
array_to_string(words, ' ') as split_message
from rejoin
where keep
order by 1, 2
;
db<>fiddle here
Edit to add:
Can you please tell me whether the below works in Redshift?
with gs as (
select generate_series as posn
from generate_series(1, 150000, 15)
)
select *, substring(m.message, gs.posn, 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn
;
Thanks to #Mike Organek 's answer and his help I found a solution that works with Redshift too.
Problem in Mike's answer for Redshift is related to generate_series that is not well supported in Redshift, so here's a workaround.
with row as (
select t.*, row_number() over () as x
from table t -- big enough table
limit 100
),
result as
(
select (x-1)*15+1 as posn from row --change 15 to a number to split the long text with
)
select * into gs
from result
And then Mike's answer:
select *, substring(m.feedback from gs.posn for 15) as split_message
from messages m
join gs
on gs.posn <= greatest(1, length(m.message))
order by m.id, gs.posn

SQL Count each occurence of words separated by comma

I have a column in a table with words separated by comma. I need to count each occurence of each word
My column looks like : ('a, b, c'), ('a, b, d'), ('b, c, d'), ('a'), ('a, c');
(fiddle at the bottom)
Here is what I get :
MyCol Count
-----------------
a 1
a, b, c 3
a, b, d 3
a, c 2
b, c, d 3
But here is what I expect
MyCol Count
-------------
a 4
b 3
c 3
d 2
Here is what I've done so far :
select MyCol, COUNT(*)
from Test
cross apply string_split(MyCol, ',')
group by MyCol
Fiddle : http://sqlfiddle.com/#!18/4e52e/3
Please note the words are separated by a comma AND a space
You are using the wrong column. Simply use the [value] column (returned from the STRING_SPLIT() call) and remove the space characters (using TRIM() for SQL Server 2017+ or LTRIM() and RTRIM() for earlier versions):
SELECT TRIM(s.[value]) AS [value], COUNT(*) AS [count]
FROM Test t
CROSS APPLY STRING_SPLIT(t.MyCol, ',') s
GROUP BY TRIM(s.[value])
ORDER BY TRIM(s.[value])
select value,count(*)cntt
from Test
cross apply string_split(MyCol,',')
group by value
order by value;
remove white space by using REPLACE and then use Subquery
select MyCol,count(MyCol) as Count from(
select
REPLACE (value, ' ', '' ) as MyCol
--TRIM(value) as MyCol -- both TRIM and REPLACE are equivalent
----comment one of them
from test
cross apply string_split(MyCol, ',')) b
group by MyCol
fiddle
Simply :
select MyCol, 1 + LEN(MyCol) - LEN(REPLACE(MyCol, ',', '')) AS NUM
from Test

Print each character of a word line by line using a SQL query

I am giving input word is "hello"
My output will be print as mentioned below using SQL query not pl/sql.
h
e
l
l
o
You can use SUBSTR() function within a hierarchical query such as
SELECT SUBSTR(col,level,1) AS "letters"
FROM t
CONNECT BY level <= LENGTH(col)
presuming your DB is Oracle from the keyword PL/SQL
Use like this
select substring(a.b, v.number+1, 1)
from (select 'hellow' b) a
join master..spt_values v on v.number < len(a.b)
where v.type = 'P'
You didn't mention your DBMS, but in Postgres you can use:
select *
from unnest(string_to_array('hello', null));
Here is another way to do this in oracle
select 'hello'
,substr('hello',rownum,1) as vertical_str
from dual a
join all_tables b
on 1=1
where rownum<=length('hello')
h
e
l
l
o
Another option is a recursive CTE.
WITH
word
(word)
AS
(
SELECT 'hello' word
FROM dual
),
letter
(letter,
remainder)
AS
(
SELECT substr(word, 1, 1) letter,
substr(word, 2) remainder
FROM word
UNION ALL
SELECT substr(remainder, 1, 1) letter,
substr(remainder, 2) remainder
FROM letter
WHERE remainder IS NOT NULL
)
SELECT letter
FROM letter;
db<>fiddle
I am assuming Oracle here as you mentioned PL/SQL. But for other DBMS this would be pretty similar -- the substr() function might be called substring(), instead of a NULL you'd have to check for remainder to be the empty string and the RECURESIVE keyword might be needed for the second CTE.

I want to extract a string after the specific character is the character exists in sql

I have column which have data like
ss
ss_period (varchar)
s1
01/01/2020-31/01/2020
s2
01/08/2019-31/08/2019
s3
ABC 1/4/2020-30/4/2020
s4
DEF GHI 1/4/2020-30/4/2020
s5
ABCDEFGHIJKLMNOP
how can i get the result like this
ss
ss_period
ss_period
s1
01/01/2020
31/01/2020
s2
01/08/2019
31/08/2019
s3
1/4/2020
30/4/2020
s4
1/4/2020
30/4/2020
s5
null
null
I try to use strpos and split_part but it doesn't work for me now.
I think i should remove text except "-" and "/" first.
but I don't know how to do that.
You could try using regex matching and replacements to do the heavy lifting. First check that an ss_period input value have two dates in the hyphen-separated format you expect. If so, then use regex replacements with capture groups to isolate each date. If not, then just assign a default value of NULL.
SELECT
ss,
CASE WHEN ss_period ~ '.*\d+/\d+/\d+-\d+/\d+/\d+.*'
THEN REGEXP_REPLACE(ss_period, '^.*(\d+/\d+/\d+)-.*$', '\1')
END AS ss_period_start,
CASE WHEN ss_period ~ '.*\d+/\d+/\d+-\d+/\d+/\d+.*'
THEN REGEXP_REPLACE(ss_period, '^.*-(\d+/\d+/\d+).*$', '\1')
END AS ss_period_end
FROM yourTable
ORDER BY ss;
Demo
You can still use STRPOS() and SPLIT_PART() functions by means of applying CROSS and LEFT JOINs recursively such as
SELECT t.ss,
SPLIT_PART(str, '-', 1) AS ss_period1, SPLIT_PART(str, '-', 2) AS ss_period2
FROM t
LEFT JOIN
(
SELECT ss, SPLIT_PART(ss_period, ' ', n) AS str
FROM t
CROSS JOIN generate_series(1, LENGTH(REGEXP_REPLACE(ss_period, '[^ ]', '', 'g'))+1) AS n
WHERE STRPOS(SPLIT_PART(ss_period, ' ', n),'-') > 0
) AS tt
ON tt.ss = t.ss
Demo
I view this as a two-part problem. First, find the string that has the range. Then split it into two pieces.
For the first part, regexp_match() seems to do the trick. For the second, split_part():
select t.*, split_part(v.rng, '-', 1), split_part(v.rng, '-', 2)
from t cross join lateral
(values ((regexp_match(ss_period, '[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}-[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}'))[1])
) v(rng);
Here is a db<>fiddle.
To go in the same direction of Tim's answer, you can use the regexp_match() function :
SELECT ss
, (regexp_match(ss_period, '\d{1,2}/\d{1,2}/\d{4}'))[1] AS ss_period_start
, right((regexp_match(ss_period, '-\d{1,2}/\d{1,2}/\d{4}'))[1], -1) AS ss_period_end
FROM your_table ;

T-SQL function to split string with two delimiters as column separators into table

I'm looking for a t-sql function to get a string like:
a:b,c:d,e:f
and convert it to a table like
ID Value
a b
c d
e f
Anything I found in Internet incorporated single column parsing (e.g. XMLSplit function variations) but none of them letting me describe my string with two delimiters, one for column separation & the other for row separation.
Can you please guiding me regarding the issue? I have a very limited t-sql knowledge and cannot fork those read-made functions to get two column solution?
You can find a split() function on the web. Then, you can do string logic:
select left(val, charindex(':', val)) as col1,
substring(val, charindex(':', val) + 1, len(val)) as col2
from dbo.split(#str, ';') s(val);
You can use a custom SQL Split function in order to separate data-value columns
Here is a sql split function that you can use on a development system
It returns an ID value that can be helpful to keep id and value together
You need to split twice, first using "," then a second split using ";" character
declare #str nvarchar(100) = 'a:b,c:d,e:f'
select
id = max(id),
value = max(value)
from (
select
rowid,
id = case when id = 1 then val else null end,
value = case when id = 2 then val else null end
from (
select
s.id rowid, t.id, t.val
from (
select * from dbo.Split(#str, ',')
) s
cross apply dbo.Split(s.val, ':') t
) k
) m group by rowid