Find the frequency of all words from a concatenated column - sql

I have concatenated text column derived from three columns in a table. I need to have frequency of all single words from that concatenated column.
Column1 Column2 column3
This is Test 1
This was Test two
What I need is concatenation of all three i.e. This is Test 1, This was Test two and then count of each word ie.
This - 2
is - 1
was -1
Test - 2
1- 1
two - 1

You can use string_split and cross apply to achieve the required result. try the following:
Code:
declare #tab table (col1 varchar(100), col2 varchar(100), col3 varchar(100))
insert into #tab
select 'This is', 'Test', '1'
union
select 'This was','Test','two'
select value, count(*) rec_count
from #tab
cross apply string_split((col1+' '+col2+' '+col3), ' ')
group by value

Related

How to count task like completion from one column

I have a column that contains
"*********Task list completion *******
1.test
2.test
3.
4.
5.
How do I create a SQL query to count the completion? i.e. when there is text after the "number ." > 1.Test
The above would come back with count = 2
I have tried:
SELECT id as Task
,table
,table.id
from table
where table.id like '%Tier 1 Please Insure all tasks%'
and table.id LIKE '%3._%'
I solved it like this:
SELECT COUNT(*)
FROM (
SELECT Column FROM table WHERE Column LIKE '[0-9]%._%'
) AS a
In where condition I checked that column:
must start with a number
contain a dot
after dot at least one character
Check test output here on DB FIDDLE
Update
Because you did not provide a full question at start here is my updated answer:
Before running query I created a table with data for testing
CREATE TABLE tbl (field NVARCHAR(200))
INSERT INTO tbl VALUES ('Test DGS ********** Tier 1 Please Insure all tasks are fully completed ********* 1.test 2.test 3. 4. 5. 6. ')
CREATE TABLE #pom (Word NVARCHAR(50))
I wrote main query:
DECLARE #Strings NVARCHAR(200) = (SELECT REVERSE(LEFT(REVERSE(field), charindex('*', REVERSE(field)) - 2)) FROM tbl)
WHILE LEN(#Strings) > 0
BEGIN
INSERT INTO #pom
SELECT LEFT(#Strings, CHARINDEX(' ', #Strings) -1) AS 'Word'
SET #Strings = stuff(#Strings, 1, charindex(' ', #Strings + ' '), '')
END
SELECT COUNT(*)
FROM (
SELECT Word FROM #pom WHERE Word LIKE '[0-9]%._%'
) AS a
And you can see output here:
You can create a function or stored procedure with this code or whatever you need.
You can try this:
SELECT count(*)
FROM table
WHERE table.id LIKE '[1-9]%[a-z]'
If the data is consistently in this format, you can use wildcard expressions as below.
SELECT count(*)
FROM [table]
WHERE [column] LIKE '[1-9]%[a-z]'

Split a column with comma delimiter

I have a table with 3 columns with the data given below.
ID | Col1 | Col2 | Status
1 8007590006 8002240001,8002170828 I
2 8002170828 8002000004 I
3 8002000001 8002240001 I
4 8769879809 8002000001 I
5 8769879809 8002000001 I
Col2 can contain multiple comma delimited values. I need to update status to C if there is a value in col2 that is also present in col1.
For example, for ID = 1, col2 contains 8002170828 which is present in Col1, ID = 2. So, status = 'C'
From what I tried, I know it won't work where there are multiple values as I need to split that data and get individual values and then apply update.
UPDATE Table1
SET STATUS = 'C'
WHERE Col1 IN (SELECT Col2 FROM Table1)
If you are using SQL Server 2016 or later, then STRING_SPLIT comes in handy:
WITH cte AS (
SELECT ID, Col1, value AS Col2
FROM Table1
CROSS APPLY STRING_SPLIT(Col2, ',')
)
UPDATE t1
SET Status = 'C'
FROM Table1 t1
INNER JOIN cte t2
ON t1.Col1 = t2.Col2;
Demo
This answer is intended as a supplement to Tim's answer
As you don't have the native string split that came in 2016 we can make one:
CREATE FUNCTION dbo.STRING_SPLIT
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') as value
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
--credits to sqlserverperfomance.com for the majority of this code - https://sqlperformance.com/2012/07/t-sql-queries/split-strings
Now Tim's answer should work out for you, so I won't need to repeat it here
I chose an xml based approach because it performs well and your data seems sane and won't have any xml chars in it. If it ever will contain xml chars like > that will break the parsing they should be escaped then unescaped after split
If you aren't allowed to make functions you can extract everything between the RETURNS and the GO, insert it into Tim's query,tweak the variable names to be column names and it'll still work out

Get a specific string

It's my data and every ThroughRouteSid record has the same pattern.
six number and five comma. then I just want to get three and five
number into two record to template Table and get the same Count()
value to these two record.
For example: First record in the picture.
ThroughRouteSid(3730,2428,2428,3935,3935,3938,) Count(32).
I want a result like this:
2428 32 3935 32
I get What number I want.become two record and both have same Count value into template table
you can use XML to get your result, please refer below sample code -
create table #t1( ThroughRouteSid varchar(500) , Cnt int)
insert into #t1
select '3730,2428,2428,3935,3935,3938,' , len('3730,2428,2428,3935,3935,3938,')
union all select '1111,2222,3333,4444,5555,6666,' , len('1111,2222,3333,4444,5555,6666,')
select cast( '<xml><td>' + REPLACE( SUBSTRING(ThroughRouteSid ,1 , len(ThroughRouteSid)-1),',','</td><td>') + '</td></xml>' as xml) XmlData , Cnt
into #t2 from #t1
select XmlData.value('(xml/td)[3]' ,'int' ), Cnt ,XmlData.value('(xml/td)[5]' ,'int' ), Cnt
from #t2
First create the function referring How to Split a string by delimited char in SQL Server. Then try Querying the following
select (SELECT CONVERT(varchar,splitdata) + ' '+ Convert(varchar, [Count])+' ' FROM (select splitdata, ROW_NUMBER() over (ORDER BY (SELECT 100)) row_no
from [dbo].[fnSplitString](ThroughRouteSid,',')
where splitdata != '') as temp where row_no in (2,5)
for xml path('')) as col1 from [yourtable]
If you are using SQL Server 2016 you can do something like this:
create table #temp (ThroughRouteSid varchar(1024),[Count] int)
insert into #temp values
('3730,2428,2428,3935,3935,3938,',32),
('730,428,428,335,935,938,',28)
select
spt.value,
t.[Count]
from #temp t
cross apply (
select value from STRING_SPLIT(t.ThroughRouteSid,',') where LEN(value) > 0
)spt

Dynamic Comma Seperated string into different column

May someone please help me for this strange scenario. i have a data as given below.
DECLARE #TABLE TABLE
(
ID INT,
PHONE001 VARCHAR(500)
)
INSERT TEST
SELECT 1,'01323840261,01323844711' UNION ALL
SELECT 2,'' UNION ALL
SELECT 3,',01476862000' UNION ALL
SELECT 4,'01233625418,1223822583,125985' UNION ALL
SELECT 5,'2089840022,9.99021E+13'
and i am trying to put in seperate column for each comma value. the max number of column depends on the largest comma seperated string.
Expected Output
1|01323840261|01323844711|''
2|''|''|''
3|01476862000|''|''|
4|01233625418|1223822583|125985|
5|2089840022|9.99021E+13|''|
try
select id,T.c.value('t[1]','varchar(50)') as col1,
T.c.value('t[2]','varchar(50)') as col2 ,
T.c.value('t[3]','varchar(50)') as col3 from
(select id,cast ('<t>'+ replace(PHONE001,',','</t><t>') +'</t>'
as xml) x
from #TABLE) a cross apply x.nodes('.') t(c)

split string in column

I have data that has come over from a hierarchical database, and it often has columns that contain data that SHOULD be in another table, if the original database had been relational.
The column's data is formatted in pairs, with LABEL\VALUE with a space as the delimiter, like this:
LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE
There is seldom more than one pair in a record, but there as many as three. There are 24 different possible Labels. There are other columns in this table, including the ID. I have been able to convert this column into a sparse array without using a cursor, with columns for ID, LABEL1, LABEL2, etc....
But this is not ideal for using in another query. My other option it to use a cursor, loop through the entire table once and write to a temp table, but I can't see to get it to work the way I want. I have been able to do it in just a few minutes in VB.NET, using a couple of nested loops, but can't manage to do it in T-SQL even using cursors. Problem is, that I would have to remember to run this program every time before I want to use the table it creates. Not ideal.
So, I read a row, split out the pairs from 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE' into an array, then split them out again, then write the rows
ID, LABEL1, VALUE
ID, LABEL2, VALUE
ID, LABEL3, VALUE
etc...
I realize that 'splitting' the strings here is the hard part for SQL to do, but it just seems a lot more difficult that it needs to be. What am I missing?
Assuming that the data label contains no . characters, you can use a simple function for this:
CREATE FUNCTION [dbo].[SplitGriswold]
(
#List NVARCHAR(MAX),
#Delim1 NCHAR(1),
#Delim2 NCHAR(1)
)
RETURNS TABLE
AS
RETURN
(
SELECT
Val1 = PARSENAME(Value,2),
Val2 = PARSENAME(Value,1)
FROM
(
SELECT REPLACE(Value, #Delim2, '.') FROM
(
SELECT LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim1, #List + #Delim1, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim1 + #List, [Number], LEN(#Delim1)) = #Delim1
) AS y(Value)
) AS z(Value)
);
GO
Sample usage:
DECLARE #x TABLE(ID INT, string VARCHAR(255));
INSERT #x VALUES
(1, 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE'),
(2, 'LABEL1\VALUE2 LABEL2\VALUE2');
SELECT x.ID, t.val1, t.val2
FROM #x AS x CROSS APPLY
dbo.SplitGriswold(REPLACE(x.string, ' ', N'ŏ'), N'ŏ', '\') AS t;
(I used a Unicode character unlikely to appear in data above, only because a space can be problematic for things like length checks. If this character is likely to appear, choose a different one.)
Results:
ID val1 val2
-- -------- --------
1 LABEL1 VALUE
1 LABEL2 VALUE
1 LABEL3 VALUE
2 LABEL1 VALUE2
2 LABEL2 VALUE2
If your data might have ., then you can just make the query a little more complex, without changing the function, by adding yet another character to the mix that is unlikely or impossible to be in the data:
DECLARE #x TABLE(ID INT, string VARCHAR(255));
INSERT #x VALUES
(1, 'LABEL1\VALUE.A LABEL2\VALUE.B LABEL3\VALUE.C'),
(2, 'LABEL1\VALUE2.A LABEL2.1\VALUE2.B');
SELECT x.ID, val1 = REPLACE(t.val1, N'ű', '.'), val2 = REPLACE(t.val2, N'ű', '.')
FROM #x AS x CROSS APPLY
dbo.SplitGriswold(REPLACE(REPLACE(x.string, ' ', 'ŏ'), '.', N'ű'), 'ŏ', '\') AS t;
Results:
ID val1 val2
-- -------- --------
1 LABEL1 VALUE.A
1 LABEL2 VALUE.B
1 LABEL3 VALUE.C
2 LABEL1 VALUE2.A
2 LABEL2.1 VALUE2.B
With only three values, you can manage to do this by brute force:
select (case when rest like '% %'
then left(rest, charindex(' ', rest) - 1)
else rest
end) as val2,
(case when rest like '% %'
then substring(col, charindex(' ', col) + 1, 1000)
end) as val3
from (select (case when col like '% %'
then left(col, charindex(' ', col) - 1)
else col
end) as val1,
(case when col like '% %'
then substring(col, charindex(' ', col) + 1, 1000)
end) as rest
from t
) t
Using the SQL split string function given at referenced SQL tutorial, you can split the label-value pairs as following
SELECT
id, max(label) as label, max(value) as value
FROM (
SELECT
s.id,
label = case when t.id = 1 then t.val else NULL end,
value = case when t.id = 2 then t.val else NULL end
FROM dbo.Split(N'LABEL1\VALUE1 LABEL2\VALUE2 LABEL3\VALUE3', ' ') s
CROSS APPLY dbo.Split(s.val, '\') t
) t
group by id
You can see that the split string function is called twice, first for splitting pairs from others. Then the second split function joined to previous one using CROSS APPLY splits labels from pairs