Related
I have two tables as follows :
CREATE TABLE keyword_tbl
(
WORDS VARCHAR(100),
TOPIC VARCHAR(100)
);
INSERT INTO keyword_tbl
VALUES ('leaf', 'nature'), ('leaves', 'nature'),
('wind', 'nature'), ('knife', 'utensils'),
('knives', 'utensils'), ('calf', 'animal'),
('calves', 'animal')
CREATE TABLE content
(
CONTENT_ID VARCHAR(100),
DESCRIPTION VARCHAR(100)
);
INSERT INTO content
VALUES ('uuid1', 'leaves fall in autumn like leafs'),
('uuid2', 'the calf is playing in the leaf, the knife' ),
('uuid3', 'knives cutting the wind'),
('uuid4', 'he says hi'),
('uuid5', 'the calves running through the wind')
I want to be able to count the occurrences of each word per topic. My ideal output would look as follows.
content_id
description
nature
utensils
animal
uuid1
leaves fall in autumn like leafs
2
0
0
uuid2
the calf is playing in the leaf, the knife
1
1
1
uuid3
knives cutting the wind
1
1
0
uuid4
he says hi
0
0
0
uuid5
the calves running through the wind
1
0
1
Explanation :
For uuid1, we count leaves and leaf hence nature has a value of 2,
For uuid2, we count calf, leaf, knife hence nature, utensils and animal have a count of 1, etc...
Is there a way for this to be done autonomously?
Using STRTOK_SPLIT_TO_TABLE:
Tokenizes a string with the given set of delimiters and flattens the results into rows.
SELECT c.CONTENT_ID, c.DESCRIPTION
,COUNT_IF(k.TOPIC = 'nature') AS nature
,COUNT_IF(k.TOPIC = 'utensils') AS utensils
,COUNT_IF(k.TOPIC = 'animal') AS animals
FROM content c
,LATERAL STRTOK_SPLIT_TO_TABLE(c.description, '(),. ') s
JOIN keyword_tbl k
ON TRIM(s.value) = k.words
GROUP BY c.CONTENT_ID, c.DESCRIPTION
ORDER BY c.CONTENT_ID;
Output:
To handle "leaf", "leafs" the join condition needs to be altered:
-- substring
ON TRIM(s.value) ILIKE k.words|| '%'
-- only 's'
ON TRIM(s.value) ILIKE ANY (k.words, k.words|| 's')
Output:
Create Split Function like this
CREATE FUNCTION [dbo].[Split]
(
#String varchar(8000), #Delimiter char(1)
)
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
Use To get string count from your table
select CONTENT_ID,DESCRIPTION,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'nature') as animal,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'utensils') as nature,
(select COUNT(1) from keyword_tbl where WORDS in (select items from Split(DESCRIPTION,' ')) AND TOPIC = 'animal') as utensils from content
here is string split from ' ' your string in leafs and leaf are different that is not count.
Throwing a regex based solution out there.
with cte (nature, animal, utensils) as
(select listagg(iff(topic='nature', words,null),'\\b|\\b'),
listagg(iff(topic='animal', words,null),'\\b|\\b'),
listagg(iff(topic='utensils', words,null),'\\b|\\b')
from keyword_tbl)
select a.*,
regexp_count(a.description,nature) as nature,
regexp_count(a.description,utensils) as utensils,
regexp_count(a.description,animal) as animal
from content a
cross join cte;
Notes:
| works like an OR condition
\\b adds word boundaries, but feel free to modify as desired
Sample table
Record Number | Filter | Filters_Applied
----------------------------------------------
1 | yes | red, blue
2 | yes | green
3 | no |
4 | yes | red, red, blue
Is it possible to query all records where there are duplicate string values? For example, how could I query to pull record 4 where the string "red" appeared twice? Except in the table that I am dealing with, there are far more string values that can populate in the "filters_applied" column.
CLARIFICATION I am working out of Periscope and pulling data using SQL.
I assume that you have to check that in the logical page.
You can query the table with like '%red%'.
select Filters_Applied from table where Filters_Applied like '%red%';
You will get the data which has red at least one. Then, doing some string analysis in logic page.
In php, You can use the substr_count function to determine the number of occurrences of the string.
//the loop to load db query
while(){
$number= substr_count("Filters_Applied",red);
if($number>1){
echo "this".$Filters_Applied.">1"
}
}
for SQL-SERVER or other versions which can run these functions
Apply this logic
declare #val varchar(100) = 'yellow,blue,white,green'
DECLARE #find varchar(100) = 'green'
select #val = replace(#val,' ','') -- remove spaces
select #val;
select (len(#val)-len(replace(#val,#find,'')))/len(#find) [recurrence]
Create this Function which will parse string into rows and write query as given below. This will works for SQL Server.
CREATE FUNCTION [dbo].[StrParse]
(#delimiter CHAR(1),
#csv NTEXT)
RETURNS #tbl TABLE(Keys NVARCHAR(255))
AS
BEGIN
DECLARE #len INT
SET #len = Datalength(#csv)
IF NOT #len > 0
RETURN
DECLARE #l INT
DECLARE #m INT
SET #l = 0
SET #m = 0
DECLARE #s VARCHAR(255)
DECLARE #slen INT
WHILE #l <= #len
BEGIN
SET #l = #m + 1--current position
SET #m = Charindex(#delimiter,Substring(#csv,#l + 1,255))--next delimiter or 0
IF #m <> 0
SET #m = #m + #l
--insert #tbl(keys) values(#m)
SELECT #slen = CASE
WHEN #m = 0 THEN 255 --returns the remainder of the string
ELSE #m - #l
END --returns number of characters up to next delimiter
IF #slen > 0
BEGIN
SET #s = Substring(#csv,#l,#slen)
INSERT INTO #tbl
(Keys)
SELECT #s
END
SELECT #l = CASE
WHEN #m = 0 THEN #len + 1 --breaks the loop
ELSE #m + 1
END --sets current position to 1 after next delimiter
END
RETURN
END
GO
CREATE TABLE Table1# (RecordNumber int, [Filter] varchar(5), Filters_Applied varchar(100))
GO
INSERT INTO Table1# VALUES
(1,'yes','red, blue')
,(2,'yes','green')
,(3,'no ','')
,(4,'yes','red, red, blue')
GO
--This query will return what you are expecting
SELECT t.RecordNumber,[Filter],Filters_Applied,ltrim(rtrim(keys)), count(*)NumberOfRows
FROM Table1# t
CROSS APPLY dbo.StrParse (',', t.Filters_Applied)
GROUP BY t.RecordNumber,[Filter],Filters_Applied,ltrim(rtrim(keys)) HAVING count(*) >1
You didn't state your DBMS, but in Postgres this isn't that complicated:
select st.*
from sample_table st
join lateral (
select count(*) <> count(distinct trim(item)) as has_duplicates
from unnest(string_to_array(filters_applied,',')) as t(item)
) x on true
where x.has_duplicates;
Online example: http://rextester.com/TJUGJ44586
With the exception of string_to_array() the above is actually standard SQL
I hava an ID column in my database, and it shows the results as follows
1121
1232
1233
and i want to get an extra column where i can have their sums as follows
5
8
9
can anyone help me which sql function should i use to break a string into characters and add them?
Assuming a number that is always 4 digits long, you can simply do this:
select (id/1000)+((id%1000)/100)+((id%100)/10)+(id%10)
If the ID field is varchar, just cast it to an int before division. Of course, if the result of this has more than 1 digit, you will not be able to get the sum of its digits again.
You could use this table-valued function:
CREATE FUNCTION [dbo].[Chars]
(
#Text NVARCHAR(MAX)
)
RETURNS #ItemTable TABLE (Item VARCHAR(250))
AS
BEGIN
DECLARE #i INT
DECLARE #Item NVARCHAR(4000)
SET #i = 1
WHILE (#i <= LEN(#Text))
BEGIN
INSERT INTO #ItemTable(Item)
VALUES(SUBSTRING(#Text, #i, 1))
SET #i = #i + 1
END
RETURN
END
Now this query should work as desired:
SELECT t.ID, SUM(CAST(Split.Item AS INT)) AS SumID
FROM dbo.TableName t
CROSS APPLY dbo.Chars(CONVERT(varchar(10), t.ID))Split
GROUP BY t.ID
Here's a demo: http://sqlfiddle.com/#!3/8eea7/8/0
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
T-SQL: Opposite to string concatenation - how to split string into multiple records
Splitting variable length delimited string across multiple rows (SQL)
I have a database table that contains column data like this:
Data (field name)
1111,44,666,77
22,55,76,54
32,31,56
I realise this is a very poor design because it is not normalised (I didn't design it - I inherited it). Is there a query that will return the data like this:
1111
44
666
77
22
55
76
54
32
31
56
I am use to using CHARINDEX and SUBSTRING, but I cannot think of a way of doing this as the number of elements in each cell (delimited by a comma) is unknown.
You can use CTE to split the data:
;with cte (DataItem, Data) as
(
select cast(left(Data, charindex(',',Data+',')-1) as varchar(50)) DataItem,
stuff(Data, 1, charindex(',',Data+','), '') Data
from yourtable
union all
select cast(left(Data, charindex(',',Data+',')-1) as varchar(50)) DataItem,
stuff(Data, 1, charindex(',',Data+','), '') Data
from cte
where Data > ''
)
select DataItem
from cte
See SQL Fiddle with Demo
Result:
| DATAITEM |
------------
| 1111 |
| 22 |
| 32 |
| 31 |
| 56 |
| 55 |
| 76 |
| 54 |
| 44 |
| 666 |
| 77 |
Or you can create a split function:
create FUNCTION [dbo].[Split](#String varchar(MAX), #Delimiter char(1))
returns #temptable TABLE (items varchar(MAX))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end;
Which you can use when you query and this will produce the same result:
select s.items declaration
from yourtable t1
outer apply dbo.split(t1.data, ',') s
I created a table called [dbo].[stack] and filled it with the data you provided and this script produced what you needed. There may be a more efficient way of doing this but this works exactly how you requested.
BEGIN
DECLARE #tmp TABLE (data VARCHAR(20))
DECLARE #tmp2 TABLE (data VARCHAR(20))
--Insert all fields from your table
INSERT INTO #tmp (data)
SELECT [data]
FROM [dbo].[stack] -- your table name here
--Loop through all the records in temp table
WHILE EXISTS (SELECT 1
FROM #tmp)
BEGIN
DECLARE #data VARCHAR(100) --Variable to chop up
DECLARE #data1 VARCHAR(100) -- Untouched variable to delete from tmp table
SET #data = (SELECT TOP 1 [data]
FROM #tmp)
SET #data1 = (SELECT TOP 1 [data]
FROM #tmp)
--Loop through variable to get individual value
WHILE PATINDEX('%,%',#data) > 0
BEGIN
INSERT INTO #tmp2
SELECT SUBSTRING(#data,1,PATINDEX('%,%',#data)-1);
SET #data = SUBSTRING(#data,PATINDEX('%,%',#data)+1,LEN(#data))
IF PATINDEX('%,%',#data) = 0
INSERT INTO #tmp2
SELECT #data
END
DELETE FROM #tmp
WHERE [data] = #data1
END
SELECT * FROM #tmp2
END
Not talking about performance, you can concatenate the data in a single column and then split it.
Concatenate data: http://sqlfiddle.com/#!6/487a4/3
Split it: T-SQL: Opposite to string concatenation - how to split string into multiple records
Take a look at this article referenced in a similar question:
http://www.codeproject.com/Articles/7938/SQL-User-Defined-Function-to-Parse-a-Delimited-Str
If you create the function that they have in that article, you can call it using:
select * from dbo.fn_ParseText2Table('100|120|130.56|Yes|Cobalt Blue','|')
SELECT REPLACE(field_name, ',', ' ') from table
EDIT: Never mind this answer as you changed your question.
In my table, I have a varchar column whereby multi-values are stored. An example of my table:
RecNum | Title | Category
-----------------------------------------
wja-2012-000001 | abcdef | 4,6
wja-2012-000002 | qwerty | 1,3,7
wja-2012-000003 | asdffg |
wja-2012-000004 | zxcvbb | 2,7
wja-2012-000005 | ploiuh | 3,4,12
The values in the Category column points to another table.
How can I return the relevant rows if I want to retrieve the rows with value 1,3,5,6,8 in the Category column?
When I tried using IN, I get the 'Conversion failed when converting the varchar value '1,3,5,6,8' to data type int' error.
Breaking the Categories out into a separate table would be a better design if that's a change you can make... otherwise, you could create a function to split the values into a table of integers like this:
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (id int)
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(id) values(convert(int, #slice))
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
Then call it from your query:
SELECT ...
FROM ...
WHERE #SomeID IN (SELECT id FROM dbo.Split(Category, ','))
Or if you're looking to provide a list of categories as an input parameter (such as '1,3,5,6,8'), and return all records in your table that contain at least one of these values, you could use a query like this:
SELECT ...
FROM ...
WHERE
EXISTS (
select 1
from dbo.Split(Category, ',') s1
join dbo.Split(#SearchValues, ',') s2 ON s1.id = s2.id
)
you can do like this
declare #var varchar(30); set #var='2,3';
exec('select * from category where Category_Id in ('+#var+')')
Try this solution:
CREATE TABLE test4(RecNum varchar(20),Title varchar(10),Category varchar(15))
INSERT INTO test4
VALUES('wja-2012-000001','abcdef','4,6'),
('wja-2012-000002','qwerty','1,3,7'),
('wja-2012-000003','asdffg',null),
('wja-2012-000004','zxcvbb','2,7'),
('wja-2012-000005','ploiuh','3,4,12')
select * from test4
Declare #str varchar(25) = '1,3,5,6,8'
;WITH CTE as (select RecNum,Title,Category from test4)
,CTE1 as (
select RecNum,Title,RIGHT(#str,LEN(#str)-CHARINDEX(',',#str,1)) as rem from CTE where category like '%'+LEFT(#str,1)+'%'
union all
select c.RecNum,c.Title,RIGHT(c1.rem,LEN(c1.rem)-CHARINDEX(',',c1.rem,1)) as rem from CTE1 c1 inner join CTE c
on c.category like '%'+LEFT(c1.rem,1)+'%' and CHARINDEX(',',c1.rem,1)>0
)
select RecNum,Title from CTE1
As mentioned by others, your table design violates basic database design principles and if there is no way around it, you could normalize the table with little code (example below) and then join away with the other table. Here you go:
Data:
CREATE TABLE data(RecNum varchar(20),Title varchar(10),Category varchar(15))
INSERT INTO data
VALUES('wja-2012-000001','abcdef','4,6'),
('wja-2012-000002','qwerty','1,3,7'),
('wja-2012-000003','asdffg',null),
('wja-2012-000004','zxcvbb','2,7'),
('wja-2012-000005','ploiuh','3,4,12')
This function takes a comma separated string and returns a table:
CREATE FUNCTION listToTable (#list nvarchar(MAX))
RETURNS #tbl TABLE (number int NOT NULL) AS
BEGIN
DECLARE #pos int,
#nextpos int,
#valuelen int
SELECT #pos = 0, #nextpos = 1
WHILE #nextpos > 0
BEGIN
SELECT #nextpos = charindex(',', #list, #pos + 1)
SELECT #valuelen = CASE WHEN #nextpos > 0
THEN #nextpos
ELSE len(#list) + 1
END - #pos - 1
INSERT #tbl (number)
VALUES (convert(int, substring(#list, #pos + 1, #valuelen)))
SELECT #pos = #nextpos
END
RETURN
END
Then, you can do something like this to "normalize" the table:
SELECT *
FROM data m
CROSS APPLY listToTable(m.Category) AS t
where Category is not null
And then use the result of the above query to join with the "other" table. For example (i did not test this query):
select * from otherTable a
join listToTable('1,3,5,6,8') b
on a.Category = b.number
join(
SELECT *
FROM data m
CROSS APPLY listToTable(m.Category) AS t
where Category is not null
) c
on a.category = c.number