Search for a word in the column string and list those words - sql

Have two tables, table 1 with columns W_ID and word. Table 2 with column N_ID and note. Have to list all the NID where words found in table 1 word column contains in Note column (easy part) and also list those words in another column without duplicating the N_ID. Which means using STUFF to concatenate all the words found in Note column for that particular N_ID. I tried using
FULL TEXT INDEX using CONTAIN
But it only allows to search for one word at a time. Any suggestions how I can use a while loop to achieve this.

If there is a maximum number of words you want displayed for N_ID, you can pivot this. You could have them in a single column by concatenating them, but I would recommend against that. Here is a pivot that supports up to 4 words per N_ID. You can adjust it as needed. You can view the SQL Fiddle for this here.
SELECT
n_id,
[1] AS word_1,
[2] AS word_2,
[3] AS word_3,
[4] AS word_4
FROM (
SELECT
n_id,
word,
ROW_NUMBER() OVER (PARTITION BY n_id ORDER BY word) AS rn
FROM tbl2
JOIN tbl1 ON
tbl2.note LIKE '%'+tbl1.word+'[ ,.?!]%'
) AS source_table
PIVOT (
MAX(word)
FOR rn IN ([1],[2],[3],[4])
) AS pivot_table
*updated the join to prevent look for a space or punctuation to declare the end of a word.

You can join your tables together based on a postive result from the charindex function.
In SQL 2017 you can run:
SELECT n_id, string_agg(word)
FROM words
inner join notes on 0 < charindex(words.word, notes.note);
Prior to SQL 2017, there is no string_agg so you'll need to use stuff, which is trickier:
select
stuff((
SELECT ', ' + word
FROM words
where 0 < charindex(words.word, notes.note)
FOR XML PATH('')
), 1, 2, '')
from notes;
I used the following schema:
CREATE table WORDS
(W_ID int identity primary key
,word varchar(100)
);
CREATE table notes
(N_ID int identity primary key
,note varchar(1000)
);
insert into words (word) values
('No'),('Nope'),('Nah');
insert into notes (note) values
('I am not going to do this. Nah!!!')
,('It is OK.');

Related

SQL to update rows to remove words with less than N characters

I have a TAGS column in my Products table in SQL Server. In my web project I split them with space, for example:
"web web_design website website_design"
=> 1.web 2. web_design 3. website ......
How I can remove words with less than N characters from tags? Is it possible with a regex?
For example if N=4 so "web" will be removed from my example and the rest remains.
I will give a solution to do this without changing your design at the bottom of this answer,
but I really think you should fix the design, here is an example on how to do that.
It starts from a table called "mytable" that has your "tag" column with all the data,
then it creates a detail table and populates it with the splitted values from your tag column
and then it is very easy to do what you want to do
create table tags (id int identity primary key, mytable_id int, tag varchar(100))
insert into tags (mytable_id, tag)
select t.id,
value
from mytable t
cross apply string_split(t.tag, ' ')
alter table mytable drop column tag
See a complete example in this dbfiddle
EDIT
if you need to show it again as if it where in one table, you can use string_agg like this
select m.id,
m.name,
( select string_agg(t.tag, ' ')
from tags t
where t.mytable_id = m.id
) as tags
from mytable m
You can see this at work in this dbfiddle
EDIT 2
And if you really want to stick to your design, here is how you can remove the words from your tag column
But I recommend not doing this, as you can see in the examples above it is not so hard to fix the design and create a new table to hold the tags.
update m
set m.tag =
( select string_agg(value, ' ')
from mytable t
cross apply string_split(m.tag, ' ')
where len(value) > 3
and t.id = m.id
)
from mytable m
Look at this dbfiddle to see it in action

SQL Server concat/join different words in a string to get all substrings

I want to join words in a sentence in SQL server. I need it to loop through using the next word as the sgtarting word to concat each time. For example the string/sentence could be 'This is a sentence'.
I need the possible outcomes to be:
This
Thisis
Thisisa
Thisisasentence
is
isa
isasentence
a
asentence
sentence
I know how to concat but I'm not too sure how I would go about doing each word with differet first word each time.
Thanks in advance!
EDIT:
More information as requested.
I have a table Account(IS(PK), Name, Country)
I have another table accountSubstrings(SubID(pk), AccountID, Substring)
I need to break the 'Name' column into the above example 'This is a sentence' so that each substring has its own row entry in accountSubstrings. The subID is unique to each row and the AccountID will map to whichever 'Name' the substring came from. This is being done for matching purposes.
Thanks
You can do this with a recursive CTE. Basically, you want adjacent combinations. I prefer to put spaces between the words, so they are visible. The following also assumes that the words are unique:
declare #s varchar(max) = 'This is a sentence';
with words as (
select s.value as word, row_number() over (order by charindex(s.value, #s)) as seqnum
from string_split(#s, ' ') s
),
cte as (
select seqnum, word as combined, format(seqnum, '000') as seqnums
from words
union all
select w.seqnum, concat(cte.combined, ' ', w.word), concat(seqnums, ':', format(w.seqnum, '000'))
from cte join
words w
on w.seqnum = cte.seqnum + 1
)
select *
from cte
order by seqnums;
Here is a db<>fiddle.
The actually tricky part here is keeping the words in order. That is what the row_number() is doing, capturing the order -- and where the uniqueness restriction comes from. Of course, this could be replaced by a recursive CTE, and that would be fine (and allow duplicate words).
Looking to the updates you made, you can do as
SELECT *
FROM (VALUES('This is a sentence')) T(Name)
JOIN (VALUES('This'), ('is'), ('a'), ('sentence')) TT(SubString)
ON CONCAT(' ', T.Name, ' ') LIKE CONCAT(' %', TT.SubString, ' %');
Here the table T is your table Account, and the table TT is your table accountSubstrings

How to check if ONLY a combination of values exist in a column?

I have a list of values and i want to get only the rows where the column has ONLY a combination of values:
Ex:
>CREATE TABLE User_Group(
[id] [int] NOT NULL,
[names] [varchar](255) NOT NULL,
) ON [PRIMARY]
Sample content User_Group:
1:" Joe,Jane"
2:"Jane, James,Frank"
3: "Jane, Joe,James"
I am being passed in a list of names and I want to check if the combination of names exist in the User_group table and return the rows. I want the rows ONLY if it contains the EXACT combination.
So for example, if i am given James,Jane and Joe, i want to check 2^3-1 times if James,Jane, Joe,James&Jane,James&Joe, Jane&joe, James&Jane&Joe exist in the table. And from this scenario i should only get rows 1 and 3. Row 2 is skipped because it has Frank.
I know i can do exist but not sure how to check for only that particular combination.
I am also not sure how to "loop" through all the combinations - i thought about using Java to make 2^x-1 calls with different combinations (given the scenario it is highly unlikely the combination will be >15).
I also read about "Select All" but not sure if that helps with only distinct combination either.
How can i elegantly achieve this?
This is a version that uses CTEs to do what you want.
I create a table #list that contains the lists you gave, and anoter table #search that contain each of the search terms.
declare #list table(k int, l varchar(100))
insert #list values (1,' Joe,Jane')
,(2,'Jane, James,Frank')
,(3,'Jane, Joe,James')
declare #search table(sk int,s varchar(20))
insert #search values (1,'jane'),(2,'joe'),(3,'james')
-- Top level CTE to remove spaces, and each term is surrounded by its own commas.
;with cte as (
select k,','+replace(replace(l,',',',,'),' ','')+',' l from #list
)
-- Go through the search terms recursively and remove the search terms, with their surrounding commas
,cte2 as(
select cte.k, replace(cte.l,','+s+',','') as l, s.sk, s.s
from cte
join #search s on sk=1
union all
select cte2.k, replace(cte2.l,','+s.s+',','') as l, s.sk ,s.s
from cte2
join #search s on s.sk=cte2.sk+1
)
-- Find any that result in zero length
select distinct k from cte2 where len(l)=0

SQL query to find rows with the most matching keywords

I'm really bad at SQL and I would like to know what SQL I can run to solve the problem below which I suspect to be a NP-Complete problem but I'm ok with the query taking a long time to run over large datasets as this will be done as a background task. A standard sql statement is preferred but if a stored procedure is required then so be it. The SQL is required to run on Postgres 9.3.
Problem: Given a set of articles that contain a set of keywords, find the top n articles for each article that contains the most number of matching keywords.
A trimmed down version of the article table looks like this:
CREATE TABLE article (
id character varying(36) NOT NULL, -- primary key of article
keywords character varying, -- comma separated set of keywords
CONSTRAINT pk_article PRIMARY KEY (id)
);
-- Test Data
INSERT INTO article(id, keywords) VALUES(0, 'red,green,blue');
INSERT INTO article(id, keywords) VALUES(1, 'red,green,yellow');
INSERT INTO article(id, keywords) VALUES(2, 'purple,orange,blue');
INSERT INTO article(id, keywords) VALUES(3, 'lime,violet,ruby,teal');
INSERT INTO article(id, keywords) VALUES(4, 'red,green,blue,yellow');
INSERT INTO article(id, keywords) VALUES(5, 'yellow,brown,black');
INSERT INTO article(id, keywords) VALUES(6, 'black,white,blue');
Which would result in this for a SELECT * FROM article; query:
Table: article
------------------------
id keywords
------------------------
0 red,green,blue
1 red,green,yellow
2 purple,orange,blue
3 lime,violet,ruby,teal
4 red,green,blue,yellow
5 yellow,brown,black
6 black,white,blue
Assuming I want to find the top 3 articles for each article that contains the most number of matching keywords then the output should be this:
------------------------
id related
------------------------
0 4,1,6
1 4,0,5
2 0,4,6
3 null
4 0,1,6
5 1,6
6 5,0,4
Like #a_horse commented: This would be simpler with a normalized design (besides making other tasks simpler/ cleaner), but still not trivial.
Also, a PK column of data type character varying(36) is highly suspicious (and inefficient) and should most probably be an integer type or at least a uuid instead.
Here is one possible solution based on your design as is:
WITH cte AS (
SELECT id, string_to_array(a.keywords, ',') AS keys
FROM article a
)
SELECT id, string_agg(b_id, ',') AS best_matches
FROM (
SELECT a.id, b.id AS b_id
, row_number() OVER (PARTITION BY a.id ORDER BY ct.ct DESC, b.id) AS rn
FROM cte a
LEFT JOIN cte b ON a.id <> b.id AND a.keys && b.keys
LEFT JOIN LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM unnest(a.keys)
INTERSECT ALL
SELECT * FROM unnest(b.keys)
) i
) ct ON TRUE
ORDER BY a.id, ct.ct DESC, b.id -- b.id as tiebreaker
) sub
WHERE rn < 4
GROUP BY 1;
sqlfiddle (using an integer id instead).
The CTE cte converts the string into an array. You could even have a functional GIN index like that ...
If multiple rows tie for the top 3 picks, you need to define a tiebreaker. In my example, rows with smaller id come first.
Detailed explanation in this recent related answer:
Query and order by number of matches in JSON array
The comparison is between a JSON array and an SQL array, but it's basically the same problem, burns down to the same solution(s). Also comparing a couple of similar alternatives.
To make this fast, you should at least have a GIN index on the array column (instead of the comma-separated string) and the query wouldn't need the CTE step. A completely normalized design has other advantages, but won't necessarily be faster than an array with GIN index.
You can store lists in comma-separated strings. No problem, as long as this is just a string for you and you are not interested in its separate values. As soon as you are interested in the separate values, as in your example, store them separately.
This said, correct your database design and only then think about the query.
The following query selects all ID pairs first and counts common keywords. It then ranks the pairs by giving the other ID with the most keywords in common rank #1, etc. Then you keep only the three best matching IDs. STRING_AGG lists the best matching IDs in a string ordered by the number of keywords in common.
select
this_article as id,
string_agg(other_article, ',' order by rn) as related
from
(
select
this_article,
other_article,
row_number() over (partition by this_article order by cnt_common desc) as rn
from
(
select
this.id as this_article,
other.id as other_article,
count(other.id) as cnt_common
from keywords this
left join keywords other on other.keyword = this.keyword and other.id <> this.id
group by this.id, other.id
) pairs
) ranked
where rn <= 3
group by this_article
order by this_article;
Here is the SQL fiddle: http://sqlfiddle.com/#!15/1d20c/9.

Check if a list of items already exists in a SQL database

I want to create a group of users only if the same group does not exist already in the database.
I have a GroupUser table with three columns: a primary key, a GroupId, and a UserId. A group of users is described as several lines in this table sharing a same GroupId.
Given a list of UserId, I would like to find a matching GroupId, if it exists.
What is the most efficient way to do that in SQL?
Let say your UserId list is stored in a table called 'MyUserIDList', the following query will efficiently return the list of GroupId containing exactly your user list. (SQL Server Syntax)
Select GroupId
From (
Select GroupId
, count(*) as GroupMemberCount
, Sum(case when MyUserIDList.UserID is null then 0 else 1 End) as GroupMemberCountInMyList
from GroupUser
left outer join MyUserIDList on GroupUser.UserID=MyUserIDList.UserID
group by GroupId
) As MySubQuery
Where GroupMemberCount=GroupMemberCountInMyList
There are couple of ways of doing this. This answer is for sql server only (as you have not mentioned it in your tags)
Pass the list of userids in comma seperated to a stored procedure and in the SP create a dynamic query with this and use the EXEC command to execute the query. This link will guide you in this regard
Use a table-valued parameter in a SP. This is applicable to sql server 2008 and higher only.
The following link will help you get started.
http://www.codeproject.com/Articles/113458/TSQL-Passing-array-list-set-to-stored-procedure-MS
Hope this helps.
One other solution is that you convert the input list into a table. This can be done with various approaches. Unions, temporary tables and others. A neat solution combines the answer of
user1461607 for another question here on SO, using a comma-separated string.
WITH split(word, csv) AS (
-- 'initial query' (see SQLite docs linked above)
SELECT
'', -- place holder for each word we are looking for
'Auto,A,1234444,' -- items you are looking for
-- make sure the list ends with a comma !!
UNION ALL SELECT
substr(csv, 0, instr(csv, ',')), -- each word contains text up to next ','
substr(csv, instr(csv, ',') + 1) -- next recursion parses csv after this ','
FROM split -- recurse
WHERE csv != '' -- break recursion once no more csv words exist
) SELECT word, exisiting_data
FROM split s
-- now join the key you want to check for existence!
-- for demonstration purpose, I use an outer join
LEFT OUTER JOIN (select 'A' as exisiting_data) as t on t.exisiting_data = s.word
WHERE s.word != '' -- make sure we clamp the empty strings from the split function
;
Results in:
Auto,null
A,A
1234444,null