SQL Server concat/join different words in a string to get all substrings

SQL Server concat/join different words in a string to get all substrings - sql

I want to join words in a sentence in SQL server. I need it to loop through using the next word as the sgtarting word to concat each time. For example the string/sentence could be 'This is a sentence'.
I need the possible outcomes to be:
This
Thisis
Thisisa
Thisisasentence
is
isa
isasentence
a
asentence
sentence
I know how to concat but I'm not too sure how I would go about doing each word with differet first word each time.
Thanks in advance!
EDIT:
More information as requested.
I have a table Account(IS(PK), Name, Country)
I have another table accountSubstrings(SubID(pk), AccountID, Substring)
I need to break the 'Name' column into the above example 'This is a sentence' so that each substring has its own row entry in accountSubstrings. The subID is unique to each row and the AccountID will map to whichever 'Name' the substring came from. This is being done for matching purposes.
Thanks

You can do this with a recursive CTE. Basically, you want adjacent combinations. I prefer to put spaces between the words, so they are visible. The following also assumes that the words are unique:
declare #s varchar(max) = 'This is a sentence';
with words as (
select s.value as word, row_number() over (order by charindex(s.value, #s)) as seqnum
from string_split(#s, ' ') s
),
cte as (
select seqnum, word as combined, format(seqnum, '000') as seqnums
from words
union all
select w.seqnum, concat(cte.combined, ' ', w.word), concat(seqnums, ':', format(w.seqnum, '000'))
from cte join
words w
on w.seqnum = cte.seqnum + 1
)
select *
from cte
order by seqnums;
Here is a db<>fiddle.
The actually tricky part here is keeping the words in order. That is what the row_number() is doing, capturing the order -- and where the uniqueness restriction comes from. Of course, this could be replaced by a recursive CTE, and that would be fine (and allow duplicate words).

Looking to the updates you made, you can do as
SELECT *
FROM (VALUES('This is a sentence')) T(Name)
JOIN (VALUES('This'), ('is'), ('a'), ('sentence')) TT(SubString)
ON CONCAT(' ', T.Name, ' ') LIKE CONCAT(' %', TT.SubString, ' %');
Here the table T is your table Account, and the table TT is your table accountSubstrings

Related

Loop through table and update a specific column

I have the following table:
Id
Category
1
some thing
2
value
This table contains a lot of rows and what I'm trying to do is to update all the Category values to change every first letter to caps. For example, some thing should be Some Thing.
At the moment this is what I have:
UPDATE MyTable
SET Category = (SELECT UPPER(LEFT(Category,1))+LOWER(SUBSTRING(Category,2,LEN(Category))) FROM MyTable WHERE Id = 1)
WHERE Id = 1;
But there are two problems, the first one is trying to change the Category Value to upper, because only works ok for 1 len words (hello=> Hello, hello world => Hello world) and the second one is that I'll need to run this query X times following the Where Id = X logic. So my question is how can I update X rows? I was thinking in a cursor but I don't have too much experience with it.
Here is a fiddle to play with.

You can split the words apart, apply the capitalization, then munge the words back together. No, you shouldn't be worrying about subqueries and Id because you should always approach updating a set of rows as a set-based operation and not one row at a time.
;WITH cte AS
(
SELECT Id, NewCat = STRING_AGG(CONCAT(
UPPER(LEFT(value,1)),
SUBSTRING(value,2,57)), ' ')
WITHIN GROUP (ORDER BY CHARINDEX(value, Category))
FROM
(
SELECT t.Id, t.Category, s.value
FROM dbo.MyTable AS t
CROSS APPLY STRING_SPLIT(Category, ' ') AS s
) AS x GROUP BY Id
)
UPDATE t
SET t.Category = cte.NewCat
FROM dbo.MyTable AS t
INNER JOIN cte ON t.Id = cte.Id;
This assumes your category doesn't have non-consecutive duplicates within it; for example, bora frickin bora would get messed up (meanwhile bora bora fickin would be fine). It also assumes a case insensitive collation (which could be catered to if necessary).
In Azure SQL Database you can use the new enable_ordinal argument to STRING_SPLIT() but, for now, you'll have to rely on hacks like CHARINDEX().
Updated db<>fiddle (thank you for the head start!)

Search for a word in the column string and list those words

Have two tables, table 1 with columns W_ID and word. Table 2 with column N_ID and note. Have to list all the NID where words found in table 1 word column contains in Note column (easy part) and also list those words in another column without duplicating the N_ID. Which means using STUFF to concatenate all the words found in Note column for that particular N_ID. I tried using
FULL TEXT INDEX using CONTAIN
But it only allows to search for one word at a time. Any suggestions how I can use a while loop to achieve this.

If there is a maximum number of words you want displayed for N_ID, you can pivot this. You could have them in a single column by concatenating them, but I would recommend against that. Here is a pivot that supports up to 4 words per N_ID. You can adjust it as needed. You can view the SQL Fiddle for this here.
SELECT
n_id,
[1] AS word_1,
[2] AS word_2,
[3] AS word_3,
[4] AS word_4
FROM (
SELECT
n_id,
word,
ROW_NUMBER() OVER (PARTITION BY n_id ORDER BY word) AS rn
FROM tbl2
JOIN tbl1 ON
tbl2.note LIKE '%'+tbl1.word+'[ ,.?!]%'
) AS source_table
PIVOT (
MAX(word)
FOR rn IN ([1],[2],[3],[4])
) AS pivot_table
*updated the join to prevent look for a space or punctuation to declare the end of a word.

You can join your tables together based on a postive result from the charindex function.
In SQL 2017 you can run:
SELECT n_id, string_agg(word)
FROM words
inner join notes on 0 < charindex(words.word, notes.note);
Prior to SQL 2017, there is no string_agg so you'll need to use stuff, which is trickier:
select
stuff((
SELECT ', ' + word
FROM words
where 0 < charindex(words.word, notes.note)
FOR XML PATH('')
), 1, 2, '')
from notes;
I used the following schema:
CREATE table WORDS
(W_ID int identity primary key
,word varchar(100)
);
CREATE table notes
(N_ID int identity primary key
,note varchar(1000)
);
insert into words (word) values
('No'),('Nope'),('Nah');
insert into notes (note) values
('I am not going to do this. Nah!!!')
,('It is OK.');

SQL Server : GROUP CONCAT with DISTINCT is sorting natural data input

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):
ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT
ROW_NUMBER() OVER (ORDER BY CableID) - 1 AS ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]', 'varchar(8000)'), 1, 13) AS Routing
FROM
(SELECT
CableID,
CAST('<XMLRoot><RowData>' + REPLACE([RouteNodeList], ',', '</RowData><RowData>') + '</RowData></XMLRoot>' AS xml) AS x
FROM
[dbo].[Cables]) t
CROSS APPLY
x.nodes('/XMLRoot/RowData') m (n)
ORDER BY
ID)
Now I need to concatenate data from the Routing column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:
Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],
I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.

Perhaps GROUP BY with ORDER BY will work:
stuff((select ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
from vw_Routing x3
where x3.CableID = c.CableId
group by x3.Routing
order by min(x3.id)
for XML PATH ('')
), 1, 1, '') as [Routing],
I also replaced the SUBSTRING() with STUFF(). The latter is more standard for this operation.

To https://stackoverflow.com/users/1144035/gordon-linoff
Unfortunately, that did not work. It gave me the same result as my select statement; that is, no dups but reordering data.
HOWEVER, I found the correct answer earlier today:
I figured it out finally!! I still have to get implement it within the other code and add the new Cable Area code, but the hard part it over!!!!!
I am going to post the following to the forums so that they know not to work on it .... I was writing this to send to my friend for his help, but I figured it out myself before I sent it.
I started with raw, comma separated data in the records of a table … the data is from another source. I had to remove some of the information from each value, so I used the following code to split it up and manipulate it:
Code1
Once that was done, I had to put the manipulated data back in the same form in the same order and with no duplicates. So I needed a SELECT DISTINCT. When I used the commented out SELECT DISTINCT below, it removed duplicates but it changed the order of the data which I could not have as it is Cable Tray Routing Data. When I took out the SELECT DISTINCT, it kept correct order, but left duplicates.
Because I was using XML PATH, I had to change this code …… To this code so that I could use SELECT DISTINCT remove the duplicates:Code2 and Code3


PostgreSQL: Check if each item in array is contained by a larger string

I have an array of strings in PostgreSQL:
SELECT ARRAY['dog', 'cat', 'mouse'];
And I have a large paragraph:
Dogs and cats have a range of interactions. The natural instincts of each species lead towards antagonistic interactions, though individual animals can have non-aggressive relationships with each other, particularly under conditions where humans have socialized non-aggressive behaviors.
The generally aggressive interactions between the species have been noted in cultural expressions.
For each item in the array, I want to check if it appears in my large paragraph string. I know for any one string, I could do the following:
SELECT paragraph_text ILIKE '%dog%';
But is there a way to simultaneously check every string in the array (for an arbitrary number of array elements) without resorting to plpgsql?

I belive you want something like this (assuming paragraph_text is column from table named table):
SELECT
paragraph_text,
sub.word,
paragraph_text ILIKE '%' || sub.word || '%' as is_word_in_text
FROM
table1 CROSS JOIN (
SELECT unnest(ARRAY['dog', 'cat', 'mouse']) as word
) as sub;
Function unnest(array) takes creates table of record from array values. The you can do CROSS JOIN which means all rows from table1 are combines with all rows from that unnest-table.
If paragraph_text is some kind of static value (not from table) you can do just:
SELECT
paragraph_text,
sub.word,
paragraph_text ILIKE '%' || sub.word || '%' as is_word_in_text
FROM (
SELECT unnest(ARRAY['dog', 'cat', 'mouse']) as word
) as sub;

This solution will work only for postgres 8.4 and above as unrest is not available for earlier versions.
drop table if exists t;
create temp table t (col1 text, search_terms text[] );
insert into t values
('postgress is awesome', array['postgres', 'is', 'bad']),
('i like open source', array['open', 'code', 'i']),
('sql is easy', array['mysql']);
drop table if exists t1;
select *, unnest(search_terms) as search_term into temp t1 from t;
-- depending on how you like to do pattern matching.
-- it will look for term not whole words
select *, position(search_term in col1) from t1;
-- This will match only whole words.
select *, string_to_array(col1, E' ')#>string_to_array(search_term, E' ') from t1;
Basically, you need to flatten array of search_terms into one column and then match long string with each search term row wise.

SELECT with a Replace()

I have a table of names and addresses, which includes a postcode column. I want to strip the spaces from the postcodes and select any that match a particular pattern. I'm trying this (simplified a bit) in T-SQL on SQL Server 2005:
SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts
WHERE P LIKE 'NW101%'
But I get the following error;
Msg 207, Level 16, State 1, Line 3
Invalid column name 'P'.
If I remove the WHERE clause I get a list of postcodes without spaces, which is what I want to search. How should I approach this? What am I doing wrong?

Don't use the alias (P) in your WHERE clause directly.
You can either use the same REPLACE logic again in the WHERE clause:
SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts
WHERE Replace(Postcode, ' ', '') LIKE 'NW101%'
Or use an aliased sub query as described in Nick's answers.

You can reference is that way if you wrap the query, like this:
SELECT P
FROM (SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts) innertable
WHERE P LIKE 'NW101%'
Be sure to give the wrapped select an alias, even unused (SQL Server doesn't allow it without one IIRC)

You are creating an alias P and later in the where clause you are using the same, that is what is creating the problem. Don't use P in where, try this instead:
SELECT Replace(Postcode, ' ', '') AS P FROM Contacts
WHERE Postcode LIKE 'NW101%'

You have to repeat your expression everywhere you want to use it:
SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts
WHERE Replace(Postcode, ' ', '') LIKE 'NW101%'
or you can make it a subquery
select P
from (
SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts
) t
WHERE P LIKE 'NW101%'

To expand on Oded's answer, your conceptual model needs a slight adjustment here. Aliasing of column names (AS clauses in the SELECT list) happens very late in the processing of a SELECT, which is why alias names are not available to WHERE clauses. In fact, the only thing that happens after column aliasing is sorting, which is why (to quote the docs on SELECT):
column_alias can be used in an ORDER BY clause. However, it cannot be used in a WHERE, GROUP BY, or HAVING clause.
If you have a convoluted expression in the SELECT list, you may be worried about it 'being evaluated twice' when it appears in the SELECT list and (say) a WHERE clause - however, the query engine is clever enough to work out what's going on. If you want to avoid having the expression appear twice in your query, you can do something like
SELECT c1, c2, c3, expr1
FROM
( SELECT c1, c2, c3, some_complicated_expression AS expr1 ) inner
WHERE expr1 = condition
which avoids some_complicated_expression physically appearing twice.

if you want any hope of ever using an index, store the data in a consistent manner (with the spaces removed). Either just remove the spaces or add a persisted computed column, Then you can just select from that column and not have to add all the space removing overhead every time you run your query.
add a PERSISTED computed column:
ALTER TABLE Contacts ADD PostcodeSpaceFree AS Replace(Postcode, ' ', '') PERSISTED
go
CREATE NONCLUSTERED INDEX IX_Contacts_PostcodeSpaceFree
ON Contacts (PostcodeSpaceFree) --INCLUDE (covered columns here!!)
go
to just fix the column by removing the spaces use:
UPDATE Contacts
SET Postcode=Replace(Postcode, ' ', '')
now you can search like this, either select can use an index:
--search the PERSISTED computed column
SELECT
PostcodeSpaceFree
FROM Contacts
WHERE PostcodeSpaceFree LIKE 'NW101%'
or
--search the fixed (spaces removed column)
SELECT
Postcode
FROM Contacts
WHERE PostcodeLIKE 'NW101%'

SELECT *
FROM Contacts
WHERE ContactId IN
(SELECT a.ContactID
FROM
(SELECT ContactId, Replace(Postcode, ' ', '') AS P
FROM Contacts
WHERE Postcode LIKE '%N%W%1%0%1%') a
WHERE a.P LIKE 'NW101%')

This will work:
SELECT Replace(Postcode, ' ', '') AS P
FROM Contacts
WHERE Replace(Postcode, ' ', '') LIKE 'NW101%'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server concat/join different words in a string to get all substrings - sql

Related

Loop through table and update a specific column

Search for a word in the column string and list those words

SQL Server : GROUP CONCAT with DISTINCT is sorting natural data input

PostgreSQL: Check if each item in array is contained by a larger string

SELECT with a Replace()

Categories

Resources