Remove banned words then collapse the data

Remove banned words then collapse the data - sql

Gets 'exact' banned words matched against a set of blog comments. It then creates a result set collapsing the banned words into the owner (the blog comment) showing the banned words found and the counts.
Any advice on how to do this in a more efficient manner - less DML - as there likely will be thousands of comments?
Banned words:
though
man
about
hear
Blog comments:
'There are many of us.'
'The man.'
'So glad to hear about.'
'So glad to hear about. A regular guy.'
'though, though, though.'
1st entry: word is NOT banned - it's a variant of a banned word. Entry is NOT to be selected.
2nd entry: 1 banned word. Entry selected as 1 row, 1 banned word, counted as 1 banned word.
3rd entry: 2 different banned words. Entry selected as 1 row, 2 banned words separate by
commas, counted as 2 banned words.
4th entry: 2 different banned words. Entry selected as 1 row, 2 banned words separate by
commas, counted as 2 banned words.
5th entry: 3 same banned words. Entry selected as 1 row, 1 banned word, counted as 3 banned
word.
Rules:
- Get the banned words in the blog comment.
- Only EXACT matches to the banned words. Do NOT include variants of the banned word.
- If there are more than 1 banned words in the same blog comment, only 1 row should be
generated.
- Generate the owner's row, include the banned words BannedWords column - non-unique banned
words separated by comma. 1 word for unique banned words.
Count the banned words and include that column in the generated row.
Desired Result - 4 rows:
BlogCommentId BannedWords t_Text1 t_Text2 t_Text3 CntOfBannedWords
2 man e f g 1
3 hear,about h i j 2
4 hear,about k l m 2
5 though n o p 3
The exact banned word matching code:
DECLARE #tableFinal TABLE (
t0_BlogCommentId int,
t0_Word VARCHAR(50),
t0_Text1 varchar(10),
t0_Text2 varchar(10),
t0_Text3 varchar(10),
t0_CntOfBannedWords int)
DECLARE #table1 TABLE (
t_BlogCommentId int,
t_Word VARCHAR(50),
t_Text1 varchar(10),
t_Text2 varchar(10),
t_Text3 varchar(10));
DECLARE #BlogComment TABLE (
BlogCommentId INT IDENTITY PRIMARY KEY,
BlogCommentContent VARCHAR(MAX),
Text1 varchar(10),
Text2 varchar(10),
Text3 varchar(10));
INSERT INTO #BlogComment
(BlogCommentContent
,Text1
,Text2
,Text3)
VALUES
('There are many of us.', 'a', 'b', 'c')
('The man.', 'e', 'f', 'g')
('So glad to hear about.', 'h', 'i', 'j')
('So glad to hear about. A regular guy.', 'k', 'l', 'm')
('though, though, though.', 'n', 'o', 'p');
DECLARE #BannedWords TABLE (
BannedWordsId INT IDENTITY PRIMARY KEY,
Word varchar(250));
INSERT INTO #BannedWords (Word) VALUES
('though'),
('man'),
('about'),
('hear');
;WITH rs AS
(
SELECT word = REPLACE(REPLACE([value],'.',''),',','')
,BlogCommentId
,Text1
,Text2
,Text3
FROM #BlogComment
CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
INSERT #table1
(t_Word,
t_BlogCommentId,
t_Text1,
t_Text2,
t_Text3 )
SELECT bw.Word
,rs.BlogCommentId
,rs.Text1
,rs.Text2
,rs.Text3
FROM rs
INNER JOIN #BannedWords bw ON rs.word = bw.Word;
Result from the WITH above before collapsing.
I want the 'Desired Result' to be generated here if possible in the WITH and not have to add the additional code below it.
SELECT *
FROM #table1
Results:
t_BlogCommentId t_BannedWords t_Text1 t_Text2 t_Text3
2 man e f g
3 about h i j
3 hear h i j
4 about k l m
4 hear k l m
5 though n o p
5 though n o p
5 though n o p
-- The 'additional code to collapse':
INSERT #tableFinal
(t0_BlogCommentId
,t0_Word
,t0_Text1
,t0_Text2
,t0_Text3
,t0_CntOfBannedWords )
SELECT DISTINCT t_BlogCommentId
,''
,''
,''
,''
,0
FROM #table1
UPDATE #tableFinal
SET t0_Word = t_Word
,t0_Text1 = t_Text1
,t0_Text2 = t_Text2
,t0_Text3 = t_Text3
FROM #table1
WHERE t0_BlogCommentId = t_BlogCommentId
UPDATE #tableFinal
SET t0_Word = t0_Word + ',' + t_Word
FROM #table1
WHERE t0_BlogCommentId = t_BlogCommentId AND t0_Word <> t_Word
UPDATE #tableFinal
SET t0_CntOfBannedWords = (SELECT Count (t_Word)
FROM #table1
WHERE t0_BlogCommentId = t_BlogCommentId)
Result of collapsing - now it's my 'Desired Result' - but more work and NOT likely suitable if there are a thousands plus comments:
SELECT t0_BlogCommentId as BlogCommentId
,t0_Word as BannedWords
,t0_Text1 as Text1
,t0_Text2 as Text2
,t0_Text3 as Text3
,t0_CntOfBannedWords as CntOfBannedWords
FROM #tableFinal
BlogCommentId BannedWords t_Text1 t_Text2 t_Text3 CntOfBannedWords
2 man e f g 1
3 hear,about h i j 2
4 hear,about k l m 2
5 though n o p 3
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=982989411c1a3e3fb784f1e0e46fd9e1

Excellent post. Thank you for providing all the data in an easy to use format. You were really close to having this all put together. You just needed that final piece to push the values back together. Here I used STUFF for this. I just started with your "rs" cte. I added another cte and then used the old STUFF trick. This produces the desired output for your sample data.
DECLARE #BlogComment TABLE (
BlogCommentId INT IDENTITY PRIMARY KEY,
BlogCommentContent VARCHAR(MAX),
Text1 varchar(10),
Text2 varchar(10),
Text3 varchar(10));
INSERT INTO #BlogComment
(BlogCommentContent
,Text1
,Text2
,Text3)
VALUES
('There are many of us.', 'a', 'b', 'c')
, ('The man.', 'e', 'f', 'g')
, ('So glad to hear about.', 'h', 'i', 'j')
, ('So glad to hear about. A regular guy.', 'k', 'l', 'm')
, ('though, though, though.', 'n', 'o', 'p');
DECLARE #BannedWords TABLE (
BannedWordsId INT IDENTITY PRIMARY KEY,
Word varchar(250));
INSERT INTO #BannedWords (Word) VALUES
('though'),
('man'),
('about'),
('hear');
;WITH rs AS
(
SELECT word = REPLACE(REPLACE([value],'.',''),',','')
,BlogCommentId
,Text1
,Text2
,Text3
FROM #BlogComment
CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
, ExpandedWords as
(
SELECT bw.Word
,rs.BlogCommentId
,rs.Text1
,rs.Text2
,rs.Text3
FROM rs
INNER JOIN #BannedWords bw ON rs.word = bw.Word
)
select BlogCommentId
, BannedWords = STUFF((select ', ' + e2.Word
from ExpandedWords e2
where e2.BlogCommentId = e1.BlogCommentId
--you could add an order by here if you want the list of words in a certain order.
FOR XML PATH('')), 1, 1, ' ')
, e1.Text1
, e1.Text2
, e1.Text3
, BannedWordsCount = count(*)
from ExpandedWords e1
group by e1.BlogCommentId
, e1.Text1
, e1.Text2
, e1.Text3
order by e1.BlogCommentId
For a more modern version of sql server using STRING_AGG is a bit less verbose and obtuse than using STUFF and FOR XML. Here is how that might look. I also use ROW_NUMBER here so you can only return a single instance of a banned word if it appears multiple times in the input. Same concept as above so this is starting a bit later in the code.
, ExpandedWords as
(
SELECT bw.Word
, rs.BlogCommentId
, rs.Text1
, rs.Text2
, rs.Text3
, RowNum = ROW_NUMBER()over(partition by rs.BlogCommentId, bw.Word order by (select newid())) --order doesn't really matter here
FROM rs
INNER JOIN #BannedWords bw ON rs.word = bw.Word
)
select e.BlogCommentId
, STRING_AGG(e.Word, ', ')
, e.Text1
, e.Text2
, e.Text3
, RowNum
from ExpandedWords e
where RowNum = 1
group by e.BlogCommentId
, e.Text1
, e.Text2
, e.Text3
, RowNum
order by e.BlogCommentId

Related

Find data by multiple Lookup table clauses

declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
The Name Table has more than just 1,2,3 and the list to parse on is dynamic
NameTable
id | name
----------
1 foo
2 bar
3 steak
CharacterTable
id | name
---------
1 tom
2 jerry
3 dog
NameToCharacterTable
id | nameId | characterId
1 1 1
2 1 3
3 1 2
4 2 1
I am looking for a query that will return a character that has two names. For example
With the above data only "tom" will be returned.
SELECT *
FROM nameToCharacterTable
WHERE nameId in (1,2)
The in clause will return every row that has a 1 or a 3. I want to only return the rows that have both a 1 and a 3.
I am stumped I have tried everything I know and do not want to resort to dynamic SQL. Any help would be great
The 1,3 in this example will be a dynamic list of integers. for example it could be 1,3,4,5,.....

Filter out a count of how many times the Character appears in the CharacterToName table matching the list you are providing (which I have assumed you can convert into a table variable or temp table) e.g.
declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
declare #RequiredNames table (nameId int);
insert into #RequiredNames (nameId)
values
(1),
(2);
select *
from #Character C
where (
select count(*)
from #NameToCharacter NC
where NC.characterId = c.id
and NC.nameId in (select nameId from #RequiredNames)
) = 2;
Returns:
id
name
1
tom
Note: Providing DDL+DML as shown here makes it much easier for people to assist you.

This is classic Relational Division With Remainder.
There are a number of different solutions. #DaleK has given you an excellent one: inner-join everything, then check that each set has the right amount. This is normally the fastest solution.
If you want to ensure it works with a dynamic amount of rows, just change the last line to
) = (SELECT COUNT(*) FROM #RequiredNames);
Two other common solutions exist.
Left-join and check that all rows were joined
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM #RequiredNames rn
LEFT JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = COUNT(nc.nameId) -- all rows are joined
);
Double anti-join, in other words: there are no "required" that are "not in the set"
SELECT *
FROM #Character c
WHERE NOT EXISTS (SELECT 1
FROM #RequiredNames rn
WHERE NOT EXISTS (SELECT 1
FROM #NameToCharacter nc
WHERE nc.nameId = rn.nameId AND nc.characterId = c.id
)
);
A variation on the one from the other answer uses a windowed aggregate instead of a subquery. I don't think this is performant, but it may have uses in certain cases.
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM (
SELECT *, COUNT(*) OVER () AS cnt
FROM #RequiredNames
) rn
JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = MIN(rn.cnt)
);
db<>fiddle

Removing duplicates returned based on the column value

This SQL gives me the blog comments that contain just the banned words defined in my table. I only get the EXACT matches and it removes duplicate rows. It also eliminates variants of a banned word. Which is what I want.
DECLARE #BlogComment TABLE (
BlogCommentId INT IDENTITY PRIMARY KEY,
BlogCommentContent VARCHAR(MAX),
Id int);
INSERT INTO #BlogComment
(BlogCommentContent,
Id)
VALUES
('There are many of us.' ,1),
('This is the man.', 2),
('I hear you.', 2),
('Your the man.',2);
DECLARE #BannedWords TABLE (
BannedWordsId INT IDENTITY PRIMARY KEY,
Word varchar(250));
INSERT INTO #BannedWords (Word) VALUES
('though'),
('man'),
('hear');
;WITH rs AS
(
SELECT word = REPLACE(REPLACE([value],'.',''),',','')
,Id
FROM #BlogComment
CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word,
rs.id
FROM rs
INNER JOIN #BannedWords bw ON rs.word = bw.Word;
Results of running this are:
Word id
hear 2
man 2
What I expect.
Now I want to take it 1 step further. Test case: I have more than 1 banned word in the same blog comment.
So I altered the code (the table values) to include the test case. A blog comment with 2 banned words.
('He is the man. I hear ya.',2),
I want only 1 row returned for this case. Either one.
Word id
hear 2
And altered the code to accommodate this by adding 2 more lines of code per the 'accepted answer' from - Get top 1 row of each group
,ROW_NUMBER() OVER(PARTITION by Id ORDER BY BlogCommentContent) AS rn
WHERE rn = 1;
DECLARE #BlogComment TABLE (
BlogCommentId INT IDENTITY PRIMARY KEY,
BlogCommentContent VARCHAR(MAX),
Id int);
INSERT INTO #BlogComment
(BlogCommentContent,
Id)
VALUES
('There are many of us.',1),
('He is the man. I hear ya.',2),
('Your the man.',2);
DECLARE #BannedWords TABLE (
BannedWordsId INT IDENTITY PRIMARY KEY,
Word varchar(250));
INSERT INTO #BannedWords (Word) VALUES
('though'),
('man'),
('hear');
;WITH rs AS
(
SELECT word = REPLACE(REPLACE([value],'.',''),',','')
,Id
,ROW_NUMBER() OVER(PARTITION by Id ORDER BY BlogCommentContent) AS rn
FROM #BlogComment
CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word,
rs.id
FROM rs
INNER JOIN #BannedWords bw ON rs.word = bw.Word
WHERE rn = 1;
Results of running this are no rows returned:
Word id
So, not sure why the 'accepted answer' does not work for me.

You don't need the row_number ... you only need to join split words from each comment and join banned words.. and then count them for each comment ... either all or unique count ..
DECLARE #BlogComment TABLE (
BlogCommentId INT IDENTITY PRIMARY KEY,
BlogCommentContent VARCHAR(MAX),
Id int);
INSERT INTO #BlogComment
(BlogCommentContent,
Id)
VALUES
('There are many of us.',1),
('He is the man. I hear ya.',2),
('Your the man.',2);
DECLARE #BannedWords TABLE (
BannedWordsId INT IDENTITY PRIMARY KEY,
Word varchar(250));
INSERT INTO #BannedWords (Word) VALUES
('though'),
('man'),
('hear');
SELECT split_words.id as comment_id
, count(bw.Word) as total_banned_words
, count(distinct bw.Word) as total_unique_banned_words
FROM (
SELECT word = REPLACE(REPLACE([value],'.',''),',','')
,Id
,COUNT(*) OVER(PARTITION by Id,REPLACE(REPLACE([value],'.',''),',','')) cnt
FROM #BlogComment
CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
) split_words
LEFT JOIN #BannedWords bw
ON bw.Word = split_words.word
GROUP BY split_words.id
ORDER BY split_words.id

Can't figure out how to join tables due to comma separated values

I have a table of the following structure.
select loginid,alloted_area from tbllogin
Which returns this result.
loginid alloted_area
------------- ---------------------------
01900017 22,153,169,174,179,301
01900117 254,91,92,285,286,287
01900217 2,690,326,327,336
17900501 null
17900601 28,513,409,410
17901101 254,91,92,285
17901701 59,1302,1303
17902101 2,690,326,327
17902301 20,159,371,161
17902401 null
I have another table tblarea whose ids are stored in comma separated values in the above tables when an area is assigned to a user. I want to join these two tables and leave entries like the last one that has not yet been assigned an area. Now I have been told several times on that storing data in comma separated values is a bad practice(I suppose it's because of the problem that I am facing) I know that but this structure has been created by another developer at my company not me so please help instead of downvoting. This is what I have tried:
declare #csv varchar(max)='';
SELECT #CSV = COALESCE(#CSV + ', ', '') + case when alloted_area is null or alloted_area='' then '0' else alloted_area end from tbllogin;
select * from tblarea where id in (select 0 union select sID from splitstring(#CSV,','));
This does get the area but there is no way it can give me the login of users that the areas have been assigned to. Sample input and output.
tbllogin
loginid alloted_area
------------- ---------------------------
a1 1,3,5
a2 2,4
a3 1,4
a4 null
tblarea
id area_name
------------- ---------------------------
1 v
2 w
3 x
4 y
5 z
After joining I need this result
login_id area_name
------------- ---------------------------
a1 v
a1 x
a1 z
a2 w
a2 y
a3 v
a3 y

By Using Split and CROSS APPLY we can achieve the desired Output
DECLARE #tbllogin TABLE (LoginID CHAR(2) NOT NULL PRIMARY KEY, alloted_area VARCHAR(MAX));
INSERT #tblLogin (LoginID, alloted_area)
VALUES ('a1', '1,3,5'), ('a2', '2,4'),('a3', '1,4'), ('a4', NULL);
DECLARE #tblArea TABLE (ID INT NOT NULL PRIMARY KEY, Area_Name CHAR(1));
INSERT #tblArea (ID, Area_Name)
VALUES (1, 'v'), (2, 'w'), (3, 'x'), (4, 'y'), (5, 'z');
SELECT Dt.LoginID,A.Area_Name FROm
(
SELECT LoginID,Split.a.value('.', 'VARCHAR(1000)') AS alloted_area
FROM (
SELECT LoginID,CAST('<S>' + REPLACE(alloted_area, ',', '</S><S>') + '</S>' AS XML) AS alloted_area
FROM #tbllogin
) AS A
CROSS APPLY alloted_area.nodes('/S') AS Split(a)
)DT
Inner join
#tblArea A
on A.ID=DT.alloted_area
OutPut
LoginID Area_Name
--------------------
a1 v
a1 x
a1 z
a2 w
a2 y
a3 v
a3 y

Consider this split function:
CREATE FUNCTION [dbo].[SplitString]
(
#List NVARCHAR(MAX),
#Delim VARCHAR(255)
)
RETURNS TABLE
AS
RETURN ( SELECT [Value] FROM
(
SELECT
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim
) AS y
);
Then you can do a query like this:
SELECT
t.loginid,tblarea.area_name
FROM
tbllogin AS t
CROSS APPLY(SELECT value FROM SplitString(t.alloted_area,',')) as split
JOIN tblarea ON tblarea.id=split.Value

You can join using LIKE, e.g. CONCAT(',', alloted_area, ',') LIKE CONCAT('%,', ID, ',%')
So for a full example
-- SAMPLE DATA
DECLARE #tbllogin TABLE (LoginID CHAR(2) NOT NULL PRIMARY KEY, alloted_area VARCHAR(MAX));
INSERT #tblLogin (LoginID, alloted_area)
VALUES ('a1', '1,3,5'), ('a2', '2,4'),('a3', '1,4'), ('a4', NULL);
DECLARE #tblArea TABLE (ID INT NOT NULL PRIMARY KEY, Area_Name CHAR(1));
INSERT #tblArea (ID, Area_Name)
VALUES (1, 'v'), (2, 'w'), (3, 'x'), (4, 'y'), (5, 'z');
-- QUERY
SELECT l.LoginID,
a.Area_Name
FROM #tblLogin AS l
INNER JOIN #tblArea AS a
ON CONCAT(',', l.alloted_area, ',') LIKE CONCAT('%,', a.ID, ',%')
ORDER BY l.LoginID;
OUTPUT
LoginID Area_Name
--------------------
a1 v
a1 x
a1 z
a2 w
a2 y
a3 v
a3 y
You could arguably split allocated_area into separate rows, but the article Split strings the right way – or the next best way by Aaron Bertrand shows that in these circumstances LIKE will outperform the any of the split functions.
Although you have said that you know it is a bad design, I can't in good conscience not mention it in my answer, so whichever method you choose is not a substitute for fixing how this is stored. If not by you, by whoever designed it.
The correct method would be a junction table, tblLoginArea:
LoginID AreaID
------------------
a1 1
a1 3
a1 5
a2 2
a2 4
....etc
Then if the developers still need the csv format, then they can create a view, and update their references to that:
CREATE VIEW dbo.LoginAreaCSV
AS
SELECT l.LoginID,
Allocated_Area = STUFF(la.AllocatedAreas.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM tblLogin AS l
OUTER APPLY
( SELECT CONCAT(',', la.AreaID)
FROM tblLoginArea AS la
WHERE la.LoginID = l.LoginID
ORDER BY la.AreaID
FOR XML PATH(''), TYPE
) AS la (AllocatedAreas);
And your query can be done using equality predicates that can be optimised with indexed:
SELECT l.LoginID, a.Area_Name
FROM tblLogin AS l
INNER JOIN tblLoginArea AS la
ON la.LoginID = l.LoginID
INNER JOIN tblArea AS a
ON a.ID = la.AreaID;
Example on DB Fiddle

SQL Trigger to split string during insert without a common delimiter and store it into another table

Currently I have a system that is dumping data into a table with the format:
Table1
Id#, row#, row_dump
222, 1, “set1 = aaaa set2 =aaaaaa aaaa dd set4=1111”
I want to take the row dump and transpose it into rows and insert it into another table of the format:
Table2
Id#, setting, value
222, ‘set1’,’aaa’
222, ‘set2’,’aaaaaa aaaa dd’
222, ‘set4’,’1111’
Is there a way to make a trigger in MSSQL that will parse this string on insert in Table1 and insert it into Table2 properly?
All of the examples I’ve found required a common delimiter. ‘=’ separates the setting from the value, space(s) separate a value from a setting but a value could have spaces in it (settings do not have spaces in them so the last word before the equal sign is the setting name but there could be spaces between the setting name and equal sign).
There could be 1-5 settings and values in any given row. The values can have spaces. There may or may not be space between the setting name and the ‘=’ sign.
I have no control over the original insert process or format as it is used for other purposes.

You could use 'set' as a delimiter. This is a simple sample. It obviously may have to be molded to your environment.
use tempdb
GO
IF OBJECT_ID('dbo.fn_TVF_Split') IS NOT NULL
DROP FUNCTION dbo.fn_TVF_Split;
GO
CREATE FUNCTION dbo.fn_TVF_Split(#arr AS NVARCHAR(2000), #sep AS NCHAR(3))
RETURNS TABLE
AS
RETURN
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1) --2 rows
,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B) --4 rows (2x2)
,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B) --16 rows (4x4)
,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B) --256 rows (16x16)
,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B) --65536 rows (256x256)
,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B) --4,294,967,296 rows (65536x65536)
,Nums AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)
SELECT
(n - 1) - LEN(REPLACE(LEFT(#arr, n-1), #sep, N'')) + 1 AS pos,
SUBSTRING(#arr, n, CHARINDEX(#sep, #arr + #sep, n) - n) AS element
FROM Nums
WHERE
n <= LEN(#arr) + 3
AND SUBSTRING(#sep + #arr, n, 3) = #sep
AND N<=100000
GO
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set4=1111')
insert into #t values(111, 2, ' set1 =cx set2 =4444set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
insert into #t2 (Id,Setting,Value)
select
Id,
[Setting]='set' + left(LTRIM(element),1),
[Value]=RIGHT(element,charindex('=',reverse(element))-1)
from #t t
cross apply dbo.fn_TVF_Split(row_dump,'set')
where pos > 1
order by
id asc,
'set' + left(LTRIM(element),1) asc
select *
from #t2
Update
You could do something like this. It is not optimal and could probably be better handled in the transformation tool or application. Anyway here we go.
Note: You will need the split function I posted before.
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set3=abc set4=1111 set5=7373')
insert into #t values(111, 2, 'set1 =cx set2 = 4444 set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
if OBJECT_ID('tempdb.dbo.#Vals') IS NOT NULL
BEGIN
DROP TABLE #Vals;
END
CREATE TABLE #Vals(
Id INT,
Row INT,
Element VARCHAR(MAX),
pos int,
value VARCHAR(MAX)
);
insert into #Vals
select
Id,
row,
element,
pos,
Value=STUFF(LEFT(element,len(element) - CHARINDEX(' ',reverse(element))),1,1,'')
from(
select
Id,
row,
row_dump = REPLACE(REPLACE(REPLACE(row_dump,'= ','='),' =','='),'=','=|')
from #t
) AS t
cross apply dbo.fn_TVF_Split(row_dump,'=')
where pos >=1 and pos < 10
insert into #t2 (Id,Setting,Value)
select
t1.Id,
Setting =
(
SELECT TOP 1
CASE WHEN t2.pos = 1
THEN LTRIM(RTRIM(t2.element))
ELSE LTRIM(RTRIM(RIGHT(t2.element,CHARINDEX(' ',REVERSE(t2.element)))))
END
FROM #Vals t2
where
t2.Id = t1.id
and t2.row = t1.row
and t2.pos < t1.pos
ORDER BY t2.pos DESC
),
t1.Value
from #Vals t1
where t1.pos > 1 and t1.pos < 10
order by t1.id,t1.row,t1.pos
select * from #t2

Using CASE WHEN in the where condition

I was wondering if anyone can help me write some code for the following logic.
We have a table
----------------
id, lang, letter
----------------
1 1 E
1 1 E
1 1 E
1 1 E
2 2 F
Problem:
I need to select ALL the rows for which the following condition fails:
id = lang (ie its either 1 or 2)
lang = 1 when letter = 'e' OR lang=2 when letter=2
I know I can hard code it. Also i would like to do this in ONE query only.
Please help

WHERE NOT
(
id = lang
AND
(
(lang = 1 AND letter = 'e')
OR (lang = 2 AND letter = '2')
)
)

select * from table
where id <> lang and
(lang<>1 and letter <> 'e' or
lang<>2 and letter <> '2')
assuming you mean you want all data where both of those conditions are false.

I think this is what you want to exclude the records meeting that criteria:
create table #t
(
id int,
lang int,
letter varchar(1)
)
insert into #t values (1, 1, 'E')
insert into #t values (1, 1, 'E')
insert into #t values (1, 1, 'E')
insert into #t values (1, 1, 'E')
insert into #t values (2, 2, 'F')
insert into #t values (1, 1, 'G')
insert into #t values (1, 1, 'H')
insert into #t values (1, 1, 'I')
insert into #t values (1, 1, 'J')
insert into #t values (2, 2, '2')
SELECT *
FROM #t
WHERE NOT
(
id = lang
AND
(
(
lang = 1
AND letter = 'E'
)
OR
(
lang = 2
AND letter = '2'
)
)
)
drop table #t
to get the records with that, just remove the NOT it:
SELECT *
FROM #t
WHERE
(
id = lang
AND
(
(
lang = 1
AND letter = 'E'
)
OR
(
lang = 2
AND letter = '2'
)
)
)

The idea here is that there are three business rules that may be implemented as three distinct tuple constraints (i.e. not false for every row in the table):
id and lang must be equal (begging the question, why not make one a computed column?).
If letter is 'E' then lang must be 1 (I assume there is a typo in your question where you said 'e' instead of 'E').
If letter is 'F' then lang must be 2 (I assume there is a typo in your question where you said 2 instead of 'F').
The constraints 'don't have anything to say' about any other data (e.g. when letter is 'X') and will allow this to pass.
All three tuple constraints can be written in conjunctive normal form as a constraint validation query:
SELECT * FROM T
WHERE id = lang
AND ( letter <> 'E' OR lang = 1 )
AND ( letter <> 'F' OR lang = 2 )
The data that violates the constraints can be simply shown (in pseudo relational algebra) as:
T MINUS (constraint validation query)
In SQL:
SELECT * FROM T
EXCEPT
SELECT * FROM T
WHERE id = lang
AND ( letter <> 'E' OR lang = 1 )
AND ( letter <> 'F' OR lang = 2 )
It is good to be able to rewrite predicates in case one's query of choice runs like glue on one's DBMS of choice! The above may be rewritten as e.g.
SELECT * FROM T
WHERE NOT ( id = lang
AND ( letter <> 'E' OR lang = 1 )
AND ( letter <> 'F' OR lang = 2 ) )
Applying rewrite laws (De Morgan's and double-negative) e.g.
SELECT * FROM T
WHERE id <> lang
OR ( letter = 'E' AND lang <> 1 )
OR ( letter = 'F' AND lang <> 2 )
Logically speaking, this should be better for the optimizer because for the above to be a contradiction every disjunct member must be false (put another way, it only takes one OR'ed clause to be true for the data to be deemed 'bad'). In practice (in theory?), the optimizer should be able to perform such rewrites anyhow!
p.s. nulls are bad for logic -- avoid them!
Here's my test code with sample data:
WITH Nums AS ( SELECT *
FROM ( VALUES (0), (1), (2) ) AS T (c) ),
Chars AS ( SELECT *
FROM ( VALUES ('E'), ('F'), ('X') ) AS T (c) ),
T AS ( SELECT N1.c AS id, N2.c AS lang,
C1.c AS letter
FROM Nums AS N1, Nums AS N2, Chars AS C1 )
SELECT * FROM T
EXCEPT
SELECT * FROM T
WHERE id = lang
AND ( letter <> 'E' OR lang = 1 )
AND ( letter <> 'F' OR lang = 2 );

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove banned words then collapse the data - sql

Related

Find data by multiple Lookup table clauses

Removing duplicates returned based on the column value

Can't figure out how to join tables due to comma separated values

SQL Trigger to split string during insert without a common delimiter and store it into another table

Using CASE WHEN in the where condition

Categories

Resources