Pickup words in text field - SQL server - sql

My data has a text field and I want to pick up the keywords in the list of: 'mr' , 'jr', 'dr', 'ii'
these words need to have space in front and at the end.
So with the data below, the output should be:
id|text|keyword1|keyword2|keyword3|keyword4
1, 'xxxx', 'jr','mr','ii'
2, 'xxxx','mr','',''
Thank you for helping.
HHC
Create TABLE have (
id int,
text varchar(225)
);
Insert into have (id,text) values (1,'monday jr due date mr ii final');
Insert into have (id,text) values (2,'happy new year mr J');

You can use CHARINDEX() function within conditionals such as
SELECT id, text,
CASE WHEN CHARINDEX(' jr ',text) > 0 THEN 'jr' END AS keyword1,
CASE WHEN CHARINDEX(' mr ',text) > 0 THEN 'mr' END AS keyword2,
CASE WHEN CHARINDEX(' ii ',text) > 0 THEN 'ii' END AS keyword3,
CASE WHEN CHARINDEX(' dr ',text) > 0 THEN 'dr' END AS keyword4
FROM have

First grab a copy of Ngrams8K.
Next you can do this:
SELECT
h.Id,
h.[text],
ng.Token,
Keyword = ROW_NUMBER() OVER (PARTITION BY h.Id ORDER BY ng.Position)
FROM dbo.have AS h
CROSS APPLY dbo.NGrams8k(h.[text], 4) AS ng
WHERE ng.token IN (' mr ' , ' jr ', ' dr ', ' ii ');
Returns:
Id text Token Keyword
---- -------------------------------- ----------------------------
1 monday jr due date mr ii final jr 1
1 monday jr due date mr ii final mr 2
1 monday jr due date mr ii final ii 3
2 happy new year mr J mr 1
A simple modification:
SELECT
f.Id,
f.[Text],
Keyword1 = MAX(CASE f.Keyword WHEN 1 THEN f.Token ELSE '' END),
Keyword2 = MAX(CASE f.Keyword WHEN 2 THEN f.Token ELSE '' END),
Keyword3 = MAX(CASE f.Keyword WHEN 3 THEN f.Token ELSE '' END),
Keyword4 = MAX(CASE f.Keyword WHEN 4 THEN f.Token ELSE '' END)
FROM
(
SELECT h.Id, h.[text], ng.Token, Keyword =
ROW_NUMBER() OVER (PARTITION BY h.Id ORDER BY ng.Position)
FROM dbo.have AS h
CROSS APPLY dbo.NGrams8k(h.[text], 4) AS ng
WHERE ng.token IN (' mr ' , ' jr ', ' dr ', ' ii ')
) AS f
GROUP BY f.Id, f.[Text]
ORDER BY f.Id;
Returns:
Id Text Keyword1 Keyword2 Keyword3 Keyword4
---- ------------------------------- -------------- ------------ ------------- ------------
1 monday jr due date mr ii final jr mr ii
2 happy new year mr J mr

Related

Count grouped values

Again I need some help.
I have a table (for the sake of simplicity) with 3 fields.
code id letter
1 2016 Pablo A
2 2017 Pablo B
3 2016 Ana B
4 2017 Pablo A
5 2018 Ana A
6 2018 Ana A
I need a query that results in
code id letterA letterB
1 2016 Pablo 1 Null
2 2017 Pablo 1 1
3 2016 Ana Null 1
4 2018 Ana 2 Null
As you can see I count the records for id and grouped by code, if they have different letters for code a new record appears, but if they have both letters on the same code is just one record.
I tried with UNION but what I got is two records (with the same code) with different letters.
Thanks guys,
Edit one:
The query with union
select code, id, count(id), 'letter A' letter
from table
where letter = 'A'
union
select code, id, count(id), 'letter B' letter
from table
where letter = 'B'
I got something like this
code id count(id) letter
1 2016 Pablo 1 A
2 2017 Pablo 1 A
3 2017 Pablo 1 B
4 2016 Ana 1 B
5 2018 Ana 2 A
The problem is that I have 2 code 2017 with id Pablo, I would like to have just 1
You almost got it. You only need another GROUP BY to get the result that you wanted.
Using PIVOT
select *
from tbl t
pivot
(
count(letter)
for letter in ([A], [B])
) p
order by id desc, code
Using Union All
select code, id, A = sum(A), B = sum(B)
from
(
select code, id, A = count(*), B = null
from tbl t
where letter = 'A'
group by code, id
union all
select code, id, A = null, B = count(*)
from tbl t
where letter = 'B'
group by code, id
) d
group by code, id
order by id desc, code
You can do this by executing a dynamic sql query rather than giving values explicitly.
Query
declare #sql as varchar(max);
select #sql = 'select [code], [id], ' + stuff((
select distinct ', sum(case [letter] when ' + char(39) + [letter] + char(39)
+ ' then 1 else 0 end) as [letter' + [letter] + '] '
from [dbo].[your_table_name]
for xml path('')
)
, 1, 2, ''
);
select #sql += ' from [dbo].[your_table_name] group by [code], [id] order by [id];';
exec(#sql);
Other approach is using CASE expression in SELECT by grouping rows.
select code,
id,
SUM(CASE WHEN letter= 'A' THEN 1 ELSE 0 END) AS 'letter A' ,
SUM(CASE WHEN letter= 'B' THEN 1 ELSE 0 END) AS 'letter B'
from table
group by code, id
Note: If there are no letters, then it returns 0 instead of NULL.

Retrieve initials from a SQL Server Table

I've been working on treating a sql table, and splitting the data. I've come to splitting some initials from the last name. The only problem is, the initials are spaced out. For example (data from my table)
Hanse J S P > J S P are the initials
Gerson B D V > B D V are the initials
J D Timberland > J D are the initials
So basically, it's up to four initials, that can be either at the begin, middle, or end of the string. I'm at a loss as to how I should import these. into a seperate column where the result will be:
COL A | COL B
J S P | Jansen
B D V | Gerson
J D | Timberland
Can anyone please point me in the right direction? I'm using SQL Server.
Here's a rather hamfisted way of doing it by abusing the Parsename function. The big caveat here is that Parsename is limited to 4 tokens so J S P Jansen will work but J S P C Jansen or John J S P Jansen will not.
With parsedname AS
(
SELECT
PARSENAME(replace(name, ' ', '.'), 1) name1,
PARSENAME(replace(name, ' ', '.'), 2) name2,
PARSENAME(replace(name, ' ', '.'), 3) name3,
PARSENAME(replace(name, ' ', '.'), 4) name4
FROM yourtable
)
SELECT
CASE WHEN LEN(name4) = 1 THEN name4 ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 ELSE '' END as initials,
CASE WHEN LEN(name1) > 1 THEN name1
WHEN LEN(name2) > 1 THEN name2
WHEN LEN(name3) > 1 THEN name3
WHEN LEN(name4) > 1 THEN name4
END as surname
FROM parsedname
Here is a sqlfiddle of this in action
CREATE TABLE NAMES (name varchar(50));
INSERT INTO NAMES VALUES ('J S P Jansen');
INSERT INTO NAMES VALUES ('B D V Gerson');
INSERT INTO NAMES VALUES ('J D Timberland');
With parsedname AS
(
SELECT
PARSENAME(replace(name, ' ', '.'), 1) name1,
PARSENAME(replace(name, ' ', '.'), 2) name2,
PARSENAME(replace(name, ' ', '.'), 3) name3,
PARSENAME(replace(name, ' ', '.'), 4) name4
FROM names
)
SELECT
CASE WHEN LEN(name4) = 1 THEN name4 ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 ELSE '' END as initials,
CASE WHEN LEN(name1) > 1 THEN name1
WHEN LEN(name2) > 1 THEN name2
WHEN LEN(name3) > 1 THEN name3
WHEN LEN(name4) > 1 THEN name4
END as surname
FROM parsedname
+----------+------------+
| initials | surname |
+----------+------------+
| JSP | Jansen |
| BDV | Gerson |
| JD | Timberland |
+----------+------------+
If a space is needed in between those letters you can just flip around that CASE statement to something like:
TRIM(CASE WHEN LEN(name4) = 1 THEN name4 + ' ' ELSE '' END +
CASE WHEN LEN(name3) = 1 THEN name3 + ' ' ELSE '' END +
CASE WHEN LEN(name2) = 1 THEN name2 + ' ' ELSE '' END +
CASE WHEN LEN(name1) = 1 THEN name1 + ' ' ELSE '' END) as initials
SQLFiddle with the spaces
+----------+------------+
| initials | surname |
+----------+------------+
| J S P | Jansen |
| B D V | Gerson |
| J D | Timberland |
+----------+------------+
This one uses CHARINDEX and recursive CTE to extract space delimited substrings from name:
Find the substring before the first space
Feed the remaining substring to the same CTE
Once you have the substrings, it is only a matter of gluing them back:
WITH yourdata(FullName) AS (
SELECT 'Hanse J S P' UNION
SELECT 'Gerson B D V' UNION
SELECT 'J D Timberland' UNION
SELECT 'TEST 1 TEST 2 TEST 3'
), cte AS (
SELECT
FullName,
CASE WHEN Pos1 = 0 THEN FullName ELSE SUBSTRING(FullName, 1, Pos1 - 1) END AS LeftPart,
CASE WHEN Pos1 = 0 THEN Null ELSE SUBSTRING(FullName, Pos1 + 1, Pos2 - Pos1) END AS NextPart,
1 AS PartSort
FROM yourdata
CROSS APPLY (SELECT CHARINDEX(' ', FullName) AS Pos1, LEN(FullName) AS Pos2) AS CA
UNION ALL
SELECT
FullName,
CASE WHEN Pos1 = 0 THEN NextPart ELSE SUBSTRING(NextPart, 1, Pos1 - 1) END,
CASE WHEN Pos1 = 0 THEN Null ELSE SUBSTRING(NextPart, Pos1 + 1, Pos2 - Pos1) END,
PartSort + 1
FROM cte
CROSS APPLY (SELECT CHARINDEX(' ', NextPart) AS Pos1, LEN(NextPart) AS Pos2) AS CA
WHERE NextPart IS NOT NULL
)
SELECT yourdata.FullName, STUFF(CA1.XMLStr, 1, 1, '') AS Initials, STUFF(CA2.XMLStr, 1, 1, '') AS Names
FROM yourdata
CROSS APPLY (
SELECT CONCAT(' ', LeftPart)
FROM cte
WHERE FullName = yourdata.FullName AND LEN(LeftPart) = 1
ORDER BY PartSort
FOR XML PATH('')
) AS CA1(XMLStr)
CROSS APPLY (
SELECT CONCAT(' ', LeftPart)
FROM cte
WHERE FullName = yourdata.FullName AND LEN(LeftPart) > 1
ORDER BY PartSort
FOR XML PATH('')
) AS CA2(XMLStr)
Result:
| FullName | Initials | Names |
|----------------------|----------|----------------|
| Gerson#B#D#V | B D V | Gerson |
| Hanse#J#S#P | J S P | Hanse |
| J#D#Timberland | J D | Timberland |
| TEST#1#TEST#2#TEST#3 | 1 2 3 | TEST TEST TEST |
Similar to JNevil's answer (+1), but not limited to 4 tokens.
Example
Declare #YourTable table (SomeCol varchar(50))
Insert Into #YourTable values
('Hanse J S P')
,('Gerson B D V')
,('J D Timberland')
,('J D Timberland / J R R Tolkien')
Select A.SomeCol
,ColA = ltrim(
concat(IIF(len(Pos1)=1,' '+Pos1,null)
,IIF(len(Pos2)=1,' '+Pos2,null)
,IIF(len(Pos3)=1,' '+Pos3,null)
,IIF(len(Pos4)=1,' '+Pos4,null)
,IIF(len(Pos5)=1,' '+Pos5,null)
,IIF(len(Pos6)=1,' '+Pos6,null)
,IIF(len(Pos7)=1,' '+Pos7,null)
,IIF(len(Pos8)=1,' '+Pos8,null)
,IIF(len(Pos9)=1,' '+Pos9,null)
)
)
,ColB = ltrim(
concat(IIF(Pos1 not Like '[a-z]',' '+Pos1,null)
,IIF(Pos2 not Like '[a-z]',' '+Pos2,null)
,IIF(Pos3 not Like '[a-z]',' '+Pos3,null)
,IIF(Pos4 not Like '[a-z]',' '+Pos4,null)
,IIF(Pos5 not Like '[a-z]',' '+Pos5,null)
,IIF(Pos6 not Like '[a-z]',' '+Pos6,null)
,IIF(Pos7 not Like '[a-z]',' '+Pos7,null)
,IIF(Pos8 not Like '[a-z]',' '+Pos8,null)
,IIF(Pos9 not Like '[a-z]',' '+Pos9,null)
)
)
From #YourTable A
Cross Apply (
Select Pos1 = xDim.value('/x[1]','varchar(max)')
,Pos2 = xDim.value('/x[2]','varchar(max)')
,Pos3 = xDim.value('/x[3]','varchar(max)')
,Pos4 = xDim.value('/x[4]','varchar(max)')
,Pos5 = xDim.value('/x[5]','varchar(max)')
,Pos6 = xDim.value('/x[6]','varchar(max)')
,Pos7 = xDim.value('/x[7]','varchar(max)')
,Pos8 = xDim.value('/x[8]','varchar(max)')
,Pos9 = xDim.value('/x[9]','varchar(max)')
From (Select Cast('<x>' + replace(SomeCol,' ','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
SomeCol ColA ColB
Hanse J S P J S P Hanse
Gerson B D V B D V Gerson
J D Timberland J D Timberland
J D Timberland / J R R Tolkien J D / J R R Timberland / Tolkien
I used some built-in functions for this. The general idea is to use string_split to split the string into rows, use ROW_NUMBER to save the order according to length and the char(s) position in the string, then use FOR XML PATH() to concatenate from rows to a single column.
--Assume your data structure
DECLARE #temp TABLE (thestring varchar(1000))
INSERT INTO #temp VALUES
('Hanse J S P'), ('Gerson B D V'), ('J D Timberland')
;WITH CTE AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY thestring ORDER BY thestring, LEN(value) ASC, pos ASC) [order]
FROM (
SELECT *
, value AS [theval]
, CHARINDEX(CASE WHEN len(value) = 1 THEN ' ' + value ELSE value END, thestring) AS [pos]
FROM #temp CROSS APPLY string_split(thestring, ' ')
) AS dT
)
SELECT ( SELECT value + ' ' AS [text()]
FROM cte
WHERE cte.thestring = T.thestring
AND LEN(theval) = 1
FOR XML PATH('')
) AS [COL A]
,( SELECT value + ' ' AS [text()]
FROM cte
WHERE cte.thestring = T.thestring
AND LEN(theval) > 1
FOR XML PATH('')
) AS [COL B]
FROM #temp T
GROUP BY thestring
Produces output:
COL A COL B
----- -----
B D V Gerson
J S P Hanse
J D Timberland
Which version of SQL Server do you have? Is STRING_SPLIT() available?
If yes, split using the space as a delimiter, iterate through the resulting strings, evaluate their length and concatenate a result string with the string when said string is one character in length and is a letter.
Add a space before unless the result string is so far empty.
If STRING_SPLIT() is not available... Well... Here are a few solutions:
T-SQL split string based on delimiter
-- Addendum
To your second part of the question (which did not originally exist when I originally posted my reply) where you would like to isolate the non-initials part into a second column, I would basically separate two blocks of logic with two result strings based on the length of each element.
Note: this is not going to be very elegant in pre-2016 SQL Server and may even require a CURSOR (sigh)
I know I am going to be downvoted for mentioning a cursor.

What is the SQL code for aggregating values?

I have the following table:
GR WORD NO.
1 A 4
2 B 5
3 C 6
1 G 5
2 H 5
3 I 5
I would like to get the following table:
GR 4 5 6
1 1 1 0
2 0 2 0
3 0 1 1
For each GR column value I count the NO. values.
Here's a dynamic solution:
--Sample data
--CREATE TABLE tbl (GR int, WORD char(1), [NO] int)
--INSERT INTO tbl values
--(1,'A',4),
--(2,'B',5),
--(3,'C',6),
--(1,'G',5),
--(2,'H',5),
--(3,'I',5)
DECLARE #sql NVARCHAR(MAX)
SELECT #sql = '
SELECT *
FROM tbl
PIVOT(
COUNT(WORD) FOR [NO] IN (' +
(SELECT STUFF(
(
SELECT DISTINCT ',' + QUOTENAME(CAST([NO] AS VARCHAR(10)))
FROM tbl
FOR XML PATH('')
)
, 1, 1, ''))
+ ')
) p
'
EXEC sp_executesql #sql
This is a conditional aggregation
select
GR
,[4] = count(case when NO. = 4 then WORD end)
,[5] = count(case when NO. = 5 then WORD end)
,[6] = count(case when NO. = 6 then WORD end)
from YourTable
group by GR
Or a pivot
select *
from YourTable
pivot(
count(WORD) for NO. in ([4],[5],[6])
) p

How to create columns from a list of values

I have the following data:
Table 1
Row ID Value Cost
1 1 Priority 1 10,000
2 2 Priority 2 9,000
3 3 Priority 3 8,000
4 4 Priority 4 6,000
Table 2
Row Name Priority Cost
1 Jon 1 10,000
2 Bob 3 8,000
3 Dan 4 7,000
4 Steve 2 9,000
5 Bill 3 8,000
...
I want the table to look like this:
Table 3
Row Name Priotity 1 Priority 2 Priority 3 Priority 4
1 Jon 10,000
2 Bob 8,000
3 Dan 7,000
4 Steve 9,000
5 Bill 8,000
...
How can I create rows from Table 1 as columns, and fill in the output as shown in Table 3.
I am hoping this is not as basic as it sounds, but my SQL is terrible!
you can try this for dynamic pivot table.
DECLARE #columns VARCHAR(8000)
SELECT #columns = COALESCE(#columns + ',[' + cast(Value as varchar) + ']',
'[' + cast(Value as varchar)+ ']')
FROM Table1
GROUP BY Value
DECLARE #query VARCHAR(8000)
SET #query = 'with Priorites as
(select a.Name,b.Value,b.Cost from Table2 a left join Table1 b on a.Priority =b.id)
SELECT *
FROM Priorites
PIVOT
(
MAX(Cost)
FOR [Value]
IN (' + #columns + ')
)
AS p'
EXECUTE(#query)
Here is the link for more details http://www.tsqltutorials.com/pivot.php
Pivot is always useful in this sort of scenario, but if the actual data is as simple as it's in question (like there are only 4 unique Priority and/or only 1 Priority is assigned to a particular user),then you can achieve this task with following query:
select t.row,t.name
(case when t.priority = 1 then t.cost
else ' '
end
) as Priority1,
(case when t.priority = 2 then t.cost
else ' '
end
) as Priority2,
(case when t.priority = 3 then t.cost
else ' '
end
) as Priority3,
(case when t.priority = 4 then t.cost
else ' '
end
) as Priority4
From
(select Row,name,priority,cost
from Table2
group by name) t
group by t.name;

Combine Multiple records per Foreign Key

I have a somewhat tricky table structure that was inherited from way legacy.
I have a table with about 4 columns that matter.
DayNight Cust_Code Name Phone Counter
D ABC0111 Marty aaaaa 1
D ABC0111 John bbbbb 2
D ABC0111 Beth ccccc 3
N ABC0111 Sue ddddd 1
N ABC0111 Mary eeeee 2
I need to combine these 5 records into one row with the following stucture.
CustCode, Day1, Day2, Day3, Night1, Night2, Night3
ABC0111, Marty aaaaa, John bbbbb, Beth ccccc, Sue ddddd , Mary eeeee, null or ''
What I have tried
SELECT DISTINCT
x.NAME,
x.DAYNIGHT,
x.PHONE,
x.COUNTER,
cp.NAME,
cp.DAYNIGHT,
cp.COUNTER,
cp.PHONE,
cp.POSITION
FROM (
SELECT *
from table1 where
table1.DAYNIGHT LIKE 'N'
) x
join table1 t1 on t1.CUST_CODE = x.CUST_CODE
where cp.DAYNIGHT LIKE 'D'
I would be inclined to do this using conditional aggregation:
select CustCode,
max(case when DayNight = 'D' and Counter = 1 then Name + ' ' + Phone end) as Day1,
max(case when DayNight = 'D' and Counter = 2 then Name + ' ' + Phone end) as Day2,
max(case when DayNight = 'D' and Counter = 3 then Name + ' ' + Phone end) as Day3,
max(case when DayNight = 'N' and Counter = 1 then Name + ' ' + Phone end) as Night1,
max(case when DayNight = 'N' and Counter = 2 then Name + ' ' + Phone end) as Night2,
max(case when DayNight = 'N' and Counter = 3 then Name + ' ' + Phone end) as Night3
from table1
group by CustCode;
you can also use Pivot
SELECT *
FROM (SELECT cust_code,
NAME + ' ' + phone AS pp,
CASE WHEN daynight ='D' THEN 'Day' ELSE 'Night' END + CONVERT(VARCHAR(30), counter) AS rr
FROM tablename)a
PIVOT (Max(pp)
FOR rr IN([Day1],
[Day2],
[Day3],
[Night1],
[Night2],
[Night3])) pv