I am analysing data in a 'RawDataDescriptions' table with a 'description' field that was open end for users to input.
I'm looking for ways to broadly categorise the descriptions by phrases or a string of characters that frequently show up (including a count of how many times they occur).
I have no specific words or phrases to look for necessarily where i can use a 'like' statement, instead i'm looking for commonalities between the fields.
Whilst looking for this through other questions, i managed to find a query which i adjusted for my own table that gets out the most common word (Pasted below), but of course one word alone provides little -if any -insight into the descriptions.
Is it possible to make a query that would provide a count of phrases and not just single words? if so, what would the main components of it be?
WITH E1(N) AS
(
SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
) t(N)
),
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
SELECT
x.Item,
COUNT(*)
FROM RawDataDescriptions p
CROSS APPLY (
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(p.[Description], l.N1, l.L1)))
FROM (
SELECT s.N1,
L1 = ISNULL(NULLIF(CHARINDEX(' ',p.[Description],s.N1),0)-
s.N1,4000)
FROM(
SELECT 1 UNION ALL
SELECT t.N+1
FROM(
SELECT TOP (ISNULL(DATALENGTH(p.[Description])/2,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4
) t(N)
WHERE SUBSTRING(p.[Description] ,t.N,1) = ' '
) s(N1)
) l(N1, L1)
) x
WHERE x.item <> ''
GROUP BY x.Item
ORDER BY COUNT(*) DESC
*Edit - not doable. Alternative desired outcome:
Sample table
Id | Description
---+--------------------------
01 | Customer didn't like it
02 | Person liked it
03 | Person didn't like it
04 | Client didn't like it
05 | person liked it
#Parameter = 3
Desired result :
string | count
-----------------+-------
didn't like it | 3
Person liked it | 2
edit 2** the original question was doable - see answer
Here is one option. I have several concerns, like punctuation, control characters, and especially performance on large tables
Example
Declare #RawDataDescriptions Table ([Id] varchar(50),[Description] varchar(50))
Insert Into #RawDataDescriptions Values
('01','Customer didn''t like it')
,('02','Person liked it')
,('03','Person didn''t like it')
,('04','Client didn''t like it')
,('05','person liked it')
;with cte as (
Select Id
,B.*
From #RawDataDescriptions A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(A.[Description],' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) B
)
Select Phrase
,Cnt = count(*)
From cte A
Cross Apply (
Select Phrase = stuff((Select ' '+RetVal
From cte
Where ID = A.ID
and RetSeq between A.RetSeq and A.RetSeq+2
Order By RetSeq
For XML Path('')),1,1,'')
) B
Where Phrase like '% % %'
Group By Phrase
Having count(*)>1
Order By 2 Desc
Returns
Phrase Cnt
didn't like it 3
Person liked it 2
UPDATE - TVF - Better Performance
I decided that I may want to turn this into a a Table-Valued Function, and was shocked by the performance gains. For example, I have 130,000 descriptions from FRED (Federal Reserve Economic Data), and I was able to generate a list of common phrases (n words) in 9 seconds.
Usage
Select Phrase = B.RetVal
,Cnt = count(*)
From YourTable A
Cross Apply [dbo].[tvf-Str-Parse-Phrase](A.YourColumn,' ',4) B
Group By B.RetVal
Having count(*)>1
Order By 2 Desc
The TVF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse-Phrase] (#String varchar(max),#Delimeter varchar(25),#WordCnt int)
Returns Table
As
Return (
with cte as (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(#String,#Delimeter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
)
Select RetSeq = Row_Number() over (Order By (Select Null))
,B.RetVal
From cte A
Cross Apply (Select RetVal = stuff((Select ' '+RetVal From cte Where RetSeq between A.RetSeq and A.RetSeq+#WordCnt-1 For XML Path('')),1,1,'') ) B
Where B.RetVal like Replicate('% ',#WordCnt-1)+'%'
);
--Select * from [dbo].[tvf-Str-Parse-Phrase]('This is some text that I want parsed',' ',4)
You can enable Microsoft Full Text Index on the table & perform these kind of finding frequent words and characters analysis on the table.
sys.dm_fts_index_keywords_by_document
sys.dm_fts_index_keywords_by_property
sys.dm_fts_index_keywords_by_document
Related
This question already has answers here:
Turning a Comma Separated string into individual rows
(16 answers)
Closed 5 years ago.
Could some one please explain how to get values separated by comma into a new record in SQL Server based on below mentioned scenario.
I have a table like this
Column 1 | Column 2
----------+--------------
abc | 12345
bcd | 13455,45678
sdf | 78934,13345
I want the result to be in the following way
Column1 | Column2
--------+----------
abc | 12345
bcd | 13455
bcd | 45678
sdf | 78934
sdf | 13345
Jason's answer would be my first choice (1+)
However, in case you can't use (or want) a TVF, here is an in-line approach.
Example
Select A.[Column 1]
,[Column 2] = B.RetVal
From YourTable A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B2.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace([Column 2],',','</x><x>')+'</x>' as xml).query('.')) B1
Cross Apply x.nodes('x') AS B2(i)
) B
Returns
Column 1 Column 2
abc 12345
bcd 13455
bcd 45678
sdf 78934
sdf 13345
Start with a good tally based string splitting function like Jeff Moden's DelimitedSplit8K
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
GO
Now your problem becomes a very simple matter...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
Column1 CHAR(3),
Column2 VARCHAR(8000)
);
INSERT #temp (Column1,Column2) VALUES
('abc', '12345'),
('bcd', '13455,45678'),
('sdf', '78934,13345');
-- the actual query...
SELECT
t.Column1,
dsk.Item
FROM
#temp t
CROSS APPLY dbo.DelimitedSplit8K(t.Column2, ',') dsk;
Results...
Column1 Column2
------- --------
abc 12345
bcd 13455
bcd 45678
sdf 78934
sdf 13345
EDIT: The above makes the assumption that Column2 can have any number of elements in the CSV string. If the maximum number of elements is two, you can skip the splitter function and use something like the following...
SELECT
t.Column1,
v.Column2
FROM
#temp t
CROSS APPLY ( VALUES (NULLIF(CHARINDEX(',', t.Column2, 1), 0)) ) s (Split)
CROSS APPLY ( VALUES (1, LEFT(t.Column2, s.Split - 1)), (2, substring(t.Column2, ISNULL(s.Split, 0) + 1, 8000)) ) v (rn, Column2)
WHERE
v.Column2 IS NOT NULL
ORDER BY
v.rn;
You can use a Tally table based splitter like below:
select
column1,
split_values
from
(
select
t.column1,
SUBSTRING( t.column2, t1.N, ISNULL(NULLIF(CHARINDEX(',',t.column2,t1.N),0)-t1.N,8000)) as split_values
from #t t
join
(
select
t.column2,
1 as N
from #t t
UNION ALL
select
t.column2,
t1.N + 1 as N
from #t t
join
(
select
top 8000
row_number() over(order by (select NULL)) as N
from
sys.objects s1
cross join
sys.objects s2
) t1
on SUBSTRING(t.column2,t1.N,1) = ','
) t1
on t1.column2=t.column2
)a
order by column1
See live demo
The scenario is that we have two lists:
A: 23,45,g5,33
B: 11,12,45,g9
We want the fastest mechanism in SQL SERVER to see if any of the values from B is available in A, in this example 45 is in A so it must return true.
The solution should describe the way to store the lists (CSV, tables etc.) and the comparison mechanism.
Each list is relatively small (average 10 values in each) but the comparison is being made many many times (very few writes, many many reads)
If you are stuck with the delimited string, consider the following:
Example:
Declare #YourTable Table ([ColA] varchar(50),[ColB] varchar(50))
Insert Into #YourTable Values
('23,45,g5,33' ,'11,12,45,g9')
,('no,match' ,'found,here')
Select *
from #YourTable A
Cross Apply (
Select Match=IsNull(sum(1),0)
From [dbo].[udf-Str-Parse-8K](ColA,',') B1
Join [dbo].[udf-Str-Parse-8K](ColB,',') B2 on B1.RetVal=B2.RetVal
) B
Returns
ColA ColB Match
23,45,g5,33 11,12,45,g9 1
no,match found,here 0
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (#String varchar(max),#Delimiter varchar(25))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(#Delimiter) From cte2 t Where Substring(#String,t.N,DataLength(#Delimiter)) = #Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(#Delimiter,#String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = LTrim(RTrim(Substring(#String, A.N, A.L)))
From cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
I'm still confused as to the core idea... but here is a simple solution that's better than a comma separated list. Creating indexes would make it faster, of course. It's far quicker than looping.
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23'),
('A','45'),
('A','g5'),
('A','33'),
('B','11'),
('B','12'),
('B','45'),
('B','g9')
select distinct
base.v
--,base.*
--,compare.*
from
#table base
inner join
#table compare
on compare.v = base.v
and compare.id <> base.id
Split Way
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23,45,g5,33'),
('B','11,12,45,g9')
;with cte as(
select
t.ID
,base.Item
from
#table t
cross apply dbo.DelimitedSplit8K(t.v,',') base)
select
t.Item
from
cte t
inner join
cte x on
x.Item = t.Item
and x.id <> t.id
where
t.id = 'A'
USING THIS FUNCTION
CREATE FUNCTION [dbo].[DelimitedSplit8K] (#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
/* "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
enough to cover VARCHAR(8000)*/
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
GO
Based on the previous answer, I think it should look like this:
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23'),
('A','45'),
('A','g5'),
('A','33'),
('B','11'),
('B','12'),
('B','45'),
('B','g9')
if exists( select count(1)
from
#table base
inner join
#table compare
on compare.v = base.v
and base.id='A' and compare.id='B')
print 'true'
else
print 'false'
index on id, v or v, id depending on grow of your data
I have a question I have a simple table that looks like this when i do select all on it (one column with some rows)
| a, b, c | - 1st row
| b, d, d | - 2nd row
| d, e, f | - 3rd row
Now in trying to split those values by comma so each value would be in separate row something like
|a| - 1st row
|b| - 2nd row
|c| - 3rd row
|d| - 4th row
|e| - 5th row
|f| - 6th row
I was trying with something like:
select id,
case when CHARINDEX(', ', [value])>0
then SUBSTRING([value] , 1, CHARINDEX(', ',[value])-1) else [value] end firstname,
CASE WHEN CHARINDEX(', ', [value])>0
THEN SUBSTRING([value],CHARINDEX(', ',[value])+1,len([value])) ELSE NULL END as lastname from table
But it is not the way.
Without a UDF Parse/Split function
You didn't specify a Table or Column name so replace YourTable and YourList with your actual table and column names.
Select Distinct RetVal
,RowNr = Dense_Rank() over (Order by RetVal)
From YourTable A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ replace((Select A.YourList as [*] For XML Path('')),',','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) B
Returns
RetVal RowNr
a 1
b 2
c 3
d 4
e 5
f 6
Using a Split/Parse function (everyone should have a good one)
Select Distinct RetVal
,RowNr = Dense_Rank() over (Order by RetVal)
From YourTable A
Cross Apply (Select * from [dbo].[udf-Str-Parse-8K](A.YourList,',') ) B
The UDF -- if interested
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(#Delimiter) From cte2 t Where Substring(#String,t.N,DataLength(#Delimiter)) = #Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(#Delimiter,#String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = LTrim(RTrim(Substring(#String, A.N, A.L)))
From cte4 A
);
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
A recursive cte solution which finds all the ,s in the string and gets the substring between 2 ,s. (Assuming you are on a sql server version 2012+ for lead to work)
with cte as (
select val,charindex(',',','+val+',') as location from t
union all
select val,charindex(',',','+val+',',location+1) from cte
where charindex(',',','+val+',',location+1) > 0
)
,substrings as (select *,
substring(val,location,
lead(location,1) over(partition by val order by location)-location-1) as sub
from cte)
select distinct sub
from substrings
where sub is not null and sub <> ''
order by 1;
Sample Demo
1) The first cte gets all the , locations in the string recursively. , is appended at the beginning and end of the string to avoid missing the first substring before , and the last substring after ,.
2) For each string, the location of the next , is found using lead ordered by the location of ,.
3) Finally get all those substrings which are not null and are not empty strings.
You can do this by using cross apply and XML
select distinct
p.a.value('.','varchar(10)') col
from (
select
cast('<x>' + replace(col,', ','</x><x>') + '</x>' as XML) as x
from your_table) t
cross apply x.nodes ('/x') as p(a)
) t
I need to concatenate the name in a recursive cross join way. I don't know how to do this, I have tried a CTE using WITH RECURSIVE but no success.
I have a table like this:
group_id | name
---------------
13 | A
13 | B
19 | C
19 | D
31 | E
31 | F
31 | G
Desired output:
combinations
------------
ACE
ACF
ACG
ADE
ADF
ADG
BCE
BCF
BCG
BDE
BDF
BDG
Of course, the results should multiply if I add a 4th (or more) group.
Native Postgresql Syntax:
SqlFiddleDemo
WITH RECURSIVE cte1 AS
(
SELECT *, DENSE_RANK() OVER (ORDER BY group_id) AS rn
FROM mytable
),cte2 AS
(
SELECT
CAST(name AS VARCHAR(4000)) AS name,
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
CAST(CONCAT(c2.name,c1.name) AS VARCHAR(4000)) AS name
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT name as combinations
FROM cte2
WHERE LENGTH(name) = (SELECT MAX(rn) FROM cte1)
ORDER BY name;
Before:
I hope if you don't mind that I use SQL Server Syntax:
Sample:
CREATE TABLE #mytable(
ID INTEGER NOT NULL
,TYPE VARCHAR(MAX) NOT NULL
);
INSERT INTO #mytable(ID,TYPE) VALUES (13,'A');
INSERT INTO #mytable(ID,TYPE) VALUES (13,'B');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'C');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'D');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'E');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'F');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'G');
Main query:
WITH cte1 AS
(
SELECT *, rn = DENSE_RANK() OVER (ORDER BY ID)
FROM #mytable
),cte2 AS
(
SELECT
TYPE = CAST(TYPE AS VARCHAR(MAX)),
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
[Type] = CAST(CONCAT(c2.TYPE,c1.TYPE) AS VARCHAR(MAX))
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT *
FROM cte2
WHERE LEN(Type) = (SELECT MAX(rn) FROM cte1)
ORDER BY Type;
LiveDemo
I've assumed that the order of "cross join" is dependent on ascending ID.
cte1 generate DENSE_RANK() because your IDs contain gaps
cte2 recursive part with CONCAT
main query just filter out required length and sort string
The recursive query is a bit simpler in Postgres:
WITH RECURSIVE t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name
FROM tbl
)
, cte AS (
SELECT grp, name
FROM t
WHERE grp = 1
UNION ALL
SELECT t.grp, c.name || t.name
FROM cte c
JOIN t ON t.grp = c.grp + 1
)
SELECT name AS combi
FROM cte
WHERE grp = (SELECT max(grp) FROM t)
ORDER BY 1;
The basic logic is the same as in the SQL Server version provided by #lad2025, I added a couple of minor improvements.
Or you can use a simple version if your maximum number of groups is not too big (can't be very big, really, since the result set grows exponentially). For a maximum of 5 groups:
WITH t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name AS n
FROM tbl
)
SELECT concat(t1.n, t2.n, t3.n, t4.n, t5.n) AS combi
FROM (SELECT n FROM t WHERE grp = 1) t1
LEFT JOIN (SELECT n FROM t WHERE grp = 2) t2 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 3) t3 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 4) t4 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 5) t5 ON true
ORDER BY 1;
Probably faster for few groups. LEFT JOIN .. ON true makes this work even if higher levels are missing. concat() ignores NULL values. Test with EXPLAIN ANALYZE to be sure.
SQL Fiddle showing both.
I have table called PhoneNumbers with columns Phone and Range as below
here in the phone column i have a phone numbers and in range column i have a range of values i need the phone numbers to be included.For the first phone number 9125678463 I need to include the phone numbers till the range 9125678465 ie (9125678463,9125678464,9125678465).Similarly for other phone numbers too.here is the sample destination table should look like
How can i write the sql to get this?
Thanks in advance
I have a solution which goes a classic way BUT: it does not need recursions and it does not need any loops! And it works even if your range has length of 3 or 5, or whatever...
first i create a table with numbers (from 1 to 1 million in this example - you can adopt this in TOP () clause):
SELECT TOP (1000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO dbo.Numbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
CREATE UNIQUE CLUSTERED INDEX idx_numbers ON dbo.Numbers(n)
;
if you have that table it's pretty simple:
;WITH phonenumbers
AS
(
SELECT phone,
[range],
CAST(RIGHT(phone,LEN([range])) AS INT) AS number_to_increase,
CAST(LEFT(phone,LEN(phone)-LEN([range])) + REPLICATE('0',LEN([range])) AS BIGINT) AS base_number
FROM PhoneNumbers
)
SELECT p.base_number + num.n
FROM phonenumbers p
INNER JOIN dbo.Numbers num ON num.n BETWEEN p.number_to_increase AND p.[range]
You don't have to use a CTE like here - it's just to see a bit clearer what the idea behind this approach is. Maybe this suits for you
You can use CTE like this:
;WITH CTE (PhoneNumbers, [Range], i) AS (
SELECT CAST(Phone AS bigint), [Range], CAST(1 AS bigint)
FROM yourTable
UNION ALL
SELECT CAST(PhoneNumbers + 1 AS bigint), [Range], i + 1
FROM CTE
WHERE (PhoneNumbers + 1) % 10000 <= [Range]
)
SELECT PhoneNumbers
FROM CTE
ORDER BY PhoneNumbers
Here is one example of using a tally table. In my system I have that set of ctes as a view so I never have to write it again.
if OBJECT_ID('tempdb..#PhoneNumbers') is not null
drop table #PhoneNumbers;
create table #PhoneNumbers
(
Phone char(10)
, Range smallint
)
insert #PhoneNumbers
select 9135678463, 8465 union all
select 3279275678, 5679 union all
select 6372938103, 8105;
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select *
from #PhoneNumbers p
join cteTally t on t.N >= RIGHT(Phone, 4) and t.N <= Range
order by p.Phone
One more approach:
--Creating dummy table
select '9999991234' phone, '1237' rang into #tbl
union
select '9999995689', '5692'
SELECT [phone] low
,(CAST(9999995689/10000 AS bigINT) * 10000 + [Rang]) high
into #tbl1
FROM #tbl
--Creating 'numbrs' to have numbers between 0 & 9999 i.e. max range
select (rn-1)rn
into #numbrs
from
(select row_number() over (partition by null order by A.object_id) rn from sys.objects A
cross join sys.objects B)A
where rn between 0 and 9999
select (low + rn)phn from #numbrs cross join #tbl1
where (low + rn) between low and high