Use SQL Server recursive common table expression to get full path of all files in a folder(with subfolders) - sql

There is a SQL Server undocumented extended stored procedure called xp_dirtree, which can return all files and folders name (include subfolders) in a table format. To practice my understanding of recursive CTE, I decide to use it to get the full path of all files in a specified folder(include subfolders). However, after an hour of head scratch I still can't figure out the correct way to do it. The following code is what I have currently. Can this purpose be implemented with recursive CTE?
DECLARE #dir NVARCHAR(260) ;
SELECT #dir = N'c:\temp' ;
IF RIGHT(#dir, 1) <> '\'
SELECT #dir = #dir + '\' ;
IF OBJECT_ID('tempdb..#dirtree', 'U') IS NOT NULL
DROP TABLE #dirtree ;
CREATE TABLE #dirtree
(
id INT PRIMARY KEY
IDENTITY,
subdirectory NVARCHAR(260),
depth INT,
is_file BIT
) ;
INSERT INTO #dirtree
EXEC xp_dirtree
#dir,
0,
1 ;
SELECT *
FROM #dirtree ;
WITH files
AS (
SELECT id,
subdirectory,
depth,
is_file, subdirectory AS path
FROM #dirtree
WHERE is_file = 1
AND depth <> 1
UNION ALL
-- ...
)
SELECT *
FROM files ;
Suppose the xp_dirtree output is:
/*
id subdirectory depth is_file
--- -------------- ------- -------
1 abc.mdf 1 1
2 a 1 0
3 a.txt 2 1
4 b.txt 2 1
5 a.rb 1 1
6 aaa.flv 1 1
*/
What I want is:
/*
path
------------------
c:\temp\abc.mdf
c:\temp\a\a.txt
c:\temp\a\b.txt
c:\temp\a.rb
c:\temp\aaa.flv
*/

If I understand you correct you want something like this:
Test data:
CREATE TABLE #dirtree
(
id INT,
subdirectory NVARCHAR(260),
depth INT ,
is_file BIT,
parentId INT
)
INSERT INTO #dirtree(id,subdirectory,depth,is_file)
VALUES
(1,'abc.mdf',1,1),(2,'a',1,0),(3,'a.txt',2,1),
(4,'b.txt',2,1),(5,'a.rb',1,1),(6,'aaa.flv',1,1)
Updated the parent id
UPDATE #dirtree
SET ParentId = (SELECT MAX(Id) FROM #dirtree
WHERE Depth = T1.Depth - 1 AND Id < T1.Id)
FROM #dirtree T1
Query
;WITH CTE
AS
(
SELECT
t.id,
t.subdirectory,
t.depth,
t.is_file
FROM
#dirtree AS t
WHERE
is_file=0
UNION ALL
SELECT
t.id,
CAST(CTE.subdirectory+'\'+t.subdirectory AS NVARCHAR(260)),
t.depth,
t.is_file
FROM
#dirtree AS t
JOIN CTE
ON CTE.id=t.parentId
)
SELECT
'c:\temp\'+CTE.subdirectory AS [path]
FROM
CTE
WHERE
CTE.is_file=1
UNION ALL
SELECT
'c:\temp\'+t.subdirectory
FROM
#dirtree AS t
WHERE
is_file=1
AND NOT EXISTS
(
SELECT
NULL
FROM
CTE
WHERE
CTE.id=t.id
)
Result
path
---------------
c:\temp\a\a.txt
c:\temp\a\b.txt
c:\temp\abc.mdf
c:\temp\a.rb
c:\temp\aaa.flv
EDIT
Changed the tables used in the example to more look like the ones in your question

/*
traverse directory tree and get back complete list of filenames w/ their paths
*/
declare
#dirRoot varchar(255)='\\yourdir'
declare
#sqlCmd varchar(255),
#idx int,
#dirSearch varchar(255)
declare #directories table(directoryName varchar(255), depth int, isfile int, rootName varchar(255),rowid int identity(1,1))
insert into #directories(directoryName, depth,isFile)
exec master.sys.xp_dirtree #dirRoot,1,1
if not exists(select * from #directories)
return
update #directories
set rootName = #dirRoot + '\' + directoryName
-- traverse from root directory
select #idx=min(rowId) from #directories
-- forever always ends too soon
while 1=1
begin
select #dirSearch = rootName
from #directories
where rowid=#idx
insert into #directories(directoryName, depth,isfile)
exec master.sys.xp_dirtree #dirSearch,1,1
update #directories
set rootName = #dirSearch + '\' + directoryName
where rootName is null
set #idx = #idx + 1
-- you see what i mean don't you?
if #idx > (select max(rowid) from #directories) or #idx is null
break
end
select
case isFile when 0 then 'Directory' else 'File' end [attribute],
rootName [filePath]
from #directories
order by filePath

Almost nine years later, and unfortunately there's no out-of-the-box solution that I know of. So I'm still looking into xp_dirtree and need a solution to this.
I gave Arion's answer a try and found that it was producing results. However, with a large file system of more than 11K objects, it was running very slowly. I saw that it was very slow even from the get go with:
UPDATE #dirtree
SET ParentId = (SELECT MAX(Id) FROM #dirtree
WHERE Depth = T1.Depth - 1 AND Id < T1.Id)
FROM #dirtree T1
Although this isn't an islands-and-gaps problem, it has some similarities and the kind of thinking with those problems has helped me here. The code at the end is my stored procedure. The sections have some comments as to what the code is doing.
You would use it like this:
exec directory
#root = 'c:\somepath',
#depth = 3,
#outputTable = '##results';
select * from ##results;
Which results in output like:
+---------------------------------+------------+------------+-----------+--------+-----------+----------+
| path | name | nameNoExt | extension | isFile | runtimeId | parentId |
+---------------------------------+------------+------------+-----------+--------+-----------+----------+
| c:\somePath\DataMovers | DataMovers | DataMovers | NULL | 0 | 4854 | NULL |
| c:\somePath\DataMovers\main.ps1 | main.ps1 | main | ps1 | 1 | 4859 | 4854 |
+---------------------------------+------------+------------+-----------+--------+-----------+----------+
I had to build it this way because internally it takes xp_dirtree output and loads it into a temp table. This prevents the ability to take the results of the proc and load them into a table outside of the proc because of the ban on nested insert-exec statements. Don't expose #outputTable to untrusted users because it is susceptible to sql-injection. Of course re-work the proc to avoid this however it meets your needs.
/*
Summary: Lists file directory contents.
Remarks: - stackoverflow.com/q/10298910
- This assumes that the tree is put in order where
subfolders are listed right under their parent
folders. If this changes in the future, a
different logic will need to be implemented.
Example: exec directory 'c:\somepath', 3, '##results';
select * from ##results;
*/
create procedure directory
#root nvarchar(255),
#depth int,
#outputTable sysname
as
-- initializations
if #outputTable is null or not (left(#outputTable,2) = '##') or charindex(' ', #outputTable) > 0
throw 50000, '#outputTable must be a global temp table with no spaces in the name.', 1;
if exists (select 0 from tempdb.information_schema.tables where table_name = #outputTable)
begin
declare #msg nvarchar(255) = '''tempdb.dbo.' + #outputTable + ''' already exists.';
throw 50000, #msg, 1;
end
-- fetch the tree (it doesn't have full path names)
drop table if exists #dir;
create table #dir (
id int identity(1,1),
parentId int null,
path nvarchar(4000),
depth int,
isFile bit,
isLeader int default(0),
groupId int
)
insert #dir (path, depth, isFile)
exec xp_dirtree #root, #depth, 1;
-- identify the group leaders (based on a change in depth)
update d
set isLeader = _isLeader
from (
select id,
isLeader,
_isLeader = iif(depth - lag(depth) over(order by id) = 0, 0, 1)
from #dir
) d;
-- find the parents for each leader (subsetting just for leaders improves efficiency)
update #dir
set parentId = (
select max(sub.id)
from #dir sub
where sub.depth = d.depth - 1
and sub.id < d.id
and d.isLeader = 1
)
from #dir d
where d.isLeader = 1;
-- assign an identifier to each group (groups being objects that are 'siblings' of the leader)
update d
set groupId = _groupId
from (
select *, _groupId = sum(isLeader) over(order by id)
from #dir
) d;
-- set the parent id for each item based on the leader's parent id
update d
set d.parentId = leads.parentId
from #dir d
join #dir leads
on d.groupId = leads.groupId
and leads.parentId is not null;
-- convert the path names to full path names and calculate path parts
drop table if exists #pathBuilderResults;
with pathBuilder as (
select id, parentId, origId = id, path, pseudoDepth = depth
from #dir
union all
select par.id,
par.parentId,
pb.origId,
path = par.path + '\' + pb.path,
pseudoDepth = pb.pseudoDepth - 1
from pathBuilder pb
join #dir par on pb.parentId = par.id
where pb.pseudoDepth >= 2
)
select path = #root + '\' + pb.path,
name = d.path,
nameNoExt = iif(ext.value is null, d.path, left(d.path, len(d.path) - len(ext.value) - 1)),
extension = ext.value,
d.isFile,
runtimeId = pb.origId,
parentId = d.parentId
into #pathBuilderResults
from pathBuilder pb
join #dir d on pb.origId = d.id
cross apply (select value = charindex('.', d.path)) dotPos
cross apply (select value = right(d.path, len(d.path) - dotPos.value)) pseudoExt
cross apply (select value = iif(d.isFile = 1 and dotPos.value > 0, pseudoExt.value, null)) ext
where pb.pseudoDepth = 1
order by pb.origId;
-- terminate
declare #sql nvarchar(max) = 'select * into ' + #outputTable + ' from #pathBuilderResults';
exec (#sql);

Create and use sp_dirtree #Path = 'c:\', #FileOnly = 1
create or alter proc sp_dirtree
#Path nvarchar(4000)
, #Depth int = 0
, #FileOnly bit = 0
as -- Dir tree with fullpath. sergkog 2018-11-14
set nocount on
declare #Sep nchar(1) = iif(patindex('%/%',#Path) > 0,'/','\') -- windows or posix
set #Path += iif(right(#Path,1) <> #Sep, #Sep,'')
declare #dirtree table(
Id int identity(1,1)
, subdirectory nvarchar(4000) not null
, depth int not null
, is_file bit not null
, parentId int null
)
insert #dirtree(subdirectory, depth, is_file)
exec xp_dirtree #Path, #Depth, 1
update #dirtree
set ParentId = (select max(id) from #dirtree where Depth = t1.Depth - 1 and Id < t1.Id)
from #dirtree t1
;with cte as(
select t.*
from #dirtree t
where is_file=0
union all
select t.id
, convert(nvarchar(4000), cte.subdirectory+ #Sep + t.subdirectory)
, t.depth
, t.is_file
, t.parentId
from
#dirtree t join cte on cte.id = t.parentId
)
select #Path + cte.subdirectory as FullPath
, cte.is_file as IsFile
from cte
where cte.is_file = iif(#FileOnly = 1, 1,cte.is_file)
union all
select #Path + t.subdirectory
, t.is_file
from #dirtree t
where
t.is_file = iif(#FileOnly = 1, 1,t.is_file)
and not exists(select null from cte
where cte.id=t.id
)
order by FullPath, IsFile
go

Related

How to traverse a path in a table with id & parentId?

Suppose I have a table like:
id | parentId | name
1 NULL A
2 1 B
3 2 C
4 1 E
5 3 E
I am trying to write a scalar function I can call as:
SELECT dbo.GetId('A/B/C/E') which would produce "5" if we use the above reference table. The function would do the following steps:
Find the ID of 'A' which is 1
Find the ID of 'B' whose parent is 'A' (id:1) which would be id:2
Find the ID of 'C' whose parent is 'B' (id:2) which would be id:3
Find the ID of 'E' whose parent is 'C' (id:3) which would be id:5
I was trying to do it with a WHILE loop but it was getting very complicated very fast... Just thinking there must be a simple way to do this.
CTE version is not optimized way to get the hierarchical data. (Refer MSDN Blog)
You should do something like as mentioned below. It's tested for 10 millions of records and is 300 times faster than CTE version :)
Declare #table table(Id int, ParentId int, Name varchar(10))
insert into #table values(1,NULL,'A')
insert into #table values(2,1,'B')
insert into #table values(3,2,'C')
insert into #table values(4,1,'E')
insert into #table values(5,3,'E')
DECLARE #Counter tinyint = 0;
IF OBJECT_ID('TEMPDB..#ITEM') IS NOT NULL
DROP TABLE #ITEM
CREATE TABLE #ITEM
(
ID int not null
,ParentID int
,Name VARCHAR(MAX)
,lvl int not null
,RootID int not null
)
INSERT INTO #ITEM
(ID,lvl,ParentID,Name,RootID)
SELECT Id
,0 AS LVL
,ParentId
,Name
,Id AS RootID
FROM
#table
WHERE
ISNULL(ParentId,-1) = -1
WHILE ##ROWCOUNT > 0
BEGIN
SET #Counter += 1
insert into #ITEM(ID,ParentId,Name,lvl,RootID)
SELECT ci.ID
,ci.ParentId
,ci.Name
,#Counter as cntr
,ch.RootID
FROM
#table AS ci
INNER JOIN
#ITEM AS pr
ON
CI.ParentId=PR.ID
LEFT OUTER JOIN
#ITEM AS ch
ON ch.ID=pr.ID
WHERE
ISNULL(ci.ParentId, -1) > 0
AND PR.lvl = #Counter - 1
END
select * from #ITEM
Here is an example of functional rcte based on your sample data and requirements as I understand them.
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something
(
id int
, parentId int
, name char(1)
)
insert #Something
select 1, NULL, 'A' union all
select 2, 1, 'B' union all
select 3, 2, 'C' union all
select 4, 1, 'E' union all
select 5, 3, 'E'
declare #Root char(1) = 'A';
with MyData as
(
select *
from #Something
where name = #Root
union all
select s.*
from #Something s
join MyData d on d.id = s.parentId
)
select *
from MyData
Note that if you change the value of your variable the output will adjust. I would make this an inline table valued function.
I think I have it based on #SeanLange's recommendation to use a recursive CTE (above in the comments):
CREATE FUNCTION GetID
(
#path VARCHAR(MAX)
)
/* TEST:
SELECT dbo.GetID('A/B/C/E')
*/
RETURNS INT
AS
BEGIN
DECLARE #ID INT;
WITH cte AS (
SELECT p.id ,
p.parentId ,
CAST(p.name AS VARCHAR(MAX)) AS name
FROM tblT p
WHERE parentId IS NULL
UNION ALL
SELECT p.id ,
p.parentId ,
CAST(pcte.name + '/' + p.name AS VARCHAR(MAX)) AS name
FROM dbo.tblT p
INNER JOIN cte pcte ON
pcte.id = p.parentId
)
SELECT #ID = id
FROM cte
WHERE name = #path
RETURN #ID
END

SQL Server generate script for views and how to decide order?

I am generating the script for views using SQL Server built-in feature (Task -> Generate script). I am creating separate file for each object (of view). I have say around 400 files (containing SQL script of all views) to be executed on another database and to do that automatically I have created BAT file which takes care of that.
There are views which are dependent on other views and due to that many views failed to execute. Is there any way by which we can set order of execution and get rid off the failure ?
Any pointers would be a great help.
Please let me know if you need more details.
Thanks
Jony
Could you try this query? You can execute the create scripts in order to "gen" (generation).
DECLARE #cnt int = 0, #index int;
DECLARE #viewNames table (number int, name varchar(max))
DECLARE #viewGen table (id uniqueidentifier, gen int, name varchar(max), parentId uniqueidentifier)
INSERT INTO #viewNames
SELECT ROW_NUMBER() OVER(ORDER BY object_Id), name FROM sys.views
SELECT #cnt = COUNT(*) FROM #viewNames
SET #index = #cnt;
WHILE ((SELECT COUNT(*) FROM #viewGen) < #cnt)
BEGIN
DECLARE #viewName varchar(200)
SELECT #viewName = name FROM #viewNames WHERE number = #index;
DECLARE #depCnt int = 0;
SELECT #depCnt = COUNT(*) FROM sys.dm_sql_referencing_entities ('dbo.' + #viewName, 'OBJECT')
IF (#depCnt = 0)
BEGIN
INSERT INTO #viewGen SELECT NEWID(), 0, name, null FROM #viewNames WHERE number = #index;
END
ELSE
BEGIN
IF EXISTS(SELECT * FROM sys.dm_sql_referencing_entities ('dbo.' + #viewName, 'OBJECT') AS r INNER JOIN #viewGen AS v ON r.referencing_entity_name = v.name)
BEGIN
DECLARE #parentId uniqueidentifier = NEWID();
INSERT INTO #viewGen SELECT #parentId, 0, name, null FROM #viewNames WHERE number = #index;
UPDATE v
SET v.gen = (v.gen + 1), parentId = #parentId
FROM #viewGen AS v
INNER JOIN sys.dm_sql_referencing_entities('dbo.' + #viewName, 'OBJECT') AS r ON r.referencing_entity_name = v.name
UPDATE #viewGen
SET gen = gen + 1
WHERE Id = parentId OR parentId IN (SELECT Id FROM #viewGen WHERE parentId = parentId)
END
END
SET #index = #index - 1
IF (#index < 0) BEGIN SET #index = #cnt; END
END
SELECT gen as [order], name FROM #viewGen ORDER BY gen
Expecting result:
order name
0 vw_Ancient
1 vw_Child1
1 vw_Child2
2 vw_GrandChild

Convert Comma Delimited String to bigint in SQL Server

I have a varchar string of delimited numbers separated by commas that I want to use in my SQL script but I need to compare with a bigint field in the database. Need to know to convert it:
DECLARE #RegionID varchar(200) = null
SET #RegionID = '853,834,16,467,841,460,495,44,859,457,437,836,864,434,86,838,458,472,832,433,142,154,159,839,831,469,442,275,840,299,446,220,300,225,227,447,301,450,230,837,441,835,302,477,855,411,395,279,303'
SELECT a.ClassAdID, -- 1
a.AdURL, -- 2
a.AdTitle, -- 3
a.ClassAdCatID, -- 4
b.ClassAdCat, -- 5
a.Img1, -- 6
a.AdText, -- 7
a.MemberID, -- 9
a.Viewed, -- 10
c.Domain, -- 11
a.CreateDate -- 12
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCAtID = a.ClassAdCAtID
INNER JOIN Region c ON c.RegionID = a.RegionID
AND a.PostType = 'CPN'
AND DATEDIFF(d, GETDATE(), ExpirationDate) >= 0
AND a.RegionID IN (#RegionID)
AND Viewable = 'Y'
This fails with the following error:
Error converting data type varchar to bigint.
RegionID In the database is a bigint field.. need to convert the varchar to bigint.. any ideas..?
Many thanks in advance,
neojakey
create this function:
CREATE function [dbo].[f_split]
(
#param nvarchar(max),
#delimiter char(1)
)
returns #t table (val nvarchar(max), seq int)
as
begin
set #param += #delimiter
;with a as
(
select cast(1 as bigint) f, charindex(#delimiter, #param) t, 1 seq
union all
select t + 1, charindex(#delimiter, #param, t + 1), seq + 1
from a
where charindex(#delimiter, #param, t + 1) > 0
)
insert #t
select substring(#param, f, t - f), seq from a
option (maxrecursion 0)
return
end
change this part:
AND a.RegionID IN (select val from dbo.f_split(#regionID, ','))
Change this for better overall performance:
AND DATEDIFF(d, 0, GETDATE()) <= ExpirationDate
Your query does not know that those are separate values, you can use dynamic sql for this:
DECLARE #RegionID varchar(200) = null
SET #RegionID = '853,834,16,467,841,460,495,44,859,457,437,836,864,434,86,838,458,472,832,433,142,154,159,839,831,469,442,275,840,299,446,220,300,225,227,447,301,450,230,837,441,835,302,477,855,411,395,279,303'
declare #sql nvarchar(Max)
set #sql = 'SELECT a.ClassAdID, -- 1
a.AdURL, -- 2
a.AdTitle, -- 3
a.ClassAdCatID, -- 4
b.ClassAdCat, -- 5
a.Img1, -- 6
a.AdText, -- 7
a.MemberID, -- 9
a.Viewed, -- 10
c.Domain, -- 11
a.CreateDate -- 12
FROM ClassAd a
INNER JOIN ClassAdCat b ON b.ClassAdCAtID = a.ClassAdCAtID
INNER JOIN Region c ON c.RegionID = a.RegionID
AND a.PostType = ''CPN''
AND DATEDIFF(d, GETDATE(), ExpirationDate) >= 0
AND a.RegionID IN ('+#RegionID+')
AND Viewable = ''Y'''
exec sp_executesql #sql
I use this apporach sometimes and find it very good.
It transfors your comma-separated string into an AUX table (called #ARRAY) and then query the main table based on the AUX table:
declare #RegionID varchar(50)
SET #RegionID = '853,834,16,467,841,460,495,44,859,457,437,836,864,434,86,838,458,472,832,433,142,154,159,839,831,469,442,275,840,299,446,220,300,225,227,447,301,450,230,837,441,835,302,477,855,411,395,279,303'
declare #S varchar(20)
if LEN(#RegionID) > 0 SET #RegionID = #RegionID + ','
CREATE TABLE #ARRAY(region_ID VARCHAR(20))
WHILE LEN(#RegionID) > 0 BEGIN
SELECT #S = LTRIM(SUBSTRING(#RegionID, 1, CHARINDEX(',', #RegionID) - 1))
INSERT INTO #ARRAY (region_ID) VALUES (#S)
SELECT #RegionID = SUBSTRING(#RegionID, CHARINDEX(',', #RegionID) + 1, LEN(#RegionID))
END
select * from your_table
where regionID IN (select region_ID from #ARRAY)
It avoids you from ahving to concatenate the query string and then use EXEC to execute it, which I dont think it is a very good approach.
if you need to run the code twice you will need to drop the temp table
I think the answer should be kept simple.
Try using CHARINDEX like this:
DECLARE #RegionID VARCHAR(200) = NULL
SET #RegionID =
'853,834,16,467,841,460,495,44,859,457,437,836,864,434,86,838,458,472,832,433,142,154,159,839,831,469,442,275,840,299,446,220,300,225,227,447,301,450,230,837,441,835,302,477,855,411,395,279,303'
SELECT 1
WHERE Charindex('834', #RegionID) > 0
SELECT 1
WHERE Charindex('999', #RegionID) > 0
When CHARINDEX finds the value in the large string variable, it will return it's position, otherwise it return 0.
Use this as a search tool.
The easiest way to change this query is to replace the IN function with a string function. Here is what I consider the safest approach using LIKE (which is portable among databases):
AND ','+#RegionID+',' like '%,'+cast(a.RegionID as varchar(255))+',%'
Or CHARINDEX:
AND charindex(','+cast(a.RegionID as varchar(255))+',', ','+#RegionID+',') > 0
However, if you are explicitly putting the list in your code, why not use a temporary table?
declare #RegionIds table (RegionId int);
insert into #RegionIds
select 853 union all
select 834 union all
. . .
select 303
Then you can use the table in the IN clause:
AND a.RegionId in (select RegionId from #RegionIds)
or in a JOIN clause.
I like Diego's answer some, but I think my modification is a little better because you are declaring a table variable and not creating an actual table. I know the "in" statement can be a little slow, so I did an inner join since I needed some info from the Company table anyway.
declare #companyIdList varchar(1000)
set #companyIdList = '1,2,3'
if LEN(#companyIdList) > 0 SET #companyIdList = #companyIdList + ','
declare #CompanyIds TABLE (CompanyId bigint)
declare #S varchar(20)
WHILE LEN(#companyIdList) > 0 BEGIN
SELECT #S = LTRIM(SUBSTRING(#companyIdList, 1, CHARINDEX(',', #companyIdList) - 1))
INSERT INTO #CompanyIds (CompanyId) VALUES (#S)
SELECT #companyIdList = SUBSTRING(#companyIdList, CHARINDEX(',', #companyIdList) + 1, LEN(#companyIdList))
END
select d.Id, d.Name, c.Id, c.Name
from [Division] d
inner join [Company] c on d.CompanyId = c.Id
inner join #CompanyIds cids on c.Id = cids.CompanyId

What is the most efficient way to concatenate a string from all parent rows using T-SQL?

I have a table that has a self-referencing foreign key that represents its parent row. To illustrate the problem in its simplest form we'll use this table:
CREATE TABLE Folder(
id int IDENTITY(1,1) NOT NULL, --PK
parent_id int NULL, --FK
folder_name varchar(255) NOT NULL)
I want to create a scalar-valued function that would return a concatenated string of the folder's name and all its parent folder names all the way to the root folder, which would be designated by a null parent_id value.
My current solution is a procedural approach which I assume is not ideal. Here is what I'm doing:
CREATE FUNCTION dbo.GetEntireLineage
(#folderId INT)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #lineage VARCHAR(MAX)
DECLARE #parentFolderId INT
SELECT #lineage = folder_name, #parentFolderId = parent_id FROM Folder WHERE id = #folderId
WHILE NOT #parentFolderId IS NULL
BEGIN
SET #parentFolderId = (SELECT parent_id FROM Folder WHERE parent_id = #parentFolderId)
SET #lineage = (SELECT #lineage + '-' + (SELECT folder_name FROM Folder WHERE parent_id = #parentFolderId))
END
RETURN #lineage
END
Is there a more ideal way to do this? I'm an experienced programmer but T-SQL not a familiar world to me and I know these problems generally require a different approach due to the nature of set based data. Any help finding a solution or any other tips and tricks to deal with T-SQL would be much appreciated.
To know for sure about performance you need to test. I have done some testing using your version (slightly modified) and a recursive CTE versions suggested by others.
I used your sample table with 2048 rows all in one single folder hierarchy so when passing 2048 as parameter to the function there are 2048 concatenations done.
The loop version:
create function GetEntireLineage1 (#id int)
returns varchar(max)
as
begin
declare #ret varchar(max)
select #ret = folder_name,
#id = parent_id
from Folder
where id = #id
while ##rowcount > 0
begin
select #ret = #ret + '-' + folder_name,
#id = parent_id
from Folder
where id = #id
end
return #ret
end
Statistics:
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 122 ms.
The recursive CTE version:
create function GetEntireLineage2(#id int)
returns varchar(max)
begin
declare #ret varchar(max);
with cte(id, name) as
(
select f.parent_id,
cast(f.folder_name as varchar(max))
from Folder as f
where f.id = #id
union all
select f.parent_id,
c.name + '-' + f.folder_name
from Folder as f
inner join cte as c
on f.id = c.id
)
select #ret = name
from cte
where id is null
option (maxrecursion 0)
return #ret
end
Statistics:
SQL Server Execution Times:
CPU time = 187 ms, elapsed time = 183 ms.
So between these two it is the loop version that is more efficient, at least on my test data. You need to test on your actual data to be sure.
Edit
Recursive CTE with for xml path('') trick.
create function [dbo].[GetEntireLineage4](#id int)
returns varchar(max)
begin
declare #ret varchar(max) = '';
with cte(id, lvl, name) as
(
select f.parent_id,
1,
f.folder_name
from Folder as f
where f.id = #id
union all
select f.parent_id,
lvl + 1,
f.folder_name
from Folder as f
inner join cte as c
on f.id = c.id
)
select #ret = (select '-'+name
from cte
order by lvl
for xml path(''), type).value('.', 'varchar(max)')
option (maxrecursion 0)
return stuff(#ret, 1, 1, '')
end
Statistics:
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 37 ms.
use a recursive query to traverse the parents and then this method for concatenating into a string.
A hierarchyid is often overkill unless you have a really deep hierarchy or very large sets of data that can take advantage of the indexing. This is as fast as you can get without changing your schema.
with recursiveCTE (parent_id,concatenated_name) as (
select parent_id,folder_name
from folder
union all
select f.parent_id,r.concatenated_name +f.folder_name
from folder f
inner join recursiveCTE r on r.parent_id = f.id
)
select folder_name from recursiveCTE
This works for you:
with cte (Parent_id, Path) as
(
select Parent_Id,Folder_Name
from folder
union all
select f.Parent_Id,r.Path + '\' + f.Folder_Name
from Folder as f
inner join cte as c on c.Parent_Id = f.Id
)
select Folder_Name from cte

SQL query to find Missing sequence numbers

I have a column named sequence. The data in this column looks like 1, 2, 3, 4, 5, 7, 9, 10, 15.
I need to find the missing sequence numbers from the table. What SQL query will find the missing sequence numbers from my table? I am expecting results like
Missing numbers
---------------
6
8
11
12
13
14
I am using only one table. I tried the query below, but am not getting the results I want.
select de.sequence + 1 as sequence from dataentry as de
left outer join dataentry as de1 on de.sequence + 1 = de1.sequence
where de1.sequence is null order by sequence asc;
How about something like:
select (select isnull(max(val)+1,1) from mydata where val < md.val) as [from],
md.val - 1 as [to]
from mydata md
where md.val != 1 and not exists (
select 1 from mydata md2 where md2.val = md.val - 1)
giving summarised results:
from to
----------- -----------
6 6
8 8
11 14
I know this is a very old post but I wanted to add this solution that I found HERE so that I can find it easier:
WITH Missing (missnum, maxid)
AS
(
SELECT 1 AS missnum, (select max(id) from #TT)
UNION ALL
SELECT missnum + 1, maxid FROM Missing
WHERE missnum < maxid
)
SELECT missnum
FROM Missing
LEFT OUTER JOIN #TT tt on tt.id = Missing.missnum
WHERE tt.id is NULL
OPTION (MAXRECURSION 0);
Try with this:
declare #min int
declare #max int
select #min = min(seq_field), #max = max(seq_field) from [Table]
create table #tmp (Field_No int)
while #min <= #max
begin
if not exists (select * from [Table] where seq_field = #min)
insert into #tmp (Field_No) values (#min)
set #min = #min + 1
end
select * from #tmp
drop table #tmp
The best solutions are those that use a temporary table with the sequence. Assuming you build such a table, LEFT JOIN with NULL check should do the job:
SELECT #sequence.value
FROM #sequence
LEFT JOIN MyTable ON #sequence.value = MyTable.value
WHERE MyTable.value IS NULL
But if you have to repeat this operation often (and more then for 1 sequence in the database), I would create a "static-data" table and have a script to populate it to the MAX(value) of all the tables you need.
SELECT CASE WHEN MAX(column_name) = COUNT(*)
THEN CAST(NULL AS INTEGER)
-- THEN MAX(column_name) + 1 as other option
WHEN MIN(column_name) > 1
THEN 1
WHEN MAX(column_name) <> COUNT(*)
THEN (SELECT MIN(column_name)+1
FROM table_name
WHERE (column_name+ 1)
NOT IN (SELECT column_name FROM table_name))
ELSE NULL END
FROM table_name;
Here is a script to create a stored procedure that returns missing sequential numbers for a given date range.
CREATE PROCEDURE dbo.ddc_RolledBackOrders
-- Add the parameters for the stored procedure here
#StartDate DATETIME ,
#EndDate DATETIME
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Min BIGINT
DECLARE #Max BIGINT
DECLARE #i BIGINT
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END
CREATE TABLE #TempTable
(
TempOrderNumber BIGINT
)
SELECT #Min = ( SELECT MIN(ordernumber)
FROM dbo.Orders WITH ( NOLOCK )
WHERE OrderDate BETWEEN #StartDate AND #EndDate)
SELECT #Max = ( SELECT MAX(ordernumber)
FROM dbo.Orders WITH ( NOLOCK )
WHERE OrderDate BETWEEN #StartDate AND #EndDate)
SELECT #i = #Min
WHILE #i <= #Max
BEGIN
INSERT INTO #TempTable
SELECT #i
SELECT #i = #i + 1
END
SELECT TempOrderNumber
FROM #TempTable
LEFT JOIN dbo.orders o WITH ( NOLOCK ) ON tempordernumber = o.OrderNumber
WHERE o.OrderNumber IS NULL
END
GO
Aren't all given solutions way too complex?
wouldn't this be much simpler:
SELECT *
FROM (SELECT row_number() over(order by number) as N from master..spt_values) t
where N not in (select 1 as sequence union
select 2 union
select 3 union
select 4 union
select 5 union
select 7 union
select 10 union
select 15
)
This is my interpretation of this issue, placing the contents in a Table variable that I can easily access in the remainder of my script.
DECLARE #IDS TABLE (row int, ID int)
INSERT INTO #IDS
select ROW_NUMBER() OVER (ORDER BY x.[Referred_ID]), x.[Referred_ID] FROM
(SELECT b.[Referred_ID] + 1 [Referred_ID]
FROM [catalog].[dbo].[Referrals] b) as x
LEFT JOIN [catalog].[dbo].[Referrals] a ON x.[Referred_ID] = a.[Referred_ID]
WHERE a.[Referred_ID] IS NULL
select * from #IDS
Just for fun, I decided to post my solution.
I had an identity column in my table and I wanted to find missing invoice numbers.
I reviewed all the examples I could find but they were not elegant enough.
CREATE VIEW EENSkippedInvoicveNo
AS
SELECT CASE WHEN MSCNT = 1 THEN CAST(MSFIRST AS VARCHAR (8)) ELSE
CAST(MSFIRST AS VARCHAR (8)) + ' - ' + CAST(MSlAST AS VARCHAR (8)) END AS MISSING,
MSCNT, INV_DT FROM (
select invNo+1 as Msfirst, inv_no -1 as Mslast, inv_no - invno -1 as msCnt, dbo.fmtdt(Inv_dt) AS INV_dT
from (select inv_no as invNo, a4glidentity + 1 as a4glid
from oehdrhst_sql where inv_dt > 20140401) as s
inner Join oehdrhst_sql as h
on a4glid = a4glidentity
where inv_no - invno <> 1
) AS SS
DECLARE #MaxID INT = (SELECT MAX(timerecordid) FROM dbo.TimeRecord)
SELECT SeqID AS MissingSeqID
FROM (SELECT ROW_NUMBER() OVER (ORDER BY column_id) SeqID from sys.columns) LkUp
LEFT JOIN dbo.TimeRecord t ON t.timeRecordId = LkUp.SeqID
WHERE t.timeRecordId is null and SeqID < #MaxID
I found this answer here:
http://sql-developers.blogspot.com/2012/10/how-to-find-missing-identitysequence.html
I was looking for a solution and found many answers. This is the one I used and it worked very well. I hope this helps anyone looking for a similar answer.
-- This will return better Results
-- ----------------------------------
;With CTERange
As (
select (select isnull(max(ArchiveID)+1,1) from tblArchives where ArchiveID < md.ArchiveID) as [from],
md.ArchiveID - 1 as [to]
from tblArchives md
where md.ArchiveID != 1 and not exists (
select 1 from tblArchives md2 where md2.ArchiveID = md.ArchiveID - 1)
) SELECT [from], [to], ([to]-[from])+1 [total missing]
From CTERange
ORDER BY ([to]-[from])+1 DESC;
from to total missing
------- ------- --------------
6 6 1
8 8 1
11 14 4
DECLARE #TempSujith TABLE
(MissingId int)
Declare #Id Int
DECLARE #mycur CURSOR
SET #mycur = CURSOR FOR Select Id From tbl_Table
OPEN #mycur
FETCH NEXT FROM #mycur INTO #Id
Declare #index int
Set #index = 1
WHILE ##FETCH_STATUS = 0
BEGIN
if (#index < #Id)
begin
while #index < #Id
begin
insert into #TempSujith values (#index)
set #index = #index + 1
end
end
set #index = #index + 1
FETCH NEXT FROM #mycur INTO #Id
END
Select Id from tbl_Table
select MissingId from #TempSujith
Create a useful Tally table:
-- can go up to 4 million or 2^22
select top 100000 identity(int, 1, 1) Id
into Tally
from master..spt_values
cross join master..spt_values
Index it, or make that single column as PK.
Then use EXCEPT to get your missing number.
select Id from Tally where Id <= (select max(Id) from TestTable)
except
select Id from TestTable
You could also solve using something like a CTE to generate the full sequence:
create table #tmp(sequence int)
insert into #tmp(sequence) values (1)
insert into #tmp(sequence) values (2)
insert into #tmp(sequence) values (3)
insert into #tmp(sequence) values (5)
insert into #tmp(sequence) values (6)
insert into #tmp(sequence) values (8)
insert into #tmp(sequence) values (10)
insert into #tmp(sequence) values (11)
insert into #tmp(sequence) values (14)
DECLARE #max INT
SELECT #max = max(sequence) from #tmp;
with full_sequence
(
Sequence
)
as
(
SELECT 1 Sequence
UNION ALL
SELECT Sequence + 1
FROM full_sequence
WHERE Sequence < #max
)
SELECT
full_sequence.sequence
FROM
full_sequence
LEFT JOIN
#tmp
ON
full_sequence.sequence = #tmp.sequence
WHERE
#tmp.sequence IS NULL
Hmmmm - the formatting is not working on here for some reason? Can anyone see the problem?
i had made a proc so you can send the table name and the key and the result is a list of missing numbers from the given table
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create PROCEDURE [dbo].[action_FindMissing_Autoincremnt]
(
#tblname as nvarchar(50),
#tblKey as nvarchar(50)
)
AS
BEGIN
SET NOCOUNT ON;
declare #qry nvarchar(4000)
set #qry = 'declare #min int '
set #qry = #qry + 'declare #max int '
set #qry = #qry +'select #min = min(' + #tblKey + ')'
set #qry = #qry + ', #max = max('+ #tblKey +') '
set #qry = #qry + ' from '+ #tblname
set #qry = #qry + ' create table #tmp (Field_No int)
while #min <= #max
begin
if not exists (select * from '+ #tblname +' where '+ #tblKey +' = #min)
insert into #tmp (Field_No) values (#min)
set #min = #min + 1
end
select * from #tmp order by Field_No
drop table #tmp '
exec sp_executesql #qry
END
GO
SELECT TOP 1 (Id + 1)
FROM CustomerNumberGenerator
WHERE (Id + 1) NOT IN ( SELECT Id FROM CustomerNumberGenerator )
Working on a customer number generator for my company. Not the most efficient but definitely most readable
The table has one Id column.
The table allows for Ids to be inserted at manually by a user off sequence.
The solution solves the case where the user decided to pick a high number