Recursively get nested URLs from database - sql

I have a Database table structured with nested URLs, using ParentID and ID to tell which piece of an URL belongs where.
Table structure looks like this:
+-----+----------+------------+-------------+
| ID | ParentID | Name | Url |
+-----+----------+------------+-------------+
| 1 | 0 | Categories | categories |
| 34 | 1 | Movies | movies |
| 281 | 34 | Star Wars | star-wars |
| 33 | 1 | Books | a-good-book |
+-----+----------+------------+-------------+
What I want to do is that I want to be able to recursively go through all of the fields, and according to the ParentID, save all the possible url combinations.
So, from the table above, I'd like to get the following output:
mysite.com/categories
mysite.com/categories/movies
mysite.com/categories/movies/star-wars
mysite.com/categories/books
mysite.com/categories/books/a-good-book
I've started writing a CTE, looking like this:
WITH CategoriesCTE AS
(
SELECT
Name,
Url,
ParentID,
ID
FROM myDB
WHERE ParentID = 1
UNION ALL
SELECT
a.Name,
a.Url,
a.ParentID,
a.ID
FROM myDB.a
INNER JOIN CategoriesCTE s on a.ParentID = s.ID
)
SELECT * FROM CategoriesCTE
Thing is, this database call saves everything flat. What I would have to do, is that for EACH step, save all urls, and then for each ID, save the url according to what the ParentID is. Right now it of course isn't formatted but my output is flatly something like:
mysite.com/categories
mysite.com/movies
mysite.com/star-wars
mysite.com/a-good-book
Which creates a lot of broken links.
Is there some way to do an action/select for each recursive step? How should I be approaching this problem?

Add a few of new fields to your recursive CTE to track:
Depth of recursion (so you can find the record with the greatest depth
The path which will be built through each iteration by concatenating the latest value to it.
The starting point of the recursion so you know what record you started with
WITH CategoriesCTE AS
(
SELECT Name, Url, ParentID, ID, 1 as depth, CAST(url as VARCHAR(500)) as path, url as startingpoint
FROM myDB
WHERE ParentID = 1
UNION ALL
SELECT a.Name, a.Url, a.ParentID, a.ID, s.depth + 1, a.url + s.path, s.url
FROM myDB.a
INNER JOIN CategoriesCTE s on a.ParentID = s.ID
)
SELECT * FROM CategoriesCTE

See what you think of this...
IF OBJECT_ID('tempdb..#SomeTable', 'U') IS NOT NULL
DROP TABLE #SomeTable;
CREATE TABLE #SomeTable (
ID INT NOT NULL,
ParentID INT NOT NULL,
FolderName VARCHAR(20) NOT NULL,
UrlPath VARCHAR(8000) NULL
);
INSERT #SomeTable (ID, ParentID, FolderName) VALUES
(1 , 0 , 'categories'),
(34 , 1 , 'movies'),
(281, 34, 'star-wars'),
(33 , 1 , 'a-good-book');
-- SELECT * FROM #SomeTable st;
WITH
cte_Categories AS (
SELECT
SitePath = CAST(CONCAT('mysite.com/', st.FolderName) AS VARCHAR(8000)),
st.ID,
NodeLevel = 1
FROM
#SomeTable st
WHERE
st.ParentID = 0
UNION ALL
SELECT
SitePath = CAST(CONCAT(c.SitePath, '/', st.FolderName) AS VARCHAR(8000)),
st.ID,
nodeLevel = c.NodeLevel + 1
FROM
cte_Categories c
JOIN #SomeTable st
ON c.ID = st.ParentID
)
SELECT
c.SitePath,
c.ID,
c.NodeLevel
FROM
cte_Categories c;

Related

SQL get top level object from joins

Working on a query right now where we want to understand which business is referring the most downstream orders for us. I've put together a very basic table for demonstration purposes here with 4 businesses listed. Bar and Donut were both ultimately referred by Foo and I want to be able to show Foo as a business has generated X number of orders. Obviously getting the the single referral for Foo (from Bar) and Bar (from Donut) are simple joins. But how do you go from Bar to get back to Foo?
I'll add that I've done some more googling this AM and found a few very similar questions about the top level parent and most of the responses suggest recursive CTE. It's been awhile since I've dug deep into SQL stuff, but 8 years ago I know these were not overly popular. Is there another way around this? Perhaps better to just store that parent ID on the order table at the time of order?
+----+--------+--------------------+
| Id | Name | ReferralBusinessId |
+----+--------+--------------------+
| 1 | Foo | |
| 2 | Bar | 1 |
| 3 | Donut | 2 |
| 4 | Coffee | |
+----+--------+--------------------+
WITH RECURSIVE entity_hierarchy AS (
SELECT id, name, parent FROM entities WHERE name = 'Donut'
UNION
SELECT e.id, e.name, e.parent FROM entities e INNER JOIN entity_hierarchy eh on e.id = eh.parent
)
SELECT id, name, parent FROM entity_hierarchy;
SQL Fiddle Example
Assuming you're using SQL Server, you could use a query like the one below to generate a hierarchical Id path for a particular business.
declare #tbl as table (Id int, Name varchar(30), ReferralBusinessId int)
insert into #tbl (id, Name, ReferralBusinessId) values
(1, 'Foo', null),
(2, 'Bar', 1),
(3, 'Donut', 2),
(4, 'Coffee', null);
;WITH business AS (
SELECT Id, Name, ReferralBusinessId
, 0 AS Level
, CAST(Id AS VARCHAR(255)) AS Path
FROM #tbl
UNION ALL
SELECT R.Id, R.Name, R.ReferralBusinessId
, Level + 1
, CAST(Path + '.' + CAST(R.Id AS VARCHAR(255)) AS VARCHAR(255))
FROM #tbl R
INNER JOIN business b ON b.Id = R.ReferralBusinessId
)
SELECT * FROM business ORDER BY Path

How can join table with IN() in ON couse?

I have two table
User
id | name | category
1 | test | [2,4]
Category
id | name
1 | first
2 | second
3 | third
4 | fourth
now i need to join this both table and get data like:
name | category
test | second, fourth
i tried like:
select u.name as name, c.name as category
from user
INNER JOIN category on(c.id in (u.category))
but it's not working.
As others have suggested, if you have any control whatsoever over the design of this database, don't store multiple values in user.category, but instead have a bridging table between the two which maps one or more category values to each user record.
However, if you are not in a position to be able to redesign the database, here's a way to get the result you're looking for. First, let's create some test data:
create table [user]
(
id int,
[name] varchar(50),
category varchar(50) -- I'm assuming this is a string type
)
create table category
(
id int,
[name] varchar(50)
)
insert into [user] values
(1,'test','[2,4]'),
(2,'another test','[1,2,4]'),
(3,'more test','[1,3,2,4]')
insert into category values
(1,'first'),
(2,'second'),
(3,'third'),
(4,'fourth');
Then you can use a CTE with split_string to pull apart the individual category values, join them to their names, then recombine them into a single comma-separated value with for xml:
with r as
(
select
u.[name] as username,
cat.id,
cat.[name] as categoryname
from [user] u
outer apply
(
select value from string_split(substring(u.category,2,len(u.category)-2),',')
) c
left join category cat on c.value = cat.id
)
select
r.username,
stuff(
(select ',' + categoryname
from r r2
where r.username = r2.username
order by r2.id
for xml path ('')), 1, 1, '') as categories
from r
group by r.username
which gives the desired output:
/-----------------------------------------\
| username | categories |
|-------------|---------------------------|
|another test | first,second,fourth |
|more test | first,second,third,fourth |
|test | second,fourth |
\-----------------------------------------/
I'm making a couple of assumptions here:
You're using MS SQL Server
The category values always begin with [, end with ] and contain nothing but a comma-delimited string containing value category ids

How to synthesize attribute for joined tables

I have a view defined like this:
CREATE VIEW [dbo].[PossiblyMatchingContracts] AS
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts
FROM [dbo].AllContracts AS C
INNER JOIN [dbo].AllContracts AS CC
ON C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND C.AssociatedUser IS NULL
AND C.UniqueID <> CC.UniqueID
Which is basically finding contracts where f.e. the first name and the birthday are matching. This works great. Now I want to add a synthetic attribute to each row with the value from only one source row.
Let me give you an example to make it clearer. Suppose I have the following table:
UniqueID | FirstName | LastName | Birthday
1 | Peter | Smith | 1980-11-04
2 | Peter | Gray | 1980-11-04
3 | Peter | Gray-Smith| 1980-11-04
4 | Frank | May | 1985-06-09
5 | Frank-Paul| May | 1985-06-09
6 | Gina | Ericson | 1950-11-04
The resulting view should look like this:
UniqueID | PossiblyMatchingContracts | SyntheticID
1 | 2 | PeterSmith1980-11-04
1 | 3 | PeterSmith1980-11-04
2 | 1 | PeterSmith1980-11-04
2 | 3 | PeterSmith1980-11-04
3 | 1 | PeterSmith1980-11-04
3 | 2 | PeterSmith1980-11-04
4 | 5 | FrankMay1985-06-09
5 | 4 | FrankMay1985-06-09
6 | NULL | NULL [or] GinaEricson1950-11-04
Notice that the SyntheticID column uses ONLY values from one of the matching source rows. It doesn't matter which one. I am exporting this view to another application and need to be able to identify each "match group" afterwards.
Is it clear what I mean? Any ideas how this could be done in sql?
Maybe it helps to elaborate a bit on the actual use case:
I am importing contracts from different systems. To account for the possibility of typos or people that have married but the last name was only updated in one system, I need to find so called 'possible matches'. Two or more contracts are considered a possible match if they contain the same birthday plus the same first, last or birth name. That implies, that if contract A matches contract B, contract B also matches contract A.
The target system uses multivalue reference attributes to store these relationships. The ultimate goal is to create user objects for these contracts. The catch first is, that the shall only be one user object for multiple matching contracts. Thus I'm creating these matches in the view. The second catch is, that the creation of user objects happens by workflows, which run parallel for each contract. To avoid creating multiple user objects for matching contracts, each workflow needs to check, if there is already a matching user object or another workflow, which is about to create said user object. Because the workflow engine is extremely slow compared to sql, the workflows should not repeat the whole matching test. So the idea is, to let the workflow check only for the 'syntheticID'.
I have solved it with a multi step approach:
Create the list of possible 1st level matches for each contract
Create the base groups list, assigning a different group for for
each contract (as if they were not related to anybody)
Iterate the matches list updating the group list when more contracts need to
be added to a group
Recursively build up the SyntheticID from final group list
Output results
First of all, let me explain what I have understood, so you can tell if my approach is correct or not.
1) matching propagates in "cascade"
I mean, if "Peter Smith" is grouped up with "Peter Gray", it means that all Smith and all Gray are related (if they have the same birth date) so Luke Smith can be in the same group of John Gray
2) I have not understood what you mean with "Birth Name"
You say contracts matches on "first, last or birth name", sorry, I'm italian, I thought birth name and first were the same, also in your data there is not such column. Maybe it is related to that dash symbol between names?
When FirstName is Frank-Paul it means it should match both Frank and Paul?
When LastName is Gray-Smith it means it should match both Gray and Smith?
In following code I have simply ignored this problem, but it could be handled if needed (I already did a try, breaking names, unpivoting them and treating as double match).
Step Zero: some declaration and prepare base data
declare #cli as table (UniqueID int primary key, FirstName varchar(20), LastName varchar(20), Birthday varchar(20))
declare #comb as table (id1 int, id2 int, done bit)
declare #grp as table (ix int identity primary key, grp int, id int, unique (grp,ix))
declare #str_id as table (grp int primary key, SyntheticID varchar(1000))
declare #id1 as int, #g int
;with
t as (
select *
from (values
(1 , 'Peter' , 'Smith' , '1980-11-04'),
(2 , 'Peter' , 'Gray' , '1980-11-04'),
(3 , 'Peter' , 'Gray-Smith', '1980-11-04'),
(4 , 'Frank' , 'May' , '1985-06-09'),
(5 , 'Frank-Paul', 'May' , '1985-06-09'),
(6 , 'Gina' , 'Ericson' , '1950-11-04')
) x (UniqueID , FirstName , LastName , Birthday)
)
insert into #cli
select * from t
Step One: Create the list of possible 1st level matches for each contract
;with
p as(select UniqueID, Birthday, FirstName, LastName from #cli),
m as (
select p.UniqueID UniqueID1, p.FirstName FirstName1, p.LastName LastName1, p.Birthday Birthday1, pp.UniqueID UniqueID2, pp.FirstName FirstName2, pp.LastName LastName2, pp.Birthday Birthday2
from p
join p pp on (pp.Birthday=p.Birthday) and (pp.FirstName = p.FirstName or pp.LastName = p.LastName)
where p.UniqueID<=pp.UniqueID
)
insert into #comb
select UniqueID1,UniqueID2,0
from m
Step Two: Create the base groups list
insert into #grp
select ROW_NUMBER() over(order by id1), id1 from #comb where id1=id2
Step Three: Iterate the matches list updating the group list
Only loop on contracts that have possible matches and updates only if needed
set #id1 = 0
while not(#id1 is null) begin
set #id1 = (select top 1 id1 from #comb where id1<>id2 and done=0)
if not(#id1 is null) begin
set #g = (select grp from #grp where id=#id1)
update g set grp= #g
from #grp g
inner join #comb c on g.id = c.id2
where c.id2<>#id1 and c.id1=#id1
and grp<>#g
update #comb set done=1 where id1=#id1
end
end
Step Four: Build up the SyntheticID
Recursively add ALL (distinct) first and last names of group to SyntheticID.
I used '_' as separator for birth date, first names and last names, and ',' as separator for the list of names to avoid conflicts.
;with
c as(
select c.*, g.grp
from #cli c
join #grp g on g.id = c.UniqueID
),
d as (
select *, row_number() over (partition by g order by t,s) n1, row_number() over (partition by g order by t desc,s desc) n2
from (
select distinct c.grp g, 1 t, FirstName s from c
union
select distinct c.grp, 2, LastName from c
) l
),
r as (
select d.*, cast(CONVERT(VARCHAR(10), t.Birthday, 112) + '_' + s as varchar(1000)) Names, cast(0 as bigint) i1, cast(0 as bigint) i2
from d
join #cli t on t.UniqueID=d.g
where n1=1
union all
select d.*, cast(r.names + IIF(r.t<>d.t,'_',',') + d.s as varchar(1000)), r.n1, r.n2
from d
join r on r.g = d.g and r.n1=d.n1-1
)
insert into #str_id
select g, Names
from r
where n2=1
Step Five: Output results
select c.UniqueID, case when id2=UniqueID then id1 else id2 end PossibleMatchingContract, s.SyntheticID
from #cli c
left join #comb cb on c.UniqueID in(id1,id2) and id1<>id2
left join #grp g on c.UniqueID = g.id
left join #str_id s on s.grp = g.grp
Here is the results
UniqueID PossibleMatchingContract SyntheticID
1 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
1 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
4 5 1985-06-09_Frank,Frank-Paul_May
5 4 1985-06-09_Frank,Frank-Paul_May
6 NULL 1950-11-04_Gina_Ericson
I think that in this way the resulting SyntheticID should also be "unique" for each group
This creates a synthetic value and is easy to change to suit your needs.
DECLARE #T TABLE (
UniqueID INT
,FirstName VARCHAR(200)
,LastName VARCHAR(200)
,Birthday DATE
)
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 1,'Peter','Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 2,'Peter','Gray','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 3,'Peter','Gray-Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 4,'Frank','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 5,'Frank-Paul','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 6,'Gina','Ericson','1950-11-04'
DECLARE #PossibleMatches TABLE (UniqueID INT,[PossibleMatch] INT,SynKey VARCHAR(2000)
)
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Fn=' + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,t2.UniqueID,'Ln=' + t1.LastName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,pm.UniqueID,'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
LEFT JOIN #PossibleMatches pm on pm.UniqueID=t1.UniqueID
WHERE pm.UniqueID IS NULL
SELECT *
FROM #PossibleMatches
ORDER BY UniqueID,[PossibleMatch]
I think this will work for you
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C INNER JOIN
[dbo].AllContracts AS CC ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN(
SELECT UniqueID FROM [dbo].DefinitiveMatches)
AND C.AssociatedUser IS NULL
You can try this:
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C
INNER JOIN
[dbo].AllContracts AS CC
ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND
C.AssociatedUser IS NULL
This will generate one extra row (because we left out C.UniqueID <> CC.UniqueID) but will give you the good souluton.
Following an example with some example data extracted from your original post. The idea: Generate all SyntheticID in a CTE, query all records with a "PossibleMatch" and Union it with all records which are not yet included:
DECLARE #t TABLE(
UniqueID int
,FirstName nvarchar(20)
,LastName nvarchar(20)
,Birthday datetime
)
INSERT INTO #t VALUES (1, 'Peter', 'Smith', '1980-11-04');
INSERT INTO #t VALUES (2, 'Peter', 'Gray', '1980-11-04');
INSERT INTO #t VALUES (3, 'Peter', 'Gray-Smith', '1980-11-04');
INSERT INTO #t VALUES (4, 'Frank', 'May', '1985-06-09');
INSERT INTO #t VALUES (5, 'Frank-Paul', 'May', '1985-06-09');
INSERT INTO #t VALUES (6, 'Gina', 'Ericson', '1950-11-04');
WITH ctePrep AS(
SELECT UniqueID, FirstName, LastName, BirthDay,
ROW_NUMBER() OVER (PARTITION BY FirstName, BirthDay ORDER BY FirstName, BirthDay) AS k,
FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
),
cteKeys AS(
SELECT FirstName, BirthDay, SyntheticID
FROM ctePrep
WHERE k = 1
),
cteFiltered AS(
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
keys.SyntheticID
FROM #t AS C
JOIN #t AS CC ON C.FirstName = CC.FirstName
AND C.Birthday = CC.Birthday
JOIN cteKeys AS keys ON keys.FirstName = c.FirstName
AND keys.Birthday = c.Birthday
WHERE C.UniqueID <> CC.UniqueID
)
SELECT UniqueID, PossiblyMatchingContracts, SyntheticID
FROM cteFiltered
UNION ALL
SELECT UniqueID, NULL, FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
WHERE UniqueID NOT IN (SELECT UniqueID FROM cteFiltered)
Hope this helps. The result looked OK to me:
UniqueID PossiblyMatchingContracts SyntheticID
---------------------------------------------------------------
2 1 PeterSmith1980-11-04
3 1 PeterSmith1980-11-04
1 2 PeterSmith1980-11-04
3 2 PeterSmith1980-11-04
1 3 PeterSmith1980-11-04
2 3 PeterSmith1980-11-04
4 NULL FrankMay1985-06-09
5 NULL Frank-PaulMay1985-06-09
6 NULL GinaEricson1950-11-04
Tested in SSMS, it works perfect. :)
--create table structure
create table #temp
(
uniqueID int,
firstname varchar(15),
lastname varchar(15),
birthday date
)
--insert data into the table
insert #temp
select 1, 'peter','smith','1980-11-04'
union all
select 2, 'peter','gray','1980-11-04'
union all
select 3, 'peter','gray-smith','1980-11-04'
union all
select 4, 'frank','may','1985-06-09'
union all
select 5, 'frank-paul','may','1985-06-09'
union all
select 6, 'gina','ericson','1950-11-04'
select * from #temp
--solution is as below
select ab.uniqueID
, PossiblyMatchingContracts
, c.firstname+c.lastname+cast(c.birthday as varchar) as synID
from
(
select a.uniqueID
, case
when a.uniqueID < min(b.uniqueID)over(partition by a.uniqueid)
then a.uniqueID
else min(b.uniqueID)over(partition by a.uniqueid)
end as SmallestID
, b.uniqueID as PossiblyMatchingContracts
from #temp a
left join #temp b
on (a.firstname = b.firstname OR a.lastname = b.lastname) AND a.birthday = b.birthday AND a.uniqueid <> b.uniqueID
) as ab
left join #temp c
on ab.SmallestID = c.uniqueID
Result capture is attached below:
Say we have following table (a VIEW in your case):
UniqueID PossiblyMatchingContracts SyntheticID
1 2 G1
1 3 G2
2 1 G3
2 3 G4
3 1 G4
3 4 G6
4 5 G7
5 4 G8
6 NULL G9
In your case you can set initial SyntheticID as a string like PeterSmith1980-11-04 using UniqueID for each line. Here is a recursive CTE query it divides all lines to unconnected groups and select MAX(SyntheticId) in the current group as a new SyntheticID for all lines in this group.
WITH CTE AS
(
SELECT CAST(','+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' as Varchar(MAX)) as GroupCont,
SyntheticID
FROM PossiblyMatchingContracts
UNION ALL
SELECT CAST(GroupCont+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' AS Varchar(MAX)) as GroupCont,
pm.SyntheticID
FROM CTE
JOIN PossiblyMatchingContracts as pm
ON
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
AND NOT
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
AND
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
)
SELECT pm.UniqueID,
pm.PossiblyMatchingContracts,
ISNULL(
(SELECT MAX(SyntheticID) FROM CTE WHERE
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
))
,pm.SyntheticID) as SyntheticID
FROM PossiblyMatchingContracts pm

postgresql with recursive grabs whole table

I have the following postgresql structure
\d brand_categories;
Table "public.brand_categories"
Column | Type | Modifiers
----------------------+---------+---------------------------------------------------------------
id | integer | not null default nextval('brand_categories_id_seq'::regclass)
category_code | text | not null
correlation_id | uuid | not null default uuid_generate_v4()
created_by_id | integer | not null
updated_by_id | integer | not null
parent_category_code | text |
I am trying to get all the parents and childs of a category via WITH RECURSIVE but not take siblings of a category. I tried to do the following (inside ruby code):
WITH RECURSIVE included_categories(category_code) AS (
SELECT category_code FROM brand_categories
WHERE category_code = 'beer'
UNION ALL
SELECT children.category_code FROM brand_categories AS parents, brand_categories AS children
WHERE parents.category_code = children.parent_category_code AND parents.category_code != 'alcohol'
UNION SELECT parents.category_code FROM brand_categories AS children, brand_categories AS parents
WHERE parents.category_code = children.parent_category_code
)
SELECT * from included_categories
The problem is that it takes the whole set of categories even though most are completely unrelated. Is there something wrong in this query?
Note that this is a simple categorization with a depth of 2 or 3.
My boss helped me to solve the problem, it made more sense to do it in 2 parts:
Find all parents
Find all children
Here is the sql:
WITH RECURSIVE children_of(category_code) AS (
SELECT category_code FROM brand_categories WHERE parent_category_code = 'alcohol'
UNION ALL
SELECT brand_categories.category_code FROM brand_categories
JOIN children_of ON brand_categories.parent_category_code = children_of.category_code
),
parents_of(parent_category_code) AS (
SELECT parent_category_code FROM brand_categories WHERE category_code = 'alcohol'
UNION
SELECT brand_categories.parent_category_code FROM parents_of
JOIN brand_categories ON brand_categories.category_code = parents_of.parent_category_code
)
SELECT category_code FROM (SELECT * FROM children_of UNION SELECT parent_category_code FROM parents_of) t0(category_code)
WHERE category_code IS NOT NULL

SQL SELECT with JOINS and Multiple Rows for one Column

The issue at hand is that I have a basic query (from a document software database), taking from multiple tables with some LEFT JOINS, but I have one column in question that needs to take from a table where there are multiple results per unique document id (DocGUID).
CURRENT QUERY
SELECT doc.[Doc #]
, '' AS 'Authors'
, ud.[Lead Author]
, doc.[Title]
, ud.[Publication]
, ud.[Citation]
, ud.[Year]
, ud.[Month]
, ud.[Comments]
, notes.[Note]
FROM [tblDocuments] doc
LEFT JOIN [tblNotes] notes ON notes.[DocGUID] = doc.[DocGUID]
LEFT JOIN [tblUserData] ud ON ud.[MasterGUID] = doc.[DocGUID]
WHERE doc.[DocGUID] = '12345678'
As you can see, I have simply queried '' for "Authors". Here's where my issue comes in. I have a table named tblMultiValues where there are two or more authors listed per DocGUID.
Table Example: (for tblMultiValues)
|------|-------------|-------------|-------------------|
| Id | DocGUID | FieldName | Value |
|------|-------------|-------------|-------------------|
| 123 | 12345678 | Authors | Collins, Nick |
| 456 | 12345678 | Authors | Williams, Robert |
| 321 | 87654321 | Authors | Smith, Kate |
| 654 | 87654321 | Authors | Hanks, Tom |
|------|-------------|-------------|-------------------|
So, what I want to show for the 2nd column of 'Authors', is the result of:
Collins, Nick; Williams, Robert
Specifically for DocGUID of '12345678'
How might one go about doing this, mixed in with the query that is already built?
(I hope this was enough info... if more is needed, please advise).
-Nick
:::EDIT:::
I was able to get things running with the following code... (very well guided from the answer given by #mohan111
SELECT DISTINCT
STUFF((
SELECT '; ' + mv2.Value
FROM [dbo].[tblMultiValues] mv2
WHERE mv1.DocGUID = mv2.DocGUID
FOR XML PATH ('')),1,2,'') AS 'Authors', mv1.FieldName, mv1.DocGUID
INTO #TempMultival
FROM [dbo].[tblMultiValues] mv1
SELECT doc.[Doc #]
, tmv.[Authors]
, ud.[Lead Author]
, doc.[Title]
, ud.[Publication]
, ud.[Citation]
, ud.[Year]
, ud.[Month]
, ud.[Comments]
, notes.[Note]
FROM [tblDocuments] doc
LEFT JOIN [tblNotes] notes ON notes.[DocGUID] = doc.[DocGUID]
LEFT JOIN [tblUserData] ud ON ud.[MasterGUID] = doc.[DocGUID]
LEFT JOIN #TempMultiVal tmv ON tmv.DoCGUID = doc.[DocGUID]
DROP TABLE #TempMultiVal
Declare #table TABLE
(
Id INT,
DocGUID int,
FieldName VARCHAR(25),
Value VARCHAR(200)
);
INSERT INTO #table
( Id,
DocGUID,
FieldName,
Value
)
VALUES
(123,12345678,'Authors','Collins, Nick'),
(456,12345678,'Authors','Williams, Robert'),
(321,87654321,'Authors','Smith, Kate'),
(654,87654321,'Authors','Hanks, Tom');
Select distinct DocGUID,
(SELECT
Substring((SELECT ', ' + CAST(i.id AS VARCHAR(1024))
FROM
#table i
WHERE i.DocGUID = tt.DocGUID
ORDER BY i.id
FOR XML PATH('')), 3, 10000000) AS list) AS ID,
FieldName,
STUFF((Select distinct t.Value + ','
from #table t
where t.DocGUID = tt.DocGUID
FOR XML PATH(''),TYPE).value('.', 'NVARCHAR(MAX)')
, 1, 0, ' ') from #table tt