Parse nested JSON in SQL Server - sql

I have stored data in sql server as Json format, which as give below. I would like to retrieve it as normal string value. I have tried JSON_VALUE but json may or may not have more than one child, so it should retrieve all the values.
Input
TableA
ID Education
-------------------------------------------------------------------------------------
1 {"Education": {"Record":[{"SLSubject":"MICRO ECONOMICS","Score":"77","Grade":"A"}]}}
2 {"Education": {"Record":[{"SLSubject":"Math","Score":"89","Grade":"A"},{"SLSubject":"eng","Score":"88","Grade":"B"},{"SLSubject":"tam","Score":"33","Grade":"C"}]}}
3 {"Education":{"Record":[{"SLSubject":"subject 1","Score":"87","Grade":"A"},{"SLSubject":"subject 2","Score":"67","Grade":"B"},{"SLSubject":"subject 3","Score":"45","Grade":"C"},{"SLSubject":"subject 4","Score":"87","Grade":"D"}]}}
Expected Output
ID Education
-------------------------------------------------------------------------------------
1 MICRO ECONOMICS - 77 - A
2 Math - 88 - B \n end - 88 - B \n Tam - 33 - C
3 subject 1 - 87- A \n subject 2 - 67- B \n subject 3 - 45- C \n subject 1 - 87- D \n
Query (which is working for one child)
SELECT ID, JSON_VALUE(Education,'$.Education.Record[0].SLSubject')
FROM TableA
Actual Result
ID Education
-------------------------------------------------------------------------------------
1 MICRO ECONOMICS - 77 - A
2 Math - 88 - B
3 subject 1 - 87- A
Sample JSON:
{"Education":{"Record":[{"SLSubject":"subject 1","Score":"87","Grade":"A"},{"SLSubject":"subject 2","Score":"67","Grade":"B"},{"SLSubject":"subject 3","Score":"45","Grade":"C"},{"SLSubject":"subject 4","Score":"87","Grade":"D"}]}}

You need to use OPENJSON and treat your data as a dataset, JSON_VALUE is for returning a scalar value. This'll likely be what you really want:
SELECT YT.ID,
OJ.SLSubject,
OJ.Score,
OJ.Grade
FROM (VALUES(1,N'[{"SLSubject":"MICRO ECONOMICS","Score":"77","Grade":"A"}]'),
(2,N'[{"SLSubject":"Math","Score":"89","Grade":"A"},{"SLSubject":"eng","Score":"88","Grade":"B"},{"SLSubject":"tam","Score":"33","Grade":"C"}]'),
(3,N'[{"SLSubject":"subject 1","Score":"87","Grade":"A"},{"SLSubject":"subject 2","Score":"67","Grade":"B"},{"SLSubject":"subject 3","Score":"45","Grade":"C"},{"SLSubject":"subject 4","Score":"87","Grade":"D"}]'))YT(ID,Education)
CROSS APPLY OPENJSON(YT.Education)
WITH (SLSubject varchar(20),
Score int,
Grade char(1)) OJ;
Seems the OP wants this?
SELECT YT.ID,
STRING_AGG(CONCAT(OJ.SLSubject, ' - ', OJ.Score, ' - ', OJ.Grade),' \n ') WITHIN GROUP (ORDER BY OJ.SLSubject) AS Education
FROM (VALUES(1,N'[{"SLSubject":"MICRO ECONOMICS","Score":"77","Grade":"A"}]'),
(2,N'[{"SLSubject":"Math","Score":"89","Grade":"A"},{"SLSubject":"eng","Score":"88","Grade":"B"},{"SLSubject":"tam","Score":"33","Grade":"C"}]'),
(3,N'[{"SLSubject":"subject 1","Score":"87","Grade":"A"},{"SLSubject":"subject 2","Score":"67","Grade":"B"},{"SLSubject":"subject 3","Score":"45","Grade":"C"},{"SLSubject":"subject 4","Score":"87","Grade":"D"}]'))YT(ID,Education)
CROSS APPLY OPENJSON(YT.Education)
WITH (SLSubject varchar(20),
Score int,
Grade char(1)) OJ
GROUP BY YT.ID;
DB<>Fiddle

Related

Case statement logic and substring

Say I have the following data:
Passes
ID | Pass_code
-----------------
100 | 2xBronze
101 | 1xGold
102 | 1xSilver
103 | 2xSteel
Passengers
ID | Passengers
-----------------
100 | 2
101 | 5
102 | 1
103 | 3
I want to count then create a ticket in the output of:
ID 100 | 2 pass (bronze)
ID 101 | 5 pass (because it is gold, we count all passengers)
ID 102 | 1 pass (silver)
ID 103 | 2 pass (steel)
I was thinking something like the code below however, I am unsure how to finish my case statement. I want to substring pass_code so that we get show pass numbers e.g '2xBronze' should give me 2. Then for ID 103, we have 2 passes and 3 customers so we should output 2.
Also, is there a way to firstly find '2xbronze' if the pass_code contained lots of other things such as '101001, 1xbronze, FirstClass' - this may change so i don't want to substring, could we search for '2xbronze' and then pull out the 2??
SELECT
CASE
WHEN Passes.pass_code like '%gold%' THEN Passengers.passengers
WHEN Passes.pass_code like '%steel%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%bronze%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%silver%' THEN SUBSTRING(passes.pass_code, 1,1)
else 0 end as no,
Passes.ID,
Passes.Pass_code,
Passengers.Passengers
FROM Passes
JOIN Passengers ON Passes.ID = Passengers.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=db698e8562546ae7658270e0ec26ca54
So assuming you are indeed using Oracle (as your DB fiddle implies).
You can do some string magic with finding position of a splitter character (in your case the x), then substringing based on that. Obviously this has it's problems, and x is a bad character seperator as well.. but based on your current set.
WITH PASSCODESPLIT AS
(
SELECT PASSES.ID,
TO_Number(SUBSTR(PASSES.PASS_CODE, 0, (INSTR(PASSES.PASS_CODE, 'x')) - 1)) AS NrOfPasses,
SUBSTR(PASSES.PASS_CODE, (INSTR(PASSES.PASS_CODE, 'x')) + 1) AS PassType
FROM Passes
)
SELECT
PASSCODESPLIT.ID,
CASE
WHEN PASSCODESPLIT.PassType = 'gold' THEN Passengers.Passengers
ELSE PASSCODESPLIT.NrOfPasses
END AS NrOfPasses,
PASSCODESPLIT.PassType,
Passengers.Passengers
FROM PASSCODESPLIT
INNER JOIN Passengers ON PASSCODESPLIT.ID = Passengers.ID
ORDER BY PASSCODESPLIT.ID ASC
Gives the result of:
ID NROFPASSES PASSTYPE PASSENGERS
100 2 bronze 2
101 5 gold 5
102 1 silver 1
103 2 steel 3
As can also be seen in this fiddle
But I would strongly advise you to fix your table design. Having multiple attributes in the same column leads to troubles like these. And the more variables/variations you start storing, the more 'magic' you need to keep doing.
In this particular example i see no reason why you don't simply have the 3 columns in Passes, also giving you the opportunity to add new columns going forward. I.e. to keep track of First class.
You can extract the numbers using regexp_substr(). So I think this does what you want:
SELECT (CASE WHEN p.pass_code LIKE '%gold%'
THEN TO_NUMBER(REGEXP_SUBSTR(p.pass_code, '^[0-9]+'))
ELSE pp.passengers
END) as num,
p.ID, p.Pass_code, pp.Passengers
FROM Passes p JOIN
Passengers pp
ON p.ID = pp.ID;
Here is a db<>fiddle.
This converts the leading digits in the code to a number. Also note the use of table aliases to simplify the query.

How to sort SQL query alphabetically but ignoring leading numbers?

I am unable to find the right query for my problem. I have a table in the db and I need to sort it in a very specific manner - the column I am sorting is an address, and it starts with the number, but I need to sort it ignoring the number.
Here is my data set:
id | address
1 | 23 Bridge road
2 | 14 Kennington street
3 | 7 Bridge road
4 | 12 Oxford street
5 | 9 Bridge road
I need to sort this like:
id | address
1 | 7 Bridge road
2 | 9 Bridge road
3 | 23 Bridge road
4 | 14 Kennington street
5 | 12 Oxford street
So far I got only this:
SELECT id, address
FROM propertySearch
Order by address ASC.
Can anyone help me out on this?
If this will always be that format(leading number, a space and then the address) , then you can do this:
SQL-Server:
SELECT * FROM YourTable t
ORDER BY SUBSTRING(t.address,CHARINDEX(' ',t.address,1),99)
MySQL :
SELECT * FROM YourTable t
ORDER BY SUBSTRING_INDEX(t.address,' ',-1)
If the format is not constant , you can use SQL-Server patindex() :
SELECT * FROM YourTable t
ORDER BY SUBSTRING(t.address,PATINDEX('%[A-z]%',t.address),99)
NOTE: This is bad DB design!! Each value should be properly stored in its own column, E.G STREET , CITY , APARTMANT_NUMBER ETC, becuase if not, they are leading to exactly this.
If you use SQL Server, you can use a combination of PATINDEX and STUFF:
SELECT *, STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, '')
FROM #Table1 AS T
ORDER BY STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, '')
PATINDEX will find first letter index in your string and STUFF is used to trim everything from the beginning to that index.
That's output:
id address No column name)
---------------------------------------------
1 23 Bridge road Bridge road
3 7 Bridge road Bridge road
5 9 Bridge road Bridge road
2 14 Kennington street Kennington street
4 12 Oxford street Oxford street
I also noticed you have different order in your expected output. If that was intented. You need to use ROW_NUMBER:
SELECT ROW_NUMBER() OVER(ORDER BY STUFF(T.address, 1, PATINDEX('%[A-z]%', T.address) - 1, ''), T.id) AS ID, T.address
FROM #Table1 AS T;
This query will generate new ID for each row.
Result:
id address
------------------------
1 23 Bridge road
2 7 Bridge road
3 9 Bridge road
4 14 Kennington street
5 12 Oxford street
Anyway, this is rather hacky solution.
I'd suggest you to store your address in seperate columns, such as street name, postal code, house number, house letter (optional), town, etc. This will be a much better approach.
I think this kind of operations is more for business layer.
If you load all data to the .net code - sorting will be more easy, more readable and maintainable.
Public Class Address
Public Property Id As Integer
Public Property AddressData As String
'This property can be used for sorting
Public ReadOnly Property SortedKey As String
Get
Dim rawData As IEnumerable(Of String) = Me.AddressData.Split(" "c).Skip(1)
Return String.Join(" ", rawData)
End Get
End Property
End Class
Then use it with LINQ
Dim loaded As List(Of Address) = yourLoadFunction()
Dim sorted = loaded.OrderBy(Function(item) item.SortedKey).ToList()
As you've tagged vb.net, guess you use MS SQL. If you are always separating street number and street name with a blank space, try ordering like this:
ORDER BY RIGHT([address], LEN([address]) - CHARINDEX(' ', [address], 1))
Declare #Table table (id int,address varchar(100))
Insert into #Table values
(1,'23 Bridge road'),
(2,'14 Kennington street'),
(3,'7 Bridge road'),
(4,'12 Oxford street'),
(5,'9 Bridge road')
Select * From #Table
Order By substring(address,patindex('%[a-z]%',address),200)
,cast(substring(address,1,charindex(' ',address)) as int)
Returns
id address
3 7 Bridge road
5 9 Bridge road
1 23 Bridge road
2 14 Kennington street
4 12 Oxford street

Fuzzy grouping in SQL

I need to modify a SQL table to group slightly mismatched names, and assign all elements in the group a standardized name.
For instance, if the initial table looks like this:
Name
--------
Jon Q
John Q
Jonn Q
Mary W
Marie W
Matt H
I would like to create a new table or add a field to the existing one like this:
Name | StdName
--------------------
Jon Q | Jon Q
John Q | Jon Q
Jonn Q | Jon Q
Mary W | Mary W
Marie W | Mary W
Matt H | Matt H
In this case, I've chosen the first name to assign as the "standardized name," but I don't actually care which one is chosen -- ultimately the final "standardized name" will be hashed into a unique person ID. (I'm also open to alternative solutions that go directly to a numerical ID.) I will have birthdates to match on as well, so the accuracy of the name matching doesn't actually need to be all that precise in practice. I've looked into this a bit and will probably use the Jaro-Winkler algorithm (see e.g. here).
If I knew that the names were all in pairs, this would be a relatively easy query, but there can be an arbitrary number of the same name.
I can easily conceptualize how to do this query in a procedural language, but I'm not very familiar with SQL. Unfortunately I don't have direct access to the data -- it's sensitive data and so somebody else (a bureaucrat) has to run the actual query for me. The specific implementation will be SQL Server, but I'd prefer an implementation-agnostic solution.
EDIT:
In response to a comment, I had the following procedural approach in mind. It's in Python, and I replaced the Jaro-Winkler with simply matching on the first letter of the name, for the sake of having a working code example.
nameList = ['Jon Q', 'John Q', 'Jonn Q', 'Mary W', 'Marie W', 'Larry H']
stdList = nameList[:]
# loop over all names
for i1, name1 in enumerate(stdList):
# loop over later names in list to find matches
for i2, name2 in enumerate(stdList[i1+1:]):
# If there's a match, replace latter with former.
if (name1[0] == name2[0]):
stdList[i1+1+i2] = name1
print stdList
The result is ['Jon Q', 'Jon Q', 'Jon Q', 'Mary W', 'Mary W', 'Larry H'].
Just a thought, but you might be able to use the SOUNDEX() function. This will create a value for the names that are similar.
If you started with something like this:
select name, soundex(name) snd,
row_number() over(partition by soundex(name)
order by soundex(name)) rn
from yt;
See SQL Fiddle with Demo. Which would give a result for each row that is similar along with a row_number() so you could return only the first value for each group. For example, the above query will return:
| NAME | SND | RN |
-----------------------
| Jon Q | J500 | 1 |
| John Q | J500 | 2 |
| Jonn Q | J500 | 3 |
| Matt H | M300 | 1 |
| Mary W | M600 | 1 |
| Marie W | M600 | 2 |
Then you could select all of the rows from this result where the row_number() is equal to 1 and then join back to your main table on the soundex(name) value:
select t1.name,
t2.Stdname
from yt t1
inner join
(
select name as stdName, snd, rn
from
(
select name, soundex(name) snd,
row_number() over(partition by soundex(name)
order by soundex(name)) rn
from yt
) d
where rn = 1
) t2
on soundex(t1.name) = t2.snd;
See SQL Fiddle with Demo. This gives a result:
| NAME | STDNAME |
---------------------
| Jon Q | Jon Q |
| John Q | Jon Q |
| Jonn Q | Jon Q |
| Mary W | Mary W |
| Marie W | Mary W |
| Matt H | Matt H |
Assuming you copy and paste the jaro-winkler implementation from SSC (registration required), the following code will work. I tried to build a SQLFiddle for it but it kept going belly up when I was building the schema.
This implementation has a cheat---I'm using a cursor. Generally, cursors are not conducive to performance but in this case, you need to be able to compare the set against itself. There's probably a graceful number/tally table approach to eliminate the declared cursor.
DECLARE #SRC TABLE
(
source_string varchar(50) NOT NULL
, ref_id int identity(1,1) NOT NULL
);
-- Identify matches
DECLARE #WORK TABLE
(
source_ref_id int NOT NULL
, match_ref_id int NOT NULL
);
INSERT INTO
#src
SELECT 'Jon Q'
UNION ALL SELECT 'John Q'
UNION ALL SELECT 'JOHN Q'
UNION ALL SELECT 'Jonn Q'
-- Oops on matching joan to jon
UNION ALL SELECT 'Joan Q'
UNION ALL SELECT 'june'
UNION ALL SELECT 'Mary W'
UNION ALL SELECT 'Marie W'
UNION ALL SELECT 'Matt H';
-- 2 problems to address
-- duplicates in our inbound set
-- duplicates against a reference set
--
-- Better matching will occur if names are split into ordinal entities
-- Splitting on whitespace is always questionable
--
-- Mat, Matt, Matthew
DECLARE CSR CURSOR
READ_ONLY
FOR
SELECT DISTINCT
S1.source_string
, S1.ref_id
FROM
#SRC AS S1
ORDER BY
S1.ref_id;
DECLARE #source_string varchar(50), #ref_id int
OPEN CSR
FETCH NEXT FROM CSR INTO #source_string, #ref_id
WHILE (##fetch_status <> -1)
BEGIN
IF (##fetch_status <> -2)
BEGIN
IF NOT EXISTS
(
SELECT * FROM #WORK W WHERE W.match_ref_id = #ref_id
)
BEGIN
INSERT INTO
#WORK
SELECT
#ref_id
, S.ref_id
FROM
#src S
-- If we have already matched the value, skip it
LEFT OUTER JOIN
#WORK W
ON W.match_ref_id = S.ref_id
WHERE
-- Don't match yourself
S.ref_id <> #ref_id
-- arbitrary threshold, will need to examine this for sanity
AND dbo.fn_calculateJaroWinkler(#source_string, S.source_string) > .95
END
END
FETCH NEXT FROM CSR INTO #source_string, #ref_id
END
CLOSE CSR
DEALLOCATE CSR
-- Show me the list of all the unmatched rows
-- plus the retained
;WITH MATCHES AS
(
SELECT
S1.source_string
, S1.ref_id
, S2.source_string AS match_source_string
, S2.ref_id AS match_ref_id
FROM
#SRC S1
INNER JOIN
#WORK W
ON W.source_ref_id = S1.ref_id
INNER JOIN
#SRC S2
ON S2.ref_id = W.match_ref_id
)
, UNMATCHES AS
(
SELECT
S1.source_string
, S1.ref_id
, NULL AS match_source_string
, NULL AS match_ref_id
FROM
#SRC S1
LEFT OUTER JOIN
#WORK W
ON W.source_ref_id = S1.ref_id
LEFT OUTER JOIN
#WORK S2
ON S2.match_ref_id = S1.ref_id
WHERE
W.source_ref_id IS NULL
and s2.match_ref_id IS NULL
)
SELECT
M.source_string
, M.ref_id
, M.match_source_string
, M.match_ref_id
FROM
MATCHES M
UNION ALL
SELECT
M.source_string
, M.ref_id
, M.match_source_string
, M.match_ref_id
FROM
UNMATCHES M;
-- To specifically solve your request
SELECT
S.source_string AS Name
, COALESCE(S2.source_string, S.source_string) As StdName
FROM
#SRC S
LEFT OUTER JOIN
#WORK W
ON W.match_ref_id = S.ref_id
LEFT OUTER JOIN
#SRC S2
ON S2.ref_id = W.source_ref_id
query output 1
source_string ref_id match_source_string match_ref_id
Jon Q 1 John Q 2
Jon Q 1 JOHN Q 3
Jon Q 1 Jonn Q 4
Jon Q 1 Joan Q 5
june 6 NULL NULL
Mary W 7 NULL NULL
Marie W 8 NULL NULL
Matt H 9 NULL NULL
query output 2
Name StdName
Jon Q Jon Q
John Q Jon Q
JOHN Q Jon Q
Jonn Q Jon Q
Joan Q Jon Q
june june
Mary W Mary W
Marie W Marie W
Matt H Matt H
There be dragons
Over on SuperUser, I talked about my experience matching people. In this section, I'll list some things to be aware of.
Speed
As part of your matching, hooray in that you have a birthday to augment the match process. I would actually propose you generate a match based exclusively on birthdate first. That is an exact match and one that, with a proper index, SQL Server will be able to quickly include/exclude rows. Because you're going to need it. The TSQL implementation is dog slow. I've been running the equivalent match against a dataset of 28k names (names that had been listed as conference attendees). There ought to be some good overlap there and while I did fill #src with data, it is a table variable with all that that implies but it's been running now for 15 minutes and still hasn't completed.
It's slow for a number of reasons but things that jumped out at me are all the looping and string manipulation in the functions. That is not where SQL Server shines. If you have a need to do a lot of this, it might be a good idea to convert them into CLR methods so at least you can leverage the strength of the .NET libraries for some of the manipulations.
One of the matches we used to use was the Double Metaphone and it would generate a pair of possible phonetic interpretations of the name. Instead of computing that every time, compute it once and store it alongside the name. That would help speed some of the matching. Unfortunately, it doesn't look like JW lends itself to breaking it down like that.
Look at iterating too. We'd first try the algs that we knew were fast. 'John' = 'John' so there's no need to pull out the big guns so we'd try a first pass of straight name checks. If we didn't find a match, we'd try harder. The hope was that by taking various swipes at matching we'd get the low hanging fruit as fast as possible and worry about the harder matches later.
Names
In my SU answer and in the code comments, I mention nicknames. Bill and Billy are going to match. Billy, Liam and William are definitely not going to match even though they may be the same person. You might want to look at a list like this to provide translation between nickname and full name. After running a set of matches on the supplied name, maybe we'd try looking for a match based on the possible root name.
Obviously, there are draw backs to this approach. For example, my grandfather-in-law is Max. Just Max. Not Maximilian, Maximus or any other things you might thing.
Your supplied names look like it's first and last concatenated together. Future readers, if you ever have the opportunity to capture individual portions of a name, please do so. There are products out there that will split names and try to match them up against directories to try and guess whether something is first/middle name or a surname but then you have people like "Robar Mike". If you saw that name there, you'd think Robar is a last name and you'd also pronounce it like "robber." Instead, Robar (say it with a French accent) is his first name and Mike is his last name. At any rate, I think you'll have a better matching experience if you can split first and last out into separate fields and match the individual pieces together. An exact last name match plus a partial first name match might suffice, especially in cases where legally they are "Franklin Roosevelt" and you have a candidate of "F. Roosevelt" Perhaps you have a rule that an initial letter can match. Or you don't.
Noise - as referenced in the JW post and my answer, strip out crap (punctuation, stop words, etc) for matching purposes. Also watch out for honorific tites (phd, jd, etc) and generationals (II, III, JR, SR). Our rule was a candidate with/without a generational could match one in the opposite state (Bob Jones Jr == Bob Jones) or could exactly match the generation (Bob Jones Sr = Bob Jones Sr) but you'd never want to match if both records supplied them and they were conflicting (Bob Jones Sr != Bob Jones Jr).
Case sensitivity, always check your database and tempdb to make sure you aren't making case sensitive matches. And if you are, convert everything to upper or lower for purposes of matching but don't ever throw the supplied casing away. Good luck trying to determine whether latessa should be Latessa, LaTessa or something else.
My query is coming up on a hour's worth of processing with no rows returned so I'm going to kill it and turn in. Best of luck, happy matching.

SQL query dynamic row generation with composite key

My question is made of 3 parts.
First part:
Is there a way to generate rows based on a value?
E.g:
I want to give each family a number of vouchers based on their family_members_count.
Each voucher should have a unique id:
Base table:
id name family_members_count
1 fadi 2
2 sami 3
3 ali 1
Result:
family_id name voucher_id
1 fadi 121
1 fadi 122
2 sami 123
2 sami 124
2 sami 125
3 ali 126
Second part:
Can I control the voucher_id composite key? I want the voucher_id to be like this
(location)(cycle)(sequence 5 digits)
If north = 08 and we are in the second cycle it should be:
080200001
080200002
... and so on.
Third part:
I need the solution in both MS Access 2010 SQL and PostgreSQL 9.1 SQL.
Question 1
Use generate_series(). (In the coming version 9.3 look for the key word LATERAL.)
SELECT id AS family_id
,name
,120 + generate_series(1, family_members_count) AS voucher_id
FROM fam;
-> sqlfiddle demo
Question 2
SELECT id AS family_id
,name
,location
|| to_char(cycle, 'FM00')
|| to_char(generate_series(1, family_members_count), 'FM00000')
AS voucher_id
FROM fam2;
-> sqlfiddle demo
Note the use of to_char() to format numbers as text - and in particular the use of the FM pattern modifier to avoid leading white space.
Question 3
Sorry, I got MS Access out of my system 10 years ago and never looked back. Somebody else might fill in. I doubt it will be as simple.

Sql Ordering Hiarchy

I am working on a SQL Statement that I can't seem to figure out. I need to order the results alphabetically, however, I need "children" to come right after their "parent" in the order. Below is a simple example of the table and data I'm working with. All non relevant columns have been removed. I'm using SQL Server 2005. Is there an easy way to do this?
tblCats
=======
idCat | fldCatName | idParent
--------------------------------------
1 | Some Category | null
2 | A Category | null
3 | Top Category | null
4 | A Sub Cat | 1
5 | Sub Cat1 | 1
6 | Another Cat | 2
7 | Last Cat | 3
8 | Sub Sub Cat | 5
Results of Sql Statement:
A Category
Another Cat
Some Category
A Sub Cat1
Sub Cat 1
Sub Sub Cat
Top Category
Last Cat
(The prefixed spaces in the result are just to add in understanding of the results, I don't want the prefixed spaces in my sql result. The result only needs to be in this order.)
You can do it with a hierarchical query, as below.
It looks a lot more complicated than it is, due to the lack of a PAD funciton in t-sql. The seed of the hierarchy are the categories without parents. The fourth column we select is their ranking alphabetically (converted to a string and padded). Then we union this with their children. At each recursion, the children will all be at the same level, so we can get their ranking alphabetically without needing to partition. We can concatenate these rankings together down the tree, and order by that.
;WITH Hierarchy AS (
SELECT
idCat, fldCatName, idParent,
CAST(RIGHT('00000'+
CAST(ROW_NUMBER() OVER (ORDER BY fldCatName) AS varchar(8))
, 5)
AS varchar(256)) AS strPath
FROM Category
WHERE idParent IS NULL
UNION ALL
SELECT
c.idCat, c.fldCatName, c.idParent,
CAST(h.strPath +
CAST(RIGHT('00000'+
CAST(ROW_NUMBER() OVER (ORDER BY c.fldCatName) AS varchar(8))
, 5) AS varchar(16))
AS varchar(256))
FROM Hierarchy h
INNER JOIN Category c ON c.idParent = h.idCat
)
SELECT idCat, fldCatName, idParent, strPath
FROM Hierarchy
ORDER BY strPath
With your data:
idCat fldCatName idParent strPath
------------------------------------------------
2 A Category NULL 00001
6 Another Category 2 0000100001
1 Some Category NULL 00002
4 A Sub Category 1 0000200001
5 Sub Cat1 1 0000200002
8 Sub Sub Category 5 000020000200001
3 Top Category NULL 00003
7 Last Category 3 0000300001
It can be done in CTE... Is this what you're after ?
With MyCats (CatName, CatId, CatLevel, SortValue)
As
( Select fldCatName CatName, idCat CatId,
0 Level, Cast(fldCatName As varChar(200)) SortValue
From tblCats
Where idParent Is Null
Union All
Select c.fldCatName CatName, c.idCat CatID,
CatLevel + 1 CatLevel,
Cast(SortValue + '\' + fldCatName as varChar(200)) SortValue
From tblCats c Join MyCats p
On p.idCat = c.idParent)
Select CatName, CatId, CatLevel, SortValue
From MyCats
Order By SortValue
EDIT: (thx to Pauls' comment below)
If 200 characters is not enough to hold the longest concatenated string "path", then change the value to as high as is needed... you can make it as high as 8000
I'm not aware of any SQL Server (or Ansi-SQL) inherent support for this.
I don't supposed you'd consider a temp table and recursive stored procedure an "easy" way ? J
Paul's answer is excellent, but I thought I would throw in another idea for you. Joe Celko has a solution for this in his SQL for Smarties book (chapter 29). It involves maintaining a separate table containing the hierarchy info. Inserts, updates, and deletes are a little complicated, but selects are very fast.
Sorry I don't have a link or any code to post, but if you have access to this book, you may find this helpful.