SQL Order Chars Numerically - Node Grouping - sql

I have a column of numbers stored as chars separated by periods - which is used as a parent-child grouping mechanism. Having issues ordering due to the nature of the varchar and 100 coming before 11, as highlighted in bold below:
01
01.01
01.02
01.03
01.03.01
01.03.02
...
01.03.10
01.03.100
01.03.101
01.03.11
01.03.12
...
01.04
01.04.01
01.04.01.01
01.04.01.02
01.04.01.03
01.04.02
01.04.03
02
02.01
etc
Any thoughts on how can I order these chars numerically? There could potentially be unlimited child nodes, so this isn't impossible:
nn.nn.nn.nn.nn.nn.nn.nn.nn.nn etc
Thank you!

If there's a limit to the depth1 of the tree, then you can write something like:
declare #t table (OrdCol varchar(50) not null)
insert into #t (OrdCol) values
('01'),
('01.01'),
('01.02'),
('01.03'),
('01.03.01'),
('01.03.02'),
('01.03.10'),
('01.03.100'),
('01.03.101'),
('01.03.11'),
('01.03.12'),
('01.04'),
('01.04.01'),
('01.04.01.01'),
('01.04.01.02'),
('01.04.01.03'),
('01.04.02'),
('01.04.03'),
('02'),
('02.01')
select OrdCol from
(select OrdCol,CAST('<a><b>' + REPLACE(OrdCol,'.','</b><b>') + '</b></a>' as xml) as xOrd from #t
) t
order by
xOrd.value('(a/b)[1]','int'),
xOrd.value('(a/b)[2]','int'),
xOrd.value('(a/b)[3]','int'),
xOrd.value('(a/b)[4]','int'),
xOrd.value('(a/b)[5]','int'),
xOrd.value('(a/b)[6]','int'),
xOrd.value('(a/b)[7]','int'),
xOrd.value('(a/b)[8]','int'),
xOrd.value('(a/b)[9]','int'),
xOrd.value('(a/b)[10]','int')
1Why I asked a clarifying comment to your question about which way "unlimited" children is meant to be interpreted. This query deals with an unlimited number of children at each level, but only deals with a depth of up to 10.
Unlimited depth version, works provided that there's at most one leading 0 on any of the numbers:
select OrdCol from
(select OrdCol,CAST(REPLACE(REPLACE('.' + OrdCol + '.','.0','.'),'.','/') as hierarchyid) as hOrd from #t
) t
order by
hOrd
Which just munges the string until it fits a format castable to hierarchyid, which already performs sorts in the order you expected. Of course, if this is valid, you might consider changing the column datatype to use this type anyway.

Related

How to fetch only a part of string

I have a column which has inconsistent data. The column named ID and it can have values such as
0897546321
ABC,0876455321
ABC,XYZ,0873647773
ABC,
99756
test only
The SQL query should fetch only Ids which are of 10 digit in length, should begin with a 08 , should be not null and should not contain all characters. And for those values, which have both digits and characters such as ABC,XYZ,0873647773, it should only fetch the 0873647773 . In these kind of values, nothing is fixed, in place of ABC, XYZ , it can be anything and can be of any length.
The column Id is of varchar type.
My try: I tried the following query
select id
from table
where id is not null
and id not like '%[^0-9]%'
and id like '[08]%[0-9]'
and len(id)=10
I am still not sure how should I deal with values like ABC,XYZ,0873647773
P.S - I have no control over the database. I can't change its values.
SQL Server generally has poor support regular expressions, but in this case a judicious use of PATINDEX is viable:
SELECT SUBSTRING(id, PATINDEX('%,08[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9],%', ',' + id + ','), 10) AS number
FROM yourTable
WHERE ',' + id + ',' LIKE '%,08[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9],%';
Demo
If you normalise your data, and split the delimited data into parts, you can achieve this some what more easily:
SELECT SS.value
FROM dbo.YourTable YT
CROSS APPLY STRING_SPLIT(YT.YourColumn,',') SS
WHERE LEN(SS.value) = 10
AND SS.value NOT LIKE '%[^0-9]%';
If you're on an older version of SQL Server, you'll have to use an alternative String Splitter method (such as a XML splitter or user defined inline table-value function); there are plenty of examples on these already on Stack Overflow.
db<>fiddle

sql extract rightmost number in string and increment

i have transaction codes like
"A0004", "1B2005","20CCCCCCC21"
I need to extract the rightmost number and increment the transaction code by one
"AA0004"----->"AA0005"
"1B2005"------->"1B2006"
"20CCCCCCCC21"------>"20CCCCCCCC22"
in SQL Server 2012.
unknown length of string
right(n?) always number
dealing with unsignificant number of string and number length is out of my league.
some logic is always missing.
LEFT(#a,2)+RIGHT('000'+CONVERT(NVARCHAR,CONVERT(INT,SUBSTRING( SUBSTRING(#a,2,4),2,3))+1)),3
First, I want to be clear about this: I totally agree with the comments to the question from a_horse_with_no_name and Jeroen Mostert.
You should be storing one data point per column, period.
Having said that, I do realize that a lot of times the database structure can't be changed - so here's one possible way to get that calculation for you.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
col varchar(100)
);
INSERT INTO #T (col) VALUES
('A0004'),
('1B2005'),
('1B2000'),
('1B00'),
('20CCCCCCC21');
(I've added a couple of strings as edge cases you didn't mention in the question)
Then, using a couple of cross apply to minimize code repetition, I came up with that:
SELECT col,
LEFT(col, LEN(col) - LastCharIndex + 1) +
REPLICATE('0', LEN(NumberString) - LEN(CAST(NumberString as int))) +
CAST((CAST(NumberString as int) + 1) as varchar(100)) As Result
FROM #T
CROSS APPLY
(
SELECT PATINDEX('%[^0-9]%', Reverse(col)) As LastCharIndex
) As Idx
CROSS APPLY
(
SELECT RIGHT(col, LastCharIndex - 1) As NumberString
) As NS
Results:
col Result
A0004 A0005
1B2005 1B2006
1B2000 1B2001
1B00 1B01
20CCCCCCC21 20CCCCCCC22
The LastCharIndex represents the index of the last non-digit char in the string.
The NumberString represents the number to increment, as a string (to preserve the leading zeroes if they exists).
From there, it's simply taking the left part of the string (that is, up until the number), and concatenate it to a newly calculated number string, using Replicate to pad the result of addition with the exact number of leading zeroes the original number string had.
Try This
DECLARE #test nvarchar(1000) ='"A0004", "1B2005","20CCCCCCC21"'
DECLARE #Temp AS TABLE (ID INT IDENTITY,Data nvarchar(1000))
INSERT INTO #Temp
SELECT #test
;WITH CTE
AS
(
SELECT Id,LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))) AS Data
,RIGHT(LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))),1)+1 AS ReqData
FROM
(
SELECT ID,
CAST ('<S>'+REPLACE(Data,',','</S><S>')+'</S>' AS XML) AS Data
FROM #Temp
) AS A
CROSS APPLY Data.nodes ('S') AS Split(a)
)
SELECT CONCAT('"'+Data+'"','-------->','"'+CONCAT(LEFT(Data,LEN(Data)-1),CAST(ReqData AS VARCHAR))+'"') AS ExpectedResult
FROM CTE
Result
ExpectedResult
-----------------
"A0004"-------->"A0005"
"1B2005"-------->"1B2006"
"20CCCCCCC21"-------->"20CCCCCCC22"
STUFF(#X
,LEN(#X)-CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END+1
,LEN(((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
,((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
works on number only strings
99 becomes 100
mod(#N) increments

Remove ASCII Extended Characters 128 onwards (SQL)

Is there a simple way to remove extended ASCII characters in a varchar(max). I want to remove all ASCII characters from 128 onwards. eg - ù,ç,Ä
I have tried this solution and its not working, I think its because they are still valid ASCII characters?
How do I remove extended ASCII characters from a string in T-SQL?
Thanks
The linked solution is using a loop which is - if possible - something you should avoid.
My solution is completely inlineable, it's easy to create an UDF (or maybe even better: an inline TVF) from this.
The idea: Create a set of running numbers (here it's limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.
DECLARE #tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO #tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');
WITH RunningNumbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
,SingleChars AS
(
SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
FROM #tbl AS tbl
CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn
)
SELECT ID,EvilString
,(
SELECT '' + Chr
FROM SingleChars AS sc
WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
ORDER BY sc.Nmbr
FOR XML PATH('')
) AS GoodString
FROM #tbl As tbl
The result
1 ËËËËeeeeËËËË eeee
2 ËaËËbËeeeeËËËcË abeeeec
Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

Sort varchar datatype with numeric characters

SQL SERVER 2005
SQL Sorting :
Datatype varchar
Should sort by
1.aaaa
5.xx
11.bbbbbb
12
15.
how can i get this sorting order
Wrong
1.aaaa
11.bbbbbb
12
15.
5.xx
On Oracle, this would work.
SELECT
*
FROM
table
ORDER BY
to_number(regexp_substr(COLUMN,'^[0-9]+')),
regexp_substr(column,'\..*');
You could do this by calculating a column based on what's on the left hand side of the period('.').
However this method will be very difficult to make robust enough to use in a production system, unless you can make a lot of assertions about the content of the strings.
Also handling strings without periods could cause some grief
with r as (
select '1.aaaa' as string
union select '5.xx'
union select '11.bbbbbb'
union select '12'
union select '15.' )
select *
from r
order by
CONVERT(int, left(r.string, case when ( CHARINDEX('.', r.string)-1 < 1)
then LEN(r.string)
else CHARINDEX('.', r.string)-1 end )),
r.string
If all the entries have this form, you could split them into two parts and sort be these, for example like this:
ORDER BY
CONVERT(INT, SUBSTRING(fieldname, 1, CHARINDEX('.', fieldname))),
SUBSTRING(fieldname, CHARINDEX('.', fieldname) + 1, LEN(fieldname))
This should do a numeric sort on the part before the . and an alphanumeric sort for the part after the ., but may need some tuning, as I haven't actually tried it.
Another way (and faster) might be to create computed columns that contain the part before the . and after the . and sort by them.
A third way (if you can't create computed columns) could be to create a view over the table that has two additional columns with the respective parts of the field and then do the select on that view.

tsql, picking out value-pairs

I have a column that has the following data:
PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"
Is there any good, clean tsql syntax to grab the 551723 (the value associated with EntityId). The combination of Substring and Patindex I'm using seems quite unwieldy.
That strings looks just like an XML attribute list for an element, so you can wrap it into an XML element and use xpath:
declare #t table (t nvarchar(max));
insert into #t (t) values (
N'PersonId="315618" LetterId="43" MailingGroupId="1"
EntityId="551723" trackedObjectId="9538"
EmailAddress="myemailaddy#addy.com"');
with xte as (
select cast(N'<x '+t+N'/>' as xml) as x from #t)
select
n.value(N'#PersonId', N'int') as PersonId
, n.value(N'#LetterId', N'int') as LetterId
, n.value(N'#EntityId', N'int') as EntityId
, n.value(N'#EmailAddress', N'varchar(256)') as EmailAddress
from xte
cross apply x.nodes(N'/x') t(n);
Whether this is better or worse that string manipulation depends on a variety of factors, not least the size of the string and number of records to parse. I preffer the simple and clean xpath syntax over char index based manipulation (the code is much more maintainable).
If that's the text in the column, then you're going to have to use substring at some stage.
declare #l_debug varchar(1000)
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, 6)
If you don't know how long EntityID could be, then you'll need to get the patindex of the next double-quote after EntityID="
declare #l_debug varchar(1000), #l_sub varchar(100), #l_index2 numeric
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select #l_sub = substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10 /*length of "entityid=""*/, char_length(#l_debug))
select #l_index2 = patindex('%"%', #l_sub)
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, #l_index2 -1)
If you possibly can, break out your data. Either normalize your tables or store XML in the column (with an XML data type) instead of name, value pairs. You'll then be able to use the full power and speed of SQL Server, or at least be able to issue XPath queries (assuming a relatively recent version of SQL Server).
I know this probably won't help you in the short term, but it's a goal to work towards. :)
Substring(
Substring(EventArguments,PATINDEX('%EntityId%', EventArguments)+10,10),0,
PATINDEX('%"%', Substring(EventArguments,
PATINDEX('%EntityId%', EventArguments)+10,10))
)