SQL Sorting numeric and string - sql

Hi I have interesting problem, I have about 1500 records within a table. the format of column I need to sort against is
String Number.number.(optional number) (optional string)
In reality this could look like this:
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
ACA 1.1
ACA 1.9 (a) V
I need a way to sort these so that instead of
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
I get this
AB 2.10.2
AB 2.10.19
AB 2.10.20 (I)
Because of the lack of standard formatting I'm at a loss as to how I can sort this via SQL.
I'm at the point of just manually identifying a new int column to denote the sorting value, unless anyone has any suggestion?
I'm using SQL Server 2008 R2

You would need to sort on the first text token, then on the second text token (which is not a number, its a string comprising some numbers) then optionally on any remaining text.
To make the 2nd token sort correctly (like a version number I presume) you can use a hierarchyid:
with t(f) as (
select 'AB 2.10.19' union all
select 'AB 2.10.2' union all
select 'AB 2.10.20 (I)' union all
select 'AB 2.10.20 (a) Z' union all
select 'AB 2.10.21 (a)' union all
select 'ACA 1.1' union all
select 'ACA 1.9 (a) V' union all
select 'AB 4.1'
)
select * from t
order by
left(f, charindex(' ', f) - 1),
cast('/' + replace(substring(f, charindex(' ', f) + 1, patindex('%[0-9] %', f + ' ') - charindex(' ', f)) , '.', '/') + '/' as hierarchyid),
substring(f, patindex('%[0-9] %', f + ' ') + 1, len(f))
f
----------------
AB 2.10.2
AB 2.10.19
AB 2.10.20 (a) Z
AB 2.10.20 (I)
AB 2.10.21 (a)
AB 4.1
ACA 1.1
ACA 1.9 (a) V

add text for the same length
SELECT column
FROM table
ORDER BY left(column + replicate('*', 100500), 100500)

--get the start and end position of numeric in the string
with numformat as
(select val,patindex('%[0-9]%',val) strtnum,len(val)-patindex('%[0-9]%',reverse(val))+1 endnum
from t
where patindex('%[0-9]%',val) > 0) --where condition added to exclude records with no numeric part in them
--get the substring based on the previously calculated start and end positions
,substrng_to_sort_on as
(select val, substring(val,strtnum,endnum-strtnum+1) as sub from numformat)
--Final query to sort based on the 1st,2nd and the optional 3rd numbers in the string
select val
from substrng_to_sort_on
order by
cast(substring(sub,1,charindex('.',sub)-1) as numeric), --1st number in the string
cast(substring(sub,charindex('.',sub)+1,charindex('.',reverse(sub))) as numeric), --second number in the string
cast(reverse(substring(reverse(sub),1,charindex('.',reverse(sub))-1)) as numeric) --third number in the string
Sample demo

Try this:
SELECT column
FROM table
ORDER BY CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END, column
This will put any strings that have a . in the second to last position first in the ordering.
Edit:
On second thought, this won't work with the leading 'AB', 'ACA' etc. Try this instead:
SELECT column
FROM table
ORDER BY SUBSTRING(column,1,2), --This will deal with leading letters up to 2 chars
CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END,
Column
Edit2:
To also compensate for the second numeric set, use this:
SELECT column
FROM table
ORDER BY substring(column,1,2),
CASE WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) = '.' THEN 0
WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) <> '.' THEN 1
WHEN substring(column,charindex('.',column) + 2,1) <> '.' and substring(column,len(column)-1,1) = '.' THEN 2
ELSE 3 END, column
Basically, this is a manual way to force hierarchical ordering by accounting for each condition.

Related

SQL add column based on if statement

I have a table with a 4 character (col1) that I would like to:
split in half
use an if statement to recode the left 2 characters as specific integers
concat left onto rght (separated by '.') to generate the fin column below.
col1
left
rght
fin
xx01
xx
01
01.1
xy01
xy
01
01.2
xz01
xz
01
01.3
Bit of a SQL noob so appreciate any advice. Thanks!
As col1 length is 4 digit fixed so here half means 2 digit. You can do ORDER BY operation either full value of col1 or LEFT(col1, 2).
-- SQL SERVER
SELECT col1, LEFT(col1, 2) "left"
, RIGHT(col1, 2) "right"
, RIGHT(col1, 2) + '.' + CAST((ROW_NUMBER() OVER (ORDER BY col1)) AS VARCHAR(2)) fin
FROM test
Please check from url https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=d76d9c504472f93ed3e4c53b88b94f57
Using CASE statement
SELECT col1, LEFT(col1, 2) "left"
, RIGHT(col1, 2) "right"
-- , RIGHT(col1, 2) + '.' + CAST((ROW_NUMBER() OVER (ORDER BY col1)) AS VARCHAR(2)) fin
, RIGHT(col1, 2) + '.' + CASE LEFT(col1, 2) WHEN 'xx' THEN '1' WHEN 'xy' THEN '2' ELSE '3' END fin
FROM test
Please check from url https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=a1f10444fd50c9c1c541c49d8f23344f

how to replace a value for string if letters are missing?

I ran into a problem where i have to create a 'LettersOfName' column. As name suggest I have to get letter 2,3 and 5 from ORGANISATIONNAME column and letters 2 and 3 from CLIENTLASTNAME column, then concatenate to form letters of name column. The condition is if letters of name is not equal to length 5 than replace with '22222' also if any of the letters is missing from first name and last name than replace with '22222'. I am using this query.
select
( CASE WHEN LENGTH (UPPER( SUBSTR(ORGANISATIONNAME, 2,2) || SUBSTR(ORGANISATIONNAME,5,1)) || UPPER(SUBSTR(CLIENTLASTNAME,2,2))) != '5' THEN '22222'
ELSE UPPER( SUBSTR(ORGANISATIONNAME, 2,2) || SUBSTR(ORGANISATIONNAME,5,1)) || UPPER(SUBSTR(CLIENTLASTNAME,2,2)) END)
AS LETTERSOFNAME
from client;
So, far this query runs fine, but when we have name like 'Jo Anne' or 'J Shark' it is missing letter '2' and '3' but does not replace the string with '22222'. When length is not equal to 5 it replaces with '22222'. I am using Oracle 12c.
If after the concatenations of the letters you remove all the spaces and the length of the remaining string is less than 5 then replace with '22222':
SELECT
CASE
WHEN LENGTH(REPLACE(SUBSTR(ORGANISATIONNAME, 2, 2) || SUBSTR(ORGANISATIONNAME, 5, 1) || SUBSTR(CLIENTLASTNAME, 2, 2), ' ', '')) < 5 THEN '22222'
ELSE UPPER(SUBSTR(ORGANISATIONNAME, 2, 2) || SUBSTR(ORGANISATIONNAME, 5, 1) || SUBSTR(CLIENTLASTNAME, 2, 2))
END LETTERSOFNAME
FROM client
Or with a CTE:
WITH cte AS (
SELECT
UPPER(REPLACE(
SUBSTR(ORGANISATIONNAME, 2, 2) ||
SUBSTR(ORGANISATIONNAME, 5, 1) ||
SUBSTR(CLIENTLASTNAME, 2, 2),
' ',
''
)) LETTERSOFNAME
FROM client
)
SELECT
CASE
WHEN LENGTH(LETTERSOFNAME) < 5 THEN '22222'
ELSE LETTERSOFNAME
END LETTERSOFNAME
FROM cte
See the demo.
You should first remove the white space between the string and and then apply your case statement on it
replace ('J Shark', ' ', '')
Reason is white space is being counted as a character in J Shark and that is why second and third characters are missing.
Here is an example demo.
Here is my approach:
Put both columns ORGANISATIONNAME and CLIENTLASTNAME to another table with identity column (to identify each row)
Write a function to split text by a string (in this case pass a space)
Get the identity and the splitted data to 2 tables each for column 1 and 2
Consider each table and apply your logic
Concatenate the row values separated by space, with the ID (1 record per ID)
Join the 2 tables (by IDs)
Join the 2 tables for matches in Col-Split data, and get the IDs
Now Query for the data in table in above 1

SQL Server Recursive CTE not returning expected rows

I'm building a Markov chain name generator. I'm trying to replace a while loop with a recursive CTE. Limitations in using top and order by in the recursive part of the CTE have led me down the following path.
The point of all of this is to generate names, based on a model, which is just another word that I've chunked out into three character segments, stored in three columns in the Markov_Model table. The next character in the sequence will be a character from the Markov_Model, such that the 1st and 2nd characters in the model match the penultimate and ultimate character in the word being generated. Rather than generate a probability matrix for the that third character, I'm using a scalar function that finds all the characters that fit the criteria, and gets one of them randomly: order by newid().
The problem is that this formulation of the CTE gets the desired number of rows in the anchor segment, but the union that recursively calls the CTE only unions one row from the anchor. I've attached a sample of the desired output at the bottom.
The query:
;with names as
(
select top 5
cast('+' as nvarchar(50)) as char1,
cast('+' as nvarchar(50)) as char2,
cast(char3 as nvarchar(50)) as char3,
cast('++' + char3 as nvarchar(100)) as name_in_progress,
1 as iteration
from markov_Model
where char1 is null
and char2 is null
order by newid() -- Get some random starting characters
union all
select
n.char2 as char1,
n.char3 as char2,
cast(fnc.addition as nvarchar(50)) as char3,
cast(n.name_in_progress + fnc.addition as nvarchar(100)),
1 + n.iteration
from names n
cross apply (
-- This function takes the preceding two characters,
-- and gets a random character that follows the pattern
select isnull(dbo.[fn_markov_getNext] (n.char2, n.char3), ',') as addition
) fnc
)
select *
from names
option (maxrecursion 3) -- For debug
The trouble is the union only unions one row.
Example output:
char1 char2 char3 name_in_progress iteration
+ + F ++F 1
+ + N ++N 1
+ + K ++K 1
+ + S ++S 1
+ + B ++B 1
+ B a ++Ba 2
B a c ++Bac 3
a c h ++Bach 4
Note I'm using + and , as null replacers/delimeters.
What I want to see is the entirety of the previous recursion, with the addition of the new characters to the name_in_progress; each pass should modify the entirely of the previous pass.
My desired output would be:
Top 10 of the Markov_Model table:
Text of the function that gets the next character from the Markov_Model:
CREATEFUNCTION [dbo].[fn_markov_getNext]
(
#char2 nvarchar(1),
#char3 nvarchar(1)
)
RETURNS nvarchar(1)
AS
BEGIN
DECLARE #newChar nvarchar(1)
set #newChar = (
select top 1
isnull(char3, ',')
from markov_Model mm
where isnull(mm.char1, '+') = isnull(#char2, '+')
and isnull(mm.char2, '+') = isnull(#char3, ',')
order by (select new_id from vw_getGuid) -- A view that calls newid()
)
return #newChar
END

Using to SQL to transform and combine strings

Currently, I have an data set that is structured as follows:
CREATE TABLE notes (
date DATE NOT NULL,
author VARCHAR(100) NOT NULL,
type CHAR NOT NULL,
line_number INT NOT NULL,
note VARCHAR(4000) NOT NULL
);
Some sample date:
Date, Author, Type, Line Number, Note
2015-01-01, Abe, C, 1, First 4000 character string
2015-01-01, Abe, C, 2, Second 4000 character string
2015-01-01, Abe, C, 3, Third 4000 character string
2015-01-01, Bob, C, 1, First 4000 character string
2015-01-01, Bob, C, 2, Second 1000 character string
2015-01-01, Cal, C, 1, First 3568 character string
This data is to be migrated to a new SQL Server structure that is defined as:
CREATE TABLE notes (
date DATE NOT NULL,
author VARCHAR(100) NOT NULL,
type CHAR NOT NULL,
note VARCHAR(8000) NOT NULL
);
I would like to prefix to the multi-line (those with more than 8000 characters when combined) Notes with "Date - Author - Part X of Y // ", and place a space between concatenated strings so the data would end up like:
Date, Author, Type, Note
2015-01-01, Abe, C, 2015-01-01 - Abe - Part 1 of 2 // First 4000 character string First 3959 characters of the second 4000 character string
2015-01-01, Abe, C, 2015-01-01 - Abe - Part 2 of 2 // Remaining 41 characters of the second 4000 character string Third (up to) 4000 character string
2015-01-01, Bob, C, First 4000 character string Second 1000 character string
2015-01-01, Cal, C, First 3568 character string
I'm looking for ways to accomplish this transformation. Initially, I had an intermediate step to simple combine (coalesce) all the Note strings where Date, Author, Type are shared together but was not able to split.
Okay, so, this was a bit of a challenge but I got there in the end. Has been a thoroughly enjoyable distraction from my regular work :D
The code assumes that you will never have a note that is longer than 72,000 total characters, in that the logic which works out how much extra text is added by the Part x in y prefix assumes that x and y are single digit numbers. This could easily be remedied by padding any single digits with leading zeros, which would also ensure ordering is correct.
If you need anything explained, the comments in the code should be sufficient:
-- Declare the test data:
declare #a table ([Date] date
,author varchar(100)
,type char
,line_number int
,note varchar(8000)
,final_line int
,new_lines int
)
insert into #a values
('2015-01-01','Abel','C',1,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Abel','C',2,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Abel','C',3,'This is a note that is 83 characters long------------------------------------------' ,null,null)
,('2015-01-01','Bob' ,'C',1,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Bob' ,'C',2,'This is a note that is 43 characters long--' ,null,null)
,('2015-01-01','Cal' ,'C',1,'This is a note that is 50 characters long---------' ,null,null)
---------------------------------------
-- Start the actual data processing. --
---------------------------------------
declare #MaxFieldLen decimal(10,2) = 100 -- Set this to your 8000 characters limit you have. I have used 100 so I didn't have to generate and work with really long text values.
-- Create Numbers table. This will perform better if created as a permanent table:
if object_id('tempdb..#Numbers') is not null
drop table #Numbers
;with e00(n) as (select 1 union all select 1)
,e02(n) as (select 1 from e00 a, e00 b)
,e04(n) as (select 1 from e02 a, e02 b)
,e08(n) as (select 1 from e04 a, e04 b)
,e16(n) as (select 1 from e08 a, e08 b)
,e32(n) as (select 1 from e16 a, e16 b)
,cte(n) as (select row_number() over (order by n) from e32)
select n-1 as Number
into #Numbers
from cte
where n <= 1000001
-- Calculate some useful figures to be used in chopping up the total note. This will need to be done across the table before doing anything else:
update #a
set final_line = t.final_line
,new_lines = t.new_lines
from #a a
inner join (select Date
,author
,type
,max(line_number) as final_line -- We only want the final line from the CTE later on, so we need a way of identifying that the line_number we are working with the last one.
-- Calculate the total number of lines that will result from the additional text being added:
,case when sum(len(note)) > #MaxFieldLen -- If the Note is long enough to be broken into two lines:
then ceiling( -- Find the next highest integer value for
sum(len(note)) -- the total length of all the notes
/ (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_')) -- divided by the max note size allowed minus the length of the additional text.
)
else 1 -- Otherwise return 1.
end as new_lines
from #a
group by Date
,author
,type
) t
on a.Date = t.Date
and a.author = t.author
and a.type = t.type
-- Combine the Notes using a recursive cte:
;with cte as
(
select Date
,author
,type
,line_number
,final_line
,note
,new_lines
from #a
where line_number = 1
union all
select a.Date
,a.author
,a.type
,a.line_number
,a.final_line
,c.note + a.note
,a.new_lines
from cte c
join #a a
on c.Date = a.Date
and c.author = a.author
and c.type = a.type
and c.line_number+1 = a.line_number
)
select c1.Date
,c1.author
,c1.type
,c2.note
from cte c1
cross apply (select case when c1.new_lines > 1 -- If there is more than one line to be returned, build up the prefix:
then convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part ' + cast(Number+1 as nvarchar(10)) + ' of ' + cast(c1.new_lines as nvarchar(10)) + ' // '
+ substring(c1.note -- and then append the next (Max note length - Generated prefix) number of characters in the note:
,1 + Number * (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))
,(#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))-1
)
else c1.note
end as note
from #Numbers
where Number >= 0
and Number < case when c1.new_lines = 1
then 1
else len(c1.note) / (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))
end
) c2
where line_number = final_line
order by 1,2,3,4

Parsing expression only with sql \ finding an order of expression's values

Is there a way to find an order of words/letters inside an expression found in the database?
To be more clear here is an example:
From table X i'm getting the Names: "a" and "b".
In other table there is the expression: "b + a",
The result I need is b,1 | a,2
Is there any way to do it using only SQL query?
P.S. I didn't find any reference to this subject...
Beautiful question! Take a look at this solution wchich breaks expression into list of identifiers:
DECLARE #val varchar(MAX) = 'b * (c + a) / (b - c)';
WITH Split AS
(
SELECT 1 RowNumber, LEFT(#val, PATINDEX('%[^a-z]%', #val)-1) Val, STUFF(#val, 1, PATINDEX('%[^a-z]%', #val), '')+'$' Rest
UNION ALL
SELECT RowNumber+1 Rownumber, LEFT(Rest, PATINDEX('%[^a-z]%', Rest)-1) Val, STUFF(Rest, 1, PATINDEX('%[^a-z]%', Rest), '') Rest
FROM Split
WHERE PATINDEX('%[^a-z]%', Rest)<>0
)
SELECT Val, ROW_NUMBER() OVER (ORDER BY MIN(RowNumber)) RowNumber FROM Split
WHERE LEN(Val)<>0
GROUP BY Val
It yields following results (only first occurences):
b 1
c 2
a 3
If executed with DECLARE #val varchar(MAX) = 'as * (c + a) / (bike - car)' returns:
as 1
c 2
a 3
bike 4
car 5
(From an similar question)
You can do it with CHARINDEX() that searches for a substring within a larger string, and returns the position of the match, or 0 if no match is found.
CHARINDEX(' a ',' ' + REPLACE(REPLACE(#mainString,'+',' '),'.',' ') + ' ')
Add more recursive REPLACE() calls for any other punctuation that may occur
For your question here is an example:
INSERT INTO t1 ([name], [index])
SELECT name, CHARINDEX(' ' + name + ' ',' ' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('b * (c + a) / (b - c)','+',' '),'-',' '),'*',' '),'(',' '),')',' '),'/',' ') + ' ')
FROM t2
The result will be:
a, 10
b, 1
c, 6