Order By clause not sorting properly - sql

This is one of the most interesting issue that I have come across in quiet some time. I have a scalar function defined in SQL server 2008 which whose return type is varchar(max)
This is the query:
Select dbo.GetJurisdictionsSubscribed(u.UserID) as 'Jurisdiction' From Users u ORDER BY Jurisdiction desc
Could anybody explain why would AAAA... 2nd record in the resultset? I am doing a descending sort, AAA... should appear at the last. If I change the query to
Jurisdiction asc
AAA goes 2nd last in the list instead of the 1st record.
This is the screenshot of the resultset: http://i48.tinypic.com/23j5vzq.jpg
Am I missing something?

That is the correct sort order. You have spaces. You must read Case Sensitive Collation Order.

Because, as you can see in your screenshot, they are a white space in other rows before 'Wise' word (and withe space is greater than 'A')
You can left trim this spaces with:
ORDER BY ltrim( Jurisdiction ) desc

notice the leading white spaces, try
SELECT ...
FROM ...
ORDER BY LTRIM(Jurisdiction) desc
LTRIM would be fine.

It's a bit hard to tell on the screenshot, but can you check the length of the values because I think some of them have a leading space. If so then they would be sorted correctly.

Related

How to get highest alphanumeric value in Postgres

I have the following table of inventory stock symbols and want to fetch the highest alphanumeric value which is AR-JS-20. When I say "highest" I mean that the letter order is sorted first and then numbers are factored in, so AR-JS-20 is higher than AL-JS-20.
BTW, I don't want to split anything into parts because it is unknown what symbols vendors will send me in the futire.
I simply want an alphanumeric sort like you sort computer directory by name. Where dashes, undersocres, asterisks, etc. come first, then numbers, and letters last with cascading priority where the first character in the symbol has the most weight, then the second character and so on.
NOTE: The question has been edited so some of the answers below no longer apply.
AL-JS-20
AR-JS-20
AR-JS-9
AB-JS-8
AA-JS-1
1A-LM-30
2BA2-1
45HT
So ideally if this table was sorted to my requirements it would look like this
AR-JS-20
AR-JS-9
AB-JS-8
AL-JS-20
AA-JS-1
45HT
2BA2-1
1A-LM-30
However, when I use this query:
select max(symbol) from stock
I get:
AR-JS-9
but what I want to get is: AR-JS-20
I also tried:
select max(symbol::bytea) from stock
But this triggers error:
function max(bytea) does not exist
There is dedicated tag for this group of problems: natural-sort (I added it now.)
Ideally, you store the string part and the numeric part in separate columns.
While stuck with your unfortunate symbols ...
If your symbols are as regular as the sample suggests, plain left() and split_part() can do the job:
SELECT symbol
FROM stock
ORDER BY left(symbol, 5) DESC NULLS LAST
, split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT 1;
Or, if at least the three dashes are a given:
...
ORDER BY split_part(symbol, '-', 1) DESC NULLS LAST
, split_part(symbol, '-', 2) DESC NULLS LAST
, split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT 1
See:
Split comma separated column data into additional columns
Or, if the format is not as rigid: regular expression functions are more versatile, but also more expensive:
...
ORDER BY substring(symbol, '^\D+') DESC NULLS LAST
, substring(symbol, '\d+$')::int DESC NULLS LAST
LIMIT 1;
^ ... anchor to the start of the string
$ ... anchor to the end of the string
\D ... class shorthand for non-digits
\d ... class shorthand for digits
Taking only (trailing) digits, we can safely cast to integer (assuming numbers < 2^31), and sort accordingly.
Add NULLS LAST if any part can be missing, or the column can be NULL.
Specify a custom order by that trims everything up to the last - and converts the remaining number to int and take the first:
select stock_code
from mytable
order by regexp_replace(stock_code, '-?[0-9]+-?', ''), regexp_replace(stock_code, '[^0-9-]', '')::int
limit 1
See live demo.
This works for numbers at both start and end of code:
regexp_replace(stock_code, '-?[0-9]+-?', '') "deletes" digits and any adjacent dashes
regexp_replace(stock_code, '[^0-9]', '') "deletes" all non-digits

Select column ignore beginning numbers

I have a column that I need to select but it has an inconsistent amount of numbers/formatting in the beginning
The column values are ideally supposed to be structured like:
# Question_-_Answer
But here are some examples which make it hard to remove the numbers in the beginning
0 Question1_-_50-60
1.Question_-_apple
12Question_-_40/50
13 Question_-_orange
14.Question_-_apple
15. Question_-_orange2
Is there a way I can query this column so that it ignores everything until the first alphabetical character while also not removing any characters/alphanumerical values in the question and answer portion?
You can use PATINDEX and STUFF to achieve this:
SELECT STUFF(V.YourString,1,PATINDEX('%[A-z]%',V.YourString)-1,'')
FROM (VALUES('0 Question1_-_50-60'),
('1.Question_-_apple'),
('12Question_-_40/50'),
('13 Question_-_orange'),
('14.Question_-_apple'),
('15. Question_-_orange2'))V(YourString);
This removes all characters up to the first alpha character.

Understanding why length in SQL DB2 would return short results as the max character length

I have a query that pulls the count of all last names in our DB and sorts the count by the length of the last name. This is a VARCHAR field with a max length of 120.
Some results that are a much shorter character length - 5, 6, 7, etc characters - are showing as 120. Using a RTRIM seems to get the right results, but I am confused as to why when I don't have the RTRIM why most values calculate correctly, but some don't. While I know I have the right results with the RTRIM, I just want to understand why some cases don't pull that correctly without it.
SELECT LENGTH(NAME_LAST), COUNT(*)
FROM database
GROUP BY LENGTH(NAME_LAST)
ORDER BY LENGTH(NAME_LAST) DESC;
Db2 does not trim trailing spaces unless you ask it to with e.g. RTRIM
$ db2 "create table t(v varchar(120))"
$ db2 "insert into t values space(120)"
$ db2 "select length(v) from v"
1
-----------
120
1 record(s) selected.
$ db2 "select length(rtrim(v)) from v"
1
-----------
0
1 record(s) selected.
You can have leading/trailing whitespaces or other non-printable characters. Try concatenating quotes or some other characters around the selection of a column and it'll high light it for you. Or as #mao suggests show the hex values
Does this help answer your question?
"If a grouping-expression contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and might not all have the same length. In that case, a reference to grouping-expression still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable."
https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/sqlref/src/tpc/db2z_sql_groupbyclause.html

Remove unnecessary Characters by using SQL query

Do you know how to remove below kind of Characters at once on a query ?
Note : .I'm retrieving this data from the Access app and put only the valid data into the SQL.
select DISTINCT ltrim(rtrim(a.Company)) from [Legacy].[dbo].[Attorney] as a
This column is company name column.I need to keep string characters only.But I need to remove numbers only rows,numbers and characters rows,NULL,Empty and all other +,-.
Based on your extremely vague "rules" I am going to make a guess.
Maybe something like this will be somewhere close.
select DISTINCT ltrim(rtrim(a.Company))
from [Legacy].[dbo].[Attorney] as a
where LEN(ltrim(rtrim(a.Company))) > 1
and IsNumeric(a.Company) = 0
This will exclude entries that are not at least 2 characters and can't be converted to a number.
This should select the rows you want to delete:
where company not like '%[a-zA-Z]%' and -- has at least one vowel
company like '%[^ a-zA-Z0-9.&]%' -- has a not-allowed character
The list of allowed characters in the second expression may not be complete.
If this works, then you can easily adapt it for a delete statement.

How to do string manipulation in SQL query

I know I'm close to figuring this out but need a little help. What I'm trying to do is all grab a column from a particular table, but chop off the first 4 characters. For example if in a column the value is "KPIT08L", the result I was is 08L. Here is what I have so far but not getting the desired results.
SELECT LEFT(FIELD_NAME, 4)
FROM TABLE_NAME
First up, left will give you the leftmost characters. If you want the characters starting at a specific location, you need to look into mid:
select mid (field_name,5) ...
Secondly, if you value performance,portability and scalability at all, this sort of "sub-column" manipulation should generally be avoided. It's usually far easier (and faster) to patch columns together than to split them apart.
In other words, keep the first four characters in their own column and the rest in a separate column, and do your selects on the relevant one. If you're using anything less than a full column, then it's technically not one attribute of the row.
Try with
SELECT MID(FIELD_NAME, 5) FROM TABLE_NAME
Mid is very powerfull, it let you select the starting point and all the remainder, or,
if specified, the length desidered as in
SELECT MID(FIELD_NAME, 5, 2) FROM TABLE_NAME ' gives 08 in your example text
SELECT RIGHT(FIELD_NAME,LEN(FIELD_NAME)-4)
FROM TABLE_NAME;
If it is for a generic string then the above one will work...
Don't have Access at my current location, but please try this.
SELECT RIGHT(FIELD_NAME, LEN(FIELD_NAME)-4)
FROM TABLE_NAME
The LEFT(FIELD_NAME, 4) will return the first 4 caracters of FIELD_NAME.
What you need to do is :
SELECT MID(FIELD_NAME, 5)
FROM TABLE_NAME
If you have a FIELD_NAME of 10 caracters, the function will return the 6 last caracters (chopping the first 4)!