Natural or Human Sort order - sql

I have been working on this on for months. I just cannot get the natural (True alpha-numeric) results. I am shocked that I cannot get them as I have been able to in RPG since 1992 with EBCDIC.
I am looking for any solution in SQL, VBS or simple excel or access. Here is the data I have:
299-8,
3410L-87,
3410L-88,
420-A20,
420-A21,
420A-40,
4357-3,
AN3H10A,
K117GM-8,
K129-1,
K129-15,
K271B-200L,
K271B-38L,
K271D-200EL,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
M8000-3,
MS24665-1,
SK271B-200L,
SAYA4008
The order I am looking for is the true alpha-numeric order as below:
AN3H10A,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
K117GM-8,
K129-1,
K129-15,
MS24665-1,
M8000-3,
SAYA4008,
SK271B-200L
The inventory is 7800 records so I have had some problems with processing power as well.
Any help would be appreciated.
Jeff

In native Excel, you can add multiple sorting columns to return the ASCII code for each character, but if the character is a number, then add a large number to the code (e.g 1000).
Then sort on each of the helper columns, including the first column in the table, but not in the sort.
The formula:
=IFERROR(CODE(MID($A1,COLUMNS($A:A),1))+AND(CODE(MID($A1,COLUMNS($A:A),1))>=48,CODE(MID($A1,COLUMNS($A:A),1))<=57)*1000,"")
The Sort dialog:
The results:
You can implement a similar algorithm using VBA, and probably SQL also. I dunno about VBS or Access.

You could try using format for left padding the string in order by
select column
from my_table
order by Format(column, "0000000000")

Add a sorting column:
, iif (left(fieldname, 1) between '0' and '9', 1, 0) sortField
etc
order by sortField, FieldName

Lets say you have your data in column "A". If you put this formula in column "B" =IFERROR(IF(LEFT(A1,1)+1>0,"ZZZZZZZ "&A1,A1),A1), it will automatically add Z in front of all numerical values, so that they will naturally appear after all alphabetical values when you sort A-Z. later you can find&replace that funny ZZZZZZ string...

There a number of approaches, but likely the least amount of work is to build two columns that split out the delimiter (-) in this case.
You then “pad” the results (spaces, or 0) right justified, and then sort on the two columns.
So in the query builder we have this:
SELECT Field1,
Format(
Mid(field1,1,IIf(InStr(field1,"-")=0,50,InStr(field1,"-")-1)),
">##########") AS Expr1,
Format(
Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1)),
">##########") AS Expr2
FROM Data
When we run the above raw query we get this:
So now in the query builder, simply sort on the first derived column, and then sort on the 2nd derived column.
Eg this:
Run the query, and we get this result:
Edit:
Looking at you desired results, it looks like above sort is wrong. We have to RIGHT just and pad with 0’s.
So this 2nd try:
SELECT Field1,
Left(Mid(field1,1,IIf(InStr(field1,"-")=0,30,InStr(field1,"-")-1))
& String(30,"0"),30) AS Expr1,
Left(Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1))
& String(30,"0"),30) AS Expr2
FROM Data
The results are thus this:
Given your small table size, then the above query should perform quite well.

Related

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

Pentaho Dynamic SQL queries

I have a Pentaho CDE project in development and i wanted to display a chart wich depends on several parameters (like month, year, precise date, country, etc). But when i want to "add" another parameter to my query, it doesn't work anymore... So i'm sure i'm doing something wrong but what ? Please take a look for the parameter month for example :
Select_months_query : (this is for my checkbox)
SELECT
"All" AS MONTH(TransactionDate)
UNION
SELECT DISTINCT MONTH(TransactionDate) FROM order ORDER BY MONTH(TransactionDate);
Select_barchart_query : (this is for my chart, don't mind the other tables)
SELECT pginit.Family, SUM(order.AmountEUR) AS SALES
FROM pginit INNER JOIN statg ON pginit.PG = statg.PGInit INNER JOIN order ON statg.StatGroup = order.StatGroup
WHERE (MONTH(order.TransactionDate) IN (${month}) OR "All" IN (${month}) OR ${month} IS NULL) AND
/*/* Apply the same pattern for another parameter (like year for example) *\*\
GROUP BY pginit.Family
ORDER BY SALES;
(Here, ${month} is a parameter in CDE)
Any ideas on how to do it ?
I read something there that said to use CASE clauses... But how ?
http://forums.pentaho.com/showthread.php?136969-Parametrized-SQL-clause-in-CDE&highlight=dynamic
Thank you for your help !
Try simplifying that query until it runs and returns something and work from there.
Here are some things I would look into as possible causes:
I think you need single quotes around ${parameter} expressions if they're strings;
"All" should probably be 'All' (single quotes instead of double quotes);
Avoid multi-line comments. I don't think you can have multi-line comments in CDE SQL queries, although -- for single line comments usually works.
Be careful with multi-valued parameters; they are passed as arrays, which CDA will convert into comma separated lists. Try with a single valued parameter, using = instead of IN.

How to order by a text column that contains int in SQL?

My current SQl statement is:
SELECT distinct [Position] FROM [Drive List] ORDER BY [Position]ASC
And the output is ordered as seen below:
1_A_0_0_0_0_0
1_A_0_0_0_0_1
1_A_0_0_0_0_10
1_A_0_0_0_0_11
1_A_0_0_0_0_12
1_A_0_0_0_0_13 - 1_A_0_0_0_0_24, and then 0_2-0_9
The field type is Text in a Microsoft Access Database. Why is the order jumbled and is there any way of correctly sorting the values?
"Why the order is jumbled":
The order is only jumbled because you are compiling it with your human brain and are applying more value than the computer does because of your symbolic understand of what the values represent. Parse the output as though you could only understand it as an array of character strings, and you were trying to determine which string is the greatest, all the while knowing nothing about the symbolic value of each character. You will find that the output your query generated is perfectly logical and not at all jumbled.
"Any way of correctly sorting the values"
This is a design issue and it should be addressed if it really is a problem.
Change 1_A_0_0_0_0_0 to 1_A_0_0_0_0_00
Change 1_A_0_0_0_0_1 to 1_A_0_0_0_0_01
Change 1_A_0_0_0_0_2 to 1_A_0_0_0_0_02
etc
This will make the problem go away.
Use these two separate queries:
SELECT distinct [Position] FROM [Drive List] WHERE [Position] LIKE '1_A_0_0_0_0_?' ORDER BY [Position] ASC
SELECT distinct [Position] FROM [Drive List] WHERE [Position] LIKE '1_A_0_0_0_0_??' ORDER BY [Position] ASC
...add to a temp table and append to get the results to display properly.
If you want sorting which incorporates the numerical values of those substrings, you can cast them to numbers.
In the simplest case, you're concerned with only the digit(s) after the 12th character. That case would be fairly easy.
SELECT
sub.Position,
Left(sub.Position, 12) AS sort_1,
Val(Mid(sub.Position, 13)) AS sort_2
FROM
(
SELECT DISTINCT [Position] FROM [Drive List]
) AS sub
ORDER BY 2, 3;
Or if you want to display only the Position field, you could do it this way ...
SELECT
sub.Position
FROM
(
SELECT DISTINCT [Position] FROM [Drive List]
) AS sub
ORDER BY
Left(sub.Position, 12),
Val(Mid(sub.Position, 13));
However, your actual situation could be much more challenging ... perhaps the initial substring (everything up to and including the final _ character) is not consistently 12 characters long, and/or includes digits which you also want sorted numerically. You could then use a mix of InStr(), Mid(),and Val() expressions to parse out the values to sort. But that task could get scary bad real fast! It could be less effort to alter the stored values so they sort correctly in character order as #Justin suggested.
Your jumbled order is caused because it is a text field. As for solutions you could attempt to add an additional column in your table that is numeric and order by that instead of Position. I would need more information about what data you have and what it means to suggest a good way to do this.
This is the right sort for a string.
In alphabetical order 10 come before 2.
For orderby the last number you can try with SUBSTRING and CAST (or CONVERT) commands

Problem with MySQL Select query with "IN" condition

I found a weird problem with MySQL select statement having "IN" in where clause:
I am trying this query:
SELECT ads.*
FROM advertisement_urls ads
WHERE ad_pool_id = 5
AND status = 1
AND ads.id = 23
AND 3 NOT IN (hide_from_publishers)
ORDER BY rank desc
In above SQL hide_from_publishers is a column of advertisement_urls table, with values as comma separated integers, e.g. 4,2 or 2,7,3 etc.
As a result, if hide_from_publishers contains same above two values, it should return only record for "4,2" but it returns both records
Now, if I change the value of hide_for_columns for second set to 3,2,7 and run the query again, it will return single record which is correct output.
Instead of hide_from_publishers if I use direct values there, i.e. (2,7,3) it does recognize and returns single record.
Any thoughts about this strange problem or am I doing something wrong?
There is a difference between the tuple (1, 2, 3) and the string "1, 2, 3". The former is three values, the latter is a single string value that just happens to look like three values to human eyes. As far as the DBMS is concerned, it's still a single value.
If you want more than one value associated with a record, you shouldn't be storing it as a comma-separated value within a single field, you should store it in another table and join it. That way the data remains structured and you can use it as part of a query.
You need to treat the comma-delimited hide_from_publishers column as a string. You can use the LOCATE function to determine if your value exists in the string.
Note that I've added leading and trailing commas to both strings so that a search for "3" doesn't accidentally match "13".
select ads.*
from advertisement_urls ads
where ad_pool_id = 5
and status = 1
and ads.id = 23
and locate(',3,', ','+hide_from_publishers+',') = 0
order by rank desc
You need to split the string of values into separate values. See this SO question...
Can Mysql Split a column?
As well as the supplied example...
http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/
Here is another SO question:
MySQL query finding values in a comma separated string
And the suggested solution:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set

How do I sort a VARCHAR column in PostgreSQL that contains words and numbers?

I need to order a select query using a varchar column, using numerical and text order. The query will be done in a java program, using jdbc over postgresql.
If I use ORDER BY in the select clause I obtain:
1
11
2
abc
However, I need to obtain:
1
2
11
abc
The problem is that the column can also contain text.
This question is similar (but targeted for SQL Server):
How do I sort a VARCHAR column in SQL server that contains words and numbers?
However, the solution proposed did not work with PostgreSQL.
Thanks in advance, regards,
I had the same problem and the following code solves it:
SELECT ...
FROM table
order by
CASE WHEN column < 'A'
THEN lpad(column, size, '0')
ELSE column
END;
The size var is the length of the varchar column, e.g 255 for varying(255).
You can use regular expression to do this kind of thing:
select THECOL from ...
order by
case
when substring(THECOL from '^\d+$') is null then 9999
else cast(THECOL as integer)
end,
THECOL
First you use regular expression to detect whether the content of the column is a number or not. In this case I use '^\d+$' but you can modify it to suit the situation.
If the regexp doesn't match, return a big number so this row will fall to the bottom of the order.
If the regexp matches, convert the string to number and then sort on that.
After this, sort regularly with the column.
I'm not aware of any database having a "natural sort", like some know to exist in PHP. All I've found is various functions:
Natural order sort in Postgres
Comment in the PostgreSQL ORDER BY documentation