Order by a field containing Numbers and Letters - sql

I need to extract data from an existing Padadox database under Delphi XE2 (yes, i more than 10 years divide them...).
i need to order the result depending on a field (id in the example) containing values such as : '1', '2 a', '100', '1 b', '50 bis'... and get this :
- 1
- 1 b
- 2 a
- 50 bis
- 100
maybe something like that could do it, but those keywords don't exist :
SELECT id, TRIM(TRIM(ALPHA FROM id)) as generated, TRIM(TRIM(NUMBER FROM id)) as generatedbis, etc
FROM "my.db"
WHERE ...
ORDER BY generated, generatedbis
how could i achieve such ordering with paradox... ?

Try this:
SELECT id, CAST('0' + id AS INTEGER) A
FROM "my.db"
ORDER BY A, id

These ideas spring to mind:
create a sort function in delphi that does the sort client-side, using a comparison/mapping function that rearranges the string into something that is compariable, maybe lexographically.
add a column to the table whose data you wish to sort, that contains a modification of the values that can be compared with a standard string comparison and thus will work with ORDER BY
add a stored function to paradox that does the modification of the values, and use this function in the ORDER BY clause.
by modification, I mean something like, separate the string into components, and re-join them with each component right-padded with enough spaces so that all of the components are in the same position in the string. This will only work reliably if you can say with confidence that for each of the components, no value will exceed a certain length in the database.
I am making these suggestions little/no knowledge of paradox or delphi, so you will have to take my suggestions with a grain of salt.

Related

Can I turn multiple values stored in a single field into a set of rows from within a select statement?

This is asked regarding an Oracle 11g database.
I'm trying to query an Atlassian Confluence calendar table. It stores calendar entries for an entire calendar into a single value in a single row, which is this gigantic glob of iCal crap.
If the fields within each entry were in a consistent order, my regex fu would be strong enough to parse out the particular entry I am searching for... but since I need to search for a date, the description, and the summary, all of which can apparently be in any order within the BEGIN/END VEVENT, this is impossible. I'm halfway certain it would be impossible even with lookahead and lookbehind.
Is there a sql (not pl-sql) construction that would chop this single string/blob value out into multiple rows, so that I could do something like:
select * from (chopped up value) where x like '%something%';
This would make it sort of the reverse of a wm_concat() or group_concat...
A typical entry looks something like this (and it has 50 or 60 already):
BEGIN:VEVENT
UID:20130724T153322Z--922125579#atlassianzzz.zzz.edu
SUMMARY:Richard Smichard
ATTENDEE;X-CONFLUENCE-USER=rismich:https://atlassianzzz.zzz.edu/c
onfluence/display/~rismich
LOCATION:
DESCRIPTION:Primary
DTSTART;VALUE=DATE:20130726
DTEND;VALUE=DATE:20130729
DTSTAMP:20130724T153322Z
CREATED:20130724T153322Z
LAST-MODIFIED:20130724T153322Z
ORGANIZER;X-CONFLUENCE-USER=botard:MAILTO:botard#zzz.edu
SEQUENCE:0
END:VEVENT
I can't use PL-SQL or build a proper parser because the environment this will run in doesn't make that possible. I get to run a select statement, and it either returns the value I'm looking for, or it doesn't.
Also, NoSQL sucks. Big time.
This is a quick test:
with w1 as
(
select 'BEGIN:VEVENT\
UID:20130724T153322Z--922125579#atlassianzzz.zzz.edu
SUMMARY:Richard Smichard
ATTENDEE;X-CONFLUENCE-USER=rismich:https://atlassianzzz.zzz.edu/c
onfluence/display/~rismich
LOCATION:
DESCRIPTION:Primary
DTSTART;VALUE=DATE:20130726
DTEND;VALUE=DATE:20130729
DTSTAMP:20130724T153322Z
CREATED:20130724T153322Z
LAST-MODIFIED:20130724T153322Z
ORGANIZER;X-CONFLUENCE-USER=botard:MAILTO:botard#zzz.edu
SEQUENCE:0
END:VEVENT' text from dual
),
w2 as
(
select 'SUMMARY' label from dual
union all
select 'DESCRIPTION' label from dual
)
select regexp_substr(w1.text, 'UID.*') id, w2.label,
substr(regexp_substr(w1.text, w2.label || '.*'),
instr(regexp_substr(w1.text, w2.label || '.*'), ':') + 1) spl
from w1, w2;
It gives:
1 UID:20130724T153322Z--922125579#atlassianzzz.zzz.edu SUMMARY Richard Smichard
2 UID:20130724T153322Z--922125579#atlassianzzz.zzz.edu DESCRIPTION Primary

Create a select statement that turn one column into Mutliple columns using Char Index or something similar

1) Question 1 : Splitting after every 6th digit, basically I need to split my columns after every 5th digit so that 123456 creates to columns , ones with 12345 and a second with 6, I have tried using the code below with no result. there are no spaces or symbols , just digits.
substring(COA.UserCode2,5,charindex('',COA.UserCode2)) as Account,
substring(COA.UserCode2,6,charindex('',COA.UserCode2)) as Project
2) Question 2: Splitting after every * , I can get the first one below to work (Fund) but my cost center and Source don't work basically if I have a string like 1234*34*500, I need the column for fund to have 1234 ( This I got already) , my Cost center to say 34 and my source to say 500
substring(COA.UserCode3, 1,charindex('*',COA.UserCode3)) as Fund,
substring(COA.UserCode3, 3,charindex('*',COA.UserCode3)+1) as CostCenter,
substring(COA.UserCode3, 1,charindex('*',COA.UserCode3)) as Source
Not knowing what DBMS you use (if it has specific SQL extensions etcetera) or how the data looks here is at least some suggestions:
1) If the data is fixed in size (always 6 chars long) you could do this:
SUBSTRING(UserCode2,1,5) as Account, SUBSTRING(UserCode2,6,1) as Project
2) For the second question you could do any of these:
If the data is fixed:
SUBSTRING(UserCode3,1,4) as Fund,
SUBSTRING(UserCode3,6,2) as CostCenter,
SUBSTRING(UserCode3,9,2) as Source
Or if the data is variable and you have to split on the * chars:
SUBSTRING(UserCode3,1,CHARINDEX('*',UserCode3,1)-1) as Fund,
SUBSTRING(UserCode3,CHARINDEX('*',UserCode3,1)+1,CHARINDEX('*',UserCode3,CHARINDEX('*',UserCode3,1)+1) - CHARINDEX('*',UserCode3,1)-1) as CostCenter,
RIGHT(UserCode3,3) as Source
Doing this many CHARINDEX and SUBSTRING functions would probably be bad for performance, but without knowing more about the data it's a bit difficult making informed suggestions.

Splitting text in SQL Server stored procedure

I'm working with a database, where one of the fields I extract is something like:
1-117 3-134 3-133
Each of these number sets represents a different set of data in another table. Taking 1-117 as an example, 1 = equipment ID, and 117 = equipment settings.
I have another table from which I need to extract data based on the previous field. It has two columns that split equipment ID and settings. Essentially, I need a way to go from the queried column 1-117 and run a query to extract data from another table where 1 and 117 are two separate corresponding columns.
So, is there anyway to split this number to run this query?
Also, how would I split those three numbers (1-117 3-134 3-133) into three different query sets?
The tricky part here is that this column can have any number of sets here (such as 1-117 3-133 or 1-117 3-134 3-133 2-131).
I'm creating these queries in a stored procedure as part of a larger document to display the extracted data.
Thanks for any help.
Since you didn't provide the DB vendor, here's two posts that answer this question for SQL Server and Oracle respectively...
T-SQL: Opposite to string concatenation - how to split string into multiple records
Splitting comma separated string in a PL/SQL stored proc
And if you're using some other DBMS, go search for "splitting text ". I can almost guarantee you're not the first one to ask, and there's answers for every DBMS flavor out there.
As you said the format is constant though, you could also do something simpler using a SUBSTRING function.
EDIT in response to OP comment...
Since you're using SQL Server, and you said that these values are always in a consistent format, you can do something as simple as using SUBSTRING to get each part of the value and assign them to T-SQL variables, where you can then use them to do whatever you want, like using them in the predicate of a query.
Assuming that what you said is true about the format always being #-### (exactly 1 digit, a dash, and 3 digits) this is fairly easy.
WITH EquipmentSettings AS (
SELECT
S.*,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 5, 1) EquipmentID,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 3, 3) Settings
FROM
SourceTable S
INNER JOIN master.dbo.spt_values V
ON V.Value BETWEEN 1 AND Len(S.AwfulMultivalue) / 6
WHERE
V.type = 'P'
)
SELECT
E.Whatever,
D.Whatever
FROM
EquipmentSettings E
INNER JOIN DestinationTable D
ON E.EquipmentID = D.EquipmentID
AND E.Settings = D.Settings
In SQL Server 2005+ this query will support 1365 values in the string.
If the length of the digits can vary, then it's a little harder. Let me know.
Incase if the sets does not increase by more than 4 then you can use Parsename to retrieve the result
Declare #Num varchar(20)
Set #Num='1-117 3-134 3-133'
select parsename(replace (#Num,' ','.'),3)
Result :- 1-117
Now again use parsename on the same resultset
Select parsename(replace(parsename(replace (#Num,' ','.'),3),'-','.'),1)
Result :- 117
If the there are more than 4 values then use split functions

How to handle string ordering in order by clause?

Suppose I want to order the records order by a field (string data type) called STORY_LENGTH. This field is a multi-valued field and I represent the multiple values using commas. For example, for record1, its value is "1" and record2 its value is "1,3" and for record3 its value is "1,2". Now when, I want to order the records according to STORY_LENGTH then records are ordered like this record1 > record3 > record2. Its clear that STORY_LENGTH data type is string and order by ASC is ordering that value considering it as string. But, here comes the problem. For example, when record4="10" and record5="2" and I try to order it looks like record4 > record5 which obviously I don't want. Because 2 > 10 and I am using a string formatted just because of multiple values of the field.
So, anybody, can you help me out of this? I need some good idea to fix.
thanks
Multi-values fields as you describe mean your data model is broken and should be normalized.
Once this is done, querying becomes much more simple.
From what I've understood you want to sort items by second or first number in comma separated values stored in a VARCHAR field. Implementation would depend on database used, for example in MySQL it would look like:
SELECT * FROM stories
ORDER BY CAST(COALESCE(SUBSTRING_INDEX(story_length, ',', -1), '0') AS INTEGER)
Yet it is not generally not good to use such sorting for performance reasons as sorting would require scanning of whole table instead of using index on field.
Edit: After edits it looks like you want to sort on first value and ignore value(s) after comma. As according to some comment above changes in database design are not an option just use following code for sorting:
SELECT * FROM stories
ORDER BY CAST(COALESCE(NULLIF(SUBSTRING_INDEX(story_length, ',', 1), ''), '0') AS INTEGER)

How to sort and display mixed lists of alphas and numbers as the users expect?

Our application has a CustomerNumber field. We have hundreds of different people using the system (each has their own login and their own list of CustomerNumbers). An individual user might have at most 100,000 customers. Many have less than 100.
Some people only put actual numbers into their customer number fields, while others use a mixture of things. The system allows 20 characters which can be A-Z, 0-9 or a dash, and stores these in a VARCHAR2(20). Anything lowercase is made uppercase before being stored.
Now, let's say we have a simple report that lists all the customers for a particular user, sorted by Customer Number. e.g.
SELECT CustomerNumber,CustomerName
FROM Customer
WHERE User = ?
ORDER BY CustomerNumber;
This is a naive solution as the people that only ever use numbers do not want to see a plain alphabetic sort (where "10" comes before "9").
I do not wish to ask the user any unnecessary questions about their data.
I'm using Oracle, but I think it would be interesting to see some solutions for other databases. Please include which database your answer works on.
What do you think the best way to implement this is?
Probably your best bet is to pre-calculate a separate column and use that for ordering and use the customer number for display. This would probably involve 0-padding any internal integers to a fixed length.
The other possibility is to do your sorting post-select on the returned results.
Jeff Atwood has put together a blog posting about how some people calculate human friendly sort orders.
In Oracle 10g:
SELECT cust_name
FROM t_customer c
ORDER BY
REGEXP_REPLACE(cust_name, '[0-9]', ''), TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+'))
This will sort by the first occurence of number, not regarding it's position, i. e.:
customer1 < customer2 < customer10
cust1omer ? customer1
cust8omer1 ? cust8omer2
, where a ? means that the order is undefined.
That suffices for most cases.
To force sort order on case 2, you may add a REGEXP_INSTR(cust_name, '[0-9]', n) to ORDER BY list n times, forcing order on the first appearance of n-th (2nd, 3rd etc.) group of digits.
To force sort order on case 3, you may add a TO_NUMBER(REGEXP_SUBSTR(cust_name, '[0-9]+', n)) to ORDER BY list n times, forcing order of n-th. group of digits.
In practice, the query I wrote is enough.
You may create a function based index on these expressions, but you'll need to force it with a hint, and a one-pass SORT ORDER BY will be performed anyway, as the CBO doesn't trust function-base indexes enough to allow an ORDER BY on them.
You could have a numeric column [CustomerNumberInt] that is only used when the CustomerNumber is purely numeric (NULL otherwise[1]), then
ORDER BY CustomerNumberInt, CustomerNumber
[1] depending on how your SQL version handles NULLs in ORDER BY you might want to default it to zero (or infinity!)
I have a similar horrible situation and have developed a suitably horrible function to deal with it (SQLServer)
In my situation I have a table of "units" (this is a work-tracking system for students, so unit in this context represents a course they're doing). Units have a code, which for the most part is purely numeric, but for various reasons it was made a varchar and they decided to prefix some by up to 5 characters. So they expect 53,123,237,356 to sort normally, but also T53, T123, T237, T356
UnitCode is a nvarchar(30)
Here's the body of the function:
declare #sortkey nvarchar(30)
select #sortkey =
case
when #unitcode like '[^0-9][0-9]%' then left(#unitcode,1) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-1)
when #unitcode like '[^0-9][^0-9][0-9]%' then left(#unitcode,2) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-2)
when #unitcode like '[^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,3) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-3)
when #unitcode like '[^0-9][^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,4) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-4)
when #unitcode like '[^0-9][^0-9][^0-9][^0-9][^0-9][0-9]%' then left(#unitcode,5) + left('000000000000000000000000000000',30-(len(#unitcode))) + right(#unitcode,len(#unitcode)-5)
when #unitcode like '%[^0-9]%' then #unitcode
else left('000000000000000000000000000000',30-len(#unitcode)) + #unitcode
end
return #sortkey
I wanted to shoot myself in the face after writing that, however it works and seems not to kill the server when it runs.
I used this in SQL SERVER and working great: Here the solution is to pad the numeric values with a character in front so that all are of the same string length.
Here is an example using that approach:
select MyCol
from MyTable
order by
case IsNumeric(MyCol)
when 1 then Replicate('0', 100 - Len(MyCol)) + MyCol
else MyCol
end
The 100 should be replaced with the actual length of that column.