ORDER BY in SQL - sql

I have a column of a type string that contains values in rows like:
1-1
1-5
1-14
1-7
1-3
Now if I use the ORDER BY on that column I get the order as:
1-1
1-14
1-3
1-5
1-7
What would be the proper way to order it as 1-1, 1-3, 1-5,1-7,1-14
Thank you for your time

Assuming your first character may also vary:
order by convert(substr(my_field, 1, locate(my_field, '-') - 1) as int),
convert(substr(my_field, locate(my_field, '-') + 1) as int)

You can rename the "1-1" into "1-01"

The proper way would be to store them as integers in different columns.

Try this:
SELECT * FROM
(
SELECT '1-1' Id
UNION
SELECT '1-5' Id
UNION
SELECT '1-14' Id
UNION
SELECT '1-7' Id
UNION
SELECT '1-3' Id
) a
ORDER BY CAST(REPLACE(Id, '-', '') AS UNSIGNED)

Is there a possibility to split the column in 2 separate columns and concat the two together after sorting them individually?
The problem now is that your column is sorting as strings not as integers.

If there's absolutely impossible to make any changes to the structure, I'd say that Carl Manaster's method is best. That would work slow on large data sets though.
You can also try to add a "sort" column (and index it), then each time a new code is added you can calculate it's value e.g.:
1-5 becomes 1000 + 5 = 1005
1-14 becomes 1000 + 14 = 1014
and save it to that sort column. That will work much faster.
You can also write a simple trigger so that this sort value is calculated automatically.

You can use something like (if the first letter doesn-t metter)
order by CAST(SUBSTRING(field, CHARINDEX(field,'-',0)+1, LEN(field)+1) as int)
Not very pretty though..

SELECT *
FROM (
SELECT '1-1' Id
UNION ALL
SELECT '1-5' Id
UNION ALL
SELECT '1-14' Id
UNION ALL
SELECT '1-7' Id
UNION ALL
SELECT '1-3' Id
UNION ALL
SELECT '10-4' Id
) a
ORDER BY
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(id, '-', -2), '-', 1) AS UNSIGNED),
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(id, '-', -1), '-', 1) AS UNSIGNED)

Related

How to unnest BigQuery nested records into multiple columns

I am trying to unnest the below table .
Using the below unnest query to flatten the table
SELECT
id,
name ,keyword
FROM `project_id.dataset_id.table_id`
,unnest (`groups` ) as `groups`
where id = 204358
Problem is , this duplicates the rows (except name) as is the case with flattening the table.
How can I modify the query to put the names in two different columns rather than rows.
Expected output below -
That's because the comma is a cross join - in combination with an unnested array it is a lateral cross join. You repeat the parent row for every row in the array.
One problem with pivoting arrays is that arrays can have a variable amount of rows, but a table must have a fixed amount of columns.
So you need a way to decide for a certain row that becomes a certain column.
E.g. with
SELECT
id,
name,
groups[ordinal(1)] as firstArrayEntry,
groups[ordinal(2)] as secondArrayEntry,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
If your array had a key-value pair you could decide using the key. E.g.
SELECT
id,
name,
(select value from unnest(groups) where key='key1') as key1,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
But that doesn't seem to be the case with your table ...
A third option could be PIVOT in combination with your cross-join solution but this one has restrictions too: and I'm not sure how computation-heavy this is.
Consider below simple solution
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (1, 2))
if applied to sample data in your question - output is
Note , when you apply to your real case - you just need to know how many such name_NNN columns to expect and extend respectively list - for example for offset + 1 in (1, 2, 3, 4, 5)) if you expect 5 such columns
In case if for whatever reason you want improve this - use below where everything is built dynamically for you so you don't need to know in advance how many columns it will be in the output
execute immediate (select '''
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (''' || string_agg('' || pos, ', ') || '''))
'''
from (select pos from (
select max(array_length(`groups`)) cnt
from `project_id.dataset_id.table_id`
), unnest(generate_array(1, cnt)) pos
))
Your question is a little unclear, because it does not specify what to do with other keywords or other columns. If you specifically want the first two values in the array for keyword "OVG", you can unnest the array and pull out the appropriate names:
SELECT id,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1
) as name_1,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1 OFFSET 1
) as name_2,
'OVG' as keyword
FROM `project_id.dataset_id.table_id` t
WHERE id = 204358;

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Group rows with similar strings

I have searched a lot, but most of solutions are for concatenation option and not what I really want.
I have a table called X (in a Postgres database):
anm_id anm_category anm_sales
1 a_dog 100
2 b_dog 50
3 c_dog 60
4 a_cat 70
5 b_cat 80
6 c_cat 40
I want to get total sales by grouping 'a_dog', 'b_dog', 'c_dog' as dogs and 'a_cat', 'b_cat', 'c_cat' as cats.
I cannot change the data in the table as it is an external data base from which I am supposed to get information only.
How to do this using an SQL query? It does not need to be specific to Postgres.
Use case statement to group the animals of same categories together
SELECT CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END AS Animals_category,
Sum(anm_sales) AS total_sales
FROM yourtables
GROUP BY CASE
WHEN anm_category LIKE '%dog' THEN 'Dogs'
WHEN anm_category LIKE '%cat' THEN 'cats'
ELSE 'Others'
END
Also this query should work with most of the databases.
By using PostgreSQL's split_part()
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_part(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle
By creating split_str() in MySQL
select animal||'s' animal_cat,count(*) total_sales,sum(anm_sales) sales_sum from(
select split_str(anm_cat,'_',2) animal,anm_sales from x
)t
group by animal
sqlfiddle
You could group by a substr of anm_catogery:
SELECT SUBSTR(anm_catogery, 3) || 's', COUNT(*)
FROM x
GROUP BY anm_catogery
If you have a constant length of the appendix like in the example:
SELECT CASE right(anm_category, 3) AS animal_type -- 3 last char
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;
You don't need a CASE statement at all, but if you use one, make it a "simple" CASE:
Simplify nested case when statement
Use a positional reference instead of repeating a possibly lengthy expression.
If the length varies, but there is always a single underscore like in the example:
SELECT split_part(anm_category, '_', 2) AS animal_type -- word after "_"
, sum(anm_sales) AS total_sales
FROM x
GROUP BY 1;

Sort string as number in sql server

I have a column that contains data like this. dashes indicate multi copies of the same invoice and these have to be sorted in ascending order
790711
790109-1
790109-11
790109-2
i have to sort it in increasing order by this number but since this is a varchar field it sorts in alphabetical order like this
790109-1
790109-11
790109-2
790711
in order to fix this i tried replacing the -(dash) with empty and then casting it as a number and then sorting on that
select cast(replace(invoiceid,'-','') as decimal) as invoiceSort...............order by invoiceSort asc
while this is better and sorts like this
invoiceSort
790711 (790711) <-----this is wrong now as it should come later than 790109
790109-1 (7901091)
790109-2 (7901092)
790109-11 (79010911)
Someone suggested to me to split invoice id on the - (dash ) and order by on the 2 split parts
like=====> order by split1 asc,split2 asc (790109,1)
which would work i think but how would i split the column.
The various split functions on the internet are those that return a table while in this case i would be requiring a scalar function.
Are there any other approaches that can be used? The data is shown in grid view and grid view doesn't support sorting on 2 columns by default ( i can implement it though :) ) so if any simpler approaches are there i would be very nice.
EDIT : thanks for all the answers. While every answer is correct i have chosen the answer which allowed me to incorporate these columns in the GridView Sorting with minimum re factoring of the sql queries.
Judicious use of REVERSE, CHARINDEX, and SUBSTRING, can get us what we want. I have used hopefully-explanatory columns names in my code below to illustrate what's going on.
Set up sample data:
DECLARE #Invoice TABLE (
InvoiceNumber nvarchar(10)
);
INSERT #Invoice VALUES
('790711')
,('790709-1')
,('790709-11')
,('790709-21')
,('790709-212')
,('790709-2')
SELECT * FROM #Invoice
Sample data:
InvoiceNumber
-------------
790711
790709-1
790709-11
790709-21
790709-212
790709-2
And here's the code. I have a nagging feeling the final expressions could be simplified.
SELECT
InvoiceNumber
,REVERSE(InvoiceNumber)
AS Reversed
,CHARINDEX('-',REVERSE(InvoiceNumber))
AS HyphenIndexWithinReversed
,SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber))
AS ReversedWithoutAffix
,SUBSTRING(InvoiceNumber,1+LEN(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber))),LEN(InvoiceNumber))
AS AffixIncludingHyphen
,SUBSTRING(InvoiceNumber,2+LEN(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber))),LEN(InvoiceNumber))
AS AffixExcludingHyphen
,CAST(
SUBSTRING(InvoiceNumber,2+LEN(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber))),LEN(InvoiceNumber))
AS int)
AS AffixAsInt
,REVERSE(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber)))
AS WithoutAffix
FROM #Invoice
ORDER BY
-- WithoutAffix
REVERSE(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber)))
-- AffixAsInt
,CAST(
SUBSTRING(InvoiceNumber,2+LEN(SUBSTRING(REVERSE(InvoiceNumber),1+CHARINDEX('-',REVERSE(InvoiceNumber)),LEN(InvoiceNumber))),LEN(InvoiceNumber))
AS int)
Output:
InvoiceNumber Reversed HyphenIndexWithinReversed ReversedWithoutAffix AffixIncludingHyphen AffixExcludingHyphen AffixAsInt WithoutAffix
------------- ---------- ------------------------- -------------------- -------------------- -------------------- ----------- ------------
790709-1 1-907097 2 907097 -1 1 1 790709
790709-2 2-907097 2 907097 -2 2 2 790709
790709-11 11-907097 3 907097 -11 11 11 790709
790709-21 12-907097 3 907097 -21 21 21 790709
790709-212 212-907097 4 907097 -212 212 212 790709
790711 117097 0 117097 0 790711
Note that all you actually need is the ORDER BY clause, the rest is just to show my working, which goes like this:
Reverse the string, find the hyphen, get the substring after the hyphen, reverse that part: This is the number without any affix
The length of (the number without any affix) tells us how many characters to drop from the start in order to get the affix including the hyphen. Drop an additional character to get just the numeric part, and convert this to int. Fortunately we get a break from SQL Server in that this conversion gives zero for an empty string.
Finally, having got these two pieces, we simple ORDER BY (the number without any affix) and then by (the numeric value of the affix). This is the final order we seek.
The code would be more concise if SQL Server allowed us to say SUBSTRING(value, start) to get the string starting at that point, but it doesn't, so we have to say SUBSTRING(value, start, LEN(value)) a lot.
Try this one -
Query:
DECLARE #Invoice TABLE (InvoiceNumber VARCHAR(10))
INSERT #Invoice
VALUES
('790711')
, ('790709-1')
, ('790709-21')
, ('790709-11')
, ('790709-211')
, ('790709-2')
;WITH cte AS
(
SELECT
InvoiceNumber
, lenght = LEN(InvoiceNumber)
, delimeter = CHARINDEX('-', InvoiceNumber)
FROM #Invoice
)
SELECT InvoiceNumber
FROM cte
CROSS JOIN (
SELECT repl = MAX(lenght - delimeter)
FROM cte
WHERE delimeter != 0
) mx
ORDER BY
SUBSTRING(InvoiceNumber, 1, ISNULL(NULLIF(delimeter - 1, -1), lenght))
, RIGHT(REPLICATE('0', repl) + SUBSTRING(InvoiceNumber, delimeter + 1, lenght), repl)
Output:
InvoiceNumber
-------------
790709-1
790709-2
790709-11
790709-21
790709-211
790711
Try this
SELECT invoiceid FROM Invoice
ORDER BY
CASE WHEN PatIndex('%[-]%',invoiceid) > 0
THEN LEFT(invoiceid,PatIndex('%[-]%',invoiceid)-1)
ELSE invoiceid END * 1
,CASE WHEN PatIndex('%[-]%',REVERSE(invoiceid)) > 0
THEN RIGHT(invoiceid,PatIndex('%[-]%',REVERSE(invoiceid))-1)
ELSE NULL END * 1
SQLFiddle Demo
Above query uses two case statements
Sorts first part of Invoiceid 790109-1 (eg: 790709)
Sorts second part of Invoiceid after splitting with '-' 790109-1 (eg: 1)
For detailed understanding check the below SQLfiddle
SQLFiddle Detailed Demo
OR use 'CHARINDEX'
SELECT invoiceid FROM Invoice
ORDER BY
CASE WHEN CHARINDEX('-', invoiceid) > 0
THEN LEFT(invoiceid, CHARINDEX('-', invoiceid)-1)
ELSE invoiceid END * 1
,CASE WHEN CHARINDEX('-', REVERSE(invoiceid)) > 0
THEN RIGHT(invoiceid, CHARINDEX('-', REVERSE(invoiceid))-1)
ELSE NULL END * 1
Order by each part separately is the simplest and reliable way to go, why look for other approaches? Take a look at this simple query.
select *
from Invoice
order by Convert(int, SUBSTRING(invoiceid, 0, CHARINDEX('-',invoiceid+'-'))) asc,
Convert(int, SUBSTRING(invoiceid, CHARINDEX('-',invoiceid)+1, LEN(invoiceid)-CHARINDEX('-',invoiceid))) asc
Plenty of good answers here, but I think this one might be the most compact order by clause that is effective:
SELECT *
FROM Invoice
ORDER BY LEFT(InvoiceId,CHARINDEX('-',InvoiceId+'-'))
,CAST(RIGHT(InvoiceId,CHARINDEX('-',REVERSE(InvoiceId)+'-'))AS INT)DESC
Demo: - SQL Fiddle
Note, I added the '790709' version to my test, since some of the methods listed here aren't treating the no-suffix version as lesser than the with-suffix versions.
If your invoiceID varies in length, before the '-' that is, then you'd need:
SELECT *
FROM Invoice
ORDER BY CAST(LEFT(list,CHARINDEX('-',list+'-')-1)AS INT)
,CAST(RIGHT(InvoiceId,CHARINDEX('-',REVERSE(InvoiceId)+'-'))AS INT)DESC
Demo with varying lengths before the dash: SQL Fiddle
My version:
declare #Len int
select #Len = (select max (len (invoiceid) - charindex ( '-', invoiceid))-1 from MyTable)
select
invoiceid ,
cast (SUBSTRING (invoiceid ,1,charindex ( '-', invoiceid )-1) as int) * POWER (10,#Len) +
cast (right(invoiceid, len (invoiceid) - charindex ( '-', invoiceid) ) as int )
from MyTable
You can implement this as a new column to your table:
ALTER TABLE MyTable ADD COLUMN invoice_numeric_id int null
GO
declare #Len int
select #Len = (select max (len (invoiceid) - charindex ( '-', invoiceid))-1 from MyTable)
UPDATE TABLE MyTable
SET invoice_numeric_id = cast (SUBSTRING (invoiceid ,1,charindex ( '-', invoiceid )-1) as int) * POWER (10,#Len) +
cast (right(invoiceid, len (invoiceid) - charindex ( '-', invoiceid) ) as int )
One way is to split InvoiceId into its parts, and then sort on the parts. Here I use a derived table, but it could be done with a CTE or a temporary table as well.
select InvoiceId, InvoiceId1, InvoiceId2
from
(
select
InvoiceId,
substring(InvoiceId, 0, charindex('-', InvoiceId, 0)) as InvoiceId1,
substring(InvoiceId, charindex('-', InvoiceId, 0)+1, len(InvoiceId)) as InvoiceId2
FROM Invoice
) tmp
order by
cast((case when len(InvoiceId1) > 0 then InvoiceId1 else InvoiceId2 end) as int),
cast((case when len(InvoiceId1) > 0 then InvoiceId2 else '0' end) as int)
In the above, InvoiceId1 and InvoiceId2 are the component parts of InvoiceId. The outer select includes the parts, but only for demonstration purposes - you do not need to do this in your select.
The derived table (the inner select) grabs the InvoiceId as well as the component parts. The way it works is this:
When there is a dash in InvoiceId, InvoiceId1 will contain the first part of the number and InvoiceId2 will contain the second.
When there is not a dash, InvoiceId1 will be empty and InvoiceId2 will contain the entire number.
The second case above (no dash) is not optimal because ideally InvoiceId1 would contain the number and InvoiceId2 would be empty. To make the inner select work optimally would decrease the readability of the select. I chose the non-optimal, more readable, approach since it is good enough to allow for sorting.
This is why the ORDER BY clause tests for the length - it needs to handle the two cases above.
Demo at SQL Fiddle
Break the sort into two sections:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE TestData
(
data varchar(20)
)
INSERT TestData
SELECT '790711' as data
UNION
SELECT '790109-1'
UNION
SELECT '790109-11'
UNION
SELECT '790109-2'
Query 1:
SELECT *
FROM TestData
ORDER BY
FLOOR(CAST(REPLACE(data, '-', '.') AS FLOAT)),
CASE WHEN CHARINDEX('-', data) > 0
THEN CAST(RIGHT(data, len(data) - CHARINDEX('-', data)) AS INT)
ELSE 0
END
Results:
| DATA |
-------------
| 790109-1 |
| 790109-2 |
| 790109-11 |
| 790711 |
Try:
select invoiceid ... order by Convert(decimal(18, 2), REPLACE(invoiceid, '-', '.'))

SQL WHERE IN (...) sort by order of the list?

Let's say I have query a database with a where clause
WHERE _id IN (5,6,424,2)
Is there any way for the returned cursor to be sorted in the order that the _id's where listed in the list? _id attribute from first to last in Cursor to be 5, 6, 424, 2?
This happens to be on Android through a ContentProvider, but that's probably not relevant.
Select ID list using subquery and join with it:
select t1.*
from t1
inner join
(
select 1 as id, 1 as num
union all select 5, 2
union all select 3, 3
) ids on t1.id = ids.id
order by ids.num
UPD: Code fixed
One approach might be to do separate SQL queries with a UNION between each. You would obviously issue each query in the order you would like it returned to you.
...
order by
case when _id=5 then 1
when _id=6 then 2
end
etc.
You can join it to a virtual table that contains the list required in sort order
select tbl.*
from tbl
inner join (
select 1 as sorter, 5 as value union all
select 2, 6 union all
select 3, 424 union all
select 4, 2) X on tbl._id = X.value
ORDER BY X.sorter
List? You don't have a list! ;)
This:
WHERE _id IN (5,6,424,2)
is mere syntactic sugar for this:
WHERE (
_id = 5
OR _id = 6
OR _id = 424
OR _id = 2
)
SQL has but one data structure, being the table. Your (5,6,424,2) isn't a table either! :)
You could create a table of values but your next problem is that tables do not have any logical ordering. Therefore, as per #cyberkiwi's answer, you'd have to create a column explicitly to model the sort order. And in order to make it explicit to the calling application, ensure you expose this column in the SELECT clause of your query.