Custom Sorting in SQL order by clause? - sql

Here is the situation that I am trying to solve:
I have a query that could return a set of records. The field being sorted by could have a number of different values - for the sake of this question we will say that the value could be A, B, C, D, E or Z
Now depending on the results of the query, the sorting needs to behave as follows:
If only A-E records are found then sorting them "naturally" is okay. But if a Z record is in the results, then it needs to be the first result in the query, but the rest of the records should be in "natural" sort order.
For instance, if A C D are found, then the result should be
A
C
D
But if A B D E Z are found then the result should be sorted:
Z
A
B
D
E
Currently, the query looks like:
SELECT NAME, SOME_OTHER_FIELDS FROM TABLE ORDER BY NAME
I know I can code a sort function to do what I want, but because of how I am using the results, I can't seem to use because the results are being handled by a third party library, to which I am just passing the SQL query. It is then processing the results, and there seems to be no hooks for me to sort the results and just pass the results to the library. It needs to do the SQL query itself, and I have no access to the source code of the library.
So for all of you SQL gurus out there, can you provide a query for me that will do what I want?

How do you identify the Z record? What sets it apart? Once you understand that, add it to your ORDER BY clause.
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN (record matches Z) THEN 0
ELSE 1
END
),
name
This way, only the Z record will match the first ordering, and all other records will be sorted by the second-order sort (name). You can exclude the second-order sort if you really don't need it.
For example, if Z is the character string 'Bob', then your query might be:
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN name='Bob' THEN 0
ELSE 1
END
), name
My examples are for T-SQL, since you haven't mentioned which database you're using.

There are a number of ways to solve this problem and the best solution depends on a number of factors that you don't discuss such as the nature of those A..Z values and what database product you're using.
If you have only a single value that has to sort on top, you can ORDER BY an expression that maps that value to the lowest possible sort value (with CASE or IIF or IFEQ, depending on your database).
If you have several different special sort values you could ORDER BY a more complicated expression or you could UNION together several SELECTs, with one SELECT for the default sorts and an extra SELECT for each special value. The SELECTs would include a sort column.
Finally, if you have quite a few values you can put the sort values into a separate table and JOIN that table into your query.

Not sure what DB you use - the following works for Oracle:
SELECT
NAME,
SOME_OTHER_FIELDS,
DECODE (NAME, 'Z', '_', NAME ) SORTFIELD
FROM TABLE
ORDER BY DECODE (NAME, 'Z', '_', NAME ) ASC

Related

How to select rows that meets multiple criteria from a single column in SQL?

I have a question similar to this one:
SQL: how to select a single id ("row") that meets multiple criteria from a single column
But in my case, the pairs of values are not unique, for example:
A user_id could be paired with same ancestry more than one time (more than one row with same user_id - ancestry).
Which could be a good and efficient solution?
The array of ancestries that must pass the condition could be large and variable (until 200) which makes me think that the join solution will be very inefficient. Furthermore as pairs of values are not uniques, the "in..group by" solution will not works.
Correct me if I'm wrong. Do you want to know which user_id has X ancestors (X being a variable amount of ancestors)?
Select t.user_id
from (select distinct *
from your_table) t
where t.ancestry in XAncestors
group by t.user_id
having count(t.user_id) = length(XAncestors)
Just to clarify, this is the exact same query as in the question you posted but with a subquery in the from to select only distinct values

What is the meaning of a constant in a SELECT query?

Considering the 2 below queries:
1)
USE AdventureWorks
GO
SELECT a.ProductID, a.ListPrice
FROM Production.Product a
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail b
WHERE b.ProductID = a.ProductID)
2)
USE AdventureWorks
GO
SELECT a.ProductID, a.Name, b.SalesOrderID
FROM Production.Product a LEFT OUTER JOIN Sales.SalesOrderDetail b
ON a.ProductID = b.ProductID
ORDER BY 1
My only question is know what is the meaning of the number 1 in those queries? How about if I change them to 2 or something else?
Thanks for helping
In the first case it does not matter; you can select a 2 or anything, really, because it is an existence query. In general selecting a constant can be used for other things besides existence queries (it just drops the constant into a column in the result set), but existence queries are where you are most likely to encounter a constant.
For example, given a table called person containing three columns, id, firstname, lastname, and birthdate, you can write a query like this:
select firstname, 'YAY'
from person
where month(birthdate) = 6;
and this would return something like
name 'YAY'
---------------
Ani YAY
Sipho YAY
Hiro YAY
It's not useful, but it is possible. The idea is that in a select statement you select expressions, which can be not only column names but constants and function calls, too. A more likely case is:
select lastname||','||firstname, year(birthday)
from person;
Here the || is the string concatenation operator, and year is a function I made up.
The reason you sometimes see 1 in existence queries is this. Suppose you only wanted to know whether there was a person whose name started with 'H', but you didn't care who this person was. You can say
select id
from person
where lastname like 'H%';
but since we don't need the id, you can also say
select 1
from person
where lastname like 'H%';
because all you care about is whether or not you get a non-empty result set or not.
In the second case, the 1 is a column number; it means you want your results sorted by the value in the first column. Changing that to a 2 would order by the second column.
By the way, another place where constants are selected is when you are dumping from a relational database into a highly denormalized CSV file that you will be processing in NOSQL-like systems.
In the second case the 1 is not a literal at all. Rather, it is an ordinal number, indicating that the resultset should be sorted by its first column. If you changed the 1 to 4 the query would fail with an error because the resultset only has three columns.
BTW, the reason you use a constant like 1 instead of using an actual column is you avoid the I/O of actually getting the column value. This may improve performance.

MySQL GROUP BY, and testing the grouped items

I've got a query like this:
select a, b, c, group_concat(d separator ', ')
from t
group by a;
This seems to work just fine. As I understand it (forgive me, I'm a MySQL rookie!), it's returning rows of:
each unique a value
for each a value, one b and c value
also for each a value, all the d values, concatenated into one string
This is what I want, but I also want to check that for each a, the b and c are always the same, for all rows with that a value.
My first thought is to compare:
select count(*) from t group by a, b, c;
with:
select count(*) from t group by a;
and make sure they're equal. But I've not convinced myself that this is correct, and I certainly am not sure there isn't a better way. Is there a SQL/MySQL idiom for this?
Thanks!
The issue with relying on MySQL's Hidden Columns functionality is spelled out in the documentation:
When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
Applied to your example, that means that the values for b and c are arbitrary -- the results can't be relied upon to consistently return the same value, and the likelihood of seeing the behavior increases with the number of possible values that b/c can return. So there's no a lot of value to compare to GROUP BY a and GROUP BY a, b, c...

Why can't I GROUP BY 1 when it's OK to ORDER BY 1?

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

select distinct over specific columns

A query in a system I maintain returns
QID AID DATA
1 2 x
1 2 y
5 6 t
As per a new requirement, I do not want the (QID, AID)=(1,2) pair to be repeated. We also dont care what value is selected from "data" column. either x or y will do.
What I have done is to enclose the original query like this
SELECT * FROM (<original query text>) Results group by QID,AID
Is there a better way to go about this? The original query uses multiple joins and unions and what not, So I would prefer not to touch it unless its absolutely necesary
If you don't care which DATA will be selected, GROUP BY is nice, though using ungrouped and unaggregated columns in SELECT clause of a GROUP BY statement is MySQL specific and not portable.