Combining concatenation with ORDER BY - sql

I have troubles in combining concatenation with order by in Postgre (9.1.9).
Let's say, I have a table borders with 3 fields:
Table "borders"
Column | Type | Modifiers
---------------+----------------------+-----------
country1 | character varying(4) | not null
country2 | character varying(4) | not null
length | numeric |
The first two fields are codes of the countries and the third one is the length of the border among those countries.
The primary key is defined on the first two fields.
I need to compose a select of a column that would have unique values for the whole table, in addition this column should be selected in decreasing order.
For this I concatenate the key fields with a separator character, otherwise two different rows might give same result, like (AB, C and A, BC).
So I run the following query:
select country1||'_'||country2 from borders order by 1;
However in the result I see that the '_' character is omited from the sorting.
The results looks like this:
?column?
----------
A_CH
A_CZ
A_D
AFG_IR
AFG_PK
AFG_TAD
AFG_TJ
AFG_TM
AFG_UZB
A_FL
A_H
A_I
.
.
You can see that the result is sorted as if '_' doesn't exists in the strings.
If I use a letter (say 'x') as a separator - the order is correct. But I must use some special character that doesn't appear in the country1 and country2 fields, to avoid contentions.
What should I do, in order to make the '_' character to be taken into account during the sorting.
EDIT
It turned out that the concatenation has nothing to do with the problem. The problem is that the order by simply ignores '_' character.

select country1 || '_' || country2 collate "C" as a
from borders
order by 1
sql fiddle demo
Notes according to discussion in comments:
1.) COLLATE "C" applies in the ORDER BY clause as long as it references the expression in the SELECT clause by positional parameter or alias. If you repeat the expression in ORDER BY you also need to repeat the COLLATE clause if you want to affect the sort order accordingly.
sql fiddle demo
2.) In collations where _ does not influence the sort order, it is more efficient to use fog's query, even more so because that one makes use of the existing index (primary key is defined on the first two fields).
However, if _ has an influence, one needs to sort on the combined expression:
sql fiddle demo
Query performance (tested in Postgres 9.2):
sql fiddle demo
PostgreSQL Collation Support in the manual.

Just order by the two columns:
SELECT country1||'_'||country2 FROM borders ORDER BY country1, country2;
Unless you use aggregates or windows, PostgreSQL allows to order by columns even if you don't include them in the SELECT list.
As suggested in another answer you can also change the collation of the combined column but, if you can, sorting on plain columns is faster, especially if you have an index on them.

What happens when you do the following?
select country1||'_'||country2 from borders order by country1||'_'||country2
My knowledge on order by 1 only does an ordinal sort. It won't do anything on concatenated columns. Granted, I'm speaking from SQL Server knowledge, so let me know if I'm way off base.
Edited: Ok; just saw Parado's post as I posted mine. Maybe you could create a view from this query (give it a column name) and then requery the view, order by that column? Or do the following:
select country_group from (
select country1||'_'||country2 as country_group from borders
) a
order by country_group

Related

ways to check for invalid characters? oracle sql

Looking for ways to filter out special signs, letters etc. from studentID in oracle SQL and show those records as invalid.
What is the best way to filter out letters a-Z and other characters? (only leaving numbers)
SELECT replace(Translate(studentid,'a-Z<>!-\+=/&', '??'),'?','') as StudentID, 'Invalid Characters in Student ID'
FROM students
The simplest approach is to use regular expressions. For example
select studentid
from student
where regexp_like(studentid, '\D')
;
\D means non-digit character; if the studentid contains at least one such character, in any position, then it will appear in the output. Note that null will not be flagged out, assuming it may appear in the column; perhaps the column is primary key in which case it can't be null. But this would apply to other tables as well, where studentid may be null.
If you have a very large table, or if you must perform this check often, you may want a less simple, but better performing query. Then you would want to use standard string functions, like you were trying to. Something like this will work:
select studentid
from student
where translate(studentid, 'x0123456789', 'x') is not null
;
translate will translate x to itself, and all digits to null (that is, all digits will be removed). The x trick is needed because the last argument must not be null. If the translation doesn't remove all characters from the string, then the studentid will appear in the output, as required.
If you need to show exactly which characters are non-digits (although that should be obvious), you can add the result of translate to the select clause. Note though that if a student id has, for example, trailing spaces, that will not be evident either from looking at the student id or at the result of translate. You may want to add something like dump(studentid) to select; if you are not familiar with dump, you may want to read a bit about it - it is extremely useful in diagnosing such problems, and easy to learn.
Once you find and handle all the exceptions, you may want to add a constraint to the column, to require all student id's to consist entirely of digits. Then you won't have to put up with this kind of errors anymore.
If you want to allow numbers only, column datatype should have been NUMBER, not VARCHAR2.
[EDIT] That's wrong, though - see #mathguy's comment about it, saying that there are situations where values do consist of digits only, but - due to leading zeros - you can't use the NUMBER datatype.
A simple option is to use regexp_like and return rows that contain anything but digits:
SQL> with students (studentid) as
2 (select '12345' from dual union all
3 select 'ABC12' from dual union all
4 select '23x#2' from dual
5 )
6 select studentid
7 from students
8 where not regexp_like(studentid, '^\d+$');
STUDE
-----
ABC12
23x#2
SQL>
You could also use below solution taking advantage of translate function.
select studentid
from students
WHERE translate(studentid, '`0123456789', '`') IS NOT NULL
;
demo

Oracle SQL: Select *, statement problem (vs SQL Server)

In SQL Server, I can write some code with select * statement, but it returns an error when writing in Oracle.
Here is an example - let's say I got a table Order which contains these columns:
[Date] | [Order_ID] | [Amt] | [Salesman]
In SQL Server, I can write code like this :
SELECT
*,
CASE WHEN [Amt] >= 0 THEN [Order_ID] END AS [Order_with_Amt]
FROM Order
The result will be :
Date | Order_ID | Amt | Salesman | Order_with_Amt
-----------+----------+-----+----------+---------------
01/01/2022 | A123 | 100 | Peter | A123
01/01/2022 | A124 | 0 | Sam | null
However, in Oracle, I cannot write the code as :
SELECT
*,
CASE WHEN "Amt" >= 0 THEN "Order_ID" END AS "Order_with_Amt"
FROM Order
It will throw an error :
ORA-00923: FROM keyword not found where expected
Any suggestion on this issue?
In Oracle's dialect of SQL, if you combine * with anything else then it has to be prefixed with the table name:
SELECT
Order.*,
CASE WHEN "Amt" >= 0 THEN "Order_ID" END AS "Order_with_Amt"
FROM Order
or if you alias the table (note there is no AS keyword for table aliases):
SELECT
o.*,
CASE WHEN "Amt" >= 0 THEN "Order_ID" END AS "Order_with_Amt"
FROM Order o
That is shown in the railroad diagram in the documentation:
The top branch has a plain* but can't be combined with anything else - there is no loop around to other options. The branches that do allow you to loop and add comma-separated terms have .* prefixed by a table (or view) or a table alias.
You are also using quoted identifiers, both for your column names and column expression aliases. It might be worth reading up on Oracle's object name rules, and seeing if you really need and want to use those.
If you create a table with a column with a quoted mixed-case name like "Amt" then you have to refer to it with quotes and exactly the same casing everywhere, which is a bit of a pain and easy to get wrong.
If you create it with an unquoted identifier like amt or Amt or AMT (or even quoted uppercase as "AMT") then those would all be in the data dictionary in the same form and you could refer to it without quotes and with any case - select amt, select Amt``, select AMT`, etc.
But order is a reserved word, as #Joel mentioned, so if you really do (and must) have a table with that name then that would have to be a quoted identifier. I would strongly suggest you call it something else though, like orders.
I see five things.
The two databases are different dialects of SQL, and so of course there are some features that work differently between them, even if this feature works just fine.
The sample for Postgresql is using string literals instead of column names. It is comparing the string 'Amt' to the value 0, instead of the value from a column named Amt.
ORDER is a reserved word, and therefore you need to take extra steps when using it as a table name. For SQL Server, this is square brackets ([Order]). For Postgresql, it's double quotes ("Order").
Postgresql is sometimes case sensitive about these table names (SQL Server is not; it doesn't care).
SELECT * is poor practice in the first place. I know many of us often use it as a placeholder while building a complex query, but we should always fill in real column names once the query is ready for use.

Oracle SQL - Not null code is not working

I am trying to retrieve the monetary amount associated with project IDs, however I only want data where a project ID exists (not blank)
When I type my SQL code below...
SELECT project_id, monetary_amount, journal_line_date
FROM PS_JRNL_LN
where project_id is not null
and journal_line_date BETWEEN to_date ('2020/01/01','yyyy/mm/dd')
AND TO_DATE ('2020/03/04','yyyy/mm/dd')
this query works however, I am still getting blank values in my result
You dont have nulls but blank spaces add below in your query
SELECT project_id, monetary_amount,
journal_line_date
FROM PS_JRNL_LN
where ( project_id is not null or
( project_id is not
null
and LTRIM( RTrim(project_id)) not
like '')
and
journal_line_date BETWEEN
to_date ('2020/01/01','yyyy/mm/dd')
AND TO_DATE
('2020/03/04','yyyy/mm/dd')
Here is something that can help you find out what is happening in the project_id column. (Most likely, a bunch of ' ' values, meaning non-empty string consisting of a single space.)
select project_id, dump(project_id)
from ps_jrnl_ln
where ltrim(project_id, chr(32) || chr(9)) is null
and project_id is not null
;
DUMP shows you exactly what is stored in your table. 32 is the ASCII code for a single space; 9 (or 09) is the code for horizontal tab. I expect you will get rows where the DUMP column shows a single character, with code 32. But - who knows; you may find other things as well.
That will help you understand what's in the column. (You may also check describe ps_jrnl_ln - you may find out that the column is declared not null!!!)
If you find a bunch of rows where the project id is a single space, of course, in your actual query you will have to change
where project_id is not null
to
where ltrim(project_id, chr(32) || chr(9)) is not null
Or, perhaps, if indeed a single space is used as placeholder for null:
where project_id != ' '
Few things you must implement in your table design to prevent the problem at first place than struggling with the data:
Add a NOT NULL constraint to the column.
Add a CHECK constraint to prevent unwanted characters like whitespaces etc. and only allow the data you want to load.
If you don't want a check constraint, then handle it during loading the data using TRIM.
If necessary, make the PROJECT_ID column the primary key, that would implicitly not allow NULL values. Usually, the ID column in a table suggests it's a primary key but it could vary in your use case.
If you are not allowed to alter the design by doing none of the above, then at least you could handle the data insertion at application level where you might be taking it as input.
Inner join that journal table to the source of truth for Project ID's. Assuming there are no "blank" ID's in that table, then you won't get "blanks" in your result.
e.g.
SELECT j.project_id, j.monetary_amount, j.journal_line_date
FROM PS_JRNL_LN J
INNER JOIN PROJECT_MASTER P ON j.project_id = p.id /* should remove "blanks" */
where j.journal_line_date >= to_date ('2020/01/01','yyyy/mm/dd')
and j.journal_line_date < TO_DATE ('2020/03/05','yyyy/mm/dd')
Note also, I never use between for date ranges, the above pattern using >= & < is more reliable (as it works regardless of the time precision of the data).
Try using filter condition:
ltrim(rtrim(project_id)) <> ''

A constant expression was encountered in the ORDER BY list, position 1

I tried to concat two string using sql query. Below is my code which is not working.
SELECT TOP 100 CONCAT('James ','Stephen') AS [Column1]
FROM [dbo].[ORDERS]
Group BY ()
ORDER BY CONCAT('James ','Stephen') ASC
If I use [Column1] instead of CONCAT('James ','Stephen') in Order by clause, it seems working.
SELECT TOP 100 CONCAT('James ','Stephen') AS [Column1]
FROM [dbo].[ORDERS]
Group by ()
ORDER BY [Column1] ASC
Can anyone explain me, why did not the first query work?
ORDER BY clause is to be used with columns from underlyring tables. You cannot order by constants.
This is explained in the documentation
Specifies a column or expression on which to sort the query result
set. A sort column can be specified as a name or column alias, or a
nonnegative integer representing the position of the column in the
select list.
ORDER BY documentation
Note that you can use an expression, such as ORDER BY CONCAT(field1, field2), but it makes no sense to try to sort by a hard coded string which would obviously be the same for every record.
You can get around this by referencing the alias, but this is not very useful.
From documentation
A sort column can be specified as a name or column alias, or a nonnegative integer representing the position of the column in the select list.
Multiple sort columns can be specified. Column names must be unique. The sequence of the sort columns in the ORDER BY clause defines the organization of the sorted result set. That is, the result set is sorted by the first column and then that ordered list is sorted by the second column, and so on.
However in first query you not specified neither a name or column alias nor position of the column in select list

SQL Order By while ignoring leading characters

Im looking for some help with regards to ordering by a numeric value but the numeric values have leading chars (-).
For example
Column Order #
---5
--8
-6
A simple ORDER BY Order# DESC gives me an unordered output. How do you ignore the leading chars when sorting data such as the Order # above?
Using the H2 database syntax and functions, and a table declared as:
create table foo(bar varchar(10));
with the rows:
'---2'
'--3'
'-1'
the following query:
select bar from foo
order by cast (replace(bar, '-') as int);
gives me the results:
BAR
-1
---2
--3
which seems to be what you're after.
Disclaimer: this is possibly not the best way to do this, since the computed value is not indexed. For just ordering a reasonably-sized result set it doesn't matter.
The solution I came up with is similar to the answer given,
I can just cast the column data to an int and sort it the way ie
ORDER BY CAST((ColumnName) AS INT) DESC;
this sucessfully ignores the leading chars and sorts it by the numbers