PostgreSQL: order by column, with specific NON-NULL value LAST - sql

When I discovered NULLS LAST, I kinda hoped it could be generalised to 'X LAST' in a CASE statement in the ORDER BY portion of a query.
Not so, it would seem.
I'm trying to sort a table by two columns (easy), but get the output in a specific order (easy), with one specific value of one column to appear last (got it done... ugly).
Let's say that the columns are zone and status (don't blame me for naming a column zone - I didn't name them). status only takes 2 values ('U' and 'S'), whereas zone can take any of about 100 values.
One subset of zone's values is (in pseudo-regexp) IN[0-7]Z, and those are first in the result. That's easy to do with a CASE.
zone can also take the value 'Future', which should appear LAST in the result.
In my typical kludgy-munge way, I have simply imposed a CASE value of 1000 as follows:
group by zone, status
order by (
case when zone='IN1Z' then 1
when zone='IN2Z' then 2
when zone='IN3Z' then 3
.
. -- other IN[X]Z etc
.
when zone = 'Future' then 1000
else 11 -- [number of defined cases +1]
end), zone, status
This works, but it's obviously a kludge, and I wonder if there might be one-liner doing the same.
Is there a cleaner way to achieve the same result?

Postgres allows boolean values in the ORDER BY clause, so here is your generalised 'X LAST':
ORDER BY (my_column = 'X')
The expression evaluates to boolean, resulting values sort this way:
FALSE (0)
TRUE (1)
NULL
Since we deal with non-null values, that's all we need. Here is your one-liner:
...
ORDER BY (zone = 'Future'), zone, status;
Related:
Sorting null values after all others, except special
Select query but show the result from record number 3
SQL two criteria from one group-by

I'm not familiar postgreSQL specifically, but I've worked with similar problems in MS SQL server. As far as I know, the only "nice" way to solve a problem like this is to create a separate table of zone values and assign each one a sort sequence.
For example, let's call the table ZoneSequence:
Zone | Sequence
------ | --------
IN1Z | 1
IN2Z | 2
IN3Z | 3
Future | 1000
And so on. Then you simply join ZoneSequence into your query, and sort by the Sequence column (make sure to add good indexes!).
The good thing about this method is that it's easy to maintain when new zone codes are created, as they likely will be.

Related

Find all the rows where column is letter case postgresql

I have a table in postgres database where I need to find all the rows -
Between two dates where fromTo is date column.
And also only those rows where column data contains mix of lower and upper case letters. for eg: eCTiWkAohbQAlmHHAemK
I can do between two dates as shown below but confuse on second point on how to do that?
SELECT * FROM test where fromTo BETWEEN '2022-09-08' AND '2022-09-23';
Data type for fromTo column is shown below -
fromTo | timestamp without time zone | | not null | CURRENT_TIMESTAMP
You can use a regular expression to check that it is only alphabetical characters and at least one uppercase character.
select *
from foo
where data ~ '[[:upper:]]' and data ~ '^[[:alpha:]]+$';
and fromTo BETWEEN '2022-09-08' AND '2022-09-23'
The character classes will match all alphabetical characters, including those with accents.
Demonstration.
Note that this may not be able to make use of an index. If your table is large, you may need to reconsider how you're storing the data.

How to get all combinations (ordered sampling without replacement) in regex

I'm trying to match a comma-separated string of numbers to a certain pattern within an sql query. I used regular expressions for similar problems in the past successfully, so I'm trying to get them working here as well. The problem is as follows:
The string may contain any number in a range (e.g. 1-4) exactly 0-1 times.
Two numbers are comma-separated
The numbers have to be in ascending order
(I think this is kind of a case of ordered sampling without replacement)
Sticking with the example of 1-4, the following entries should match:
1
1,2
1,3
1,4
1,2,3
1,2,4
1,3,4
1,2,3,4
2
2,3
2,4
3
3,4
4
and these should not:
q dawda 323123 a3 a1 1aa,1234 4321 a4,32,1a 1112222334411
1,,2,33,444, 11,12,a 234 2,2,3 33 3,3,3 3,34 34 123 1,4,4,4a 1,444
The best try I currently have is:
\b[1-4][\,]?[2-4]?[\,]?[3-4]?[\,]?[4]?\b
This still has two major drawbacks:
It delivers quite a lot of false positives. Numbers are not eliminated after they occurred once.
It will get rather long, when the range of numbers increases, e.g. 1-18 is already possible as well, bigger ranges are thinkable of.
I used regexpal for testing purposes.
Side notes:
As I'm using sql it would be possible to implement some algorithm in another language to generate all the possible combinations and save them in a table that can be used for joining, see e.g. How to get all possible combinations of a list’s elements?. I would like to only rely on that as a last resort, as the creation of new tables will be involved and these will contain a lot of entries.
The resulting sql statement that uses the regex should run on both Postgres and Oracle.
The set of positive examples is also referred to as "powerset".
Edit: Clarified the list of positive examples
I wouldn't use Regex for this, as e.g. the requirements "have to be unique" and "have to be in ascending order" can't really be expressed with a regular expression (at least I can't think of a way to do that).
As you also need to have an expression that is identical in Postgres and Oracle, I would create a function that checks such a list and then hide the DBMS specific implementation in that function.
For Postgres I would use its array handling features to implement that function:
create or replace function is_valid(p_input text)
returns boolean
as
$$
select coalesce(array_agg(x order by x) = string_to_array(p_input, ','), false)
from (
select distinct x
from unnest(string_to_array(p_input,',')) as t(x)
where x ~ '^[0-9]+$' -- only numbers
) t
where x::int between 1 and 4 -- the cast is safe as the inner query only returns valid numbers
$$
language sql;
The inner query returns all (distinct) elements from the input list as individual numbers. The outer query then aggregates that back for values in the desired range and numeric order. If that result isn't the same as the input, the input isn't valid.
Then with the following sample data:
with sample_data (input) as (
values
('1'),
('1,2'),
('1,3'),
('1,4'),
('1,2,3'),
('1,2,4'),
('foo'),
('1aa,1234'),
('1,,2,33,444,')
)
select input, is_valid(input)
from sample_data;
It will return:
input | is_valid
-------------+---------
1 | true
1,2 | true
1,3 | true
1,4 | true
1,2,3 | true
1,2,4 | true
foo | false
1aa,1234 | false
1,,2,33,444, | false
If you want to use the same function in Postgres and Oracle you probably need to use returns integer in Postgres as Oracle still doesn't support a boolean data type in SQL
Oracle's string processing functions are less powerful than Postgres' functions (e.g. no string_to_array or unnest), but you can probably implement a similar logic in PL/SQL as well (albeit more complicated)

ISNULL with aggregate function

What is the best way to go about using these two together? In my case if a userID is null I want to return zero, and users can have multiple ID's so we want to have get the lowest (the original) one.
ISNULL(MIN(UserId),0)
Or,
MIN(ISNULL(UserId),0)
Thank you.
Is the answer indicative of all aggregate functions?
Those statements do not necessarily produce the same output:
the first takes the minimum that exists and only if that is null, uses 0.
the second checks each user id and if that is null uses 0 - it then takes the minimum of those (and unless a user ID can be negative, a user with a 5 and a null, would output 0)
A quick script can demonstrate this :
with testData as (
select 1 as SomeKey, 5 as userID
union all
select 1 as SomeKey, null as userID
union all
select 2 as SomeKey, 6 as userID
union all
select 2 as SomeKey, 5 as userID
)
select
somekey
, isnull(min(userid),0) as firstScenario
, min(isnull(userid,0)) as SecondScenario
from testdata
group by somekey
Results:
Somekey firstScenario secondScenario
1 5 0
2 5 5
The first scenario is the most likely one you were after, but the phrasing of the question makes it a bit ambiguous as to what the desired behaviour was.
(http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/10170)
It depends on what you want to do. But I am biased towards COALESCE() because it is the ANSI standard function.
Your two options are:
COALESCE(MIN(UserId), 0)
MIN(COALESCE(UserId, 0))
These do not do the same thing. The first returns the minimum user id. If all user ids are NULL, then this expression returns 0.
The second replaces each NULL with 0. Assuming the user ids are positive, then this returns 0 if any user ids are NULL.
Based on my understanding of your logic, you want the second version.
I suppose you use SQL Server, bacause ISNULL is a T-Sql function.
To use a function accross DBMS you can use COALESCE
NULL values are not included in MIN functions.
So If you want to prevent NULL result, I advice you to use the first solution
ISNULL(MIN(UserId), 0)

How can I "dynamically" split a varchar column by specific characters?

I have a column that stores 2 values. Example below:
| Column 1 |
|some title1 =ExtractThis ; Source Title12 = ExtractThis2|
I want to remove 'ExtractThis' into one column and 'ExtractThis2' into another column. I've tried using a substring but it doesn't work as the data in column 1 is variable and therefore it doesn't always carve out my intended values. SQL below:
SELECT substring(d.Column1,13,24) FROM dbo.Table d
This returns 'Extract This' but for other columns it either takes too much or too little. Is there a function or combination of functions that will allow me to split consistently on the character? This is consistent in my column unlike my length count.
select substring(col1,CHARINDEX('=',col1)+1,CHARINDEX (';',col1)-CHARINDEX ('=',col1)-1) Val1,
substring(col1,CHARINDEX('=',col1,CHARINDEX (';',col1))+1,LEN(col1)) Val2
from #data
there is duplicate calculation that can be reduced from 5 to 3 to each line.
but I want to believe this simple optimization done by SQL SERVER.

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.