SQL MAX function and strings - sql

I have a column nr that contains strings in the format of 12345-12345. The numbers before and after the dash can be of any length. I would like to get the maximum value for nr taking into account only the part after the dash. I tried
SELECT MAX(nr) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this works only for values < 10 (i.e. 12345-9 would be returned as max even if 12345-10 exists). I thought of removing the dash and doing a type conversion:
SELECT MAX(REPLACE(nr, '-', '')::int) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this of course returns the result without the dash. What would be the best way to get the maximum value while including the dash and the number before the dash in the result?
PostgreSQL 9.1

I'm no expert in PostGres, but you can use regexp_replace('foobarbaz', 'b..', 'X') to extract the string after the dash and then convert the number to int. The following query will retrieve only one row the nr from your table where the nr is like 12345-%, sorted by the number after the dash in descending order (largest number first).
SELECT nr
FROM table WHERE (nr LIKE '12345-%')
ORDER BY regexp_replace(nr, '^\d+-', '')::integer DESC
LIMIT 1
The regular expression above removes the leading digits and the dash, leaving only the last set of digits. For example, 54352-12345 would become 12345.
Official documentation.
And here is a SQL Fiddle illustrating it's use.

Use substring function with position function:
http://www.postgresql.org/docs/8.1/static/functions-string.html
to extract number after dash, and then use this value in MAX function as you have in your code now. You can also try to_number function.
It will look similiar to this:
MAX(substring(nr from position('-' in nr))::int)

Related

How to get highest alphanumeric value in Postgres

I have the following table of inventory stock symbols and want to fetch the highest alphanumeric value which is AR-JS-20. When I say "highest" I mean that the letter order is sorted first and then numbers are factored in, so AR-JS-20 is higher than AL-JS-20.
BTW, I don't want to split anything into parts because it is unknown what symbols vendors will send me in the futire.
I simply want an alphanumeric sort like you sort computer directory by name. Where dashes, undersocres, asterisks, etc. come first, then numbers, and letters last with cascading priority where the first character in the symbol has the most weight, then the second character and so on.
NOTE: The question has been edited so some of the answers below no longer apply.
AL-JS-20
AR-JS-20
AR-JS-9
AB-JS-8
AA-JS-1
1A-LM-30
2BA2-1
45HT
So ideally if this table was sorted to my requirements it would look like this
AR-JS-20
AR-JS-9
AB-JS-8
AL-JS-20
AA-JS-1
45HT
2BA2-1
1A-LM-30
However, when I use this query:
select max(symbol) from stock
I get:
AR-JS-9
but what I want to get is: AR-JS-20
I also tried:
select max(symbol::bytea) from stock
But this triggers error:
function max(bytea) does not exist
There is dedicated tag for this group of problems: natural-sort (I added it now.)
Ideally, you store the string part and the numeric part in separate columns.
While stuck with your unfortunate symbols ...
If your symbols are as regular as the sample suggests, plain left() and split_part() can do the job:
SELECT symbol
FROM stock
ORDER BY left(symbol, 5) DESC NULLS LAST
, split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT 1;
Or, if at least the three dashes are a given:
...
ORDER BY split_part(symbol, '-', 1) DESC NULLS LAST
, split_part(symbol, '-', 2) DESC NULLS LAST
, split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT 1
See:
Split comma separated column data into additional columns
Or, if the format is not as rigid: regular expression functions are more versatile, but also more expensive:
...
ORDER BY substring(symbol, '^\D+') DESC NULLS LAST
, substring(symbol, '\d+$')::int DESC NULLS LAST
LIMIT 1;
^ ... anchor to the start of the string
$ ... anchor to the end of the string
\D ... class shorthand for non-digits
\d ... class shorthand for digits
Taking only (trailing) digits, we can safely cast to integer (assuming numbers < 2^31), and sort accordingly.
Add NULLS LAST if any part can be missing, or the column can be NULL.
Specify a custom order by that trims everything up to the last - and converts the remaining number to int and take the first:
select stock_code
from mytable
order by regexp_replace(stock_code, '-?[0-9]+-?', ''), regexp_replace(stock_code, '[^0-9-]', '')::int
limit 1
See live demo.
This works for numbers at both start and end of code:
regexp_replace(stock_code, '-?[0-9]+-?', '') "deletes" digits and any adjacent dashes
regexp_replace(stock_code, '[^0-9]', '') "deletes" all non-digits

Extract number between two characters in Hive SQL

The query below outputs 1642575.0. But I only want 1642575 (just the number without the decimal and the zero following it). The number of delimited values in the field varies. The only constant is that there's always only one number with a decimal. I was trying to write a regexp function to extract the number between " and ..
How would I revise my regexp_extract function to get the desired output? Thank you!
select regexp_extract('{"1244644": "1642575.0", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*');
You can cast the result to bigint.
select cast(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*') as bigint) col;
output - 1642575
You can use round if you want to round it off.
select round(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*')) col;
output - 1642576
Use this regexp: '"(\\d+)\\.' - means double-quote, capturing group with one or more digits, dot.
select regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','"(\\d+)\\.',1)
Result:
1642575
To skip any number of leading zeroes, use this regexp: '"0*(\\d+)\\.'

Postgres SQL regexp_replace replace all number

I need some help with the next. I have a field text in SQL, this record a list of times sepparates with '|'. For example
'14613|15474|3832|148|5236|5348|1055|524' Each value is a time in milliseconds. This field could any length, for example is perfect correct '3215|2654' or '4565' (only 1 value). I need get this field and replace all number with -1000 value.
So '14613|15474|3832|148|5236|5348|1055|524' will be '-1000|-1000|-1000|-1000|-1000|-1000|-1000|-1000'
Or '3215|2654' => '-1000|-1000' Or '4565' => '-1000'.
I try use regexp_replace(times_field,'[[:digit:]]','-1000','g') but it replace each digit, not the complete number, so in this example:
'3215|2654' than must be '-1000|-1000', i get:
'-1000-1000-1000-1000|-1000-1000-1000-1000', I try with other combinations and more options of regexp but i'm done.
Please need your help, thanks!!!.
We can try using REGEXP_REPLACE here:
UPDATE yourTable
SET times_field = REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g');
If instead you don't really want to alter your data but rather just view your data this way, then use a select:
SELECT
times_field,
REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g') AS times_field_replace
FROM yourTable;
Note that in either case we pass g as the fourtb parameter to REGEXP_REPLACE to do a global replacement of all pipe separated numbers.
[[:digit:]] - matches a digit [0-9]
+ Quantifier - matches between one and unlimited times, as many times as possible
your regexp must look like
regexp_replace(times_field,'[[:digit:]]+','-1000','g')

How to extract a number part of a field using regex_substr function?

I need to extract the numerical part of values in a column (varchar) if there exists a number in the value.
ColumnA has values like ABC, M365, J344, MCT etc.
I would like to check the entire value from second position and if is a number I would like to extract it, for instance,
a. M365, from 2nd position 365 is a number so I would like to return this substring.
b. M3AB, from 2nd position 3AB is not a number so I would not want to return this substring.
I tried regex_substr('M365', '[0-9]', 2) but this is not how I want and it only returns what is there in the second position but not the entire substring.
This seems to do what you want:
select regexp_substr(substr(x, 2), '^\d+$')
This starts matching the pattern at the second position in the string, requiring that a number start there.
[0-9] only searches for one number. You want to know if they are all numbers, so you need the '+' operator. For more info, visit:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
The following code should work for you.
regex_substr('M365', '[0-9]+', 2)

How to retrieve specific character positions within rows of database column using REGEX in Oracle SQL?

What Oracle SQL query could return the second, third and fourth positions of characters contained within rows of a specific column using the REGEXP_SUBSTR method instead of using SUBSTR method like my example provided below?
SELECT SUBSTR(city,2,3) AS "2nd, 3rd, 4th"
FROM student.zipcode;`
One way that works for me (with test data) is:
SELECT REGEXP_SUBSTR(city, '\S{3}', 2) AS partial FROM student.zipcode;
Note that this is set to find three non-whitespace characters beginning at the second position of the string.
You could also use:
SELECT REGEXP_SUBSTR(city, '.{3}', 2) AS partial FROM student.zipcode;
which will instead match any three characters in the 2nd to 4th position.
However, I'm not sure what advantage this has over simply:
SELECT SUBSTR(city,2,3) AS partial FROM student.zipcode;
The REGEXP_INSTR function is not what you want, as it returns an index (position number) for the search item in the searched string. You can read about it here: http://www.techonthenet.com/oracle/functions/regexp_instr.php