Extract number between two characters in Hive SQL - sql

The query below outputs 1642575.0. But I only want 1642575 (just the number without the decimal and the zero following it). The number of delimited values in the field varies. The only constant is that there's always only one number with a decimal. I was trying to write a regexp function to extract the number between " and ..
How would I revise my regexp_extract function to get the desired output? Thank you!
select regexp_extract('{"1244644": "1642575.0", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*');

You can cast the result to bigint.
select cast(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*') as bigint) col;
output - 1642575
You can use round if you want to round it off.
select round(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*')) col;
output - 1642576

Use this regexp: '"(\\d+)\\.' - means double-quote, capturing group with one or more digits, dot.
select regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','"(\\d+)\\.',1)
Result:
1642575
To skip any number of leading zeroes, use this regexp: '"0*(\\d+)\\.'

Related

Best way to count words in a string on SQL impala/Hive

I need to count words in a string, with SQL Impala/Hive. What is the best way?
In Oracle I use regexp_count() function, like the example below:
SELECT regexp_count('1aa 2bb 3cc', '\s*[a-z]+\s*'); -- result: 3
In impala/hive we can't use the above function. Which is the best way to achieve this goal?
Thank in advance
can you simply use this formula - count of words = count of spaces +1.
so calculate total length and then length of all spaces removed or replaced by empty string. Minus first from second and you will get number of spaces.
Add 1 to it.
select length('1aa 2bb 3cc') - length(replace('1aa 2bb 3cc',' ',''))+1
Simply, you can do it with split and size function which returns the length of the split array.
SELECT size(split('1aa 2bb 3cc', ' ')); -- result: 3

Postgres SQL regexp_replace replace all number

I need some help with the next. I have a field text in SQL, this record a list of times sepparates with '|'. For example
'14613|15474|3832|148|5236|5348|1055|524' Each value is a time in milliseconds. This field could any length, for example is perfect correct '3215|2654' or '4565' (only 1 value). I need get this field and replace all number with -1000 value.
So '14613|15474|3832|148|5236|5348|1055|524' will be '-1000|-1000|-1000|-1000|-1000|-1000|-1000|-1000'
Or '3215|2654' => '-1000|-1000' Or '4565' => '-1000'.
I try use regexp_replace(times_field,'[[:digit:]]','-1000','g') but it replace each digit, not the complete number, so in this example:
'3215|2654' than must be '-1000|-1000', i get:
'-1000-1000-1000-1000|-1000-1000-1000-1000', I try with other combinations and more options of regexp but i'm done.
Please need your help, thanks!!!.
We can try using REGEXP_REPLACE here:
UPDATE yourTable
SET times_field = REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g');
If instead you don't really want to alter your data but rather just view your data this way, then use a select:
SELECT
times_field,
REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g') AS times_field_replace
FROM yourTable;
Note that in either case we pass g as the fourtb parameter to REGEXP_REPLACE to do a global replacement of all pipe separated numbers.
[[:digit:]] - matches a digit [0-9]
+ Quantifier - matches between one and unlimited times, as many times as possible
your regexp must look like
regexp_replace(times_field,'[[:digit:]]+','-1000','g')

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

SQL MAX function and strings

I have a column nr that contains strings in the format of 12345-12345. The numbers before and after the dash can be of any length. I would like to get the maximum value for nr taking into account only the part after the dash. I tried
SELECT MAX(nr) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this works only for values < 10 (i.e. 12345-9 would be returned as max even if 12345-10 exists). I thought of removing the dash and doing a type conversion:
SELECT MAX(REPLACE(nr, '-', '')::int) AS max_nr FROM table WHERE (nr LIKE '12345-%')
However, this of course returns the result without the dash. What would be the best way to get the maximum value while including the dash and the number before the dash in the result?
PostgreSQL 9.1
I'm no expert in PostGres, but you can use regexp_replace('foobarbaz', 'b..', 'X') to extract the string after the dash and then convert the number to int. The following query will retrieve only one row the nr from your table where the nr is like 12345-%, sorted by the number after the dash in descending order (largest number first).
SELECT nr
FROM table WHERE (nr LIKE '12345-%')
ORDER BY regexp_replace(nr, '^\d+-', '')::integer DESC
LIMIT 1
The regular expression above removes the leading digits and the dash, leaving only the last set of digits. For example, 54352-12345 would become 12345.
Official documentation.
And here is a SQL Fiddle illustrating it's use.
Use substring function with position function:
http://www.postgresql.org/docs/8.1/static/functions-string.html
to extract number after dash, and then use this value in MAX function as you have in your code now. You can also try to_number function.
It will look similiar to this:
MAX(substring(nr from position('-' in nr))::int)

Concatenate strings in Oracle SQL without a space in between?

I am trying to concatenate strings in oracle.
The following is my query:
insert into dummy values('c'||to_char(10000,'99999'));
The expected result is:
c10000
But the output I get is with a space in between 'c' and the value 10000:
c 10000
How to concat without spaces?
This is not an issue with the concatenation operator but with the function to_char(). Try instead:
to_char(10000,'FM99999')
I quote the manual here:
FM ..
Returns a value with no leading or trailing blanks.
There are two solutions:
Fill Mode ('FM') formatting prefix that suppresses the additional blank character prefix for the to_char number conversion. I suggest this one is preferred, because it is integrated with the to_char format and does not require an additional function call;
LTRIM of the returned value from the to_char number conversion.
The code below shows the results of both solutions:
Select concat('NTA', to_char(1,'FM0000000000000')),
concat('NTA', ltrim(to_char(1,'0000000000000'))),
concat('NTA', to_char(1,'0000000000000'))
from dual;
"CONCAT('NTA',TO_CHAR(1,'FM0000000000000'))": "NTA0000000000001"
"CONCAT('NTA',LTRIM(TO_CHAR(1,'0000000000000')))": "NTA0000000000001"
"CONCAT('NTA',TO_CHAR(1,'0000000000000'))": "NTA 0000000000001"