Remove leading zeros using HiveQL - hive

I have a string value in which i might have leading zero's, so i want to remove all leading zeros.
For example:
accNumber = "000340" ---> "340"
Any UDF is available in Hive? can we use regexp_extract for this?

Yes, just use REGEXP_REPLACE().
SELECT some_string,
REGEXP_REPLACE(some_string, "^0+", '') stripped_string
FROM db.tbl
(fixed simple typo with comma)

You can also use,
SELECT cast(cast("000340" as INT) as STRING) col_without_leading_zeroes
FROM db.table;
output : 340 (Datatype will be string)
Hope this is helpful.

Related

Snowflake SQL - Format Phone Number to 9 digits

I have a column with phone numbers in varchar, currently looks something like this. Because there is no consistent format, I don't think substring works.
(956) 444-3399
964-293-4321
(929)293-1234
(919)2991234
How do I remove all brackets, spaces and dashes and have the query return just the digits, in Snowflake? The desired output:
9564443399
9642934321
9292931234
9192991234
You can use regexp_replace() function to achieve this:
REGEXP_REPLACE(yourcolumn, '[^0-9]','')
That will strip out any non-numeric character.
You could use regexp_replace to remove all of the special characters
something like this
select regexp_replace('(956) 444-3399', '[\(\) -]', '')
An alternative using translate . Documentation
select translate('(956) 444-3399', '() -', '')

Redshift SQL REGEXP_REPLACE function

I have a value which is duplicated from source (can't do anything about that). I have read some examples here https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_REPLACE.html
Example value:
ABC$ABC$
So just trimming anything after the first '€'. I tried this, but I cannot figure out the correct REGEX expression.
REGEXP_REPLACE(value, '€.*\\.$', '')
So just trimming anything after the first '€'.
Why use regex at all? Why not just..
SELECT LEFT(value, CHARINDEX('€', value)-1)
If not all your data has a euro sign, consider WHERE value like '%€%'
Your current regex pattern is including a dot as the final character. Remove it and your approach should work:
SELECT REGEXP_REPLACE(value, '€.*$', '') AS value_out
FROM yourTable;
Or you can take the initial sequence of non-€ characters:
REGEXP_SUBSTR(value, '^[^€]+')

BigQuery LTRIM doesn't return desired result

I've got the following SQL code:
SELECT LTRIM("0039040123456","0039")
The result should be 040123456 but BigQuery returns 40123456.
Why is the 0 trimmed as well?
Bug or intended behavior?
Many thanks!
Try this:
SELECT LTRIM("0039p40123456","p039")
40123456
It removed the p too!
That's because:
If value2 contains more than one character or byte, the function removes all leading or trailing characters or bytes contained in value2.
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators?hl=en#trim
(so it looks at the list of characters, not the sequence of them)
What you really want is:
SELECT REGEXP_REPLACE("0039040123456","^0039","")
The column type is STRING. After looking at the docs https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators?hl=de#ltrim I guess this behavior is intended. (Have a look at the fruits example.)

How can I replace a string pattern with blank in hive?

I have a string as:
https://maps.googleapis.com/maps/api/staticmap?center=41.892532+-87.63811&zoom=11&scale=2&size=280x320&maptype=roadmap&format=png&visual_refresh=true%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:1%7C2413+S+State+St++Chicago+IL+60616%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:2%7C3000+N+Halsted+St++Chicago+IL+60657%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&key=AIzaSyBNEAQcC5niAEeiP3zkA_nuWGvtl0IOEs4
I want to replace the '++++' pattern at the end with blank and not the single occurrence of '+'. Tried using regexp_replace and translate functions in hive but that replaces all the single occurrences of '+' as well.
Use
regexp_replace(string,'[+]{4}','')
Pattern '[+]{4}' means + caracter four times.
Test:
select regexp_replace('++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&','[+]{4}','');
Result:
OK
++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C&
Dod you try this?
replace(string, '++++', '')
Admittedly, this will replace all occurrences of '++++', but your string only has one of them.

How to get substring based on a character and starting to read the string from the right

I have the following values on a column:
DB3-0800-VRET,
DB3-0800-IC,
IB-TZ-850-IB,
O11FS-OB ...
From each value I want to remove the last part after the dash.
I need to have the following result:
DB3-0800-VRET -> DB3-0800,
DB3-0800-IC -> DB3-0800,
O11FS-OB -> O11FS
I tried to work with the SPLIT_PART function of RedShift but I didn't have any luck.
If someone knows a regex to select the part I need I'd be grateful.
In both Postgres and Redshift, you should be able to use regexp_replace():
select regexp_replace(str, '-[^-]+$', '')