Convert a sequential number to letter in HiveQL - hive

I have a column in my table with the following structure:
column_1
1233042
2233098
5230980
I need to replace the first number to a letter:
column_1
A233042
B233098
E230980
The range of numbers are 1-9, then the letters will be A-I.
Is there a way to do it without using CASE statement for each number in Hive?

You can use such a combination of functions :
SELECT DECODE(UNHEX(HEX(SUBSTR(column_1,1,1)+64)), 'US-ASCII')||
SUBSTR(column_1,1-LENGTH(column_1))
FROM t
where the first character replacement conforms to ASCII conversion logic(65->A,66->B,69->E...)

Related

Extract number between two characters in Hive SQL

The query below outputs 1642575.0. But I only want 1642575 (just the number without the decimal and the zero following it). The number of delimited values in the field varies. The only constant is that there's always only one number with a decimal. I was trying to write a regexp function to extract the number between " and ..
How would I revise my regexp_extract function to get the desired output? Thank you!
select regexp_extract('{"1244644": "1642575.0", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*');
You can cast the result to bigint.
select cast(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*') as bigint) col;
output - 1642575
You can use round if you want to round it off.
select round(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*')) col;
output - 1642576
Use this regexp: '"(\\d+)\\.' - means double-quote, capturing group with one or more digits, dot.
select regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','"(\\d+)\\.',1)
Result:
1642575
To skip any number of leading zeroes, use this regexp: '"0*(\\d+)\\.'

Find non-hexadecimal formatted data

I am trying to write a query in hive to find the rows that are not in hexadecimal format. I used RLIKE which retrieves the rows that are in hexadecimal.
Just use NOT RLIKE:
select col from table where col not rlike '^[0-9A-F]+$'; --apply your hexadecimal pattern
For both upper and lower case use this pattern: '^[0-9a-fA-F]+$'

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.

MAX in SQLSERVER

This may be a small one but i could find any , see this is how it is..
I have a sqlserver table with two columns and two rows , one of the column's name is Number and it has two rows with values
1. c7df055e-f8b5-4fc5-9c0a-8f59624c4022
2. 1234
When i query the table with this query select max(Number) from table table_name
Its giving the result c7df055e-f8b5-4fc5-9c0a-8f59624c4022 , So how does MAX calculate the maximum value when any of the values contains characters, i have searched for this and found this
For character columns, MAX finds the highest value in the collating sequence.
But could understand better , so anyone please suggest a better explanation..
Thanks in advance
Collating sequence refers to the definition of how the numeric codes translate to characters. ASCII is a common collating sequence, for example; the byte "65" translates to the character "A", the byte "58" translates to the character "8" etc.
Most languages will compare character by character, comparing the underlying values. So "c" is 99 ASCII, and "1" is 49 ASCII, so the string starting with "c" will be the larger value. In general, lowercase letters are higher than upper case are higher than numbers, and other characters are all over the place.
Your "number" column is a text type (evidenced by presence of alpha and hyphen chars). For text types, sorting is alphabetic, and letters are "higher" than numbers, so the value starting with "c" is greater than one starting with "1".
Sorting has nothing to do with the format if the value: If the first character of the alphanumeric value was a zero, you would have got "1234" as the max.

MSAccess Query a string matching a pattern

I have a table with a string field containing location information. I want to be able to query this table and retrieve all of the tags matching the format xxxxxxAA where xxxxxx is a 6-digit number and AA is two alphabetic characters.
Is there a method of querying this using SQL or is this something that I need to do in VBA?
Sample data:
BGS5 PM RGP5
022051PM
022201PM
030539PM
WAS3N
179546MM
And I want to return the following without knowing the values:
022051PM
022201PM
030539PM
179546MM
thanks in advance
Jason
You can use a query with a Like comparison in the WHERE clause.
SELECT y.text_field
FROM YourTable AS y
WHERE y.text_field Like '######[A-Z][A-Z]'
The # matches a digit.
[A-Z] matches one character from a character class consisting of only letters. That character class is actually upper case letters. However, the comparison is case-insensitive, so will match lower case letters, too.