I am trying to write a query in hive to find the rows that are not in hexadecimal format. I used RLIKE which retrieves the rows that are in hexadecimal.
Just use NOT RLIKE:
select col from table where col not rlike '^[0-9A-F]+$'; --apply your hexadecimal pattern
For both upper and lower case use this pattern: '^[0-9a-fA-F]+$'
Related
The query below outputs 1642575.0. But I only want 1642575 (just the number without the decimal and the zero following it). The number of delimited values in the field varies. The only constant is that there's always only one number with a decimal. I was trying to write a regexp function to extract the number between " and ..
How would I revise my regexp_extract function to get the desired output? Thank you!
select regexp_extract('{"1244644": "1642575.0", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*');
You can cast the result to bigint.
select cast(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*') as bigint) col;
output - 1642575
You can use round if you want to round it off.
select round(regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','([1-9][0-9]*[.][0-9]+)&*')) col;
output - 1642576
Use this regexp: '"(\\d+)\\.' - means double-quote, capturing group with one or more digits, dot.
select regexp_extract('{"1244644": "1642575.9", "1338410": "1650435"}','"(\\d+)\\.',1)
Result:
1642575
To skip any number of leading zeroes, use this regexp: '"0*(\\d+)\\.'
I have a column in my table with the following structure:
column_1
1233042
2233098
5230980
I need to replace the first number to a letter:
column_1
A233042
B233098
E230980
The range of numbers are 1-9, then the letters will be A-I.
Is there a way to do it without using CASE statement for each number in Hive?
You can use such a combination of functions :
SELECT DECODE(UNHEX(HEX(SUBSTR(column_1,1,1)+64)), 'US-ASCII')||
SUBSTR(column_1,1-LENGTH(column_1))
FROM t
where the first character replacement conforms to ASCII conversion logic(65->A,66->B,69->E...)
Due to an earlier error, I ended up with letters and symbols in places where I should have had integers and floars. At this time, I don't know the extent of the problem and working to correct the code as we move forward.
As of right now when I run SELECT distinct col1 from table; I get integers, floats, symbols and letters. A few million of them.
How can I update the SQL to exclude all numbers? In other words, show me only letters and symbols.
You can use the GLOB operator:
select col1
from tablename
where col1 GLOB '*[^0-9]*'
This will return all values of col1 that contain any character different than a digit.
You may change it to include '.' in the list of chars:
where col1 GLOB '*[^0-9.]*'
See the demo.
If what you want is values that do not contain any digits then use this:
select col1
from tablename
where col1 not GLOB '*[0-9]*'
See the demo.
Hmmm . . . SQLite doesn't have regular expressions built-in, making this a bit of a pain. If the column actually contains numbers and strings (because that is possible in SQLite), you can use
where typeof(col) = 'text'
If the types are all text (so '1.23' rather than 1.23), then this may do what you want:
where cast( (col + 0) as text) = col
I would like to extract strings of varying length located between two repeating underscores in Hive QL. Below I show a sampling of the pattern of the rows. Specifically, I would like to extract the string between the 3rd and 4th underscores. Thanks!
2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250
2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600
2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600
Just kept trying and I modified this from previous responses to non-Hive SQL. I am still interested in knowing better ways of doing this. Note that creative_str is the name of the column:
select creative_str, ltrim(rtrim(substring(regexp_replace(cast(creative_str as varchar(1000)), '_', repeat(cast(' ' as varchar(1000)),10000)), 30001, 10000)))
from impression_cr
You should be able to do this with Hive's SPLIT() function. If you're trying to grab the value between the third and fourth underscores, this will do it:
SELECT SPLIT("2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250", "[_]")[3],
SPLIT("2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600", "[_]")[3],
SPLIT("2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600", "[_]")[3]
I have SQL table where username have different cases for example "ACCOUNTS\Ninja.Developer" or "ACCOUNTS\ninja.developer"
I want to find the how many records where username where first in first and last name capitalize ? how can use Regex to find the total ?
x table
User
"ACCOUNTS\James.McAvoy"
"ACCOUNTS\michael.fassbender"
"ACCOUNTS\nicholas.hoult"
"ACCOUNTS\Oscar.Isaac"
Do you want something like this?
select count(*)
from t
where name rlike 'ACCOUNTS\[A-Z][a-z0-9]*[.][A-Z][a-z0-9]*'
Of course, different databases implement regular expressions differently, so the actual comparator may not be rlike.
In SQL Server, you can do:
select count(*)
from t
where name like 'ACCOUNTS\[A-Z][^.][.][A-Z]%';
You might need to be sure that you have a case-sensitive collation.
In most cases in MS SQL string collation is case insensitive so we need some trick. Here is an example:
declare #accts table(acct varchar(100))
--sample data
insert #accts values
('ACCOUNTS\James.McAvoy'),
('ACCOUNTS\michael.fassbender'),
('ACCOUNTS\nicholas.hoult'),
('ACCOUNTS\Oscar.Isaac')
;with accts as (
select
--cleanup and split values
left(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)-1) frst,
right(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)) last
from #accts
)
,groups as (--add comparison columns
select frst, last,
case when CAST(frst as varbinary(max)) = CAST(lower(frst) as varbinary(max)) then 'lower' else 'Upper' end frstCase, --circumvert case insensitive
case when CAST(last as varbinary(max)) = CAST(lower(last) as varbinary(max)) then 'lower' else 'Upper' end lastCase
from accts
)
--and gather fruit
select frstCase, lastCase, count(frst) cnt
from groups
group by frstCase,lastCase
Your question is a little vague but;
You might be looking for the DISTINCT command.
REF
I don't think you need regex.
Maybe do something like:
Get distinct names from Table X as Table A
Use inputs table A as where clause on Table X
count
union
I hope this helps,
Rhys
Given your example set you can use a combination of techniques. First if the user name always begins with "ACCOUNTS\" then you can use substr to select the characters that start after the "\" character.
For the first name:
Then you can use a regex function to see if it matches against [A-Z] or [a-z] assuming your username must start with an alpha character.
For the last name:
Use the instr function on the substr and search for the character '.' and again apply the regex function to match against [A-Z] or [a-z] to see if the last name starts with an upper or a lower character.
To total:
Select all matches where both first and last match against upper and do a count. Repeat for the lower matches and you'll have both totals.