I am new to hive and trying to pull all the records in a table which match a particular format.
> Table structure-
> (
> id string,
> col_json string
> )
Within the json col_json, there is an attribute which is a text within which I am looking for the format
\"abc\":\"xyz\"
.
I am using a where condition like below -
and get_json_object(a.col_json,'$.Attributes[].attributeValues[].attributeValue') like '%\"abc\":\"xyz\"%'
But this does not seem to be working as I am not getting any rows returned.
Can someone suggest what is going wrong?
Add one more backslash before each backslash in like function and run again
hive>
get_json_object(a.col_json,'$.Attributes[].attributeValues[].attributeValue')
like '%\\"abc\\":\\"xyz\\"%'
With one back slash hive considers it is been used as escape character() so we need to use two back slashes(\) then hive considers as ****
Related
I have rows containing data like this in column called ERROR_CODE:
00111[2003] Maschine0; 000222[2003] Maschinen2
I need to filter out only values in the brackets like this in one row:
2003;2003
I have one solution but only to get first element. And I would need all of them...like 2003,2003
SUBSTRING(ERROR_CODE,CHARINDEX('[',ERROR_CODE)+1 ,CHARINDEX(']',ERROR_CODE)-CHARINDEX('[',ERROR_CODE)-1)
Could you pease help me to find a solution?
This is based on several assumptions:
Each error is semicolon (;) delimited
An error always contains one value in brackets ([])
You are using a fully supported version of SQL Server.
One method to achieve this would be to string your string on the delimiter (;). Then you can find the position left bracket ([) and the right (]) and SUBSTRING to get the content between. Thing finally you can string aggregate to get 1 row (per value of your column) again:
SELECT STRING_AGG(SUBSTRING(V.YourColumn,CI.LB +1, CI.RB - CI.LB - 1),',')
FROM (VALUES('00111[2003] Maschine0; 000222[2003] Maschinen2'))V(YourColumn)
CROSS APPLY STRING_SPLIT(V.YourColumn,';') SS
CROSS APPLY (VALUES(CHARINDEX('[',V.YourColumn),CHARINDEX(']',V.YourColumn)))CI(LB,RB)
GROUP BY V.YourColumn;
For point 3, if you are not using a fully supported version of SQL Server you will need to use a user defined (set based or CLR) string splitter and FOR XML PATH respectively for splitting and aggregating your strings. If either 1 and 2 are not true, you have a far more fundamental problem with your design that your let on; fix your design.
I need to create a View on top of a Hive Table, masking data in a particular column.
The Table has a column of String Type. The data in that particular column is of JSON structure. I need to mask a value of a particular field say 'ip_address'
{"id":1,"first_name":"john","last_name":"doe","email":"sample#123.com","ip_address":"111.111.111.111"}
expected:
{"id":1,"first_name":"john","last_name":"doe","email":"sample#123.com","ip_address":null}
These are the few Built-in Hive Functions I have tried, they don't seem to help my cause.
mask
get_json_object
STR_TO_MAP
if clause
Also I don't think substring and regexp_Extract are useful here coz the position of the field value is not always predetermined plus I'm not familiar with regex expressions.
PS: Any help is appreciated that would help me avoid writing a new UDF.
regexp_replace:
select regexp_replace(column_name,'"ip_address":".*?"', '"ip_address":null') as column_name will work fine with any position.
You can add any number of optional spaces before and after ::
regexp_replace(column_name,'"ip_address" *: *".*?"', '"ip_address":null')
Regexp '"ip_address" *: *".*?"' meaning:
"ip_address" - literally "ip_address"
* - 0 or more spaces (allowed in json)
: - literally :
* - 0 or more spaces
".*?" - any number of any characters (non-greedy) inside double-quotes.
See also similar question if you want to replace value with some calculated value, for example obfuscate using sha256, not with just null: https://stackoverflow.com/a/54179543/2700344
I need to convert the name of the table into lower before passing it for the query.
Irrespective of which case in pass the value for parameter $1 i need it to be converted into lower case before executing the below query.
QUERY:
show tables like '$1';
I have tried something like
QUERY
show tables like 'lower($1)';
But this doesn't work.
please help.
Your response would be highly appreciated
Impala identifiers are always case-insensitive. That is, tables named
t1 and T1 always refer to the same table, regardless of quote
characters. Internally, Impala always folds all specified table and
column names to lowercase. This is why the column headers in query
output are always displayed in lowercase.
Impala Documentation
All the below queries will give same result as internally impala converts to lowercase.
show tables like 'test*';
show tables like 'TeSt*';
show tables like 'TEST*';
Hi i was trying to extract portion of data from one column in my hive table but the position of character is not in one place
select value4,regexp_extract(value4,'*****',0) from hive_table;
column value is shown below
grade:data:home made;Cat;dinnerbox_grade_Enroll
list:date:may;animal;dinnerbox_list_value
cgrade:made_data;dinnerbox_cgrade_notEnroll
I want data from dinnerbox to till end.
Can any one help on this?
It is a pretty simple regular expression
.*dinnerbox(.*?)$
Using a non-greedy wildcard, but forcing it to the end of the line makes sure that you always get the dinnerbox at the end.
You want capture group 1
To get rid of the _ you can use
.*dinnerbox_(.*?)$
I have a column that I need to clean the data up on.
First I'd like to do a select to get a record of the bad data then I've like to run a replace on the invalid charters.
I'm looking to select anything that contains non alphanumeric characters but ignores the slash "\" as the second character and also ignores underscores and dashes in the rest of the string. Here's a couple of example of the data I'm expecting to get back from this query.
#\AAA
A\Adam's
A\Amanda.Smith
B\Bear's-ltd
C\Couple & More
After this I'd like to run a replace on any of these invalid characters and replace them with underscores so the result would look like this:
_\AAA
A\Adam_s
A\Amanda_Smith
B\Bear_s-ltd
C\Couple_More
I do not think there is native support for that. You can create a CLR to support regex, ex: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/