Remove nulls from an array in SQL - sql
Want to remove nulls from an array in hive/sql
for example : array is ['1',null] after converting to string values it should be '1' only.
to split the array I am using below:
concat_ws( ",", array_val)
this gives : 1,null
required output : 1
Thanks for the help!
Use regexp_replace to remove null from concatenated string:
hive> select regexp_replace('null,1,2,null,2,3,null','(,+null)|(^null,)','');
OK
1,2,2,3
Time taken: 6.006 seconds, Fetched: 1 row(s)
Related
R cuts the string result from SQL query
I have a simple query where [column] has character data type. I get a result where all the values is cut to 16 characters length. The string with more than 16 character length is the row number 500 or so, so I guess the r or sql server cuts the length based on the first 100 or 200 lines. Is there a way how to eliminate this? odbc::dbGetQuery(con, "SELECT [column] FROM [database]") expected result: column String StringString ... StringStringString StringStringStringString received result: column String StringString ... StringStringStri StringStringStri
Well, Tim, your advice actually helped. After changing the cache settings, the output is as expected. Thanks!
how to use regexp_extract in hive
I am trying to extract a portion of the below string using regexp_extract but am not having any success: CUST_NEW_ACCOUNTS_LINES_2019-03-03.dat.gz I want to just get the date portion. On the regex101.com website this seemed to work, but hive is giving me an error message. regexp_extract(meta_source_filename,'^(?:[^_]+_){4}([^_]+)') file_date Can someone help me understand what is incorrect here? I am not at all familiar with regexp_extract syntax so have been using another function as a starting point. Thank you!
with your_data as ( select 'CUST_NEW_ACCOUNTS_LINES_2019-03-03.dat.gz' str ) select regexp_extract(str,'_(\\d{4}(-\\d{2}){2})\\.',1) from your_data; Result: OK 2019-03-03 Time taken: 0.062 seconds, Fetched: 1 row(s) Expression '_(\\d{4}(-\\d{2}){2})\\.' means: underscore _ four digits \\d{4} repeat (hyphen and two digits) two times (-\\d{2}){2} dot\\. Capture group number one (date only): (\\d{4}(-\\d{2}){2}) . In Hive you need to use \\ for shielding.
You have captured the substring you need into a capturing group. You should use the number, ID of the group as the third argument: regexp_extract(meta_source_filename,'^(?:[^_]+_){4}([^_]+)', 1) file_date ^ See the regexp_extract(string subject, string pattern, int index) docs: The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.
Insert an array of zeros into hive
I have a hive table with a column of type array, how can I insert a record with say, an array of 1000 0s? I could just use a script to create an array of 1000 0s but looking for something more compact and less error prone
Use lpad for generating string of 0 with length of 1000. then use split function to split result by empty string except beginning and end of the string: hive> select split(lpad(0,1000,0),'(?!^)(?!$)'); The result is an array of 1000 0s: ["0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0"]
A quick hack is to select 1000 rows from a table and then use collect_list. select collect_list(res) from (select 0 as res from tblWithGT1000Rows limit 1000 ) t
Hive variable concatenation
I am facing problems in concatenating the value of a variable with a string . my script contains the below set hivevar:tab_dt= substr(date_sub(current_date,1),1,10); CREATE TABLE default.udr_lt_bc_${hivevar:tab_dt} ( trans_id double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; in the above, the variable tab_dt gets assigned correctly with yesterdays date in the format yyyymmdd. but when i try to concatenate this variable in a table name with a static string, the script fails. it is not doing the concatenation . Kindly provide a solution. note: i tried the below too, which is erroring out too set hivevar:tab_dt= substr(date_sub(current_date,1),1,10); set hivevar:tab_nm1= default.udr_lt_bc_; set hivevar:tab_name= concat(${hivevar:tab_dt},${hivevar:tab_nm1}) CREATE TABLE ${hivevar:tab_name} ( trans_id double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; This too is returning an error.
Hive does not calculate expressions in the variables, substituting them as is. Your create table expression results in this: CREATE TABLE default.udr_lt_bc_substr(date_sub(current_date,1),1,10)... Your second expression results in this: CREATE TABLE concat(substr(date_sub(current_date,1),1,10),default.udr_lt_bc_) Unfortunately Hive does not support such expressions in DDL. I recommend to calculate this variable in a shell and pass as a --hivevar to the hive script. For example in the sell script: table_name=udr_lt_bc_$(date +'%Y_%m_%d' --date "-1 day") #table_name is udr_lt_bc_2017_10_31 now #call your script hive -hivevar table_name="$table_name" -f your_script.hql And then in your_script you can use variable: CREATE TABLE default.${hivevar:table_name} Note that '-' is not allowed in table names, that is why i used '_' instead. For better understanding how Hive substitutes variables, try this: hive> set hivevar:tab_dt= substr(date_sub(current_date,1),1,10); hive> select ${hivevar:tab_dt}; OK 2017-10-31 Time taken: 1.406 seconds, Fetched: 1 row(s) hive> select '${hivevar:tab_dt}'; OK substr(date_sub(current_date,1),1,10) Time taken: 0.087 seconds, Fetched: 1 row(s) Note that in the first select statement the variable was substituted as is before execution and then calculated in the SQL. Second select statement prevent calculation because the variable is quoted and remains as is: substr(date_sub(current_date,1),1,10).
Another way in Hive: select concat("table_",date_sub(from_unixtime(unix_timestamp(current_date,'yyyy-MM-dd'),'yyyy-MM-dd'),0)); Here, we can use above in a variable and use it as per our needs.
In Hive I need to Get numeric value after a particular word is it possible?
i want to get a numeric value immediately after a particular word in string In hive for example : APDSGDSCRAM051 in that i need to get numeric value after word RAM is it possible in hive Note: its not a fixed length string
Here you go, you need to use substr and instr pre-defined hive functions: create table str_testing (c string); insert into table str_testing values ('APDSGDSCRAM051'); select substr(c, instr(c, 'RAM') + 3) from str_testing; OK 051 Time taken: 0.243 seconds, Fetched: 1 row(s)
As explained here, you can implemented in hive as select regexp_extract(name, '\\d+', 0) from <table_name>; Note: I do not have environment for Hive configured so you can check this by running at your end. Ya this will work only for first set of numbers found in your string, if you string has numbers at multiple places this might fail.