I have column in hive table like below
testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Problem is that you're comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here's a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[#]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"
Related
I am trying to define a table that has a column that is an arrays of structs using standard sql. The docs here suggest this should work:
CREATE OR REPLACE TABLE ta_producer_conformed.FundStaticData
(
id STRING,
something ARRAY<STRUCT<INT64,INT64>>
)
but I get an error:
$ bq query --use_legacy_sql=false --location=asia-east2 "$(cat xxxx.ddl.temp.sql | awk 'ORS=" "')"
Waiting on bqjob_r6735048b_00000173ed2d9645_1 ... (0s) Current status: DONE
Error in query string: Error processing job 'xxxxx-10843454-yyyyy-
dev:bqjob_r6735048b_00000173ed2d9645_1': Illegal field name:
Changing the field (edit: column!) name does not fix it. What I am doing wrong?
The fields within the struct need to be named so this works:
CREATE OR REPLACE TABLE ta_producer_conformed.FundStaticData
(
id STRING,
something ARRAY<STRUCT<x INT64,y INT64>>
)
I want to store current_day - 1 in a variable in Hive. I know there are already previous threads on this topic but the solutions provided there first recommends defining the variable outside hive in a shell environment and then using that variable inside Hive.
Storing result of query in hive variable
I first got the current_Date - 1 using
select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Then i tried two approaches:
1. set date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
and
2. set hivevar:date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Both the approaches are throwing an error:
"ParseException line 1:82 cannot recognize input near 'select' 'date_sub' '(' in expression specification"
When I printed (1) in place of yesterday's date the select query is saved in the variable. The (2) approach throws "{hivevar:dt_chk} is undefined
".
I am new to Hive, would appreciate any help. Thanks.
Hive doesn't support a straightforward way to store query result to variables.You have to use the shell option along with hiveconf.
date1 = $(hive -e "set hive.cli.print.header=false; select date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1);")
hive -hiveconf "date1"="$date1" -f hive_script.hql
Then in your script you can reference the newly created varaible date1
select '${hiveconf:date1}'
After lots of research, this is probably the best way to achieve setting a variable as an output of an SQL:
INSERT OVERWRITE LOCAL DIRECTORY '<home path>/config/date1'
select CONCAT('set hivevar:date1=',date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1)) from <some table> limit 1;
source <home path>/config/date1/000000_0;
You will then be able to use ${date1} in your subsequent SQLs.
Here we had to use <some table> limit 1 as hive got a bug in insert overwrite if we don't specify a table name.
How do I bind a variable to a SQL set for an IN query in Perl DBI?
Example:
my #nature = ('TYPE1','TYPE2'); # This is normally populated from elsewhere
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ?"
) || die("Failed to prepare query: $DBI::errstr");
# Using the array here only takes the first entry in this example, using a array ref gives no result
# bind_param and named bind variables gives similar results
$qh->execute(#nature) || die("Failed to execute query: $DBI::errstr");
print $qh->fetchrow_array();
The result for the code as above results in only the count for TYPE1, while the required output is the sum of the count for TYPE1 and TYPE2. Replacing the bind entry with a reference to #nature (\#nature), results in 0 results.
The main use-case for this is to allow a user to check multiple options using something like a checkbox group and it is to return all the results. A work-around is to construct a string to insert into the query - it works, however it needs a whole lot of filtering to avoid SQL injection issues and it is ugly...
In my case, the database is Oracle, ideally I want a generic solution that isn't affected by the database.
There should be as many ? placeholders as there is elements in #nature, ie. in (?,?,..)
my #nature = ('TYPE1','TYPE2');
my $pholders = join ",", ("?") x #nature;
my $qh = $dbh->prepare(
"SELECT count(ref_no) FROM fm_fault WHERE nature IN ($pholders)"
) or die("Failed to prepare query: $DBI::errstr");
I have a datafile sales_history. I want to query it in the following way.
my_df<-sqldf("SELECT *
FROM sales_history
WHERE Business_Unit=='RETAIL'"")
Now I want to write a function with argument datafile and column name to do the above job. So something like:
pick_column<-function(df, column_name){
my_df<-sqldf("SELECT *
FROM df
WHERE Business_Unit==column_name"
return(my_df)
}
Ideally, after running the above function definition, I should then be able to run
pick_column(sales_history,'RETAIL'). But when I do this, the second argument 'RETAIL' is not passed to the function correctly. What's the correct way to do this then?
I know that for this example, there are other ways to do this other than using "sqldf" for SQL query. But the point of my question here is how to pass the column_name correctly as a function argument.
the sqldf package uses gsubfn to allow you to add names of R variables into your SQL commands by prefixing them with the "$" character. So you can write
sales_history <- data.frame(
price=c(12,10),
Business_Unit=c("RETAIL","BUSINESS"),
stringsAsFactors=F
)
pick_column <- function(df, columnname) {
fn$sqldf("SELECT * FROM $df WHERE Business_Unit='$columnname'")
}
pick_column("sales_history","RETAIL")
i have this script that retrieves a parameter (a number) from a SQL query and assigns it to a variable.
there are two options - either the SQL query finds a value and then the script preforms
echo "the billcycle number is $v_bc", or it doesnt find a value and it suppose to
echo "no billcycle parameter found".
im having a problem with the if condition.
this is what i came up with:
#!/bin/bash
v_bc=`sqlplus -s /#bscsprod <<EOF
set pagesize 0
select billcycle from bc_run
where billcycle not in (50,16)
and control_group_ind is null
and billseqno=6043;
EOF`
if [ -z "$v_bc" ]; then echo no billcycle parameter found
else echo "the billcycle parameter is $v_bc"
fi
when billseqno=6043, then it means that v_bc=25, and when i run the script, the result is:
"the billcycle parameter is 25". which is what i ment it to do.
when i set billseqno=6042, according to the above SQL query, v_bc will get no value, therefore what i want it to do is echo "no billcycle parameter found".
instead i get
"the billcycle parameter is
no rows were selected".
any suggestions ?
thanks very much
Assaf.
Your code is correctly checking for an empty value -- v_bc is just not empty, even with -s.
You may either:
parse the output of sqplus when no rows are returned, so add this:
if [[ "$v_bc" == "no rows were selected" ]]; then v_bc=""; fi
This uses the bash [[ ]] command with "==" for pattern matching so we don't need to worry about leading/trailing whitespace. This is not as robust as dogbane's SET FEEDBACK OFF since it's entirely possible for "no rows were selected" to be valid data.
write a better query which always returns data, like this:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:594023455752#followup-221571500346463844
The trick being to reformulate your query and use select/union with a fallback query that conditionally provides output when your query is empty:
with data as
(select billcycle from bc_run where [...])
select * from data
union all
select 'NA', null from dual where not exists (select null from data);
(see also What is the dual table in Oracle? )
Try turning feedback off in sqlplus so that you don't get any output if no rows are selected:
v_bc=`sqlplus -s /#bscsprod <<EOF
SET FEEDBACK OFF
set pagesize 0
select billcycle from bc_run
where billcycle not in (50,16)
and control_group_ind is null
and billseqno=6043;
EOF`
Try something like
if [ "${v_bc:-SHOULDNTHAPPEN}" = "SHOULDNTHAPPEN" ]; then
echo no billcycle parameter found
else......
The idiomatic way to assign a value to a variable if that variable is empty is with an = in a parameter expansion. In other words, this:
: ${v_bc:=no billcycle paramter found}
is equivalent to:
test -z "$v_bc" && v_bc='no billcycle paramter found'
In your case, it would also be easy to do:
echo ${v_bc:-no billcycle parameter found}
but the question asked in the title is not really the problem in your case. (Since the problem is not that v_bc is the empty string, but rather that it is a value you do not expect.)