How to pass variables as expressions in Bigquery timestamp_diff - google-bigquery

Currently, I am trying to check the timestamp difference in hours with expressions passed as a variables through the command line. But I am unable to get the desired output when passing through variables.
a=2019-11-1812:49:43
b=2020-04-04 20:32:33
timediff=$(bq query --nouse_legacy_sql \ 'SELECT TIMESTAMP_DIFF(TIMESTAMP "'$a'", TIMESTAMP "$b", HOUR);')
Looks like the variables I am passing are not recognized. Can someone help me understand the correct way of doing it?

In addition to Hemant's answer to further contribute with the community I will provide an alternative method.
As stated in the documentation, it is possible to use parameterized queries in BigQuery using the Command-Line interface (CLI). You need to use the flag --parameter within your bq query command in order to specify the varibles/parameters you will use.
This flag must be in the format name:type:value. Although, if type is omitted it will used as STRING. As an example:
timediff= $(bq query --use_legacy_sql=false
--parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00'
--parameter='ts_value1:TIMESTAMP:2016-12-07 09:00:00'
'SELECT
TIMESTAMP_DIFF(#ts_value,#ts_value1, HOUR)')
echo $timediff
And the output is:
+-----+
| f0_ |
+-----+
| -1 |
+-----+
You could use --format=csv to format the output as a line:
f0_ -1
In addition, I would like to add that you can use aliases to simplify your query. For instance:
alias bq_set="bq query --use_legacy_sql=false --format=pretty"
timediff=$(bq_set
--parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00'
--parameter='ts_value1:TIMESTAMP:2016-12-07 09:00:00'
'SELECT
TIMESTAMP_DIFF(#ts_value,#ts_value1, HOUR)')
echo $timediff
The output:
+-----+
| f0_ |
+-----+
| -1 |
+-----+
As you can see it was just an alternative to simply your query.

Try using single quotes around the variables, but double-quotes around the entire query. For example:
a='2019-11-18 12:49:43'
b='2020-04-04 20:32:33'
timediff=$(bq query --format=csv --nouse_legacy_sql "SELECT TIMESTAMP_DIFF(TIMESTAMP '$a', TIMESTAMP '$b', HOUR);" | awk
'NR>1')
echo $timediff
-3319

Related

How to reference an eval variable in query

I am trying to access a variable (in this example; sampleFromDate and sampleToDate) from a sub-query. I have defined the variables with syntax eval variableName = value and would like to access with syntax filterName=$variableName$. See the example below where I am trying to access values using earliest=$sampleFromDate$ latest=$sampleToDate$
index=*
earliest=-8d latest=-1d
| eval sampleToDate=now()
| eval sampleFromDate=relative_time(now(), "-1d")
| appendcols [
search (index=*)
earliest=$sampleFromDate$ latest=$sampleToDate$
]
This produces the error:
Invalid value "$sampleFromDate$" for time term 'earliest'
The value of sampleFromDate is in the format seconds since epoch time, e.g.
1612251236.000000
I know I can do earliest=-d latest=now() - but I don't want to do this because I want to reference the variables in several locations and output them at the end.
Why are you trying to eval those time values?
Just do:
index=* earliest=-8d latest=-1d
| <rest of search>
| appendcols [
search (index=*) earliest=-1d
| <rest of appended search>
]
There's no need to explicitly set latest unless you want something other than now()

Arguments mismatch using where IN clause in query

I have column in hive table like below
testing_time
2018-12-31 14:45:55
2018-12-31 15:50:58
Now I want to get the distinct values as a variable so I can use in another query.
I have done like below
abc=`hive -e "select collect_set(testing_time)) from db.tbl";`
echo $abc
["2018-12-31 14:45:55","2018-12-31 15:50:58"]
xyz=${abc:1:-1}
when I do
hive -e "select * from db.tbl where testing_time in ($xyz)"
I get below error
Arguments for IN should be the same type! Types are {timestamp IN (string, string)
what the the mistake I am doing?
What is the correct way of achieving my result?
Note: I know I can use subquery for this scenario but I would like to use variable to achieve my result
Problem is that you're comparing timestamp (column testing_time) with string (i.e. "2018-12-31 14:45:55"), so you need to convert string to timestamp, which you can do via TIMESTAMP(string).
Here's a bash script that adds the conversion:
RES="" # here we will save the resulting SQL
IFS=","
read -ra ITEMS <<< "$xyz" # split timestamps into array
for ITEM in "${ITEMS[#]}"; do
RES="${RES}TIMESTAMP($ITEM)," # add the timestamp to RES variable,
# surrounded by TIMESTAMP(x)
done
unset IFS
RES="${RES%?}" # delete the extra comma
Then you can run the constructed SQL query:
hive -e "select * from db.tbl where testing_time in ($RES)"

How to pass dynamic parameterss in where condition in bq command line

FTIMESTAMP="2018-07-09 00:00:00"
LTIMESTAMP="2018-07-09 08:00:00"
echo $FTIMESTAMP
echo $LTIMESTAMP
bq query --nouse_legacy_sql 'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= "$FTIMESTAMP" AND _PARTITIONTIME < "$LTIMESTAMP"'
When I ran these commands in .sh script, it gave the following error:
*Error in query string: Error processing job '************': Could not cast literal "$FTIMESTAMP" to type TIMESTAMP at [3:25].*
I want to pass those parameters dynamically once this query is successful.
Or is there any other way to extract the data for last 8 hours on the basis of partition time.
It's really a better idea to use query parameters instead of modifying your query text directly; you won't have issues where the query text ends up with syntax errors or other problems. Here is an example using parameters with the names from your question:
$ bq query --use_legacy_sql=false \
--parameter=FTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
--parameter=LTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
"SELECT #FTIMESTAMP, #LTIMESTAMP;"
+---------------------+---------------------+
| f0_ | f1_ |
+---------------------+---------------------+
| 2018-07-09 00:00:00 | 2018-07-09 00:00:00 |
+---------------------+---------------------+
In your case, you would want something like this:
$ bq query --nouse_legacy_sql \
--parameter=FTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
--parameter=LTIMESTAMP:TIMESTAMP:"2018-07-09 00:00:00" \
'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= #FTIMESTAMP AND _PARTITIONTIME < #LTIMESTAMP'
If you still want to set the parameter values from shell variables, you can do so like this:
$ FTIMESTAMP="2018-07-09 00:00:00"
$ LTIMESTAMP="2018-07-09 00:00:00"
$ bq query --nouse_legacy_sql \
--parameter=FTIMESTAMP:TIMESTAMP:"$FTIMESTAMP" \
--parameter=LTIMESTAMP:TIMESTAMP:"$LTIMESTAMP" \
'insert `table1`(Time,UserId)
select Time,UserId from `table2`
WHERE _PARTITIONTIME >= #FTIMESTAMP AND _PARTITIONTIME < #LTIMESTAMP'
This sets the values of the query parameters from the shell variables, which are then passed to BigQuery.

Create table name in Hive using variable subsitution

I'd like to create a table name in Hive using variable substitution.
E.g.
SET market = "AUS";
create table ${hiveconf:market_cd}_active as ... ;
But it fails. Any idea how it can be achieved?
You should use backtrics (``) for name for that, like:
SET market=AUS;
CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
DESCRIBE `${hiveconf:market}_active`;
Example run script.sql from beeline:
$ beeline -u jdbc:hive2://localhost:10000/ -n hadoop -f script.sql
Connecting to jdbc:hive2://localhost:10000/
...
0: jdbc:hive2://localhost:10000/> SET market=AUS;
No rows affected (0.057 seconds)
0: jdbc:hive2://localhost:10000/> CREATE TABLE `${hiveconf:market}_active` AS SELECT 1;
...
INFO : Dag name: CREATE TABLE `AUS_active` AS SELECT 1(Stage-1)
...
INFO : OK
No rows affected (12.402 seconds)
0: jdbc:hive2://localhost:10000/> DESCRIBE `${hiveconf:market}_active`;
...
INFO : Executing command(queryId=hive_20190801194250_1a57e6ec-25e7-474d-b31d-24026f171089): DESCRIBE `AUS_active`
...
INFO : OK
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| _c0 | int | |
+-----------+------------+----------+
1 row selected (0.132 seconds)
0: jdbc:hive2://localhost:10000/> Closing: 0: jdbc:hive2://localhost:10000/
Markovitz's criticisms are correct, but do not produce a correct solution. In summary, you can use variable substitution for things like string comparisons, but NOT for things like naming variables and tables. If you know much about language compilers and parsers, you get a sense of why this would be true. You could construct such behavior in a language like Java, but SQL is just too crude.
Running that code produces an error, "cannot recognize input near '$' '{' 'hiveconf' in table name".(I am running Hortonworks, Hive 1.2.1000.2.5.3.0-37).
I spent a couple hours Googling and experimenting with different combinations of punctuation, different tools ranging from command line, Ambari, and DB Visualizer, etc., and I never found any way to construct a table name or a field name with a variable value. I think you're stuck with using variables in places where you need a string literal, like comparisons, but you cannot use them in place of reserved words or existing data structures, if that makes sense. By example:
--works
drop table if exists user_rgksp0.foo;
-- Does NOT work:
set MY_FILE_NAME=user_rgksp0.foo;
--drop table if exists ${hiveconf:MY_FILE_NAME};
-- Works
set REPORT_YEAR=2018;
select count(1) as stationary_event_count, day, zip_code, route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = '${hiveconf:REPORT_YEAR}'
-- Does NOT Work:
set MY_VAR_NAME='zip_code'
select count(1) as stationary_event_count, day, '${hiveconf:MY_VAR_NAME}', route_id from aaetl_dms_pub.dms_stationary_events_pub
where part_year = 2018
The qualifies should be removed
You're using the wrong variable name
SET market=AUS; create table ${hiveconf:market}_active as select 1;

BigQuery select alias using regex_extract_all in standard mode

I'm unable to reference a SELECT alias in BigQuery (standard mode).
Trying to do this query:
SELECT
REGEXP_EXTRACT_ALL(text,
r"(<div \w+>)") AS matches
FROM
regex.test
WHERE
matches IS NOT NULL
Here are steps to reproduce.
bq mk regex
bq mk -t regex.test id:integer,text:string
echo '{"id":1, "text":"<div a>"}' | bq insert regex.test
echo '{"id":2, "text":"<div b>"}' | bq insert regex.test
echo '{"id":3, "text":"<div>"}' | bq insert regex.test
bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE id IS NOT NULL"
+--------------+
| matches |
+--------------+
| [u'<div b>'] |
| [] |
| [u'<div a>'] |
+--------------+
When I try to reference the matches alias, I see an error:
bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE matches IS NOT NULL"
Error in query string: Error processing job 'myname': Unrecognized name:
matches
I am unable to reference the alias matches, and am unable to filter those results WHERE matches IS NOT NULL.
Does anyone know what I'm doing incorrectly here?
Thanks!
Even in BQ, you can't use a column alias in the where clause. Just use a subquery:
SELECT t.*
FROM (SELECT REGEXP_EXTRACT_ALL(text, r"(<div \w+>)") AS matches
FROM regex.test
) t
WHERE ARRAY_LENGTH(matches) > 0
Check out SELECT list aliases visibility
The reason why comparing with NULL does't work for REGEXP_EXTRACT_ALL is because
it returns array so checking with length is the way. Comparing with NULL still will work for REGEXP_EXTRACT
In addition, ideally you should be able use REGEX_MATCH to filter out records w/o matches, but looks like there is an issue with this function in standard mode