executing HIVE query in background - hive

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.

Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output

True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)

I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

Related

Bigquery error, "parenthesized expression cannot be parsed"

Hello i have the next code and when i run it inside Bigquery is giving me the "correct" answer but when i put it into a sh script and execute the script inside google shell vm gives me the next error. Any thoughts?
I suppose the error lies in the need to multiply case/when statements results inside another case/when statement..
This is an example of what my code looks like:
SELECT CASE WHEN (
(CASE WHEN TABLE1.COL1 = 'X' THEN 0 ELSE 1 END) *
(CASE WHEN TABLE2.COL2 = 'Y' THEN 1 ELSE 0 END) *
(CASE WHEN (SELECT 0
FROM TABLE3
WHERE TABLE3.ID = TABLE2.ID) = 0 THEN 0 ELSE 1 END)) = 1
THEN (SELECT '111111') ELSE NULL END
FROM TABLE1
INNER JOIN TABLE2
ON TABLE1.ID = TABLE2.ID
FULL JOIN (SELECT COL1,'TRUE' FROM TABLE4) AS XX
ON XX.COL1 = TABLE1.COL1 AND XX.COL1 IS NULL
WHERE
TABLE1.COL3 = 'YY'
I cant provide the expected result but the result of that query gives me the next error: Parenthesized expression cannot be parsed as an
expression, struct constructor, or subquery at...
I'll post debugging tips as an answer since I think you'll probably find the problem this way. From a command-line prompt, list recent jobs:
bq ls -j --all
The failed query job will probably be at the top. Copy the job ID and use it with the next command:
bq --prettyformat=json show -j YOUR_JOB_ID
This will print out the complete job configuration as well as the error message. What I suspect you'll see is that the query is garbled; the quotes or some other character may have caused unexpected behavior when interpreted by the shell. When executing queries from the command-line, it's a good idea to put the contents in a file, then pipe it as input to the bq tool, e.g.
bq query --use_legacy_sql=false < query.sql
This prevents the shell from intercepting any part of the query as a command.
Thanks #Elliott Brossard, i resolved the problem with literally no change in the code hah. So what i did was rewrite the bq query sentence like: bq query --destination_table=.. --use_legacy_sql=false --replace 'QUERY' instead of assign the query to a variable and then execute it with an echo | bq query sentence..
Thanks

Copy partitioned bigquery table that only overwrites the partitions from the source table

So something like
bq cp -f src_table dst_table
but I want partitions in dst_table that are not present in src_table to stay unoverwritten. Is something like this possible?
You can do something like this:
Use this query to build script for bq command
#legacySql
select concat ('bq cp -f ', s.project_id, ':', s.dataset_id, '.',
s.table_id, '\$', s.partition_id, ' ',
t.project_id, ':', t.dataset_id, '.', t.table_id, '\$', t.partition_id, ';')
from [source_table$__PARTITIONS_SUMMARY__] s
inner join (select * from [target_table$__PARTITIONS_SUMMARY__]) t
on t.partition_id = s.partition_id
And just execute result in terminal
Make sure you escaped $ - this code will work for Mac/Unix - not sure about Windows
If I understood well your question, you intend to copy from one partitioned table (table 1) to another one (table 2) just the partitions filled in table 1.
You have to configure the required permissions to perform the operation, as described here. In your case, use bq cp -a to
append the data from the source partition to an existing table or
partition in the destination dataset
instead of bq cp -f, which forces overwriting. For example:
bq --location=[LOCATION] cp -a -f -n [PROJECT_ID]:[DATASET].[SOURCE_TABLE]$[SOURCE_PARTITION] [PROJECT_ID]:[DATASET].[DESTINATION_TABLE]$[DESTINATION_PARTITION]

Google BigQuery: bq query from command line fails, although it runs fine from the UI Query Editor

When I run the following query from the BQ UI, it runs fine and gives the desired output.
But when I run the same from the command line, it gives the following error
bq query --destination_table
Chicago_Traffic_Sensor_Data.Latest_Traffic_Data_with_Geocoding
--replace --use_legacy_sql=false 'SELECT segmentid, _lif_lat, start_lon, _lit_lat, _lit_lon, _traffic, _last_updt, CASE WHEN
_traffic < 20 THEN '#FF0000' WHEN _traffic >= 20 and _traffic < 40 THEN '#FFFF00' WHEN _traffic >= 40 THEN '008000' ELSE '#666666' END as
strokeColor FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY segmentid
ORDER BY _last_updt DESC) col FROM
Chicago_Traffic_Sensor_Data.traffic_sensor_data) x WHERE x.col = 1
ORDER BY segmentid' Error in query string: Error processing job
'bedrock-gcp-testing:bqjob_r5d944587b08c4e54_000001626fa3f61d_1':
Syntax error: Unexpected end of statement at [1:412]
You need to escape any characters in your SQL that the command line is trying to interpret. What I find much easier & quicker is to put my SQL in a file, and pipe it instead. For example:
bq query --destination_table grey-sort-challenge:partitioning_magic.foobarred --use_legacy_sql=false "$(cat data.sql)"

query errors out if ran from shell script

I can run this query fine
CREATE TABLE db.table1 STORED AS PARQUET as
SELECT * FROM db.table WHERE UPPER(executing) = 'TRUE';
Unless I run it from bash shell script. I get this error
#!/bin/bash
bash -c 'impala-shell -k -q "CREATE TABLE db.table1 STORED AS PARQUET as
SELECT * FROM db.table WHERE UPPER(executing) = 'TRUE';"'
ERROR: AnalysisException: operands of type STRING and BOOLEAN are not
comparable: upper(executing) = TRUE
I have tried using double quotes, no quotes and lower case with no luck
Single quotes cannot be included in a single-quoted string in shell. The single quotes around TRUE aren't included in the SQL command passed to impala-shell; the first closes the initial ', and the second starts a new quoted string, so your script is equivalent to
bash -c "impala-shell -k -q \"CREATE TABLE db.table1 STORED AS PARQUET as
SELECT * from db.table WHERE UPPER(executing) = TRUE;\""
One solution is to use double quotes as I have above, which allow you to include the single quotes that SQL requires.
bash -c "impala-shell -k -q \"CREATE TABLE db.table1 STORED AS PARQUET as
SELECT * from db.table WHERE UPPER(executing) = 'TRUE';\""
Alternatively, use $'...' to quote the argument to -c, in which case you can include properly escaped single quotes in the string.
bash -c $'impala-shell -k -q "CREATE TABLE db.table1 STORED AS PARQUET as
SELECT * from db.table WHERE UPPER(executing) = \'TRUE\';"'
However it's not clear why you are using bash -c at all instead of just running impala-shell directly as:
impala-shell -k -q "CREATE ... WHERE UPPER(executing) = 'TRUE';"

how to assign a query result to a shell variable

I have an sql query that returns a date.
I call this query from a shell script and would like to assign this value to the variable called datestart (and use it later). Here is my code. Without the datestart assignment the query works fine.
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=(psql $firstname $portname $dbname<< EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN(SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties')GROUP BY transactionname) as earliest;
EOF
)
echo $datestart
but the result is this :
Syntax error: word unexpected (expecting ")").
I have no idea where should I insert that closing bracket. Any hint is appreciated.
Instead of brackets in variable assignment you need to use $(...) for BASH or `...` for sh.
Try this:
#!/bin/sh
firstname="-Upgsql"
dbname="statcoll"
portname="-p5438"
datestart=`psql -t --pset="footer=off" --user="$firstname" --port="$portname" -d "$dbname"<<EOF
SELECT MIN(latestrefdate) FROM (SELECT MAX(referencedate) AS latestrefdate FROM statistics WHERE transactionname IN (SELECT DISTINCT transactionname FROM statistics WHERE platform = 'Smarties') GROUP BY transactionname) as earliest;
EOF
`
echo "$datestart"