Passing parameter in sqoop - apache-pig

Below is my sqoop cmd in shell script,
sqoop import --connect 'jdbc:sqlserver://190.148.155.91:1433;username=****;password=****;database=Testdb' --query 'Select DimFreqCellRelationID,OSSC_RC, MeContext, ENodeBFunction,EUtranCellFDD,EUtranFreqRelation, EUtranCellRelation FROM dbo.DimCellRelation WHERE DimFreqCellRelationID > **$maxval** and $CONDITIONS' --split-by OSS --target-dir /testval;
Before executing this command, i have assigned a value to $maxval , when I execute sqoop cmd value should get passed in place of $maxval. But thats not happning. Is it possible to pass parameter through sqoop. Can you please let me know if you have any suggestion to achieve this logic?

I believe that the problem you are seeing is with incorrect enclosing. Using single quotes (') will prohibit bash to perform any substitutions. You need to use double quotes (") if you want to use variables inside the parameter. However you also have to be careful as you do not want to substitute the $CONDITIONS placeholder. Try it without Sqoop:
jarcec#Odie ~ % echo '$maxval and $CONDITIONS'
$maxval and $CONDITIONS
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
and
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
jarcec#Odie ~ % export maxval=30
jarcec#Odie ~ % echo "$maxval and $CONDITIONS"
30 and
jarcec#Odie ~ % echo "$maxval and \$CONDITIONS"
30 and $CONDITIONS

Related

I want to execute awk command from tcl script

I want to execute the following command from my tcl script
exec /bin/awk '/Start/{f=1;++file}/END/{f=0}f{print > "/home/user/report/"file }' input
I'm getting this error
awk: ^ invalid char ''' in expression
is it possible to execute such command from tcl
Thanks
Quote from tcl man page:
When translating a command from a Unix shell invocation, care should
be taken over the fact that single quote characters have no special
significance to Tcl. Thus:
awk '{sum += $1} END {print sum}' numbers.list
would be translated into something like:
exec awk {{sum += $1} END {print sum}} numbers.list
So I would try without quotes (posted as answer as it can't fit properly in a comment, it's just a though from a quick search on google)
as per comment you may create the awk script in a var before like:
set awk_command "/Start/{f=1;++file}/END/{f=0}f{print > \"$tcl_variable\"file }"
exec /bin/awk $awk_command input
Here is my solution:
puts [exec /usr/bin/awk \
{/Start/{f=1;++file}END{f=0}f{print > file}} \
invoke_awk.txt]
If you don't need to show the output:
exec /usr/bin/awk \
{/Start/{f=1;++file}END{f=0}f{print > file}} \
invoke_awk.txt
Note that in TCL, you don't group with single quote, but use either double quote or brace.

Grep query in C shell script not performing properly

When I run the grep command on the command prompt, the output is correct. However, when I run it as part of a script, I only get partial output. Does anyone know what is wrong with this programme?
#!/bin/csh
set res = `grep -E "OPEN *(OUTPUT|INPUT|I-O|EXTEND)" ~/work/lst/TXT12UPD.lst`
echo $res
Your wildcard is probably being processed by the shell calling awk rather than as part of the awk script.
try escaping the * with a \ (i.e. \*)

awk doesn't work in hadoop's mapper

This is my hadoop job:
hadoop streaming \
-D mapred.map.tasks=1\
-D mapred.reduce.tasks=1\
-mapper "awk '{if(\$0<3)print}'" \ # doesn't work
-reducer "cat" \
-input "/user/***/input/" \
-output "/user/***/out/"
this job always fails, with an error saying:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `export TMPDIR='..../work/tmp'; /bin/awk { if ($0 < 3) print } '
But if I change the -mapper into this:
-mapper "awk '{print}'"
it works without any error. What's the problem with the if(..) ?
UPDATE:
Thank #paxdiablo for your detailed answer.
what I really want to do is filter out some data whose 1st column is greater than x, before piping the input data to my custom bin. So the -mapper actually looks like this:
-mapper "awk -v x=$x{if($0<x)print} | ./bin"
Is there any other way to achieve that?
The problem's not with the if per se, it's to do with the fact that the quotes have been stripped from your awk command.
You'll realise this when you look at the error output:
sh: -c: line 0: `export TMPDIR='..../work/tmp'; /bin/awk { if ($0 < 3) print } '
and when you try to execute that quote-stripped command directly:
pax> echo hello | awk {if($0<3)print}
bash: syntax error near unexpected token `('
pax> echo hello | awk {print}
hello
The reason the {print} one works is because it doesn't contain the shell-special ( character.
One thing you might want to try is to escape the special characters to ensure the shell doesn't try to interpret them:
{if\(\$0\<3\)print}
It may take some effort to get the correctly escaped string but you can look at the error output to see what is generated. I've had to escape the () since they're shell sub-shell creation commands, the $ to prevent variable expansion, and the < to prevent input redirection.
Also keep in mind that there may be other ways to filter depending on you needs, ways that can avoid shell-special characters. If you specify what your needs are, we can possibly help further.
For example, you could create an shell script (eg, pax.sh) to do the actual awk work for you:
#!/bin/bash
awk -v x=$1 'if($1<x){print}'
then use that shell script in the mapper without any special shell characters:
hadoop streaming \
-D mapred.map.tasks=1 -D mapred.reduce.tasks=1 \
-mapper "pax.sh 3" -reducer "cat" \
-input "/user/***/input/" -output "/user/***/out/"

How to hide result set decoration in Psql output

How do you hide the column names and row count in the output from psql?
I'm running a SQL query via psql with:
psql --user=myuser -d mydb --output=result.txt -c "SELECT * FROM mytable;"
and I'm expecting output like:
1,abc
2,def
3,xyz
but instead I get:
id,text
-------
1,abc
2,def
3,xyz
(3 rows)
Of course, it's not impossible to filter the top two rows and bottom row out after the fact, but it there a way to do it with only psql? Reading over its manpage, I see options for controlling the field delimiter, but nothing for hiding extraneous output.
You can use the -t or --tuples-only option:
psql --user=myuser -d mydb --output=result.txt -t -c "SELECT * FROM mytable;"
Edited (more than a year later) to add:
You also might want to check out the COPY command. I no longer have any PostgreSQL instances handy to test with, but I think you can write something along these lines:
psql --user=myuser -d mydb -c "COPY mytable TO 'result.txt' DELIMITER ','"
(except that result.txt will need to be an absolute path). The COPY command also supports a more-intelligent CSV format; see its documentation.
You can also redirect output from within psql and use the same option. Use \o to set the output file, and \t to output tuples only (or \pset to turn off just the rowcount "footer").
\o /home/flynn/queryout.txt
\t on
SELECT * FROM a_table;
\t off
\o
Alternatively,
\o /home/flynn/queryout.txt
\pset footer off
. . .
usually when you want to parse the psql generated output you would want to set the -A and -F ...
# generate t.col1, t.col2, t.col3 ...
while read -r c; do test -z "$c" || echo , $table_name.$c | \
perl -ne 's/\n//gm;print' ; \
done < <(cat << EOF | PGPASSWORD=${postgres_db_useradmin_pw:-} \
psql -A -F -v -q -t -X -w -U \
${postgres_db_useradmin:-} --port $postgres_db_port --host $postgres_db_host -d \
$postgres_db_name -v table_name=${table_name:-}
SELECT column_name
FROM information_schema.columns
WHERE 1=1
AND table_schema = 'public'
AND table_name =:'table_name' ;
EOF
)
echo -e "\n\n"
You could find example of the full bash call here:

Bash Script - If statement within quoted command

Within a bash script I am running a sql query via 'psql -c '. Based off of the arguements given to the bash script, the where claus of the select command will be different. So basically I need to know if its possible to do something like this:
psql -c "select statement here until we get to the where clause at which point we break out of statement and do"
if (arg1 was given)
concatenate "where arg1" to the end of the above select statement
if (arg2 was given)
concatenate "where arg2" to the end of the above select statement
and so on for as many arguments. I know I could do this much easier in a sql function if I just passed the arguments but that really isnt an option. Thanks!
Edit: 5 seconds after posting this I realize I could just create a string before calling the psql command and then call the psql command on that. Doh!
psql -c "SELECT columns FROM table ${1:+WHERE $1} ${2:+WHERE $2}"
This uses the "use alternate value" substitution - ${VAR:+alternate} - where alternate is substituted if $VAR is set and not empty. If $VAR is empty, nothing will be substituted.
Save ascript, e.g. query.sh:
#!/bin/bash
query="select statement here until we get to the where clause at which point we break out of statement and do"
if [ $# -gt 0 ]
then
query+=" where $1"
shift
fi
while [ $# -gt 0 ]
then
query+=" and $1"
shift
fi
psql -c "$query"
Call it like
chmod +x ./query.sh
./query.sh "id in (1,2,3)" "modified_by='myname'"
SQL="select x,y,z FROM foobar"
if [ "$1" != "" ]
then
SQL="$SQL where $1"
fi
psql "$SQL"
stmt="select statement here until we get to the where clause at which point we break out of statement and do"
if (( $# > 0 ))
then
stmt="$stmt where $1"
fi
if (( $# > 1 ))
then
stmt="$stmt where $2"
fi
psql -c "$stmt"