I am exporting query results form hive using beeline, here is my command :
beeline -u 'jdbc:hive2://myhost.com:10000/mydb;principal=hive/myhost.COM' --incremental=true --silent=true --outputformat=dsv --disableQuotingForSV=true --delimiterForDSV=\, --showHeader=false --nullemptystring=true -f myquery.hql --hiveconf DT_ID=${DT_ID} > ${spoolFile}
This is my query :
SELECT id, concat('"',c_name,'"'), app_name from mytab where dt_id='${hiveconf:DT_ID}';
But I get results like this, for fields having my field separator(,) in column value:
66,**^#**"(Chat\, Social\, Music\, Utilities)"**^#**,Default
Note the ^#. Why is it coming? How can avoid it? What is that character? If it is quote, I am to have it, so that I can remove the concat in my query. I tried playing with --disableQuotingForSV=true/false. But that did not help me.
Related
I'm trying to run a big query query from the command line, but because my query is very long I've written it in a text file. The query works from the GUI and I'm overwriting a table that already exsists
bq query --allow_large_results --replace --destination_table=me.Tbl_MyTable '`cat query.txt`'
However, I'm getting error results:
Error in query string: Error processing job
'dev:bqjob_r_00000123456789456123_1': Encountered "
"\'cat query.txt\' "" at line 1, column 1.
Was expecting: EOF
Do I need to put the entire file path in the .txt filename? (this doesn't seem to make a difference)
Are there any characters I need to be careful with in the text file (e.g. "\" or quotation marks) ?
I'm using where clauses and group by clauses - is that an issue?
Instead of cat, just pipe the input from the file. The command would be:
bq query --allow_large_results --replace --destination_table=me.Tbl_MyTable < query.txt
This will send the contents of query.txt to the bq tool.
Elliot is right, now if you want to cat, sed or anything, pipe it:
cat query.txt | bq query
I've got a param which is like "This is a param", and I'm going to pass it to below hiveQL:
hive -hivevar sys_nm="This is a param" -e 'select * from rd_sys where rd_sys_nm=${hivevar:sys_nm}'
But Hive returned below error message:
Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1409.jar!/hive-log4j.properties
FAILED: ParseException line 1:49 missing EOF at 'is' near 'This'
g4t7491_[mgr#g4t7491 ~]$
Does anyone know how to pass it normally?
Hive var don't work like hiveconf where you need to apply "hiveconf:somthing" in the code
when declaring hivevar just add the var name like this -> ${var_name}
for example:
through command line:
hive -hivevar MONTH_VAR='11' -e "select * from table where month=${MONTH_VAR};"
you can also declair through the script:
set hivevar:MONTH_VAR=11;
-- so query would look like this (no hiveconf):
set hivevar:MONTH_VAR=11;
SELECT * from table where month=${MONTH_VAR};
You need to put the string in single quotes for it to parse correctly as a string inside the sql after interpolation.
hive -hivevar sys_nm="'This is a param'" -e 'select * from rd_sys where rd_sys_nm=${hivevar:sys_nm}'
I have a problem with the double quotes on this sqoop query:
select i.Number, i.Date,i.Station, i.lStation,
count(*) ax, “1- Pd” St , b.Type
from Leg jl, yLeg i, senger b,
where jl.LegID = i.LegID and jl.rID = b.erID and b.gID = b.ID
and b.tus not in (1,4) group by Number, Date, tion, b.Type
how can i fixed? with some escape parameter
First debug the query with the below command sqoop eval -libjars /var/lib/sqoop/ojdbc6.jar --connect jdbc:oracle:thin:#hostname:portnumber/servicename --username user -password password --query "select * from schemaname.tablename where rownum=10" write your query in the --query and see if the actual query is generating the output you are expecting ? you can see the output in the terminal itself.
If the query is giving the results as you expected, use the below sqoop command
to import the table
sqoop import -libjars /var/lib/sqoop/ojdbc6.jar --connect 'jdbc:oracle:thin:#hostname/service_name' --username user -password password -m 1 --hive-overwrite --hive-import --hive-database database_name --hive-table table_nmae --target-dir '/user/hive/warehouse/databasename.db/tablename' --query "select * from source_database.source_tablename WHERE 1=1 AND \$CONDITIONS"
The exact problem with the double quotes you are facing can be resolved using escape key. Please us the WHERE 1=1 AND \$CONDITIONS as is and paste your query before the WHERE in sqoop command.
If you face any error please paste the error, you must need to add an other escape key to escape the double quotes.
There are two parts to this question.
The first is what is a valid query for your source database? Most databases have some kind of client or shell that let you enter and execute queries. Your query should be valid as far as the shell or client is concerned.
The second part of your question is how do you take that query (as a String) and pass it to the database via sqoop. The answer to that lies in the way you're running sqoop.
If you're running sqoop via command line then you need to identify those characters (usually double quotes) that give your OS fits when embedded in a command line argument. Use a backslash before those characters to help the OS parse the command correctly. You usually have to put the entire query string inside unescaped double quotes so that the OS treats your query as a single string argument.
If you're running sqoop via Oozie then I strongly recommend you break the Sqoop command into arguments in the Sqoop action:
<arg>--query</arg>
<arg>select ... count(*) ax, “1- Pd” St , b.Type ... WHERE $CONDITIONS</arg>
So that you can generally paste your query as is into the action.
Of course, nothing is that simple. You still have to remember that the query is sitting inside an XML document, so any character that will mess up an XML parse become problematic. The only characters like that that I've encountered so far are the angle brackets and I use property substitution (a bit of a kludge, I admit) to solve that problem:
In the Oozie workflow properties file I put:
lessThan=<
and I change my arg from
<arg>SELECT * from MyTable where $CONDITIONS AND (SOME_COL < 1000)</arg>
to
<arg>SELECT * from MyTable where $CONDITIONS AND (SOME_COL ${lessThan} 1000)</arg>
EDIT:
For those of you who don't like my kludge, you could try using a CDATA element to "escape" anything in the query (except, of course, ']]>'):
<arg><![CDATA[SELECT * from MyTable where $CONDITIONS AND (SOME_COL < 1000)]]></arg>
I am trying to create a properties file like this...
firstname=Jon
lastname=Snow
occupation=Nights_Watch
family=Stark
...from a query like this...
SELECT
a.fname as firstname,
a.lname as lastname,
b.occ as occupation...
FROM
names a,
occupation b,
family c...
WHERE...
How can I do this? As I am aware of only using spool to a CSV file which won't work here?
These property files will be picked up by shell scripts to run automated tasks. I am using Oracle DB
Perhaps something like this?
psql -c 'select id, name from test where id = 1' -x -t -A -F = dbname -U dbuser
Output would be like:
id=1
name=test1
(For the full list of options: man psql.)
Since you mentionned spool I will assume you are running on Oracle. This should produce a result in the desired format, that you can spool straight away.
SELECT
'firstname=' || firstname || CHR(10) ||
'lastname=' || lastname || CHR(10) -- and so on for all fields
FROM your_tables;
The same approach should be possible with all database engines, if you know the correct incantation for a litteral new line and the syntax for string concatenation.
It is possible to to this from your command line SQL client but as STTLCU notes it might be better to get the query to output in something "standard" (like CSV) and then transform the results with a shell script. Otherwise, because a lot of the features you would use are not part of any SQL standard, they would depend on the database server and client application. Think of this step as sort of the obverse of ETL where you clean up the data you "unload" so that it is useful for some other application.
For sure there's ways to build this into your query application: e.g. if you use something like perl DBI::Shell as your client (which allows you to connect to many different servers using the DBI module) you can jazz up your output in various ways. But here you'd probably be best off if could send the query output to a text file and run it through awk.
Having said that ... here's how the Postgresql client could do what you want. Notice how the commands to set up the formatting are not SQL but specific to the client.
~/% psql -h 192.168.2.69 -d cropdusting -u stubblejumper
psql (9.2.4, server 8.4.14)
WARNING: psql version 9.2, server version 8.4.
Some psql features might not work.
You are now connected to database "cropdusting" as user "stubblejumper".
cropdusting=# \pset border 0 \pset format unaligned \pset t \pset fieldsep =
Border style is 0.
Output format is unaligned.
Showing only tuples.
Field separator is "=".
cropdusting=# select year,wmean_yld from bckwht where year=1997 AND freq > 13 ;
1997=19.9761904762
1997=14.5533333333
1997=17.9942857143
cropdusting=#
With the psql client the \pset command sets options affecting the output of query results tables. You can probably figure out which option is doing what. If you want to do this using your SQL client tell us which one it is or read through the manual page for tips on how to format the output of your queries.
My answer is very similar to the two already posted for this question, but I try to explain the options, and try to provide a precise answer.
When using Postgres, you can use psql command-line utility to get the intended output
psql -F = -A -x -X <other options> -c 'select a.fname as firstname, a.lname as lastname from names as a ... ;'
The options are:
-F : Use '=' sign as the field separator, instead of the default pipe '|'
-A : Do not align the output; so there is no space between the column header, separator and the column value.
-x : Use expanded output, so column headers are on left (instead of top) and row values are on right.
-X : Do not read $HOME/.psqlrc, as it may contain commands/options that can affect your output.
-c : The SQL command to execute
<other options> : Any other options, such as connection details, database name, etc.
You have to choose if you want to maintain such a file from shell or from PL/SQL. Both solutions are possible and both are correct.
Because Oracle has to read and write from the file I would do it from database side.
You can write data to file using UTL_FILE package.
DECLARE
fileHandler UTL_FILE.FILE_TYPE;
BEGIN
fileHandler := UTL_FILE.FOPEN('test_dir', 'test_file.txt', 'W');
UTL_FILE.PUTF(fileHandler, 'firstname=Jon\n');
UTL_FILE.PUTF(fileHandler, 'lastname=Snow\n');
UTL_FILE.PUTF(fileHandler, 'occupation=Nights_Watch\n');
UTL_FILE.PUTF(fileHandler, 'family=Stark\n');
UTL_FILE.FCLOSE(fileHandler);
EXCEPTION
WHEN utl_file.invalid_path THEN
raise_application_error(-20000, 'ERROR: Invalid PATH FOR file.');
END;
Example's source: http://psoug.org/snippet/Oracle-PL-SQL-UTL_FILE-file-write-to-file-example_538.htm
At the same time you read from the file using Oracle external table.
CREATE TABLE parameters_table
(
parameters_coupled VARCHAR2(4000)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY test_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS
(
parameters_coupled VARCHAR2(4000)
)
)
LOCATION ('test_file.txt')
);
At this point you can write data to your table which has one column with coupled parameter and value, i.e.: 'firstname=Jon'
You can read it by Oracle
You can read it by any shell script because it is a plain text.
Then it is just a matter of a query, i.e.:
SELECT MAX(CASE WHEN INSTR(parameters_coupled, 'firstname=') = 1 THEN REPLACE(parameters_coupled, 'firstname=') ELSE NULL END) AS firstname
, MAX(CASE WHEN INSTR(parameters_coupled, 'lastname=') = 1 THEN REPLACE(parameters_coupled, 'lastname=') ELSE NULL END) AS lastname
, MAX(CASE WHEN INSTR(parameters_coupled, 'occupation=') = 1 THEN REPLACE(parameters_coupled, 'occupation=') ELSE NULL END) AS occupation
FROM parameters_table;
I'm trying run a query that will include static columns in its output. The select statement works when I run it via the CLP, but not when I execute it within a shell script:
su - myid -c 'db2 connect to mydb;db2 -x -v "select COL1,'','',COL2,'','',COL3L from MYTABLE fetch first 10 rows only"; db2 connect reset;'
When I run this, the output error I get is:
SQL0104N An unexpected token "," was found following "select COL1,".
Expected tokens may include: "<select_sublist>". SQLSTATE=42601
SQL1024N A database connection does not exist. SQLSTATE=08003
I've even tried putting the select statement in a variable and inserting that within the statement, but still the same error. Any help would be greatly appreciated. -Thx
You should escape the single quotes as with a backlash as in :
su - myid -c 'db2 connect to mydb;db2 -x -v "select COL1,\'\',\'\',COL2,\'\',\'\',COL3L from MYTABLE fetch first 10 rows only"; db2 connect reset;'
Beware, I didn't test it... no shell at hand just now.
UPDATE:
Finally I got my hands on a DB2 instance.. after a little testing i got it working.
It turns out that the previous syntax was faulty. The proper way of quoting the single quote is (in this case) '\'' as in:
su - myid -c 'db2 connect to mydb;db2 -x -v "select COL1,'\'','\'',COL2,'\'','\'',COL3L from MYTABLE fetch first 10 rows only"; db2 connect reset;'
That's because the single quote around the whole command must be closed (') in order to supply the escape for the single quote in the db2 query (\') and then reopened to resume the command quoting ('). Weird as it looks, it works....
This is the command I used to test it:
bash -c 'db2 connect to mydb;db2 -x -v "select 1,'\'','\'',2,'\'','\'',3 from SYSIBM.SYSDUMMY1 fetch first 10 rows only"; db2 connect reset;'