Beeline query output coming in JSON format instead of csv table - hive

I an using a Beeline Query like below,the underlying data sitting in HDFS comes from a mainframe server. All I want is to execute a query and dump it to a csv (or any tabular format):
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=false --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
My issues are:
The format is not clean, there are extra rows at top and bottom ;
It appears as JSOn and not a table.
Some numbers seem hexadecimal format.
+-----------------------------------------------------------------------------------------------------------------------------+
| col1:{"col1_a":"00000" col1_b:"0" col1_c:{"col11_a":"00000" col11_tb:{"mo_acct_tp":"0" col11_c:"0"}} col1_d:"0"}|
+-----------------------------------------------------------------------------------------------------------------------------+
I want a regular csv with column names on top and no nesting.

Please help us understanding your data in a better way.
Is your table has data like below when you run the select query either in beeline or hive?:
> select * from test;
+------------------------------------------------------------------------------------------------------------------------+--+
| test.col |
+------------------------------------------------------------------------------------------------------------------------+--+
| {"col1_a":"00000","col1_b":"0","col1_c":{"col11_a":"00000","col11_tb":{"mo_acct_tp":"0","col11_c":"0"}},"col1_d":"0"} |
+------------------------------------------------------------------------------------------------------------------------+--+
If yes, you might have to parse out the data from the Json Objects which would as below:
select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from test tbl
INFO : Completed executing command(queryId=hive_20180918182457_a2d6230d-28bc-4839-a1b5-0ac63c7779a5); Time taken: 1.007 seconds
INFO : OK
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| col1_a | col1_b | col1_c_col11_a | col1_c_col11_tb_col11_c | col1_c_col11_tb_mo_acct_tp | col1_d |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| 00000 | 0 | 00000 | 0 | 0 | 0 |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
1 row selected (2.058 seconds)
Then you can use this query in your command line to export results into a file.
>beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' --showHeader=false --outputformat=csv2 -e "select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from corpde_commops.test tbl;" > test.csv
If you need the column names in the file then turn the --showHeader=true
Final output would be:
>cat test.csv
col1_a,col1_b,col1_c_col11_a,col1_c_col11_tb_col11_c,col1_c_col11_tb_mo_acct_tp,col1_d
00000,0,00000,0,0,0
I clearly don't see anything wrong in your beeline statement.
If your data is not as above example, the solution might be in a different way..
All the best.

You have to do showHeader=true and you will get the desired result
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=true --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
You can also try the table format, outputformat=table, this will not give csv as output but gives you a clean tabular structure like below:
+-----+---------+-----------------+
| id | value | comment |
+-----+---------+-----------------+
| 1 | Value1 | Test comment 1 |
| 2 | Value2 | Test comment 2 |
| 3 | Value3 | Test comment 3 |
+-----+---------+-----------------+

Related

Clarifications about some SQL Injection commands

I'm struggling with a CTF(Capture The Flag) Web Challange on hackthebox, not being an expert in penetration testing I'm asking your help to explain me (with some comments) some commands used to reach the solution, expecially about the syntax and logic of the commands themselves. (A reference to the commands can be found here (click me), so you have the whole situation very clear).
I ask you to be very detailed, even on things that may seem trivial.
Leaving aside the base64 encoding (that I understand) I need to understand these commands and their related parameters (syntax and logic of the commands):
1th: {"ID":"1"}
2nd: {"ID": "1' or 1-- -"}
3rd: {"ID": "-1' union select * from (select 1)table1 JOIN (SELECT 2)table2 on 1=1-- -"}
About the 3rd command, I saw the same command but with an alteration of the table names, like this:
{"ID": "-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1-- -"}
What is the difference? Is the name given to the tables in the query irrelevant?
If you need further clarification or I haven't made myself clear, just tell it and I'll try to help you. Thank you in advance.
The stage of hacking is: recon, scanning, gaining access, maintaining access, and clearing tracks. Basically it's just obtain information, then do something with that information It seems that this SQL injection learning module is used to teach how to obtain information about the current system.
The basic of SQL injection is inserting SQL code/command/syntax. It's usually done in the WHERE clause (because webapp often have search feature, which is basically retrieving user input and inserting it on the where clause.
For example, the simplest vulnerability would be like this (assuming MySQL and PHP):
SELECT * FROM mytable WHERE mycolumn='$_GET[myparam]'
Payload is what you put inside the parameter (ex: myparam) to do SQL injection.
With such query, you can inject payload 1' OR 1=1 to test for SQL injection vulnerability.
1st payload
1st payload is used to check if there is an injection point (parameter that can be injected) or not.
If you change the parameter and there is a change on the output, then it means there is an injection point.
Otherwise there is no injection point
2nd payload
2nd payload is used to check if the target app have SQL injection vulnerability or not (would the app sanitize user's input or not).
If the app shows all output, then it means the app have SQL injection vulnerability. Explanation: because the query sent to RDBMS would become something like this
Before injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='myparam'
After injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1-- -'
Please note that in MySQL, -- (minus-minus-space) is used to mark inline comment. So the actual query would be: SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1
3rd payload
3rd payload is used to check how many column the query would SELECT. To understand this you have to understand subquery, join, and union (do a quick search, it's a very basic concept). The name or the table alias is not important (UT1 or UT2), it's just identifier so that it's not identical with current table alias.
If the query succeed (no error, the app display output), then it means the app query SELECTs 2 columns
If the query failed, then it means it's not 2 column, you can change the payload to check for 3 columns, 4 columns, etc...
Example for checking if SELECT statement have 3 columns:
-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1 JOIN (SELECT 3)UT3 on 1=1 -- -
Tips: when learning about SQL injection, it's far easier to just type (or copy-paste) the payload to your SQL console (use virtual machine or sandbox if the query is considered dangerous).
Edit 1:
basic explanation of subquery and union
Subquery: It's basically putting a query inside another query. Subqueries may be inserted in SELECT clause, FROM clause, and WHERE clause.
Example of subquery in FROM clause:
select * from (select 'hello','world','foo','bar')x;
Example of subquery in WHERE clause:
select * from tblsample t1 where t1.price>(select avg(t2.price) from tblsample t2);
Union: concatenating select output, example:
tbl1
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
+----+--------+-----------+------+
tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
select * from tbl1 union select * from tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
Edit 2:
further explanation on 3rd payload
In mysql, you can make a 'literal table' by selecting a value. Here is an example:
MariaDB [(none)]> SELECT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1,2;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
+----------+-----------+
1 row in set (0.00 sec)
The purpose of making this 'literal table' is to check how many column the SELECT statement that we inject have. For example:
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
| 3 | 4 |
+----------+-----------+
2 rows in set (0.07 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol, 5 fifthcol;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
As shown above, when UNION is used on two select statement with different number of column, it'll throw an error. Therefore, you can get how many column a SELECT statement when it DOESN'T throw an error.
So, why don't we just use SELECT 1, 2 to generate a 'literal table' with 2 column? That's because the application's firewall block the usage of comma. Therefore we must go the roundabout way and make 2 columned 'literal table' with JOIN query SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1
MariaDB [(none)]> SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.01 sec)
Additional note: MariaDB is the 'free version' of MySQL (since MySQL was sold and made proprietary). MariaDB maintain more or less the same syntax and command as MySQL.

How to display all columns and its data type in a table via SQL query

I am trying to print the column names from a table called 'meta' and I need also its data types.
I tried this query
SELECT meta FROM INFORMATION_SCHEMA.TABLES;
but it throws an error saying no information schema available. Could you please help me, I am a beginner in SQL.
Edit:
select tables.name from tables join schemas on
tables.schema_id=schemas.id where schemas.name=’sprl_db’ ;
This query gives me all the tables in database 'sprl_db'
You can use the monetdb catalog:
select c.name, c.type, c.type_digits, c.type_scale
from sys.columns c
inner join sys.tables t on t.id = c.table_id and t.name = 'meta';
as you are using monetDB you can get that by using sys.columns
sys.columns
it will return all information related to table columns
you can also check Schema, table and columns documentation for monetDB
in sql server we get that like this exec sp_columns TableName
If I understand correctly you need to see the columns and the types of a table you (or some other user) defined called meta?
There are at least two ways to do this:
First (as #GMB mentioned in their answer) you can query the SQL catalog: https://www.monetdb.org/Documentation/SQLcatalog/TablesColumns
SELECT * FROM sys.tables WHERE NAME='meta';
+------+------+-----------+-------+------+--------+---------------+--------+-----------+
| id | name | schema_id | query | type | system | commit_action | access | temporary |
+======+======+===========+=======+======+========+===============+========+===========+
| 9098 | meta | 2000 | null | 0 | false | 0 | 0 | 0 |
+------+------+-----------+-------+------+--------+---------------+--------+-----------+
1 tuple
So this gets all the relevant information about the table meta. We are mostly interested in the value of the column id because this uniquely identifies the table.
(Please note that this id will probably be different in your system)
After we have this information we can query the columns table with this table id:
SELECT * FROM sys.columns WHERE table_id=9098;
+------+------+------+-------------+------------+----------+---------+-------+--------+---------+
| id | name | type | type_digits | type_scale | table_id | default | null | number | storage |
+======+======+======+=============+============+==========+=========+=======+========+=========+
| 9096 | i | int | 32 | 0 | 9098 | null | true | 0 | null |
| 9097 | j | clob | 0 | 0 | 9098 | null | true | 1 | null |
+------+------+------+-------------+------------+----------+---------+-------+--------+---------+
2 tuples
Since you are only interested in the names and types of the columns, you can modify this query as follows:
SELECT name, type FROM sys.columns WHERE table_id=9098;
+------+------+
| name | type |
+======+======+
| i | int |
| j | clob |
+------+------+
2 tuples
You can combine the two queries above with a join:
SELECT col.name, col.type FROM sys.tables as tab JOIN sys.columns as col ON tab.id=col.table_id WHERE tab.name='meta';
+------+------+
| name | type |
+======+======+
| i | int |
| j | clob |
+------+------+
2 tuples
The second, and preferred way to get this information if you are using the mclient utility of MonetDB, is by using the describe meta-command of mclient. When used without arguments it presents a list of tables that have been defined in the current database and when it is given the name of the table it prints its SQL definition:
sql>\d
TABLE sys.data
TABLE sys.meta
sql>\d sys.meta
CREATE TABLE "sys"."meta" (
"i" INTEGER,
"j" CHARACTER LARGE OBJECT
);
You can use the \? meta-command to see a list of all meta-commands in mclient:
sql>\?
\? - show this message
\<file - read input from file
\>file - save response in file, or stdout if no file is given
\|cmd - pipe result to process, or stop when no command is given
\history - show the readline history
\help - synopsis of the SQL syntax
\D table - dumps the table, or the complete database if none given.
\d[Stvsfn]+ [obj] - list database objects, or describe if obj given
\A - enable auto commit
\a - disable auto commit
\e - echo the query in sql formatting mode
\t - set the timer {none,clock,performance} (none is default)
\f - format using renderer {csv,tab,raw,sql,xml,trash,rowcount,expanded,sam}
\w# - set maximal page width (-1=unlimited, 0=terminal width, >0=limit to num)
\r# - set maximum rows per page (-1=raw)
\L file - save client-server interaction
\X - trace mclient code
\q - terminate session and quit mclient
For MySQL:
SELECT column_name,
data_type
FROM information_schema.columns
WHERE table_schema = ’ yourdatabasename ’
AND table_name = ’ yourtablename ’;
Output:
+-------------+-----------+
| COLUMN_NAME | DATA_TYPE |
+-------------+-----------+
| Id | int |
| Address | varchar |
| Money | decimal |
+-------------+-----------+

Remove the Junk charcters from hive tables or from unix

We have the tables in hive like below and we are generating the flat files from hive data while we are generating we found that there was junk characteres with in the data like below we have many characters in many columns can any one help us to remove those junk characters from hive table or from unix file ?
ÿ,ä,í,ã
Here problem the same data need to send the downstream when they are loading in to there DB it shows as double dollar but we design code double dollar as column delimiter.
Basic concept
hive> select regexp_replace('Hÿelloä íworlãd','[^a-zA-Z ]','');
OK
Hello world
Demo
Removing undesired character from the whole table and exporting it to a file.
create table t (i int,s1 string,s2 string);
insert into t values (1,'Hÿelloä','íworlãd'),(2,'ãGããood','Byÿe');
select * from t;
+---+---------+---------+
| i | s1 | s2 |
+---+---------+---------+
| 1 | Hÿelloä | íworlãd |
| 2 | ãGããood | Byÿe |
+---+---------+---------+
create external table t_ext (rec string)
row format delimited
fields terminated by '0'
location '/user/hive/warehouse/t'
;
insert overwrite local directory '/tmp/t_ext'
select regexp_replace(regexp_replace(rec,'[^a-zA-Z0-9 \\01]',''),'\\x01','<--->')
from t_ext
;
! ls /tmp/t_ext
;
000000_0
! cat /tmp/t_ext/000000_0
;
1<--->Hello<--->world
2<--->Good<--->Bye
This works as long as your tables contain only "primitive" types (no structs, arrays, maps etc.).
I really pushed the envelope here.
Demo
create table t (i int, dt date, str string, ts timestamp, bl boolean);
insert into t select 1,current_date,'Hello world',current_timestamp,true;
select * from t;
+-----+------------+-------------+-------------------------+------+
| t.i | t.dt | t.str | t.ts | t.bl |
+-----+------------+-------------+-------------------------+------+
| 1 | 2017-03-14 | Hello world | 2017-03-14 14:37:28.889 | true |
+-----+------------+-------------+-------------------------+------+
select regexp_replace
(
printf(concat('%s',repeat('$$%s',field(unhex(1),*,unhex(1))-2)),*)
,'(\\$\\$)|[^a-zA-Z0-9 -]'
,'$1'
)
from t
;
1$$2017-03-14$$Hello world$$2017-03-14 143728.889$$true

How to output the result of an Impala query along with the query

Is there a way to output the result of an Impala query along with the query?
For example, if my query was 'show databases', I'd like to output the result to a file with something like this:
Query: show databases
Result: ------default-------
When I run impala-shell -i someip -f 'filename' -o 'output', I only see the results, one after one, so it's difficult to correlate which result goes with which query (especially when the input file contains a lot of queries).
Redirect stderr and stdout to file
(The queries are in stderr)
impala-shell -f 'filename' &>'output'
Demo
[cloudera#quickstart ~]$ cat>filename
select 1;
select 2;
select 3;
[cloudera#quickstart ~]$ impala-shell -f 'filename' &>'output'
[cloudera#quickstart ~]$ cat output
Starting Impala Shell without Kerberos authentication
Connected to quickstart.cloudera:21000
Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
Query: select 1
+---+
| 1 |
+---+
| 1 |
+---+
Fetched 1 row(s) in 0.16s
Query: select 2
+---+
| 2 |
+---+
| 2 |
+---+
Fetched 1 row(s) in 0.02s
Query: select 3
+---+
| 3 |
+---+
| 3 |
+---+
Fetched 1 row(s) in 0.02s
[cloudera#quickstart ~]$

How to use Regex in SQL for extracting values after repetitive numbers

I have the following table (table1):
+---+---------------------------------------------+
+---|--------att1 --------------------------------+
| 1 | 10.2.5.4 4.3.2.1.in-addr.arpa |
| 2 | asd 100.99.98.97 97.3.2.1.a.b.c fsdf |
| 3 | fd 95.94.93.92 92.5.7.1.a.b.c |
| 4 | a 11.4.99.75 75.77.52.41.in-addr.arpa |
+---+---------------------------------------------+
I would like to get the following values (that are located after the repetitive numbers): in-addr.arpa, a.b.c, a.b.c, in-addr.arpa.
I tried to use the following format with no success:
SELECT att1
FROM table1
WHERE REGEXP_LIKE(att1 , '^(\d+?)\1$')
I would like it to run in Impala and Oracle.
Use REGEXP_SUBSTR (assuming you are using an Oracle DB).
select regexp_substr(att1,'[0-9]\.([^0-9]+)',1,1,null,1)
from table1
[0-9]\. a numeric followed by a .
[^0-9]+ any character other than a numeric is matched until the next numeric is found. () around this indicates the group (first in this case) and we only extract that part of the string.
Sample Demo