How to output the result of an Impala query along with the query - io-redirection

Is there a way to output the result of an Impala query along with the query?
For example, if my query was 'show databases', I'd like to output the result to a file with something like this:
Query: show databases
Result: ------default-------
When I run impala-shell -i someip -f 'filename' -o 'output', I only see the results, one after one, so it's difficult to correlate which result goes with which query (especially when the input file contains a lot of queries).

Redirect stderr and stdout to file
(The queries are in stderr)
impala-shell -f 'filename' &>'output'
Demo
[cloudera#quickstart ~]$ cat>filename
select 1;
select 2;
select 3;
[cloudera#quickstart ~]$ impala-shell -f 'filename' &>'output'
[cloudera#quickstart ~]$ cat output
Starting Impala Shell without Kerberos authentication
Connected to quickstart.cloudera:21000
Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
Query: select 1
+---+
| 1 |
+---+
| 1 |
+---+
Fetched 1 row(s) in 0.16s
Query: select 2
+---+
| 2 |
+---+
| 2 |
+---+
Fetched 1 row(s) in 0.02s
Query: select 3
+---+
| 3 |
+---+
| 3 |
+---+
Fetched 1 row(s) in 0.02s
[cloudera#quickstart ~]$

Related

Clarifications about some SQL Injection commands

I'm struggling with a CTF(Capture The Flag) Web Challange on hackthebox, not being an expert in penetration testing I'm asking your help to explain me (with some comments) some commands used to reach the solution, expecially about the syntax and logic of the commands themselves. (A reference to the commands can be found here (click me), so you have the whole situation very clear).
I ask you to be very detailed, even on things that may seem trivial.
Leaving aside the base64 encoding (that I understand) I need to understand these commands and their related parameters (syntax and logic of the commands):
1th: {"ID":"1"}
2nd: {"ID": "1' or 1-- -"}
3rd: {"ID": "-1' union select * from (select 1)table1 JOIN (SELECT 2)table2 on 1=1-- -"}
About the 3rd command, I saw the same command but with an alteration of the table names, like this:
{"ID": "-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1-- -"}
What is the difference? Is the name given to the tables in the query irrelevant?
If you need further clarification or I haven't made myself clear, just tell it and I'll try to help you. Thank you in advance.
The stage of hacking is: recon, scanning, gaining access, maintaining access, and clearing tracks. Basically it's just obtain information, then do something with that information It seems that this SQL injection learning module is used to teach how to obtain information about the current system.
The basic of SQL injection is inserting SQL code/command/syntax. It's usually done in the WHERE clause (because webapp often have search feature, which is basically retrieving user input and inserting it on the where clause.
For example, the simplest vulnerability would be like this (assuming MySQL and PHP):
SELECT * FROM mytable WHERE mycolumn='$_GET[myparam]'
Payload is what you put inside the parameter (ex: myparam) to do SQL injection.
With such query, you can inject payload 1' OR 1=1 to test for SQL injection vulnerability.
1st payload
1st payload is used to check if there is an injection point (parameter that can be injected) or not.
If you change the parameter and there is a change on the output, then it means there is an injection point.
Otherwise there is no injection point
2nd payload
2nd payload is used to check if the target app have SQL injection vulnerability or not (would the app sanitize user's input or not).
If the app shows all output, then it means the app have SQL injection vulnerability. Explanation: because the query sent to RDBMS would become something like this
Before injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='myparam'
After injection:
SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1-- -'
Please note that in MySQL, -- (minus-minus-space) is used to mark inline comment. So the actual query would be: SELECT col1, col2, ... colN FROM mytable WHERE col1='1' or 1
3rd payload
3rd payload is used to check how many column the query would SELECT. To understand this you have to understand subquery, join, and union (do a quick search, it's a very basic concept). The name or the table alias is not important (UT1 or UT2), it's just identifier so that it's not identical with current table alias.
If the query succeed (no error, the app display output), then it means the app query SELECTs 2 columns
If the query failed, then it means it's not 2 column, you can change the payload to check for 3 columns, 4 columns, etc...
Example for checking if SELECT statement have 3 columns:
-1' union select * from (select 1)UT1 JOIN (SELECT 2)UT2 on 1=1 JOIN (SELECT 3)UT3 on 1=1 -- -
Tips: when learning about SQL injection, it's far easier to just type (or copy-paste) the payload to your SQL console (use virtual machine or sandbox if the query is considered dangerous).
Edit 1:
basic explanation of subquery and union
Subquery: It's basically putting a query inside another query. Subqueries may be inserted in SELECT clause, FROM clause, and WHERE clause.
Example of subquery in FROM clause:
select * from (select 'hello','world','foo','bar')x;
Example of subquery in WHERE clause:
select * from tblsample t1 where t1.price>(select avg(t2.price) from tblsample t2);
Union: concatenating select output, example:
tbl1
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
+----+--------+-----------+------+
tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
select * from tbl1 union select * from tbl2
+----+--------+-----------+------+
| id | name | address | tele |
+----+--------+-----------+------+
| 1 | Rupert | Somewhere | 022 |
| 2 | John | Doe | 022 |
| 1 | AAAAAA | DDDDDDDDD | 022 |
| 2 | BBBB | CCC | 022 |
+----+--------+-----------+------+
Edit 2:
further explanation on 3rd payload
In mysql, you can make a 'literal table' by selecting a value. Here is an example:
MariaDB [(none)]> SELECT 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1,2;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
+----------+-----------+
1 row in set (0.00 sec)
The purpose of making this 'literal table' is to check how many column the SELECT statement that we inject have. For example:
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol;
+----------+-----------+
| firstcol | secondcol |
+----------+-----------+
| 1 | 2 |
| 3 | 4 |
+----------+-----------+
2 rows in set (0.07 sec)
MariaDB [(none)]> SELECT 1 firstcol, 2 secondcol UNION SELECT 3 thirdcol, 4 fourthcol, 5 fifthcol;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
As shown above, when UNION is used on two select statement with different number of column, it'll throw an error. Therefore, you can get how many column a SELECT statement when it DOESN'T throw an error.
So, why don't we just use SELECT 1, 2 to generate a 'literal table' with 2 column? That's because the application's firewall block the usage of comma. Therefore we must go the roundabout way and make 2 columned 'literal table' with JOIN query SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1
MariaDB [(none)]> SELECT * FROM (SELECT 1)UT1 JOIN (SELECT 2)UT2 ON 1=1;
+---+---+
| 1 | 2 |
+---+---+
| 1 | 2 |
+---+---+
1 row in set (0.01 sec)
Additional note: MariaDB is the 'free version' of MySQL (since MySQL was sold and made proprietary). MariaDB maintain more or less the same syntax and command as MySQL.

Beeline query output coming in JSON format instead of csv table

I an using a Beeline Query like below,the underlying data sitting in HDFS comes from a mainframe server. All I want is to execute a query and dump it to a csv (or any tabular format):
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=false --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
My issues are:
The format is not clean, there are extra rows at top and bottom ;
It appears as JSOn and not a table.
Some numbers seem hexadecimal format.
+-----------------------------------------------------------------------------------------------------------------------------+
| col1:{"col1_a":"00000" col1_b:"0" col1_c:{"col11_a":"00000" col11_tb:{"mo_acct_tp":"0" col11_c:"0"}} col1_d:"0"}|
+-----------------------------------------------------------------------------------------------------------------------------+
I want a regular csv with column names on top and no nesting.
Please help us understanding your data in a better way.
Is your table has data like below when you run the select query either in beeline or hive?:
> select * from test;
+------------------------------------------------------------------------------------------------------------------------+--+
| test.col |
+------------------------------------------------------------------------------------------------------------------------+--+
| {"col1_a":"00000","col1_b":"0","col1_c":{"col11_a":"00000","col11_tb":{"mo_acct_tp":"0","col11_c":"0"}},"col1_d":"0"} |
+------------------------------------------------------------------------------------------------------------------------+--+
If yes, you might have to parse out the data from the Json Objects which would as below:
select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from test tbl
INFO : Completed executing command(queryId=hive_20180918182457_a2d6230d-28bc-4839-a1b5-0ac63c7779a5); Time taken: 1.007 seconds
INFO : OK
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| col1_a | col1_b | col1_c_col11_a | col1_c_col11_tb_col11_c | col1_c_col11_tb_mo_acct_tp | col1_d |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
| 00000 | 0 | 00000 | 0 | 0 | 0 |
+---------+---------+-----------------+--------------------------+-----------------------------+---------+--+
1 row selected (2.058 seconds)
Then you can use this query in your command line to export results into a file.
>beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' --showHeader=false --outputformat=csv2 -e "select
get_json_object(tbl.col, '$.col1_a') col1_a
, get_json_object(tbl.col, '$.col1_b') col1_b
, get_json_object(tbl.col, '$.col1_c.col11_a') col1_c_col11_a
, get_json_object(tbl.col, '$.col1_c.col11_tb.col11_c') col1_c_col11_tb_col11_c
, get_json_object(tbl.col, '$.col1_c.col11_tb.mo_acct_tp') col1_c_col11_tb_mo_acct_tp
, get_json_object(tbl.col, '$.col1_d') col1_d
from corpde_commops.test tbl;" > test.csv
If you need the column names in the file then turn the --showHeader=true
Final output would be:
>cat test.csv
col1_a,col1_b,col1_c_col11_a,col1_c_col11_tb_col11_c,col1_c_col11_tb_mo_acct_tp,col1_d
00000,0,00000,0,0,0
I clearly don't see anything wrong in your beeline statement.
If your data is not as above example, the solution might be in a different way..
All the best.
You have to do showHeader=true and you will get the desired result
beeline -u 'jdbc:hive2://server.com:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;transportMode=binary' -–showHeader=true --outputformat=csv2 -e "SELECT * FROM tbl LIMIT 2;"> tables1.csv
You can also try the table format, outputformat=table, this will not give csv as output but gives you a clean tabular structure like below:
+-----+---------+-----------------+
| id | value | comment |
+-----+---------+-----------------+
| 1 | Value1 | Test comment 1 |
| 2 | Value2 | Test comment 2 |
| 3 | Value3 | Test comment 3 |
+-----+---------+-----------------+

Batch execution of hive scripts causing issue

I will start off by saying, if you need more information, please let me know, I am trying to keep this as simple as possible, but I can provide full details if required.
So I have a base hql script that looks like this
INSERT INTO TABLE ${DB_NAME}.${TABLE_NAME} PARTITION (SNAPSHOT_YEAR_MONTH,SNAPSHOT_DAY)
SELECT
`(snapshot_year_month|snapshot_day|row_num)?+.+`,
'${SNAPSHOT_YEAR}' as snapshot_year_month,
'${SNAPSHOT_DAY}' as snapshot_day
FROM
(SELECT
*,
ROW_NUMBER() OVER ( PARTITION BY ${PK} ORDER BY time_s DESC,wanted_c DESC ) AS row_num
FROM
( SELECT ${COL} FROM ${SOURCE_DB_NAME}.${SRC_TABLE_NAME} WHERE (concat(snapshot_year_month,snapshot_day) = '${LOWER_LIMIT}${LOWER_LIMIT_DAY}')
UNION ALL
SELECT ${COL} FROM ${SOURCE_DB_NAME}.${SRC_TABLE_NAME} WHERE (concat(snapshot_year_month,snapshot_day) >'${LOWER_LIMIT}${LOWER_LIMIT_DAY}' and concat(snapshot_year_month,snapshot_day) <='${UPPER_LIMIT}${UPPER_LIMIT_DAY}') ) B) A
WHERE A.row_num = 1 ;
This base script generates several hundred hive queries that need to be run. One important part to note is the WHERE a.row_num = 1 which prevents any combination of ${PK} from having more than one row. Now, these scripts are successfully generated and stored in hql files. I then have a batch script to run all of these hive scripts
for f in *.hql ; do beeline -u 'myserverinfo' -f $f ; done
Now, if only a couple files are in the folder, everything works as intended. However, if I drop 500 hqls in the folder, no duplicates (which are correctly removed when only a couple hqls are in the folder) are correctly removed.
So for example,
if I have a dataset thats like this
| my_pk1 | my_pk2 | time_s | wanted_c | other cols... |
1 1 1 1 ...
1 1 2 5 ...
1 1 6 6 ...
If I have just a couple hql files being run with my batch script, it will correctly condense that to just -
| my_pk1 | my_pk2 | time_s | wanted_c | other cols... |
1 1 6 6 ...
However, if I drop 500+ hql scripts to be run, the partitions are correctly inserted, all hql scripts run and have data, however they all have duplicate primary keys, and that same table, with the EXACT same hql script would output
| my_pk1 | my_pk2 | time_s | wanted_c | other cols... |
1 1 1 1 ...
1 1 2 5 ...
1 1 6 6 ...
This makes absolutely no sense to me. Any ideas on what could be happening? I'm sorry if this makes no sense.

What does this "t" do?

I'm having trouble understanding the following example from the PostgreSQL documentation:
-- set returning function WITH ORDINALITY
SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
ls | n
-----------------+----
pg_serial | 1
pg_twophase | 2
postmaster.opts | 3
pg_notify | 4
...
The things inside the parentheses of the t(...) become the column names, but what is the t itself? I'm asking here because the docs don't explain it, and a single-letter function is ungoogleable. In fact, the docs don't even explain what is supposed to come after AS; the only thing we get is this one example.
It seems I can replace t by any other identifier, and it still works.
The syntax you're looking for is:
function_call [WITH ORDINALITY] [[AS] table_alias [(column_alias [, ... ])]]
https://www.postgresql.org/docs/10/static/queries-table-expressions.html#QUERIES-TABLEFUNCTIONS
t is an arbitrary table alias; you can give it any valid name you want.
it's alias for a set, to be able to reference it in column list, eg:
SELECT t.*,pg_database .datname
FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n)
join pg_database on true where datname = 'postgres'
ls | n | datname
----------------------+----+----------
pg_dynshmem | 1 | postgres
postmaster.pid | 2 | postgres
PG_VERSION | 3 | postgres
base | 4 | postgres

mysql expanded output

I'm coming from postgresql to mysql, curious if mysql has an expanded output flag similar to that of postgresql?
ie: in psql I could \x to get expanded output
id | name
---+-----
1 | foo
into
-[ Record ]------
id | 1
name | foo
how can I do this in mysql?
try SELECT foo FROM bla\G instead of SELECT foo FROM bla;