Count Number of Columns In Hive

Count Number of Columns In Hive - sql

I am looking for a way to count the number of columns in a table in Hive.
I know the following code works in Microsoft SQL Server. Is there a Hive equivalent?
SELECT COUNT(*),
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_CATALOG = 'database_name'
AND TABLE_SCHEMA = 'schema_name'
AND TABLE_NAME = 'table_name'

Try this
SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name]

Try this, it will show you the columns of your table:
DESCRIBE schemaName.tableName;

I do not know of a way to count the columns directly, however, I solved the problem for my needs indirectly via:
echo 'table1name:, '`hive -e 'describe schemaname.table1name;' | grep -v 'col_name' | wc -l > num_columns.csv
echo 'table2name:, '`hive -e 'describe schemaname.table2name;' | grep -v 'col_name' | wc -l >> num_columns.csv
...
(I needed the grep -v bit because I have headers on by default; without it you get one too many lines counted in the wc -l step.)

you have to check if your HIVE include HIVE-287 because for versions of HIVE which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

Just do a describe it will show you all columns then at the bottom then you can see number of rows it fetched that is number of columns.

Related

Copy partitioned bigquery table that only overwrites the partitions from the source table

So something like
bq cp -f src_table dst_table
but I want partitions in dst_table that are not present in src_table to stay unoverwritten. Is something like this possible?

You can do something like this:
Use this query to build script for bq command
#legacySql
select concat ('bq cp -f ', s.project_id, ':', s.dataset_id, '.',
s.table_id, '\$', s.partition_id, ' ',
t.project_id, ':', t.dataset_id, '.', t.table_id, '\$', t.partition_id, ';')
from [source_table$__PARTITIONS_SUMMARY__] s
inner join (select * from [target_table$__PARTITIONS_SUMMARY__]) t
on t.partition_id = s.partition_id
And just execute result in terminal
Make sure you escaped $ - this code will work for Mac/Unix - not sure about Windows

If I understood well your question, you intend to copy from one partitioned table (table 1) to another one (table 2) just the partitions filled in table 1.
You have to configure the required permissions to perform the operation, as described here. In your case, use bq cp -a to
append the data from the source partition to an existing table or
partition in the destination dataset
instead of bq cp -f, which forces overwriting. For example:
bq --location=[LOCATION] cp -a -f -n [PROJECT_ID]:[DATASET].[SOURCE_TABLE]$[SOURCE_PARTITION] [PROJECT_ID]:[DATASET].[DESTINATION_TABLE]$[DESTINATION_PARTITION]

Print variable with each line of while-read command

I'm trying to set up a monitoring script that would take all the databases we have, showed tables and done some arithmetics on it.
I have this command:
impala-shell -i impalad -q " show databases;" -B | while read a; do impala-shell -q "show tables in ${a}" -B -i impalad; done
That produces following output:
Query: show tables in database1
table1
table2
How should I format the output to display the database name($a) with each table? I tried echoing it or || but this only prints the database name after displaying all the tables. Or is there a way how to pass the variable to awk?
Desired output would look like this:
database1.table1
database1.table2

It looks like the output of the show tables ... command will have a 1-line header, followed by the list of table names.
You could skip the first line by piping to tail -n +2,
and then use another while loop to echo the database name and table name pairs in the desired format:
impala-shell -i impalad -q " show databases;" -B | while read a; do
impala-shell -q "show tables in ${a}" -B -i impalad | tail -n +2 | while read table; do
echo $a.$table
done
done

You could also do
impala-shell -q ... | awk -v db="$a" 'NR > 1 {print db "." $0}'

SQL query over multiple tables in one database

enter code hereI have the following problem:
I have a bunch of Tables in a Vertica Database say:
+------------+
| Tablenames |
+------------+
| a_1 |
| a_2 |
| a_34 |
| b_1 |
| b_4 |
+------------+
The tables are not exactly the same but have mostly similar entries. And now I want to make one query over all tables that start with a_ (a_1 a_2 a_34).
Is there a way to search through all the tables for the string a_ in their name, output some sort of list and than either use a for loop or join operation with the generated list?
Once I get the new table (lets call it temp_table) that has all the table names that start with a_ I would like to run one query over all of them, something like that (Matlab syntax):
for ii=1:length(temp_table)
Data{ii}=SELECT * FROM temp_table(ii) WHERE paste_condition_here
end
So Data should be a new table that appends the new rows with each iteration.

#Nirjihar - there is on information_schema (you need v_catalog), you are confusing with MySQL.
select TABLE_NAME from v_catalog.TABLES where TABLE_NAME like 'a_%';
This will return all tables with a criteria of 'a_%'
Just as a complement !
In Vertica you won`t have loops ! For this you have to use UDP(user defined procedures), this can be written in the language of your choice (shell,java,R,C++).
i will go ahead and post on model here for you :
1 - Shell proc - to be created in the procedures folder
#!/bin/bash
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -t -o /tmp/query.sql -c"
SELECT
' select * from '
||TABLE_SCHEMA
||'.'
||TABLE_NAME
||';'
FROM
v_catalog.TABLES where TABLE_NAME like '%$1%'
"
/opt/vertica/bin/vsql -U $username -w $password -F $'|' -At -o /tmp/query_output.csv -f /tmp/query.sql
2 - change sh file privs
chmod 4750 query_table.sh
3 - make sure you have the .profile file populated accordingly
. /home/dbadmin/.profile
#!/bin/bash
username=dbadmin
password=secrectpasswd
export username
export password
Note: this is to avoid passwd in text and only have one point of text passwd
4 - Register the UDP with Vertica Catalog
. /home/dbadmin/.profile
admintools -t install_procedure -f /vertica/catalog//procedures/query_table.sh -d -p $password
5 -Create the UDP inside the database
. /home/dbadmin/.profile
/opt/vertica/bin/vsql -U $username -w $password -c "CREATE PROCEDURE dba.query_table(table_name varchar) AS 'query_table.sh' LANGUAGE 'external' USER 'dbadmin';"
6 - execute the proc
select dba.query_table('you possible table name here');
7 - check results
a - you will get a file with the query
b - one file with the exported data(csv '|' delimited).
i have a similar post here:
http://www.aodba.com/create-vertica-schema-fly/

To get all the tables that start from a_:-
select TABLE_NAME from INFORMATION_SCHEMA.TABLES where TABLE_NAME like 'a_%'
Then you can alias this and join the list or each table as you like.

executing HIVE query in background

how to execute a HIVE query in background when the query looks like below
Select count(1) from table1 where column1='value1';
I am trying to write it using a script like below
#!/usr/bin/ksh
exec 1> /home/koushik/Logs/`basename $0 | cut -d"." -f1 | sed 's/\.sh//g'`_$(date +"%Y%m%d_%H%M%S").log 2>&1
ST_TIME=`date +%s`
cd $HIVE_HOME/bin
./hive -e 'SELECT COUNT(1) FROM TABLE1 WHERE COLUMN1 = ''value1'';'
END_TIME=`date +%s`
TT_SECS=$(( END_TIME - ST_TIME))
TT_HRS=$(( TT_SECS / 3600 ))
TT_REM_MS=$(( TT_SECS % 3600 ))
TT_MINS=$(( TT_REM_MS / 60 ))
TT_REM_SECS=$(( TT_REM_MS % 60 ))
printf "\n"
printf "Total time taken to execute the script="$TT_HRS:$TT_MINS:$TT_REM_SECS HH:MM:SS
printf "\n"
but getting error like
FAILED: SemanticException [Error 10004]: Line 1:77 Invalid table alias or column reference 'value1'
let me know exactly where I am doing mistake.

Create a document named example
vi example
Enter the query in the document and save it.
create table sample as
Select count(1) from table1 where column1='value1';
Now run the document using the following command:
hive -f example 1>example.error 2>example.output &
You will get the result as
[1]
Now disown the process :
disown
Now the process will run in the background. If you want to know the status of the output, you may use
tail -f example.output

True #Koushik ! Glad that you found the issue.
In the query, bash was unable to form the hive query due to ambiguous single quotes.
Though SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1' is valid in hive,
$hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = 'Value1';' is not valid.
The best solution would be to use double quotes for the Value1 as
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'
or use a quick and dirty solution by including the single quotes within double quotes.
hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "'"Value1"'";'
This would make sure that the hive query is properly formed and then executed accordingly. I'd not suggest this approach unless you've a desperate ask for a single quote ;)

I am able to resolve it replacing single quote with double quote. Now the modified statement looks like
./hive -e 'SELECT COUNT(1) FROM Table1 WHERE Column1 = "Value1";'

How to display only the db2 query result via shell script and not the query?

There is probably a very simple solution here, but I am probably not using the right search terms. I have a sql query running in a shell script. I get the results I am looking for, however, I am also getting the sql query as part of of the result. How can I suppress this and just show the result?
My script:
#!/usr/bin/sh
db2 connect to MYDB >/dev/null 2>&1;
db2 -x -v "select A, B, C from MYTABLE";
db2 connect reset >/dev/null 2>&1;
And my output looks like this:
select A, B, C from MYTABLE
AAA BBB CCC
AAA BBB CCC
I would like to get rid of the first row and just show the result. What am I missing?
Thanks in advance for your help!

The -v option for the DB2 command line processor causes the current statement being executed to be printed in the output.
Remove the -v from your command and you'll get only the results of the query.

if you just want to skip the 1st row from your output you could:
yourscript.sh | tail -n +2
test with seq:
kent$ seq 5|tail -n +2
2
3
4
5

Try this
db2 -o query
for more info. http://www.ibm.com/developerworks/data/library/techarticle/adamache/0109adamache.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count Number of Columns In Hive - sql

Try this SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name]

Try this, it will show you the columns of your table: DESCRIBE schemaName.tableName;

you have to check if your HIVE include HIVE-287 because for versions of HIVE which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

Just do a describe it will show you all columns then at the bottom then you can see number of rows it fetched that is number of columns.

Related

Copy partitioned bigquery table that only overwrites the partitions from the source table

Print variable with each line of while-read command

SQL query over multiple tables in one database

executing HIVE query in background

How to display only the db2 query result via shell script and not the query?

Categories

Resources