changing schemas in hive from command line - hive

How to change hive schema from command line?
I need to run hql scripts for creating tables but those tables need to be created inside a particular schema.
I am using hive -f createTable.hql to create table

You can pass a parameter(s) to the script:
hive -hiveconf myschema=newschema -f createTable.hql
Then in the script:
CREATE SCHEMA IF NOT EXISTS ${hiveconf:myschema}
LOCATION "/foo/dir";
USE ${hiveconf:myschema};
Any tables you then create will be in that working schema. You can also make the variables part of the name:
hive -hiveconf name=Bob -f createTable.hql
In the script:
CREATE SCHEMA IF NOT EXISTS ${hiveconf:name}_SCHEMA
LOCATION "/foo/dir";
USE ${hiveconf:name}_SCHEMA;

you can do that inside your hql/ddl statement.
USE schemaName;
create table.....
....
So, basically you can tell hive to use a specific schema by using the "USE" statement.
You can also write it in hive CLI.

If you want to run queries in a particular schema via command line you can use the below command:
hive -e "use schema_name; show tables;"
but -f and -e cannot be used both in same command.

Related

How do I copy the table structure of a postgres table into a different postgres database without the data

I need to play around with a data syncing program I wrote, but I want to copy the structure of the production database into a new table on my localhost Postgres database, without copying the data to my localhost db.
I was thinking along the lines of
CREATE TABLE new_table AS
TABLE existing_table
WITH NO DATA;
But I am unsure how to modify it to work with 2 different databases.
Any help would be appreciated
This boils down to the question "how to create the DDL script for a table" which can easily be done using pg_dump on the command line.
pg_dump -d some_db -h production_server -t existing_table --schema-only -f create.sql
The file create.sql then contains the CREATE TABLE script that you can run on your local Postgres installation.

How to specify extension for the files for external table in?

I have an external table in hive. It is generating file names without any extension. How can I force Hive to add extension (e.g. .tsv) to the files ? Is there any option in the create table statement?
You can run it directly from the command line, give it a shot
hive -e 'select columns from table' > /<Location>/<Filename>.tsv

HIVE query logs location

I am find very difficult to locate the HIVE query logs, basically i want to see what queries were executed.
Basically i want to find the queries in this state:
select foo, count(*) from table where field=value group by foo;
From Hive documentation:
hive.exec.scratchdir Default Value:
/tmp/${user.name} in Hive 0.2.0 through 0.8.0
/tmp/hive-${user.name} in Hive 0.8.1 through 0.14.0
/tmp/hive in Hive 0.14.0 and later
This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages
hive.start.cleanup.scratchdir Default Value: false
Execute the query with below command
hive --hiveconf hive.root.logger=DRFA --hiveconf hive.log.dir=./logs --hiveconf hive.log.level=DEBUG -e "select foo, count(*) from table where field=value group by foo"
It will create a log file in logs folder. Make sure that the logs folder exist in current directory.

PostgreSQL. Generate create database statement for an existing databse

How to generate the "create table" sql statement for an existing table in postgreSQL
Here is explained how to generate such script for one table, but how to do the same for whole database?
I need to extract the script that creates the database and all tables in it.
pgdump -s databasename does this
pgdump -s databasename | awk 'RS="";/TABLE[^;]*;/' allows extract statements of table creating/altering. Unfortunately I don't understand awk syntax.

run shell command inside Hive that uses Hive's variables value as input

I have a python script that receives a Hive table name and 2 dates and adds all partitions between those dates. (runs a bunch of hive -e 'alter table add partition (date=...)')
What i would like to do is when running a Hive script that has a hiveconf:date variable
pass it to the python script as input.
something like:
!python addpartitions.py mytable date=${hiveconf:date}
but of course the variable substitution does not take place...
Any way to achieve this?