How to overwrite MySQL table when using sqoop export from Hive - hive

I need to transfer data from Hive to MySQL.
Here is my sqoop command:
jdbc:mysql://mysqlserver --username username --password password --table test --columns "member_id,answer_id,answerer_id" -m 1 --export-dir /user/hive/warehouse/utils.db/test --input-fields-terminated-by \001 --lines-terminated-by \n --update-mode allowinsert
But, every time I run this command, data seems to be appended to the table but not overwrite the table.
So, is there any way that I can truncate MySQL table automatically when I run this sqoop command?

I think what you are trying to do is, complete refresh of the table each time you upload the data. usually that is something that needs to be handled in the database end. You will need to delete all records before performing the insert. The other way is use --staging-table parameter along with --clear-staging-table which will make sure that the table is cleared each time. In this scenario you --table will contains a dummy table that will be appened each time. you can have a trigger to clear the data of that table at set period everyday or when pleases. I have given the sqoop command below. I have placed "test" as staging table and "dummy" as main table.
jdbc:mysql://mysqlserver --username username --password password --table dummy --columns "member_id,answer_id,answerer_id" -m 1 --export-dir /user/hive/warehouse/utils.db/test --input-fields-terminated-by \001 --lines-terminated-by \n --update-mode allowinsert --staging-table test --clear-staging-table

Use below command to existing records and insert new records if any .
sqoop import --connect jdbc:mysql://mysqlserver --username username --password password --table test --columns "member_id,answer_id,answerer_id" -m 1 --export-dir /user/hive/warehouse/utils.db/test --input-fields-terminated-by \001 --lines-terminated-by \n --update-key --update-mode allowinsert
Note that above command will not apply deletes .
If you really want to truncate the data and load whole data again use below command. it is useful only when source(hdfs) has deleted records.
sqoop eval --connect jdbc:mysql://mysqlserver --username username --password password --query 'TRUNCATE TABLE TABLE_NAME'
sqoop export --connect jdbc:mysql://mysqlserver --username username --password password --export-dir 'HDFS_PATH' --table TABLE_NAME

Related

PostgreSQL - dynamically script table definition

Is there a way in which I can get the table definition in a script that I can execute?
i.e.
I have a table "cities":
CREATE TABLE public.cities
(
name character(80) COLLATE pg_catalog."default" NOT NULL,
location point,
CONSTRAINT pk_city_name PRIMARY KEY (name)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.cities
OWNER to postgres;
is there a way I can generate that with a script rather than using the GUI?
If you want to write your commands into a script file, and the run it from command line, you should use psql -f <filename>.
See https://www.postgresql.org/docs/9.2/static/app-psql.html
Thanks for the links and places to look.
for those reading what I done:
open cmd
navigate to C:\Program Files\PostgreSQL\10\bin> (or create a short cut)
pg_dump -d mydb -t cities -U postgres -h localhost > C:/test/weather.sql
enter password for postgres
file is output to directory.

Cannot make granting access to existing tables work

I'm failing at adding a new DB role, that would have a SELECT privilege on tables from particular database.
My problem is that the role is not able to SELECT from a table in existing DB.
Here's my failing test case (written so it can safely be copy-pasted into a /tmp/test.sh and executed):
# --- cleanup objects, if any
psql -U postgres -c "REVOKE SELECT ON ALL SEQUENCES IN SCHEMA public FROM db_reader"
psql -U postgres -c "REVOKE SELECT ON ALL TABLES IN SCHEMA public FROM db_reader"
psql -U postgres -c "REVOKE USAGE ON SCHEMA public FROM db_reader"
psql -U postgres -c "DROP ROLE IF EXISTS db_reader"
psql -U postgres -c "DROP DATABASE IF EXISTS some_existing_db"
# --- test
psql -U postgres -c "CREATE DATABASE some_existing_db"
psql -U postgres some_existing_db -c "CREATE TABLE cats (name varchar(10))"
psql -U postgres some_existing_db -c "INSERT INTO cats (name) VALUES ('a'), ('b')"
psql -U postgres -c "CREATE ROLE db_reader WITH login"
psql -U postgres -c 'GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO db_reader'
psql -U postgres -c 'GRANT SELECT ON ALL TABLES IN SCHEMA public TO db_reader'
psql -U postgres -c 'GRANT USAGE ON SCHEMA public TO db_reader'
psql -U db_reader some_existing_db -c "SELECT COUNT(1) FROM cats"
Looks like I'm missing something extremely, embarrassingly obvious here, as the above fails with the following error:
ERROR: permission denied for relation cats
Why?
You are missing that databases are logically separated.
Your GRANT statements are executed in database postgres (if you do not specify a database name, psql will try to connect to a database with the same name as the database user).
Consequently, the effect of these grants is limited to the database to which you are connected.
You have to add some_existing_db to the psql invocations where you grant privileges to db_reader.

How to run alter table script in postgres using bash

I wanted to run the alter table command using bash script. I managed to create the table, load the basemodel, create config tables and etc. The script will login to the postgres database before it is execute the alter table command. It stuck as (abcdb=> ) without proceed to the alter table command. Is there any way to make sure the alter table able to execute?
The login as
psql -h 191.169.51.10 -d abcdb -U myname
alter table attr_config rename regexp to regexp_val;
alter table class_action_config rename type to type_name;
alter table funcitem_config rename type to type_name;
In order to run a script like this you need to redirect the SQL/DML (alter table statements) into the psql command. Otherwise bash won't understand what to do with them.
psql -h 191.169.51.10 -d abcdb -U myname << EOF
alter table attr_config rename regexp to regexp_val;
alter table class_action_config rename type to type_name;
alter table funcitem_config rename type to type_name;
EOF
Alternatively you can put your SQL/DML into a separate file and have psql to read from that:
psql -h 191.169.51.10 -d abcdb -U myname < alter_statements.sql
Or
psql -h 191.169.51.10 -d abcdb -U myname -f alter_statements.sql

PostgreSQL create a schema under a different database using a script

So what I want to do here is to run a script while connected to a database I already had using pgAdmin3. The script contains a create role, tablespace, database and a create schema and several tables under that schema.
The problem here is that when I run the script it creates the new role, tablespace and database correctly. It also creates the schema and the tables correctly but with a problem, the schema is created under the database, from which I ran the script, instead of the newly created database. The script is more or less like this.
CREATE ROLE "new_role" ... ;
CREATE TABLESPACE "new_space"
OWNER "new_role"
LOCATION '/home/...';
CREATE DATABASE "new_db"
WITH OWNER = "new_role"
TABLESPACE = "new_space";
CREATE SCHEMA "schema" AUTHORIZATION "new_role" ;
CREATE TABLE IF NOT EXISTS "schema"."new_table"(
...
) TABLESPACE "new_space";...
...
I already saw a solution with a \connect foo; but that is not what I wanted, I wanted it to somehow connect within the script without running things separately and running \connect foo in the terminal.
Can anyone tell me if there is anyway to do this and help me come out with a solution to this problem?
Use psql and split it up into two scripts . You can save the scripts in .sql files, and then run psql to connect to the DB you want to run each script against all on the same command line (with && in between each command). The two psql commands could be combined into one bash script so it's only one command that you need to run.
Something like this, if the script were named foo.sql:
psql -X -h <host> -U <user> -p <port> -f foo.sql <db_name>
The first script could have the create role, create tablespace and create database commands, connecting to the postgres db or a template DB, and the second script could have the rest of the commands.
You could also use createdb from the bash script instead of CREATE DATABASE.
Using pgAdminIV:
1- right click on default database "postgres"
2- select create database, give a name f.e. "newdatabase"
3- click on "newdatabase" (to establish connection)
4- open the query tool
5- import, write or paste your code
6- run your code f.e.: CREATE SCHEMA newschema;
It works for me...

How to pass arguments to .sql file from shell script

Hi there if anyone can help me, I have a .sh script that executes 4 .sql scripts, each executing against a schema. Currently the schema name is hardcoded but i want to make it configurable.
Given the following below how will i pass the arguments from the shell script to the .files?
an e.g call to a .sql is done in my shell script is done so like the following
ECHO “DELETING SCHEME….”
psql -f $SCRIPT_DIR/delete_data.sql my_db postgres
ECHO “DATABASE SCHEMA DELETED..”
delete_data.sql
drop schema my_schema cascade;
create schema my_schema;
You could replace the my_schema part with a placeholder, like %SCHEMA%:
drop schema %SCHEMA% cascade;
create schema %SCHEMA%;
We then run a substitution using sed, and pipe the results into psql (reading from stdin is equivalent to reading from file):
sed "s/%SCHEMA%/$schemaName/" $SCRIPT_DIR/delete_data.sql | psql powa_aim_db postgres
You can do this using a heredoc for your SQL:
my_schema="$1"
ECHO “DELETING SCHEME….”
psql <<SQL
drop schema $my_schema cascade
create schema $my_schema
SQL
ECHO “DATABASE SCHEMA DELETED..”
Then call your script with the schema name as the first argument:
$ ./my_script my_schema_name